Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
27ee979
Create a dummy app for openapi web search project
Apr 1, 2023
185b6d8
fix port number
Apr 1, 2023
b7421c5
fix github cache
Apr 1, 2023
a8adced
Created a default Sails.js server with no frontend in the src/server …
Jun 10, 2023
ce75f4c
start implementating crawling controller - create fasade design patte…
Jun 15, 2023
47a02de
Fix the typo in whole codebase: Fasade to Facade
Jun 15, 2023
3484876
Reorganize the directory structure through refactoring and rewrite th…
Jun 20, 2023
c48cd21
write batch processing for common crawl directories and implement a …
Jun 22, 2023
a3a6319
Implement backoff for retriving index files URLs from CC server.
Jun 26, 2023
ef9181d
Refactor code for improved readability and maintainability
Jun 30, 2023
5427bd8
Implementing Queue-Based Architecture of Downloading Index Files from…
Jul 9, 2023
b078e88
refactor the both controller and error handling, Fix 503 error by fix…
Jul 26, 2023
cf908c6
Delete couple unnecessary files
Jul 27, 2023
223ec43
Add jsDoc to every function and create a new file ConsumeMessagesFrom…
Jul 27, 2023
4375c08
Restructure project directories, removing VS Code folder and refactor…
Jul 28, 2023
f016cb0
write tests for both controllers and remove code for storing openapi …
Aug 5, 2023
3f097c0
Introduce controller tests, implement validation service, and perform…
Aug 14, 2023
dbf1deb
Merge pull request #9 from priyanshu-kun/feature/tests
vinitshahdeo Aug 27, 2023
7905fca
completed openapi web search project server.
Aug 31, 2023
a1dce38
Fix common crawl server bug.
Sep 23, 2023
eb3e4e8
update readme file
Sep 27, 2023
4b2cf66
Fix markdown in readme file
Sep 27, 2023
8190817
Fix markdown and add postman collection
Sep 27, 2023
8490a0a
Update README file
Sep 27, 2023
12417c9
Merge pull request #11 from priyanshu-kun/priyanshu-kun/fix-cc-bug
HimanshuS129 Nov 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 49 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,25 +22,61 @@ The goal of this project can be achieved with the following milestones:
4. **Providing an interface**: Design a UI for API consumers and producers to initiate a search looking for APIs. Initially, the search can be done using metadata—the info object of the [OpenAPI document](https://spec.openapis.org/oas/latest.html#info-object).
5. **Updating dataset**: Regularly update the crawl results and re-index them for better search results.

## Info about GSoC’23

> **Note**: This project idea is shortlisted for [Google Summer Of Code 2023](https://blog.postman.com/join-postman-at-google-summer-of-code-2023/). Find the initial conversation [here](https://github.com/postman-open-technologies/gsoc-2023/issues/7).
# Running the Server

> Fork and/or clone the OpenAPI Web Search repo and change directory into it:

```js

git clone https://github.com/<username>/openapi-web-search.git
cd openapi-web-search/src/server

```

> Install dependencies via yarn:

```js

yarn install

```

> Start local server:

```js

yarn run dev

```

> After launching the local server, we can use Postman to begin sending http requests to the specified endpoints. I've included a postman collection in root of the project to get you started:


> Run the following endpoints in the specified order after configuring Postman with the collection above:

```js

1. http://localhost:1337/api/v1/run/crawler?latest=true
2. http://localhost:1337/api/v1/process/index-files?skip=0&limit=20&sort=aes
3. http://localhost:1337/api/v1/indexing
4. http://localhost:1337/api/v1/search?q=<query>

```

> Explanation:

1. The first endpoint will crawl the common-crawl website to get some files which include the paths to index files that are converted into the appropriate endpoints.
2. The second endpoint initiates the background process of downloading index files, processing them, and storing the results, which are validated openapi definitions, in mongodb.
3. Third endpoint begins indexing the previously gathered MongoDB results into Elasticsearch..
4. The last endpoint is utilised to create a search query for optimum retrival.



If you're an aspiring GSoC candidate, here's what you should know:

- Having said that the purpose of this project is the **discovery of APIs from lesser-known sources**, crawling is where you will spend a good chunk of time.
- The proposal should expand on each milestone mentioned in the above section. We understand that completing all the milestones within the 12 weeks of the GSoC period may not be feasible. We can figure it out based on the timeline provided.
- There is no restriction on the choice of language, framework, or tools for building the solution for Open API Web Search.
- We really don’t believe in reinventing the wheel. Feel free to use an existing solution like [Common Crawl](https://commoncrawl.org/).
- For any concerns, kindly reach out to [@vinitshahdeo](https://github.com/vinitshahdeo) or [@MikeRalphson](https://github.com/MikeRalphson).

#### Qualifying task

As mentioned in [`CONTRIBUTOR_GUIDANCE.md`](https://github.com/postman-open-technologies/gsoc-2023/blob/main/CONTRIBUTOR_GUIDANCE.md), please refer to **[#2](https://github.com/postman-open-technologies/openapi-web-search/issues/2)** for the qualifying task.


## Contact

If you have any questions or queries, please [create an issue](openapi-web-search) on this repo (with a prefix GSoC 2023), start a topic on [our community forums in the GSoC category](https://community.postman.com/c/open-technology/gsoc/42) or send an email to us at gsoc@postman.com.

[![Twitter](https://img.shields.io/badge/Twitter-%40getpostman-orange?logo=twitter&logoColor=white)](https://twitter.com/getpostman) [![YouTube](https://img.shields.io/badge/YouTube-%40postman-orange?logo=youtube)](https://www.youtube.com/c/postman)
131 changes: 131 additions & 0 deletions open api web search.postman_collection.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{
"info": {
"_postman_id": "a19eee01-9a5f-4228-8353-c7fcc3615531",
"name": "open api web search",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
"_exporter_id": "20917353"
},
"item": [
{
"name": "start crawling",
"request": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "{\n \"dataSource\": \"commonCrawl\"\n}",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "http://localhost:1337/api/v1/run/crawler?latest=true",
"protocol": "http",
"host": [
"localhost"
],
"port": "1337",
"path": [
"api",
"v1",
"run",
"crawler"
],
"query": [
{
"key": "latest",
"value": "true"
}
]
}
},
"response": []
},
{
"name": "download-process-index-files",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "http://localhost:1337/api/v1/process/index-files?skip=0&limit=20&sort=aes",
"protocol": "http",
"host": [
"localhost"
],
"port": "1337",
"path": [
"api",
"v1",
"process",
"index-files"
],
"query": [
{
"key": "skip",
"value": "0"
},
{
"key": "limit",
"value": "20"
},
{
"key": "sort",
"value": "aes"
}
]
}
},
"response": []
},
{
"name": "searchController",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "http://localhost:1337/api/v1/search?q=austria",
"protocol": "http",
"host": [
"localhost"
],
"port": "1337",
"path": [
"api",
"v1",
"search"
],
"query": [
{
"key": "q",
"value": "austria"
}
]
}
},
"response": []
},
{
"name": "indexController",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "http://localhost:1337/api/v1/indexing",
"protocol": "http",
"host": [
"localhost"
],
"port": "1337",
"path": [
"api",
"v1",
"indexing"
]
}
},
"response": []
}
]
}
31 changes: 31 additions & 0 deletions src/server/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
################################################
# ╔═╗╔╦╗╦╔╦╗╔═╗╦═╗┌─┐┌─┐┌┐┌┌─┐┬┌─┐
# ║╣ ║║║ ║ ║ ║╠╦╝│ │ ││││├┤ ││ ┬
# o╚═╝═╩╝╩ ╩ ╚═╝╩╚═└─┘└─┘┘└┘└ ┴└─┘
#
# > Formatting conventions for your Sails app.
#
# This file (`.editorconfig`) exists to help
# maintain consistent formatting throughout the
# files in your Sails app.
#
# For the sake of convention, the Sails team's
# preferred settings are included here out of the
# box. You can also change this file to fit your
# team's preferences (for example, if all of the
# developers on your team have a strong preference
# for tabs over spaces),
#
# To review what each of these options mean, see:
# http://editorconfig.org/
#
################################################
root = true

[*]
indent_style = space
indent_size = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
1 change: 1 addition & 0 deletions src/server/.eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

85 changes: 85 additions & 0 deletions src/server/.eslintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
{
// ╔═╗╔═╗╦ ╦╔╗╔╔╦╗┬─┐┌─┐
// ║╣ ╚═╗║ ║║║║ ║ ├┬┘│
// o╚═╝╚═╝╩═╝╩╝╚╝ ╩ ┴└─└─┘
// A set of basic code conventions designed to encourage quality and consistency
// across your Sails app's code base. These rules are checked against
// automatically any time you run `npm test`.
//
// > Note: If you're using mocha, you'll want to add an extra override file to your
// > `test/` folder so that eslint will tolerate mocha-specific globals like `before`
// > and `describe`.
// Designed for ESLint v4.
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
// For more information about any of the rules below, check out the relevant
// reference page on eslint.org. For example, to get details on "no-sequences",
// you would visit `http://eslint.org/docs/rules/no-sequences`. If you're unsure
// or could use some advice, come by https://sailsjs.com/support.
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

"env": {
"node": true
},

"parserOptions": {
"ecmaVersion": 2018
},

"globals": {
// If "no-undef" is enabled below, be sure to list all global variables that
// are used in this app's backend code (including the globalIds of models):
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Promise": true,
"sails": true,
"_": true
// …and any others (e.g. `"Organization": true`)
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
},

"rules": {
"block-scoped-var": ["error"],
"callback-return": ["error", ["done", "proceed", "next", "onwards", "callback", "cb"]],
"camelcase": ["warn", {"properties":"always"}],
"comma-style": ["warn", "last"],
"curly": ["warn"],
"eqeqeq": ["error", "always"],
"eol-last": ["warn"],
"handle-callback-err": ["error"],
"indent": ["warn", 2, {
"SwitchCase": 1,
"MemberExpression": "off",
"FunctionDeclaration": {"body":1, "parameters":"off"},
"FunctionExpression": {"body":1, "parameters":"off"},
"CallExpression": {"arguments":"off"},
"ArrayExpression": 1,
"ObjectExpression": 1,
"ignoredNodes": ["ConditionalExpression"]
}],
"linebreak-style": ["error", "unix"],
"no-dupe-keys": ["error"],
"no-duplicate-case": ["error"],
"no-extra-semi": ["warn"],
"no-labels": ["error"],
"no-mixed-spaces-and-tabs": [2, "smart-tabs"],
"no-redeclare": ["warn"],
"no-return-assign": ["error", "always"],
"no-sequences": ["error"],
"no-trailing-spaces": ["warn"],
"no-undef": ["off"],
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
// ^^Note: If this "no-undef" rule is enabled (set to `["error"]`), then all model globals
// (e.g. `"Organization": true`) should be included above under "globals".
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"no-unexpected-multiline": ["warn"],
"no-unreachable": ["warn"],
"no-unused-vars": ["warn", {"caughtErrors":"all", "caughtErrorsIgnorePattern": "^unused($|[A-Z].*$)", "argsIgnorePattern": "^unused($|[A-Z].*$)", "varsIgnorePattern": "^unused($|[A-Z].*$)" }],
"no-use-before-define": ["error", {"functions":false}],
"one-var": ["warn", "never"],
"prefer-arrow-callback": ["warn", {"allowNamedFunctions":true}],
"quotes": ["warn", "single", {"avoidEscape":false, "allowTemplateLiterals":true}],
"semi": ["warn", "always"],
"semi-spacing": ["warn", {"before":false, "after":true}],
"semi-style": ["warn", "last"]
}

}
Loading