-
Notifications
You must be signed in to change notification settings - Fork 9
Index Route
The index route (/index
) is the primary route you'll be using from DocDex. This is what allows you to traverse all the stored data in a speedy, intuitive manner.
The index route accepts 4 parameters:
-
javadoc
: Which javadoc to search in -
query
: What to search for -
limit
: How many results should be returned at a maximum? (not guaranteed, default 5, max 10) -
algorithm
: What algorithm should be used for a fuzzy search? (DUKE_JARO_WINKLER by default, and that's the only algorithm atm) The javadoc and query are the only required parameters, whereas limit and algorithm are optional, as they have defaults.
The javadoc refers to one of the names you set for a particular javadoc object in the app's config.json. For example, by default, the jdk has two aliases set, jdk
, and jdk11
, so for the jdk parameter, you could use either of these values.
The query, while having lots going on behind the scenes, isn't too hard to use. First off there's 3 types of objects you can search for, however there's different ways you can search for these. Here's a little table:
type | method | field |
---|---|---|
name | name | name |
fqn | fqn | fqn |
And for methods:
full params | type params | name params |
---|---|---|
name | name | name |
fqn | fqn | fqn |
Yeah so that's actually 12 ways to search, not 3, but I said 3 because it's kinda 3 if you think about it in the way I do, which I'm not going to delve into further because we're going off on a tangent at this point, let's get back on track.
A name is a name, not an fqn. An fqn is a name and a package. e.g.
String
= name, java.lang.String
= fqn.
String~chars
= name, java.lang.String~chars
= fqn.
String~getBytes(String charsetName)
= name full params, java.lang.String~getBytes(String charsetName)
= fqn full params.
String~getBytes(String)
= name type params, java.lang.String~getBytes(String)
= fqn type params.
String~getBytes(charsetName)
= name name params, java.lang.String~getBytes(charsetName)
= fqn name params.
Hopefully those examples make it a bit more clear how it all works. By the way, queries are completely case insensitive, just a fun fact.
Now you're probably wondering what's with these little squiggly lines (tildes), those are method separators. In the actual app, methods are separated with #
, but due to that symbol having special meaning in urls, we simply replace it with ~
when querying the api. Additionally, due to %
also having special meaning, for fields instead of using %
, we use -
.
/index?javadoc=1.16.4&query=commandexecutor
/index?javadoc=1.16.4&query=commandexecutor~oncommand
/index?javadoc=jdk&query=map~getordefault(key, defaultvalue)
/index?javadoc=1.16.4&query=material-birch_log
/index?javadoc=1.16.4&query=material-birch&limit=7
Now, what do these results mean? Well, if you want to use these results, you'll need to go look at the object structure, as that explains exactly what everything means. Actually, not quite everything, there's one thing here that the object structure doesn't explain, and that's the name
field of the root objects in the result set. This name field is used for inheritance purpose. For example, if you query player~getname
from the spigot javadoc on the official DocDex instance, the name of the documented object that is returned is CommandSender#getName(), because that's where Player inherits the getName method from. However, the name field of the root object is player#getname()
. The name field is not equal to whatever you put in your query, it's equal to what's stored in memory in DocDex.
Assuming your query and javadoc are valid, even if there's no direct match, a result will always be returned due to the fuzzy search that kicks in if a perfect match isn't found. We don't use edit distances, so it'll just find whatever is closest.
DocDex is still initializing:
You'll get this message, along with a 503 status code if you attempt to query something while the API is still starting up.
Unkown javadoc: %blah%:
You'll get this message along with a 400 status code if you query from a non-existent javadoc.
Javadoc and/or query not provided:
You'll get this message along with a 400 status code if you do not provide the javadoc and/or query in the request.
So, you may be wondering, how does DocDex figure out that when you typed "colleciton", you actually probably meant "collection"? Well the answer to that is via a highly optimized implementation of the Jaro Winkler algorithm. We're jumping ahead a bit though, let's start from the beginning. When you send a query, it's routed to a class called DocumentationIndex, which will break down your query into it's components and figure out what you really meant to type.
The first step is figuring out what kind of result you want, a type, method, or field. This is super simply, if it has a #, it's a method. If it has a %, it's a field. If it has neither, it's a type!. Types and fields both use the same searching algorithm, methods are MUCH more complicated. So let's start with types and fields. We need to figure out if you're searching for a type or fqn. While this technique isn't perfect, 99% of the time just checking if the query contains a period will do the trick, although this doesn't work as intended if you're querying an inner class (feel free to pr a fix).
Now that we know exactly what your query is, we can figure out which collection to match it to. With that collection, we first do a check to see if there's a direct match. If there is, we'll just go straight to returning your object from mongo. If there's a direct match, a result will be returned in O(1) time.
What if there isn't a direct match? Unfortunately, that's where it gets a bit more complicated. Time to go back to something briefly mentioned at the start, the jaro winkler algorithm. The jaro winkler algorithm used in DocDex is a minimally modified version of the jaro winkler algorithm present in the duke project. The modifications I've made are that I ripped out the "isCommonCharInS2" stuff, as it slowed down the algorithm massively, and I also migrated from strings to ascii byte arrays (as that's what docdex uses to store object names). Compared to other jaro winkler algorithms, I found this one to be over twice the speed of others, such as "tdebatty"'s implementation. This is due to the duke implementation counting all of the metrics in a single loop, instead of the 4 separate loops you'll often see in other impls, such as the one linked.
To use this algorithm, we first loop through ALL of the byte arrays in the collection we determined earlier, and generate a distance from each individual element to the query. We then cache this distance into a wrapper object. We then sort these wrapper objects, via the precalculated distance, which happens at O(nlogn). Technically we could make it O(nlogk) (k = top n# elements), however after running some benchmarks, I deemed it unnecessary due to speeds already being very good.
With our sorted collection, we lazily loop through it filling a duplicateless ordered set of the objects retrieved from mongo, till it reaches the specified limit. We do this as the collection may have multiple names which refer to the same object, so we can't limit the collection itself (at least not to the limit specified). Additionally, this also makes converting the sorting algorithm to O(nlogk) a pain, for this reason specified, although it should be relatively safe to say just limit to 3x the specified limit. Additionally we make no actual guarantees that the specified limit will be the amount of objects returned, merely a maximum, so this doesn't apply. It's just a nicety really.
The searching algorithm for methods still uses a lot of the above logic, it just does it five times (worst case) instead of once. First thing we do is determine what kind of parameters you've got. If you've got none, we won't bother searching for them, and the query will just be processed as a "full parameter" query (in the query search, parameters aren't factored if they're empty, so this isn't an issue). If you've got parameters with a type and a name, then it'll be searched as a full parameter. This search has 2 parts, a method search, and a parameter search.
If your query has parameters, and they aren't full parameters, but rather type or name parameters, that's where we get to the worst case search. We have to do 5 searches here. The first 4 are method & parameter searches on the type and name collections. The last search is comparing the results from the 4 previous searches to your query, and determining the best match.
Likewise to types & fields, if there's a direct match for a method, the result will be returned in O(1) time.
Feel free to contact me (PiggyPiglet) via discord for support.
-
API
- Routes
- Object Structure
- Population
- Running
- Commands
- Discord
- Development
- Quick Links