Dev federated query performance improvement merge request #1463

mhs62 · 2025-01-31T09:07:36Z

Implementation of the Inverted index for query federation
Implementation of TIFF-CSV and CSV-TIFF conversion

inverted index: class to files

load inverted index from a saved file

Multilevel inverted indexing

load saved multilevel inverted index from file

print_key_value_stats(self) is defined

We tested against two blazegraph namespaces

added progress bar

it creates class to blazegraph endpoints (namespace) index

minor modification

It can analyse SPARQL to retrieve classes and properties in the query so that we can decide at which endpoints we should run our SPARQL using our proposed inverted index

It can run a SPARQL and retrieve results in the JSON format

It creates inverted index from property to endpoints

It can 1) build 3 indices (concept->endpoint, property-> endpoint, concept -> property -> endpoint), 2) update indices on adding new triple-set, 3) save indices (individually and collectively) to a dir, 4) load indices from a dir (individually and collectively)

It can analyse a SPARQL and find necessary endpoints to run the SPARQL

It can run a SPARQL against an endpoint

experimented earlier. not necessary now.

It can build Inverted Index from a number of namespaces, save the inverted index in a local directory, update inverted index for new set of triple inserted into KGs, and load the inverted index into memory.

It can load the inverted index from local directory, analyse the user query to find classes/properties, find the relevant endpoints, and finally can run the user query against endpoints using FedX.

Added maven dependencies for the federated query processing

Java code is documented and elimination of some stop_classes and stop_properties has implemented so that unimportant class/property does not have any negative impact.

Some stop_classes and stop_properties are defined so that unimportant class/property does not have any negative impact.

Java code is documented

Timer has been added to the main() module so that we can know the elapse time

compacted to produce overhead on 100 and 1000 iterations for a query

code documentation

dev-federated-query-performance-improvement: modified main to perform various experiments.

…een duplicated to change as required for integrating with RemoteStoreClient

…e and extend index

…om INSERT sparql is done for updating index

…erformance-improvement

Ushcode

JPS Base Lib does not build on this branch due to the change frrom java 1.8 to java 17.

From the build report:

Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:compile (default-compile) on project jps-base-lib: Fatal error compiling: error: release version 17 not supported -> [Help 1]

Ushcode

I will move the code here to our new JPS repo and keep this branch as a WIP in the new location

mhs62 added 30 commits April 23, 2024 17:29

Create build_kg_index.py

ff9d802

inverted index: class to files

Create load_kg_index.py

fb7bef8

load inverted index from a saved file

Create README.md

89e5b0b

Update build_kg_index.py

fc333ff

Update load_kg_index.py

0457dc4

Update README.md

f343a97

Create build_kg_multilevel_index.py

45ef3e7

Multilevel inverted indexing

Create load_kg_multilevel_index.py

89347fc

load saved multilevel inverted index from file

Update load_kg_index.py

63fe644

print_key_value_stats(self) is defined

renamed

694cbb2

Create build_cp2endpoints_indx.py

a15f500

We tested against two blazegraph namespaces

Update build_cp2endpoints_indx.py

ca4fc41

added progress bar

Create build_c2endpoints_indx.py

f18f0c8

it creates class to blazegraph endpoints (namespace) index

Update build_cp2endpoints_indx.py

371ffff

minor modification

Create analyse_sparql.py

e1dc4aa

It can analyse SPARQL to retrieve classes and properties in the query so that we can decide at which endpoints we should run our SPARQL using our proposed inverted index

Create process_sparql.py

46463d6

It can run a SPARQL and retrieve results in the JSON format

Create build_p2endpoints_indx.py

8c26aec

It creates inverted index from property to endpoints

load_c_indx is renamed as load_single_indx

287a8ac

renamed

6e75836

Update analyse_sparql.py

6ff68b9

It can analyse a SPARQL and find necessary endpoints to run the SPARQL

Update process_sparql.py

5c9b1bb

It can run a SPARQL against an endpoint

moved to backup_extra

a980fe9

experimented earlier. not necessary now.

Create BuildInvertedIndex.java

bd25f4d

It can build Inverted Index from a number of namespaces, save the inverted index in a local directory, update inverted index for new set of triple inserted into KGs, and load the inverted index into memory.

Create ProcessQuery.java

6774e6e

It can load the inverted index from local directory, analyse the user query to find classes/properties, find the relevant endpoints, and finally can run the user query against endpoints using FedX.

Update pom.xml

cc75a50

Added maven dependencies for the federated query processing

Update TimeSeriesRDBClientIntegrationTest.java

de0ee5c

Update BuildInvertedIndex.java

5336db4

Java code is documented and elimination of some stop_classes and stop_properties has implemented so that unimportant class/property does not have any negative impact.

Create stopcps.json

bde4234

Some stop_classes and stop_properties are defined so that unimportant class/property does not have any negative impact.

Update ProcessQuery.java

eed40fc

Java code is documented

mhs62 added 14 commits July 2, 2024 18:12

Update BuildInvertedIndex.java

2625698

Update ProcessQuery.java

c5174b7

Timer has been added to the main() module so that we can know the elapse time

Update ProcessQuery.java

fa7d05e

Update ProcessQuery.java

fcb252b

compacted to produce overhead on 100 and 1000 iterations for a query

Update BuildInvertedIndex.java

17f20a8

code documentation

Update ProcessQuery.java

74e1970

Revert "Update ProcessQuery.java"

4c58ea1

dev-federated-query-performance-improvement: modified main to perform various experiments.

dev-federated-query-performance-improvement: a method has been renamed

f8010b3

dev-federated-query-performance-improvement: BuildInvertedIndex has b…

97d9dcb

…een duplicated to change as required for integrating with RemoteStoreClient

dev-federated-query-performance-improvement: integration is tested

cc236d8

dev-federated-query-performance-improvement: add test method on creat…

1b64182

…e and extend index

dev-federated-query-performance-improvement: extraction of triples fr…

6573ee3

…om INSERT sparql is done for updating index

Create csv2tif.py

6788091

Create tif2csv.py

afddd85

mhs62 requested review from gpeb2 and Ushcode and removed request for gpeb2 January 31, 2025 09:15

mhs62 self-assigned this Jan 31, 2025

Merge remote-tracking branch 'origin/main' into dev-federated-query-p…

a92c4f6

…erformance-improvement

Ushcode reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev federated query performance improvement merge request #1463

Dev federated query performance improvement merge request #1463

Uh oh!

mhs62 commented Jan 31, 2025

Uh oh!

Ushcode left a comment

Uh oh!

Ushcode left a comment

Uh oh!

Uh oh!

Dev federated query performance improvement merge request #1463

Are you sure you want to change the base?

Dev federated query performance improvement merge request #1463

Uh oh!

Conversation

mhs62 commented Jan 31, 2025

Uh oh!

Ushcode left a comment

Choose a reason for hiding this comment

Uh oh!

Ushcode left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!