Skip to content

Commit 62a4e04

Browse files
README and License files for documentation generated during build.
1 parent cffa3c5 commit 62a4e04

19 files changed

+233
-123
lines changed

.github/workflows/deploy-docs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.6.33/quarto-1.6.33-linux-amd64.deb
3434
sudo dpkg -i quarto-1.6.33-linux-amd64.deb
3535
python -m pip install --upgrade pip
36-
python -m pip install build itables jupyter myst-parser setuptools sphinx sphinx-autodoc-typehints
36+
python -m pip install build jupyter myst-parser setuptools sphinx sphinx-autodoc-typehints sphinx-book-theme
3737
3838
- name: Build Sphinx docs
3939
run: |

docs/Makefile

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,11 @@ help:
1818
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
1919
%: Makefile
2020
cp "$(SOURCEDIR)/../../LICENSE.txt" "$(SOURCEDIR)/LICENSE.md"
21-
cp "$(SOURCEDIR)/../../README.md" "$(SOURCEDIR)/README.md"
22-
sed -i 's/LICENSE.txt/LICENSE.md/g' "$(SOURCEDIR)/README.md"
23-
sed -i 's/<a.*hex-logo.png.*<\/a>//g' "$(SOURCEDIR)/README.md"
21+
quarto render "$(SOURCEDIR)/_static/README.qmd" -t gfm --output-dir ../
22+
cp "$(SOURCEDIR)/README.md" ../README.md
23+
sed -i 's/\[MIT license\](LICENSE.txt)/<a href="LICENSE.html">MIT license<\/a>/g' "$(SOURCEDIR)/README.md"
24+
sed -i 's/docs\/source\///g' "$(SOURCEDIR)/README.md"
2425
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
25-
quarto render "$(SOURCEDIR)/_static/examples/dl-matching.qmd" --output-dir ../../../build/html/
26+
quarto render "$(SOURCEDIR)/_static/examples/dl-matching.qmd" -t gfm
27+
mv "$(SOURCEDIR)/_static/examples/dl-matching.md" "$(SOURCEDIR)/dl-matching.md"
28+
sed -i 's/dl-matching_files/_static\/examples\/dl-matching_files/g' "$(SOURCEDIR)/dl-matching.md"
Lines changed: 34 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
1-
# Neer Match <a href="https://py-neer-match.pikappa.eu"><img src="docs/source/_static/img/hex-logo.png" align="right" height="139" alt="neermatch website" /></a>
1+
---
2+
title: "Neer Match"
3+
self-contained: true
4+
resource-path:
5+
- "../../../"
6+
bibliography: bibliography.bib
7+
---
8+
9+
<a href="https://py-neer-match.pikappa.eu" style="float:right;margin-left:10px;"><img src="docs/source/_static/img/hex-logo.png" align="right" height="139" alt="neermatch website" /></a>
210

311
<!-- badges: start -->
412
![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)
@@ -14,9 +22,9 @@ The package has also an `R` implementation available at [r-neer-match](https://g
1422

1523
The package is built on the concept of similarity maps. Similarity maps are concise representations of potential associations between fields in two datasets. Entities from two datasets can be matched using one or more pairs of fields (one from each dataset). Each field pair can have one or more ways to compute the similarity between the values of the fields.
1624

17-
Similarity maps are used to automate the construction of entity matching models and to facilitate the reasoning capabilities of the package. More details on the concept of similarity maps and an early implementation of the package’s functionality (without neural-symbolic components) are given by (Karapanagiotis and Liebald 2023).
25+
Similarity maps are used to automate the construction of entity matching models and to facilitate the reasoning capabilities of the package. More details on the concept of similarity maps and an early implementation of the package’s functionality (without neural-symbolic components) are given by [@karapanagiotis2023].
1826

19-
The training loops for both deep and symbolic learning models are implemented in [tensorflow](https://www.tensorflow.org) (see Abadi et al. 2015). The pure deep learning model inherits from the [keras](https://keras.io) model class (Chollet et al. 2015). The neural-symbolic model is implemented using the logic tensor network ([LTN](https://pypi.org/project/ltn/)) framework (Badreddine et al. 2022). Pure neural-symbolic and hybrid models do not inherit directly from the (Chollet et al. 2015) model class, but they emulate the behavior by providing custom `compile`, `fit`, `evaluate`, and `predict`methods, so that all model classes in `neermatch` have a uniform calling interface.
27+
The training loops for both deep and symbolic learning models are implemented in [tensorflow](https://www.tensorflow.org) [@tensorflow2015]. The pure deep learning model inherits from the [keras](https://keras.io) model class [@keras2015]. The neural-symbolic model is implemented using the logic tensor network ([LTN](https://pypi.org/project/ltn/)) framework [@badreddine2022]. Pure neural-symbolic and hybrid models do not inherit directly from the [keras](https://keras.io) model class, but they emulate the behavior by providing custom `compile`, `fit`, `evaluate`, and `predict`methods, so that all model classes in `neermatch` have a uniform calling interface.
2028

2129
## Auxiliary Features
2230
In addition, the package offers explainability functionality customized for the needs of matching problems. The default explainability behavior is built on the information provided by the similarity map. From a global explainability aspect, the package can be used to calculate partial matching dependencies and accumulated local effects on similarities. From a local explainability aspect, the package can be used to calculate local interpretable model-agnostic matching explanations and Shapley matching values.
@@ -31,13 +39,28 @@ Implementing matching models using `neermatch` is a three-step process:
3139

3240
To train the model you need to provide three datasets. Two datasets should contain records representing the entities to be matched. By convention, the first dataset is called Left and the second dataset is called Right dataset in the package’s documentation. The third dataset should contain the ground truth labels for the matching entities. The ground truth dataset should have two columns, one for the index of the entity in the Left dataset and one for the index of the entity in the Right dataset.
3341

34-
``` python
42+
```{python}
43+
#| label: data-setup
44+
#| include: false
45+
46+
import os
47+
import sys
48+
sys.path.append("../../../")
49+
import test
50+
51+
def prepare_data():
52+
return test.left, test.right, test.matches
53+
```
54+
55+
```{python}
56+
#| label: usage
57+
3558
from neer_match.similarity_map import SimilarityMap
3659
from neer_match.matching_model import NSMatchingModel
3760
import tensorflow as tf
3861
3962
# 0) replace this with your own data preprocessing function
40-
from neer_match.examples import games
63+
left, right, matches = prepare_data()
4164
4265
# 1) customize according to the fields in your data
4366
smap = SimilarityMap(
@@ -55,25 +78,10 @@ model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
5578
5679
# 3) train
5780
model.fit(
58-
games.left,
59-
games.right,
60-
games.matches,
61-
epochs=100,
62-
batch_size=16,
63-
log_mod_n=10,
81+
left, right, matches,
82+
epochs=10, batch_size=16,
83+
log_mod_n=1,
6484
)
65-
#>>> | Epoch | BCE | Recall | Precision | F1 | Sat |
66-
#>>> | 0 | 5.2150 | 1.0000 | 0.3333 | 0.5000 | 0.7245 |
67-
#>>> | 10 | 6.9364 | 0.0000 | nan | nan | 0.7806 |
68-
#>>> | 20 | 9.4707 | 0.0000 | nan | nan | 0.7853 |
69-
#>>> | 30 | 8.9746 | 0.0000 | nan | nan | 0.7857 |
70-
#>>> | 40 | 1.9495 | 0.0000 | nan | nan | 0.8339 |
71-
#>>> | 50 | 0.7654 | 1.0000 | 0.8919 | 0.9429 | 0.8853 |
72-
#>>> | 60 | 0.3452 | 1.0000 | 0.9429 | 0.9706 | 0.9083 |
73-
#>>> | 70 | 1.2782 | 1.0000 | 0.8462 | 0.9167 | 0.8718 |
74-
#>>> | 80 | 0.6670 | 1.0000 | 0.9167 | 0.9565 | 0.9039 |
75-
#>>> | 90 | 0.8415 | 1.0000 | 0.9167 | 0.9565 | 0.9002 |
76-
#>>> Training finished at Epoch 99 with DL loss 0.9324 and Sat 0.9020
7785
```
7886

7987
# Installation
@@ -82,12 +90,12 @@ model.fit(
8290

8391
You can obtain the sources for the development version of `neermatch` from its github [repository](https://github.com/pi-kappa-devel/py-neer-match).
8492

85-
``` bash
93+
```
8694
git clone https://github.com/pi-kappa-devel/py-neer-match
8795
```
8896

8997
To build and install the package locally, from the project's root path, execute
90-
```bash
98+
```
9199
python -m build
92100
python -m pip install dist/$(basename `ls -Art dist | tail -n 1` -py3-none-any.whl).tar.gz
93101
```
@@ -99,7 +107,7 @@ Online documentation is available for the [release](https://py-neer-match.pikapp
99107
## Reproducing Documentation from Source
100108

101109
Make sure to build and install the package with the latest modifications before building the documentation. The documentation website is using [sphinx](https://www.sphinx-doc.org/). The build the documentation, from `<project-root>/docs`, execute
102-
```bash
110+
```
103111
make html
104112
```
105113

@@ -126,38 +134,4 @@ The package is distributed under the [MIT license](LICENSE.txt).
126134

127135
# References
128136

129-
<div id="refs" class="references csl-bib-body hanging-indent"
130-
entry-spacing="0">
131-
132-
<div id="ref-tensorflow2015" class="csl-entry">
133-
134-
Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
135-
Craig Citro, Greg S. Corrado, et al. 2015. “TensorFlow: Large-Scale
136-
Machine Learning on Heterogeneous Systems.”
137-
<https://www.tensorflow.org/>.
138-
139-
</div>
140-
141-
<div id="ref-badreddine2022logic" class="csl-entry">
142-
143-
Badreddine, Samy, Artur d’Avila Garcez, Luciano Serafini, and Michael
144-
Spranger. 2022. “Logic Tensor Networks.” *Artificial Intelligence* 303:
145-
103649. <https://doi.org/10.1016/j.artint.2021.103649>.
146-
147-
</div>
148-
149-
<div id="ref-keras2015" class="csl-entry">
150-
151-
Chollet, François et al. 2015. “Keras.” <https://keras.io>.
152-
153-
</div>
154-
155-
<div id="ref-karapanagiotis2023" class="csl-entry">
156-
157-
Karapanagiotis, Pantelis, and Marius Liebald. 2023. “Entity Matching
158-
with Similarity Encoding: A Supervised Learning Recommendation Framework
159-
for Linking (Big) Data.” <http://dx.doi.org/10.2139/ssrn.4541376>.
160-
161-
</div>
162137

163-
</div>

docs/source/_static/bibliography.bib

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
@article{badreddine2022,
2+
title = {Logic Tensor Networks},
3+
journal = {Artificial Intelligence},
4+
volume = 303,
5+
pages = 103649,
6+
year = 2022,
7+
issn = {0004-3702},
8+
doi = {10.1016/j.artint.2021.103649},
9+
author = {Samy Badreddine and Artur {d'Avila Garcez} and
10+
Luciano Serafini and Michael Spranger},
11+
keywords = {Neurosymbolic AI, Deep learning and reasoning,
12+
Many-valued logics}
13+
}
14+
15+
@misc{karapanagiotis2023,
16+
title = {Entity Matching with Similarity Encoding: A
17+
Supervised Learning Recommendation Framework for
18+
Linking (Big) Data},
19+
author = {Pantelis Karapanagiotis and Marius Liebald},
20+
year = 2023,
21+
url = {http://dx.doi.org/10.2139/ssrn.4541376},
22+
note = {SAFE Working Paper No. 398},
23+
}
24+
25+
@misc{keras2015,
26+
title = {Keras},
27+
author = {Chollet, Fran\c{c}ois and others},
28+
year = 2015,
29+
url = {https://keras.io},
30+
}
31+
32+
@misc{neermatch2024,
33+
title = {{NEural-symbolic Entity Reasoning and Matching
34+
(Python Neer Match)}},
35+
author = {Pantelis Karapanagiotis and Marius Liebald},
36+
year = 2024,
37+
url = {https://github.com/pi-kappa-devel/py-neer-match},
38+
}
39+
40+
@misc{pkgdown2024,
41+
title = {pkgdown: Make Static HTML Documentation for a
42+
Package},
43+
author = {Hadley Wickham and Jay Hesselberth and Maëlle Salmon
44+
and Olivier Roy and Salim Brüggemann},
45+
year = 2024,
46+
note = {R package version 2.1.1,
47+
https://github.com/r-lib/pkgdown},
48+
url = {https://pkgdown.r-lib.org/},
49+
}
50+
51+
@misc{rapidfuzz2021,
52+
author = {Max Bachmann},
53+
title = {maxbachmann/RapidFuzz: Release 1.8.0},
54+
month = oct,
55+
year = 2021,
56+
publisher = {Zenodo},
57+
version = {v1.8.0},
58+
doi = {10.5281/zenodo.5584996},
59+
url = {https://doi.org/10.5281/zenodo.5584996}
60+
}
61+
62+
@misc{roxygen22024,
63+
title = {roxygen2: In-Line Documentation for R},
64+
author = {Hadley Wickham and Peter Danenberg and Gábor Csárdi
65+
and Manuel Eugster},
66+
year = 2024,
67+
note = {R package version 7.3.2,
68+
https://github.com/r-lib/roxygen2},
69+
url = {https://roxygen2.r-lib.org/},
70+
}
71+
72+
@misc{tensorflow2015,
73+
title = { {TensorFlow}: Large-Scale Machine Learning on
74+
Heterogeneous Systems},
75+
url = {https://www.tensorflow.org/},
76+
note = {Software available from tensorflow.org},
77+
author = { Mart\'{i}n~Abadi and Ashish~Agarwal and Paul~Barham
78+
and Eugene~Brevdo and Zhifeng~Chen and Craig~Citro
79+
and Greg~S.~Corrado and Andy~Davis and Jeffrey~Dean
80+
and Matthieu~Devin and Sanjay~Ghemawat and
81+
Ian~Goodfellow and Andrew~Harp and Geoffrey~Irving
82+
and Michael~Isard and Yangqing Jia and
83+
Rafal~Jozefowicz and Lukasz~Kaiser and
84+
Manjunath~Kudlur and Josh~Levenberg and
85+
Dandelion~Man\'{e} and Rajat~Monga and Sherry~Moore
86+
and Derek~Murray and Chris~Olah and Mike~Schuster
87+
and Jonathon~Shlens and Benoit~Steiner and
88+
Ilya~Sutskever and Kunal~Talwar and Paul~Tucker and
89+
Vincent~Vanhoucke and Vijay~Vasudevan and
90+
Fernanda~Vi\'{e}gas and Oriol~Vinyals and
91+
Pete~Warden and Martin~Wattenberg and Martin~Wicke
92+
and Yuan~Yu and Xiaoqiang~Zheng},
93+
year = 2015,
94+
}

docs/source/_static/css/extra.css

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#main-content .caption {
2+
display: none;
3+
}
4+
5+
#main-content p {
6+
text-align: justify;
7+
}
8+
9+
#sidebar > li {
10+
display: none;
11+
}

0 commit comments

Comments
 (0)