GitHub - BonnBytes/paper_crawler: Source code for crawling Machine Learning paper GitHub repository links from conference proceedings.

More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Source code for our position paper on software engineering in machine learning. An example template repository for most concepts discussed in the paper is available here.

Getting Started

First of all, we have to clone this repository,

git clone git@github.com:BonnBytes/position_we_need_more_tests_in_ml.git

In the next step, you need to configure an environment to use the code in this project. To do that, create a .env-file with the following content.

PYTHONPATH=.
OPENREVIEW_USERNAME=YOUR_OPENREVIEW_ACCOUNT_NAME
OPENREVIEW_PASSWORD=YOUR_PASSWORD

This crawler utilizes the Selenium package, which in turn requires an installed version of the Chrome browser.

Reusability

After cloning and navigating into this repository, you can install the code in this repository via pip.

pip install .

Reproduction

To aggregate the statistical data we used for the paper, run the command below.

./run_all.sh

Run the tests

Set up a dotenv with your OpenReview account credentials. Make sure you set the OPENREVIEW_USERNAME and OPENREVIEW_PASSWORD variables are set correctly. To run the tests, type

nox -s test

into the console.

Funding

The Bundesministerium für Bildung und Forschung (BMBF) supported research through its "BNTrAInee" (16DHBK1022) and "WestAI" (01IS22094A) projects. The sole responsibility for the content of the paper and this corresponding code lies with the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.github/workflows		.github/workflows
scripts		scripts
src/paper_crawler		src/paper_crawler
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENCE		LICENCE
README.md		README.md
noxfile.py		noxfile.py
pylock.toml		pylock.toml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Getting Started

Reusability

Reproduction

Run the tests

Funding

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

BonnBytes/paper_crawler

Folders and files

Latest commit

History

Repository files navigation

More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Getting Started

Reusability

Reproduction

Run the tests

Funding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages