Login-Checker-Assignment

A small benchmark suite for login-checking data structures (linear search, binary search, hash set, Bloom filter, Cuckoo filter).

This repository includes scripts to generate synthetic username datasets and to benchmark lookup time and memory/space usage across several structures.

Requirements

Python 3.8+ (this project was developed with Poetry)
Poetry (preferred) — or use your system Python and pip to install the dependencies in pyproject.toml.

Install dependencies with Poetry:

poetry install

Or install with pip into a virtual environment:

python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r <(poetry export -f requirements.txt --without-hashes)

Generate datasets

Datasets are created by data/generate_dataset.py and saved under data/datasets by default.

Usage example (creates 100 usernames):

python data/generate_dataset.py --n 100 --out data/datasets --seed 42

Using Poetry (recommended):

poetry run python data/generate_dataset.py --n 100 --out data/datasets --seed 42

You can generate larger datasets by changing --n. Datasets have not been pushed for size reasons.

Run benchmarks

Benchmarks are executed by the bench.run_bench module. The following example runs lookups and measures space for several structures and dataset sizes. This is the exact command used for large-scale experiments:

poetry run python -m bench.run_bench --dataset data/datasets/logins_n10000000.txt --structure linear,binary,hash,bloom,cuckoo --n 100,1000,10000,100000,1000000,10000000 --runs 3 --out results/compare_various_n_lookup_space_10e7.json --seed 42 --measures lookup,space

Flags explained:

--dataset: path to a newline-separated file with usernames (one per line).
--structure: comma-separated list of structures to benchmark. Supported: linear, binary, hash, bloom, cuckoo.
--n: comma-separated list of numbers of items to use from the dataset for each experiment.
--runs: how many repetitions to average per experiment.
--out: output JSON file where results will be written.
--seed: random seed for reproducibility.
--measures: comma-separated list of measures to collect (e.g., lookup, space).

Results are written to the specified JSON file and plots can be generated from these results using the visualization helpers in visualization/plot.py.

Tests

Run tests with Poetry:

poetry run pytest -q

Notes

If you need only a subset of records from a large dataset, use the --n option to limit the number of items used per experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
bench		bench
data		data
results		results
structures		structures
tests		tests
visualization		visualization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Login-Checker-Assignment

Requirements

Generate datasets

Run benchmarks

Tests

Notes

About

Uh oh!

Languages

License

iamshahd/Login-Checker

Folders and files

Latest commit

History

Repository files navigation

Login-Checker-Assignment

Requirements

Generate datasets

Run benchmarks

Tests

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages