hmdb_endogenous_animal

Fault-tolerant streaming crawler for HMDB Endogenous/Animal flags

hmdb_endogenous_animal is a single-file, fault-tolerant Python tool that streams the ~1-GB HMDB XML, extracts each on the fly, then concurrently queries every MetaboCard to see whether the compound is tagged “Endogenous → Animal,” writing a tidy TSV you can drop straight into R or Python. The design keeps memory in the low-MB range, recovers cleanly from network hiccups, and can resume mid-run.

Key Features 1] Uses xml.etree.ElementTree.iterparse so the HMDB dump is never fully loaded into RAM — ideal for big files on modest workstations. 2] A dedicated parser thread feeds a queue.Queue; a configurable swarm of ThreadPoolExecutor workers processes URLs in parallel, leveraging Python’s standard library concurrency primitives. 3] All requests are wrapped in a retry/back-off helper (built on requests + HTTPAdapter) to survive flaky institutional proxies or HMDB throttling. 4] Detects tqdm; if available, prints a dynamic progress bar, else falls back to periodic console updates. 5] --resume scans the existing TSV every 5 000 rows and skips already-done IDs, so power cuts or ^C won’t waste hours of crawling.

Quick Start

1. Grab the code

git clone https://github.com/SidSin0809/hmdb_endogenous_animal.git

2. Install minimal deps

python -m pip install -r requirements.txt # requests tqdm

3. Run

python hmdb_endogenous_animal.py
hmdb_metabolites.xml
--workers 8
--out endogenous_animal.tsv
--resume

Flag	Default	Description
`--out PATH`	`endogenous_animal.tsv`	Output file (TSV, 2 cols: ID & flag).
`--workers N`	`8`	Thread pool size. Tune for your bandwidth/CPU.
`--cutoff SECS`	`10`	Per-request timeout.
`--max-retries N`	`5`	Total retries before declaring a URL dead.
`--resume`	off	Skip IDs already in the output file.

Citing This Tool Singh S. hmdb_endogenous_animal (v1.x) [Software]. GitHub; 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
hmdb_endogenous_animal.py		hmdb_endogenous_animal.py
requirements.txt		requirements.txt
sweat_metabolites.zip		sweat_metabolites.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hmdb_endogenous_animal

1. Grab the code

2. Install minimal deps

3. Run

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

SidSin0809/hmdb_endogenous_animal

Folders and files

Latest commit

History

Repository files navigation

hmdb_endogenous_animal

1. Grab the code

2. Install minimal deps

3. Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages