Skip to content

hmdb_endogenous_animal.py v1.1

Latest
Compare
Choose a tag to compare
@SidSin0809 SidSin0809 released this 25 May 08:26
· 3 commits to main since this release
52b8824

Why v1.1?

The original "v1.0" waited until all HMDB IDs were extracted
before starting the network crawl. On files ≫1 GB this looked like it
"hung" and wasted memory. v1.1 switches to a streaming, producer‑
consumer design
: IDs are extracted from XML and fed to worker threads
on the fly, so crawling begins immediately and RAM usage stays flat.

Key Updates

  • Zero‑memory blow‑up: we never store more than the executor queue
    size (≈ workers*2) IDs at once.
  • Visible progress from second 1: both XML parsing and crawl speeds
    are shown via tqdm (falls back to textual counters if tqdm missing).
  • Auto‑resume: identical --resume semantics, but now we also
    create a .partial checkpoint every 5 s to guard against abrupt
    power failures.
  • Py≥3.7 compatible (dropped the 3.8‑only {*}tag XML shortcut).
  • Graceful shutdown: Ctrl‑C or SIGTERM stops creating new tasks, but
    in‑flight requests finish and the partial TSV is flushed.