Releases: bottomless-archive-project/document-location-database
Releases · bottomless-archive-project/document-location-database
2021 - July/August
This release is a collection of 240 million URLs with the following file extensions:
pdf, doc, docx, ppt, pptx, xls, xlsx, rtf, mobi, epub
The URL list was acquired by crawling Common Crawl's 2021 August and July dataset.
To merge and uncompress the files, use 7-Zip.