Remove WayBack Machine bits from downloaded files

I'm just about done downloading a defunct website from the Wayback Machine. ``waybackpy`` has been quite helpful. The website is/was a small reference site. We think the owner passed away, so I'm trying to reconstruct it.

Initially, I thought the HTML files had a bit of JS in the header and a footer on the file identifying the times and dates. Looking deeper, it seems there is quite a few more strands of Wayback Machine bits embedded in the files. (Even some "JPG" files are actually bits of WM Javascript.) I'm finessing any copyright issues for the moment (searching the Internet Archive doesn't lead to much, mostly about copyright on movies and books).

Are any tools available for cleaning up the downloaded files? Note that I don't expect ``waybackpy`` to be modified to perform this function. I've come up empty in my search though, so I thought maybe people here might have some pointers.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove WayBack Machine bits from downloaded files #181

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Remove WayBack Machine bits from downloaded files #181

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions