Skip to content

WaybackMachineCDXServerAPI.newest does not return latest snapshot #176

Open
@sissbruecker

Description

@sissbruecker

Describe the bug

Using WaybackMachineCDXServerAPI.newest does not return the last snapshot, but some recent snapshot. For example for https://openlayers.org/, it returns a snapshot from 2022-06-16 17:20:36, the latest snapshot (as of today, September 10th 2022) is from 2022-09-10 08:05:37. There are around 380 snapshots between these two.

I've debugged this a bit and it seems there is an issue either with how sort or limit are configured, or interpreted by the CDX server. The method sets sort = 'closest' and limit = 1. If I configure the WaybackMachineCDXServerAPI instance manually and set to limit = -1 instead, then I actually get the latest snapshot. #155 (comment) hints that limit = -1 should be used for the latest snapshot.

To Reproduce

url = 'https://openlayers.org/'
cdx_api = waybackpy.WaybackMachineCDXServerAPI(url)
newest_snapshot = cdx_api.newest()
print(newest_snapshot.datetime_timestamp)
# prints 2022-06-16 17:20:36, should be 2022-09-10 08:05:37

Workaround

url = 'https://openlayers.org/'
unix_timestamp = int(time.time())
timestamp = waybackpy.utils.unix_timestamp_to_wayback_timestamp(unix_timestamp)
cdx_api = waybackpy.WaybackMachineCDXServerAPI(url)
cdx_api.closest = timestamp
cdx_api.sort = 'closest'
cdx_api.limit = -1

for item in cdx_api.snapshots():
    print(item.datetime_timestamp)
    break

Expected behavior
The newest API should return the newest snapshot.

Version:

  • OS: macOS
  • Version 3.0.6
  • Is latest version? Yes

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions