Description
Describe the bug
Using WaybackMachineCDXServerAPI.newest
does not return the last snapshot, but some recent snapshot. For example for https://openlayers.org/
, it returns a snapshot from 2022-06-16 17:20:36
, the latest snapshot (as of today, September 10th 2022) is from 2022-09-10 08:05:37
. There are around 380 snapshots between these two.
I've debugged this a bit and it seems there is an issue either with how sort
or limit
are configured, or interpreted by the CDX server. The method sets sort = 'closest'
and limit = 1
. If I configure the WaybackMachineCDXServerAPI
instance manually and set to limit = -1
instead, then I actually get the latest snapshot. #155 (comment) hints that limit = -1
should be used for the latest snapshot.
To Reproduce
url = 'https://openlayers.org/'
cdx_api = waybackpy.WaybackMachineCDXServerAPI(url)
newest_snapshot = cdx_api.newest()
print(newest_snapshot.datetime_timestamp)
# prints 2022-06-16 17:20:36, should be 2022-09-10 08:05:37
Workaround
url = 'https://openlayers.org/'
unix_timestamp = int(time.time())
timestamp = waybackpy.utils.unix_timestamp_to_wayback_timestamp(unix_timestamp)
cdx_api = waybackpy.WaybackMachineCDXServerAPI(url)
cdx_api.closest = timestamp
cdx_api.sort = 'closest'
cdx_api.limit = -1
for item in cdx_api.snapshots():
print(item.datetime_timestamp)
break
Expected behavior
The newest API should return the newest snapshot.
Version:
- OS: macOS
- Version 3.0.6
- Is latest version? Yes