-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Querying the index brings back a status, timestamp, url triple, e.g.:
$ cdxt --cc --crawl CC-MAIN-2025-43 iter 'commoncrawl.org/get-started'
status 200, timestamp 20251014220259, url https://www.commoncrawl.org/get-started
status 200, timestamp 20251016192109, url https://commoncrawl.org/get-started
It would be good to have direct method to bring back a particular record based on the timestamp alone. I'm aware you can do something like cdxt --cc --crawl CC-MAIN-2025-43 --from 20251016192109 --limit 1 warc 'commoncrawl.org/get-started' but a direct --timestamp flag or similar would be useful, given the presentation of the index records.
Metadata
Metadata
Assignees
Labels
No labels