Hello,
I am working on a workflow that pulls every S1 images of a certain type (Specifically, IW, GRD, VV) in a certain region, in a given timeframe.
The current system queries the API through Scihub (https://scihub.copernicus.eu/dhus
) or APIHub (https://apihub.copernicus.eu/apihub
), gets all of the matching product names, and downloads them from AWS.
Problem is, there seem to be duplicates. Same capture date, same content, different ID. For the sake of examples, I’ll talk specifically about the capture S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E
, but others are affected.
In this case, the duplicates are:
S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_083F
- with ID
4ed64f35-b0ba-4a15-83a1-d7e9e3e5b5d2
- with ID
S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_5146
- with ID
0bbde3de-099c-498b-bf6b-7e9e7e9bebcb
- with ID
S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_BA14
- with ID
ffac2db7-89ac-440a-8a7c-50ced39bccd0
- with ID
I’ve read this other thread that concludes the duplicates are NRT products, but it doesn’t exactly match my symptoms. In particular:
- Months old duplicates are still listed in apihub
https://apihub.copernicus.eu/apihub/search?format=json&q=(filename:*1SDV_20220312T110121*)
- From my understanding, non-NRT are supposed to replace NRT within 24 hours in the API (while staying in AWS). But those duplicates have an ingestion date 4 days after the capture.
- 2022-03-12T12:15:12.011Z (1-2 hours after capture date
beginPosition
) - 2022-03-16T00:57:24.183Z
- 2022-03-16T14:45:34.97Z
- 2022-03-12T12:15:12.011Z (1-2 hours after capture date
- On the graphical (
https://scihub.copernicus.eu/dhus
) interface, the two “newer” versions are flagged as “offline”, available only for async access, while the original product is available for synchronous access.
All three of them have, in their metadata, “Status: ARCHIVED” and “timeliness: Fast-24h”. It also doesn’t seem obvious that the first product is of “lesser quality”.
Is it intended? If yes, is there a way to know which products will have “improved” versions? While relatively trivial for older (2 months+) products, I don’t see anything clean for more recent captures.
Off the top of my head, the two solutions for my workflow are either
- Introduce a long delay (like 7 days) to ensure every product has had all of its versions released
- Ignore newer (and better?) releases of the same capture, if they eventually come out.
Thank you in advance for your help and insight.
(sorry for non-hypertext links, can’t post more than 2)