The best would be to split this into chunks of about a year worth of data. In principle such long time-series should work, but then it depends on the area (size of geometry).
Best,
Hi,Thanks for your quick reply!
The area of our polygons is usually a few ha, but may be up to several hundreds. Though I tested it with really small ones, less than 0.1 ha which timeout as well.
Is there any rule of thumb how to set the intervals (a year or shorter) perhaps depending on the polygon area when splitting into chunks?
All the best!
Hi,
The timeouts can happen because of too large amount of data that has to be read in order to fulfil your request. That can happen for several reasons:
- too many observations in the time range (could be due to too long time range, too dense time series /e.g. daily observations/, being on the intersection of several S-2 tiles, …)
- too many bands requested
- too large area requested
The last two would just “tip the scale” in case that the time range requested is already at the limit. So splitting to 1 year is a good practice in any case. The rest is unfortunately even for us at the moment a bit of a “try and see if it works”.
Following on the discussion above.
We implemented splitting the period into chunks of 180 days and tested it. For each of the chunks Statistical API is requested asynchronously.
Still, some of the requests timeout and compared to FIS (from which we want to migrate due to its deprecation) the requests are very slow. They take from 80 to more than one thousand seconds, while FIS request for the same data takes around 6 seconds.
We might split the period into even smaller chunks but then we would burn out quite many requests and also made much bigger traffic.
We are requesting data for agricultural fields usually around 10-20 ha, so large area should not be the reason for the slowness.
Does anyone have any thoughts on this?
Hi,
Could you provide some examples of the larger timed-out requests (you can send them directly to me via https://zerobin.net/ or similar) so we can investigate. We’re constantly trying to improve our services so any such reports are very helpful.
If you make the chunks smaller you’ll certainly get shorter and more deterministic execution times and the traffic (in your direction) wouldn’t increase that much.
Hi,
It seems to me that the problem arises when we send several (many) requests to Statistical API at a time.
When our users register a new polygon, we request cloud cover for all available dates since 2018 from Sentinel Hub.
It is a common case that users register tens of polygons at a time. Then our workers pick the polygons and process them in parallel, sending requests to Sentinel Hub to get the time-series.
As a single request for the whole time period tends to time out, we tried to split it into 180 day chunks, which leads to 9 requests. A single request for a 180 day period data takes usually a few seconds. All the 9 requests processed in a series then take about a minute. When trying to speed it up, we send all the 9 request in parallel. And here it seems that the Statistical API does not scale well. The first request returns fast, but the response time of remaining grows quickly and some may time out.
It helps, if we wait for a second before another request is sent, but this helps only in case a single polygon is processed at a time and I don’t think it is the right way of solving the issue. If there are 30 polygons registered by our users, we send 9*30 requests to Statistical API in a very short time.
Here are some examples of request timimgs. The polygon size is about 0.2 ha.
Requests sent in parallel:
(‘2018-01-01’, ‘2018-06-29’): start at 2022-04-03T08:57:18.147204Z
(‘2018-06-30’, ‘2018-12-26’): start at 2022-04-03T08:57:18.157782Z
(‘2020-12-16’, ‘2021-06-13’): start at 2022-04-03T08:57:18.160281Z
(‘2018-12-27’, ‘2019-06-24’): start at 2022-04-03T08:57:18.162932Z
(‘2020-06-19’, ‘2020-12-15’): start at 2022-04-03T08:57:18.164990Z
(‘2019-12-22’, ‘2020-06-18’): start at 2022-04-03T08:57:18.165364Z
(‘2019-06-25’, ‘2019-12-21’): start at 2022-04-03T08:57:18.166185Z
(‘2021-06-14’, ‘2021-12-10’): start at 2022-04-03T08:57:18.177563Z
(‘2021-12-11’, ‘2022-04-03’): start at 2022-04-03T08:57:18.180407Z
(‘2020-06-19’, ‘2020-12-15’): finished after 4 seconds
(‘2021-12-11’, ‘2022-04-03’): finished after 6 seconds
(‘2018-06-30’, ‘2018-12-26’): finished after 130 seconds
(‘2018-01-01’, ‘2018-06-29’): finished after 134 seconds
(‘2019-06-25’, ‘2019-12-21’): finished after 136 seconds
(‘2019-12-22’, ‘2020-06-18’): finished after 264 seconds
(‘2020-12-16’, ‘2021-06-13’): finished after 430 seconds
(‘2021-06-14’, ‘2021-12-10’): finished after 433 seconds
(‘2018-12-27’, ‘2019-06-24’): finished after 435 seconds
Sleeping 1 sec between requests:
(‘2018-01-01’, ‘2018-06-29’): start at 2022-04-03T09:09:08.377943Z
(‘2018-06-30’, ‘2018-12-26’): start at 2022-04-03T09:09:09.224317Z
(‘2018-12-27’, ‘2019-06-24’): start at 2022-04-03T09:09:10.222963Z
(‘2019-06-25’, ‘2019-12-21’): start at 2022-04-03T09:09:11.225285Z
(‘2019-12-22’, ‘2020-06-18’): start at 2022-04-03T09:09:12.225422Z
(‘2018-01-01’, ‘2018-06-29’): finished after 3 seconds
(‘2020-06-19’, ‘2020-12-15’): start at 2022-04-03T09:09:13.227838Z
(‘2020-12-16’, ‘2021-06-13’): start at 2022-04-03T09:09:14.229192Z
(‘2019-12-22’, ‘2020-06-18’): finished after 2 seconds
(‘2021-06-14’, ‘2021-12-10’): start at 2022-04-03T09:09:15.232453Z
(‘2021-12-11’, ‘2022-04-03’): start at 2022-04-03T09:09:16.235208Z
(‘2020-06-19’, ‘2020-12-15’): finished after 3 seconds
(‘2020-12-16’, ‘2021-06-13’): finished after 2 seconds
(‘2021-06-14’, ‘2021-12-10’): finished after 2 seconds
(‘2021-12-11’, ‘2022-04-03’): finished after 2 seconds
(‘2018-06-30’, ‘2018-12-26’): finished after 10 seconds
(‘2019-06-25’, ‘2019-12-21’): finished after 9 seconds
(‘2018-12-27’, ‘2019-06-24’): finished after 13 seconds
If you need more info, I will be glad to provide it. Thank you!
Could you please provide the UID of the account you were/are using for the stat requests via direct message so we can investigate things properly.
Please let me know if the UID reached you. I did not find out how to send a direct message here I have a vitsyrovat account at gmail.
I did some investigation. What you were experiencing is a known usability issue that happens when we recieve bursts of requests from multiple users (at the same time) in periods when the whole api is overall very idle.Since the infrastructure needs to scale some orders of magnitude (relatively) there’s a delay that the end user experiences. We already have plans in our roadmap to mitigate this issues - improve the overall user experience.
Thank you for your investigation. When approximately can we expect the better experience?