Skip to main content

S2CLOUDLESS: get_cloud_masks is very slow on large bbox

  • April 26, 2024
  • 13 replies
  • 102 views

Hi !

I’m always working on my script for temporal analysis. But i’m stuck because of slowness of .get_cloud_masks command on large area.

I use BboxSplitter to split my very big area and to have bbox under 5000 pixel height or large:

largeur = int(np.ceil((Xmax_utm - Xmin_utm)/10)) # Calcul de la largeur de l'image
hauteur = int(np.ceil((Ymax_utm - Ymin_utm)/10)) # Calcul de la hauteur de l'image
print('\nArea pixel size: {0} x {1}'.format(largeur,hauteur))

>>>Area pixel size: 9968 x 7245

if largeur > 5000 or hauteur > 5000: # Si la largeur ou la hauteur depasse 5000 pixels
    if largeur > 5000:
        L = int(np.ceil(largeur/5000))
        print('%s cells wide' % (L))
    else:
        L = 1
    if hauteur > 5000:
        H = int(np.ceil(hauteur/5000))
        print('%s cells high' % (H))
    else:
        H = 1

>>>2 cells wide
>>>2 cells high

Here is an illustration:

I’m testing it on only 3 dates and it’s already long, so I can’t imagine on 3 years…
Do you have an idea to accelerate it ?

13 replies

  • 4852 posts replied to
  • April 26, 2024

Hi,

we usually run cloud detection on lower resolution. We found out that running cloud detection at 160 m x 160 m resolution gives good results. Of course the post-processing parameters need to be adjusted accordingly. We usually set them to average_over=2 and dilation_size=1. If you do this you should observe speed up for factor 256.


  • Author
  • 3963 posts replied to
  • April 26, 2024

Ok maybe it’s a good option. But I’m a little confused because I want cloud percentage on agricole blocks so 160m resolution could be to low… That’s why until now I used 10m resolution.
I’m still going to try this solution.


  • 4852 posts replied to
  • April 26, 2024

Cloud detection is anyway a somehow statistical exercise and it is not “up to pixel accurate”. Perhaps worth exploring several options, e.g. 20m, 40m, 80m, 160m, to see, which one will produce best “price/performance” result. E.g. 20m will be 4 times as fast, 40m 16 times as fast…


  • Author
  • 3963 posts replied to
  • April 26, 2024

Thanks for all these informations. I’ll try to find the best option.


  • 4852 posts replied to
  • April 26, 2024

s2cloudless uses a pretrained random forest model for cloud classification. In the background all this is handled by lightgbm package, which is highly optimized for performance (speed and memory). By default it uses all processing cores available on your computer and could ever work on GPU.

Therefore one solution to improve speed performance would be to run your code on a machine with more processors.


  • Author
  • 3963 posts replied to
  • April 26, 2024

Yes, it’ll run on jupyter hub on a google server. So we can modulate CPU power and number of core. I have to see that with my dev team. But does s2cloudless manage multi-threads ?


  • 4852 posts replied to
  • April 26, 2024

Yes, in s2cloudless multiple processors and multiple threads are always used. That is because lightgbm works that way by default.


  • 4852 posts replied to
  • April 26, 2024

When running cloud detection at lower resolution don’t forget to adjust the post-processing parameters (average_over and dilation_size). At 10m resolution the values that work best are 22 and 11, respectively.

The recommended values are roughly

Resolution [m] average_over dilation_size
10 22 11
20 11 6
40 6 3
80 3 2
160 2 1

  • Author
  • 3963 posts replied to
  • April 26, 2024

For my use case, I need to run this on resolution 1m, can you recommend any values for that


  • 4852 posts replied to
  • April 26, 2024

Why would you run it on 1 meter, if resolution of Sentinel data is 10m? You will not get any better results yet you will use 100 times more compute resources…
(or are you using some other datasource?)


  • Author
  • 3963 posts replied to
  • April 26, 2024

At 10m the images are way too pixelated…
I am making sentinel hub wcs request with resx and resy as 1m and am retrieving all bands data.
Part of this activity is cloud masking.
The bounding box will always be smaller than zoom level 14 tile.


  • 4852 posts replied to
  • April 26, 2024

They might be pixelated, but these are original data. When making requests with resx/resy=1m, you get 10m resolution interpolated to 1m. This is useful in many ways, but you should not assume that actual resolution is 1m…


  • Author
  • 3963 posts replied to
  • April 26, 2024

Okay… Thanks… Will keep that in mind…

Is there any way of converting res 10m image to 1m image afterwards because I need to to display the True color imagery