Skip to main content

Hello.


I’m trying to figure out the difference between L2A products downloaded from Planetary Computer service vs SentinelHub service.


What I do:

I’m trying to download MSI GeoTIFFs for the same polygon and the same date and gets totally different products.


I understand SH is “under the hood” using some processing + mosaic to provide the whole Area Of Interest without empty parts of the original product. The most concerning are different value ranges for both products. Planetary Computer in terms of values seems much more similar to data from Copernicus Hub which is quite raw.


Do you know if I can somehow be able to convert values for both data providers to get the same or even similar distribution / values range?


SH script is simple:


ALL = """
//VERSION=3
function setup() {
return {
input: {
bands: "B01","B02","B03","B04","B05","B06","B07","B08","B8A","B09","B11","B12", "SCL"],
units: "DN"
}],
output: {
bands: 13,
sampleType: "INT16"
}
};
}

function evaluatePixel(sample) {
return sample.B01,
sample.B02,
sample.B03,
sample.B04,
sample.B05,
sample.B06,
sample.B07,
sample.B08,
sample.B8A,
sample.B09,
sample.B1972,
sample.B12,
sample.SCL];
}
"""

Planetary Computer stackstac request isn’t involving any special treatment except NaN handling as well:


        data = (
stackstac.stack(
signed_items,
epsg=self.default_epsg, #4326
resolution=self.default_raster_resolution, #10
assets=assets, #MSI bands list
chunksize=self.default_chunksize,
bounds_latlon=bbox,
)
.where(lambda x: x > 0, other=np.nan)
.assign_coords(band=assets)
)

return data

Data preview for both products in QGIS:


Thanks for your time!

Hi @mz-fourpoint,

the biggest difference you are noticing probably comes due to harmonize value parameter, which was introduced a few months ago, with the new S2 L2A baseline, in order to help with consistency of long time-series. If you set this to false, results should be much more similar.

I see you are using EPSG:4326, which introduces another set of (small) differences, due to reprojection.


If you set the request correctly (UTM CRS, 10-meter resolution, harmonize = false) and perfectly align your request with UTM grid, you should be able to get exactly the same values as they are in Copernicus Hub.


Hi,


Thanks for your insight, an excellent suggestion.


I made a few more tests and was able to use harmonizeValue to generate SH request with your SH request builder.


        img_request = SentinelHubRequest(
data_folder=data_path,
evalscript=evalscript,
input_data=[
SentinelHubRequest.input_data(
data_collection=data_collection, time_interval=time_interval, mosaicking_order="leastCC",
other_args={"processing": {"harmonizeValues": False}}
)
],
responses=[SentinelHubRequest.output_response("default", MimeType.TIFF)],
bbox=object_bbox,
size=object_size,
config=self.configure_access_data(),
)

Values looks much more similar now (including Copernicus raw data tile for B02 as reference as well):


I have another question. Is there an easy way to transform raw(non-harmonic) S2A values into SH S2A harmonized values? I have a feeling it’s not very simple since harmonization process is likely taking place in both L1C and L2A during products creation chain as presented: https://scihub.copernicus.eu/news/News00931.


Thanks for your advice!


Not sure if I understand what you would like to do. Can you elaborate a bit on this?


That said, both L1C and L2A data (“raw”) are affected by same newly added offset (i.e. after baseline change). So what we do, if harmonizeValues=true, we remove that same offset, in order to make data comparable/consistent to what they were before the baseline change.

If you use harmonizeValues=false, you get data exactly as they are (i.e. we fetch them from original files and stream them to you). But, if you are interested in time-series, you will probably have an issue as the data within that specific time series will have a disruptive change.


Sure. I’m not a GIS expert, rather a Python/ML dev.


The main goal is to combine data from different providers like Sentinel Hub and feed it into DL models (for semantic segmentation, object detection etc.). We’re trying to harmonize data from a few providers to make them useful in a long run.


What is concerning is this approach? If something changes (like baseline in January 2022) data distribution can become somehow inconsistent with historical training data that we have been downloading in batches every week for a longer period of time. We’d like to have control to apply bigger changes manually on our side if product processing changes over time.


This is why I asked if we can fluently apply some transformations to be able to transition between “raw” and “harmonized” SH products that we already have. 🙂


Thank you for your time!


I see. In case of data coming from Sentinel Hub I see two options:


  • you fetch data without harmonization, i.e. original values and implement the offset calculation on your side (for the past and any forthcoming changes in the baseline), in order to have them “harmonized”

  • you fetch data with harmonizationValues=true, which will make the data harmonized compared to the old data in your archive (if old data contain original values, they should be the same regardless of whether you got them from Sentinel Hub, Copernicus Hub or Planetary computer); if you want to have control, you can still apply the transformation “backwards” (in the case of PB4, transformation can be dobe both ways).


Reply