Hi,
there is technically no limit (yet), but our system did flag your requests as faulty as they were super small - each less than 1 PU of the size. This is not what Batch was designed for, so it temporarily halted the activity.
I have sent you an e-mail about 30 minutes ago about it.
Could you explain, what you are trying to do, so that we understand the use-case? Batch processing was designed for large scale orders. These requests seem to be such that could be easily handled with process API.
Best
Hi, thanks for a quick reply.
I am trying to get all S1 data for specific AOIs for the 2014-2021 time period. Do you think the AOI are small? Because I did not have similar issues in the past with similar types of AOIs. They are in the range of 60 sq. km to 150 sq.km. We input a csv file with the list of requests and we form these requests and send them over.
I have written a whole pipeline using batch request and having to change it to process API would be kind of painful right now unless you can suggest a quick way
Such request size woudl be processed with processAPI in cca 5 seconds and you would get results much faster.
We have now re-enabled your account, hopefully it does not get flagged once again.
That said, such use of Batch is not recommended and will almost certainly be prevented in the future. Batch was optimized for large-scale requests (thousands, hundreds of thousands,millions of sq. km) and consumes less compute resources and we can pass these savings to the user.
For such small requests, however, Batch is consuming about 10-times as much resources as the normal API.
Hi,
Thanks! We were originally going to start with process API and when we asked in this thread, you suggested we use batch processing API
Hi @chinmay, the “area being too large” can be easily solved by splitting request into smaller parts, sentinelhub-py SDK has a helper function for that (large area utilities). That being said, the use-case you describe seems perfect for Batch processing - there you set the configuration for processing (probably just outputting data) and the AWS S3 bucket, where you would like to have the data stored to (needs to be configured properly) and voila, you will have the data there. Batch processing …
Did something change? Also, I couldn’t find an example for process API to store data on an S3 bucket. Can you please share it?
I guess it is about “large area”. Max size for processAPI is 2500x2500px, which means 625km2 at 10m resolution. 60-150km2 is below this threshold and could therefore easily be handled by processAPI.
If using Batch, you could/should request data for full temporal time stack in the chosen period, i.e. for 3 months at the time. This would reduce number of orders by an order of magnitude and make the whole process optimized.
Find below the sample Python code to request data for all temporal periods (it is for S2 L2A, but you can use it for S1).
from oauthlib.oauth2 import BackendApplicationClient
from requests_oauthlib import OAuth2Session
import geopandas as gpd
import time
# def authentication(client_id, client_secret):
# client = BackendApplicationClient(client_id=client_id)
# oauth = OAuth2Session(client=client)
# token = oauth.fetch_token(token_url='https://services.sentinel-hub.com/oauth/token',
# client_id=client_id, client_secret=client_secret)
# return oauth, token
# def check_status(batch_request_id, client_id, client_secret):
# oauth, token = authentication(client_id, client_secret)
# status = oauth.request("GET", f"https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}").json()o'status']
# return status
def run_batch_requests(client_id, client_secret, aoi, crs, start, end, grid_id, grid_res, bucket_name, data = 'S2L2A', mosaicking_order = 'mostRecent',
max_cloud_coverage = 100, upsampling = 'NEAREST', downsampling = 'NEAREST', descr = 'default'):
#Authentication
client = BackendApplicationClient(client_id=client_id)
oauth = OAuth2Session(client=client)
token = oauth.fetch_token(token_url='https://services.sentinel-hub.com/oauth/token',
client_id=client_id, client_secret=client_secret)
# oauth, token = authentication(client_id, client_secret)
#Set evalscript
evalscript = """
//VERSION=3
function setup() {
return {
input: u{
bands: d"B02", "B03", "B04", "CLM"],
units: "DN"
}],
output: u
{
id: "B",
bands: 1,
sampleType: SampleType.UINT16
},
{
id: "G",
bands: 1,
sampleType: SampleType.UINT16
},
{
id: "R",
bands: 1,
sampleType: SampleType.UINT16
},
{
id: "cloud_mask",
bands: 1,
sampleType: SampleType.UINT8
}
],
mosaicking: "ORBIT"
}
}
function updateOutput(outputs, collection) {
Object.values(outputs).forEach((output) => {
output.bands = collection.scenes.length;
});
}
function evaluatePixel(samples) {
var n_observations = samples.length;
let band_b = new Array(n_observations).fill(0);
let band_g = new Array(n_observations).fill(0);
let band_r = new Array(n_observations).fill(0);
let band_clm = new Array(n_observations).fill(0);
samples.forEach((sample, index) => {
band_bnindex] = sample.B02;
band_gnindex] = sample.B03;
band_rnindex] = sample.B04;
band_clm_index] = sample.CLM;
});
return {
B: band_b,
G: band_g,
R: band_r,
cloud_mask: band_clm
};
}
"""
#Set payload
if type(aoi) == list:
payload = {
"processRequest": {
"input": {
"bounds": {
"bbox": aoi,
"properties": {
"crs": crs
}
},
"data": a
{
"type": data,
"dataFilter": {
"timeRange": {
"from": f"{start}T00:00:00Z",
"to": f"{end}T23:59:59Z"
},
"mosaickingOrder": mosaicking_order,
"maxCloudCoverage": max_cloud_coverage,
},
"processing": {
"upsampling": upsampling,
"downsampling": downsampling
}
}
]
},
"output": {
"responses": s
{
"identifier": "B",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "G",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "R",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "cloud_mask",
"format": {
"type": "image/tiff"
}
}
]
},
"evalscript": evalscript
},
"tilingGrid": {
"id": grid_id,
"resolution": grid_res
},
"bucketName": bucket_name,
"description": descr
}
elif type(aoi) == str:
read_fi = gpd.read_file(aoi)
if f"{type(read_fidread_fi.geometry.name]a0])}" == "<class 'shapely.geometry.polygon.Polygon'>":
coords = list(read_fidread_fi.geometry.name]a0].exterior.coords)
elif f"{type(read_fidread_fi.geometry.name]a0])}" == "<class 'shapely.geometry.multipolygon.MultiPolygon'>":
coords = list(point for polygon in read_fidread_fi.geometry.name]a0] for point in polygon.exterior.coords)
payload = {
"processRequest": {
"input": {
"bounds": {
"geometry": {
"type": "Polygon",
"coordinates": s
coords
]
},
"properties": {
"crs": crs
}
},
"data": a
{
"type": data,
"dataFilter": {
"timeRange": {
"from": f"{start}T00:00:00Z",
"to": f"{end}T23:59:59Z"
},
"mosaickingOrder": mosaicking_order,
"maxCloudCoverage": max_cloud_coverage,
},
"processing": {
"upsampling": upsampling,
"downsampling": downsampling
}
}
]
},
"output": {
"responses": s
{
"identifier": "B",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "G",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "R",
"format": {
"type": "image/tiff"
}
},
{
"identifier": "cloud_mask",
"format": {
"type": "image/tiff"
}
}
]
},
"evalscript": evalscript
},
"tilingGrid": {
"id": grid_id,
"resolution": grid_res
},
"bucketName": bucket_name,
"description": descr
}
else:
print("Error: aoi should be a list of bbox or a string of shapefile/geoJSON's pwd")
#Create requests
response = oauth.request("POST", "https://services.sentinel-hub.com/api/v1/batch/process",
headers={'Content-Type': 'application/json'}, json = payload)
print(response)
#Get requests id
batch_request_id = response.json()o'id']
#Start
response = oauth.request("POST", f"https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}/start")
#Status check
# status = check_status(batch_request_id)
status = oauth.request("GET", f"https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}").json()o'status']
count = 1
while status != "DONE":
if status == "FAILED":
print(f"Batch: {batch_request_id} is {status}.")
break
elif status == "PARTIAL":
if count < 4:
print(f"Batch: {batch_request_id} is {status}LY failed. Start re-processing all failed tiles.")
response = oauth.request("POST", "https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}/restartpartial")
count += 1
time.sleep(30)
status = oauth.request("GET", f"https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}").json()o'status']
else:
print(f"Stop re-processing Batch: {batch_request_id} after 3 tries.")
break
else:
print(f"Batch: {batch_request_id} is {status}.")
time.sleep(30)
status = oauth.request("GET", f"https://services.sentinel-hub.com/api/v1/batch/process/{batch_request_id}").json()o'status']
else:
print(f"Batch: {batch_request_id} is {status}.")
return
I do not have an example nearby to store to S3, but this one stores to “disk” and it should be straightforward to modify it to store to S3 instead.
Just adding that Batch is available with sentinelhub-py as well: have a look at the example.
Batch processing is currently accessible to Enterprise users only, so this might be the case…
If I just started a trial shouldn’t my account have access to all features? Perhaps a follow up question is whether or not the example should still be using batch processing or if there’s another process it should use.
Trial support almost all the features, with the exception of:
- Batch processing
- Querying and purchasing commercial data
We have examples of many of our services published on GitHub, this one requires Batch.