STAC, COG, Python and QGIS

  • blog

I’ve spent a lot of time in the past trying to find satellite data. What the SpatioTemporal Asset Catalog (STAC) does is to build a common specification for how Satellite images (read Assets, they can include other things like drone imagery) are queried and found online. Why should we care about this? Well, for a start, if all imagery is supplied as STAC assets then we can resuse our search queries for any collection of images. For a developer or end user, this makes life simpler. Learn more about the STAC and its spec here. As for a COG, the definition from the cogeo.org website covers all the main points:

‘A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need. ‘

https://www.cogeo.org/

Python is the most commonly used programming language for Geospatial and QGIS is the most commonly used open source GIS. There are other languages and softwares available.

For both parts I am drawing on the excellent resources on STAC, found below:

In this post I am going to step through how to search for a Sentinel 2 image with a geojson file (and a Shapefile in QGIS) and plot it using Python. We are lucky to have a great library called sat search that helps us with this. Then I’ll show how you can make some adjustments and get the code to work in QGIS as a script.

Installation

In your Python interpeter use the following command to install sat search:

pip install sat-search 

You will also need Geopandas, Matplotlib and Rasterio. If you need help getting started, I’ve written a whole series on Python for Geospatial workflows. Starting here.

Setup and test

Use the example from the resources above to check if sat-search is working:

from satsearch import Search
Search = Search(bbox=[-110, 39.5, -105, 40.5], url='https://earth-search.aws.element84.com/v0')
print('bbox search: %s items' % search.found())

This should return (at the time of writing):

bbox search: 20174 items 

As long as you get a count of items back you should be good to go. Hopefully you can see that it’s parsing a bounding box and a url containing the STAC API end point.

Building a custom search and plotting

The next step is to customise the search to what you are searching for. Let’s load a geojson file called map.geojson, get the minx, miny, maxx and maxy values that the bbox parameter needs (as shown above).

import geopandas as gpd
gdf = gpd.read_file('map.geojson')
bounds = gdf.bounds
print(bounds['minx'])
boundary = bounds.values.tolist()
print(boundary[0])

This will print:

0   -1.63147
Name: minx, dtype: float64
[-1.6314697265625, 50.523904629228625, -0.9558105468749999, 50.819818262156545]

The file I am using only has one feature, so be careful if you have multiple features, ie select the one you wish to use for the boundary. Let’s feed this boundary into the above search:

search = Search(bbox=boundary[0], url='https://earth-search.aws.element84.com/v0')
print('bbox search: %s items' % search.found())

This still returns way too many items. I can restrict the search to a date range (May 2020 – end of July 2020) using:

search = Search(bbox=boundary[0], datetime='2020-05-01/2020-07-30', url='https://earth-search.aws.element84.com/v0')
print('bbox search: %s items' % search.found())

At the time of writing I am getting back 393 items. Now, using code from the sat-search tutorial I can fetch these results, which will further enable me to improve my search.

items = search.items()
print('%s items' % len(items))
print('%s collections' % len(items._collections))
print(items._collections)

This prints me:

393 items
3 collections
[sentinel-s2-l1c, sentinel-s2-l2a-cogs, sentinel-s2-l2a]

This is super good news because I am after the cog collection. Now I can go back and rerun my query to just get these items. My aim here is to get a link to an image and plot it up (and later load into QGIS) without having to download the image. Making the adjustment to my query:

search = Search(bbox=boundary[0], datetime='2020-05-01/2020-07-30', collections=['sentinel-s2-l2a-cogs'], url='https://earth-search.aws.element84.com/v0')
print('bbox search: %s items' % search.found())

This prints me:

bbox search: 131 items 

That is still a lot of images (each one has several bands, remember!) You may wish to tune your searching further based on narrower date ranges or % cloud cover (this would be a fine idea). To print these items:

items = search.items() print(items.summary(['date', 'id', 'eo:cloud_cover'])) 

Which gives me:

Items (131):
date                      id                        eo:cloud_cover            
2020-07-30                S2B_30UWA_20200730_0_L2A  0                         
2020-07-30                S2B_30UXA_20200730_0_L2A  0                         
2020-07-30                S2B_30UWB_20200730_0_L2A  5.78                      
2020-07-30                S2B_30UXB_20200730_0_L2A  3.52                      
2020-07-28                S2A_30UWA_20200728_0_L2A  5.45                      
2020-07-28                S2A_30UWB_20200728_0_L2A  56.82                     
2020-07-28                S2A_30UXB_20200728_0_L2A  70.1 
etc etc...

If we select the first asset – conviently it has 0 cloud cover. Again, this is the new thing about STAC items, you can expect the above print out to have the same structure for any Catalog. Back to the first item:

print(type(items[0]))
print(items[0])
<class 'satstac.item.Item'>
S2B_30UWA_20200730_0_L2A

What does this item contain?

print(items[0].assets)
{'thumbnail': {'title': 'Thumbnail', 'type': 'image/png', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/30/U/WA/2020/7/30/0/preview.jpg'}, 'overview': {'title': 'True color image', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/L2A_PVI.tif', 'proj:shape': [343, 343], 'proj:transform': [320, 0, 499980, 0, -320, 5600040, 0, 0, 1]}, 'info': {'title': 'Original JSON metadata', 'type': 'application/json', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l2a/tiles/30/U/WA/2020/7/30/0/tileInfo.json'}, 'metadata': {'title': 'Original XML metadata', 'type': 'application/xml', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l2a/tiles/30/U/WA/2020/7/30/0/metadata.xml'}, 'visual': {'title': 'True color image', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/TCI.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'B01': {'title': 'Band 1 (coastal)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B01.tif', 'proj:shape': [1830, 1830], 'proj:transform': [60, 0, 499980, 0, -60, 5600040, 0, 0, 1]}, 'B02': {'title': 'Band 2 (blue)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B02.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'B03': {'title': 'Band 3 (green)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B03.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'B04': {'title': 'Band 4 (red)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B04.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'B05': {'title': 'Band 5', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B05.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'B06': {'title': 'Band 6', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B06.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'B07': {'title': 'Band 7', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B07.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'B08': {'title': 'Band 8 (nir)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B08.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'B8A': {'title': 'Band 8A', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B8A.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'B09': {'title': 'Band 9', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B09.tif', 'proj:shape': [1830, 1830], 'proj:transform': [60, 0, 499980, 0, -60, 5600040, 0, 0, 1]}, 'B11': {'title': 'Band 11 (swir16)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B11.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'B12': {'title': 'Band 12 (swir22)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B12.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}, 'AOT': {'title': 'Aerosol Optical Thickness (AOT)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/AOT.tif', 'proj:shape': [1830, 1830], 'proj:transform': [60, 0, 499980, 0, -60, 5600040, 0, 0, 1]}, 'WVP': {'title': 'Water Vapour (WVP)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/WVP.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}, 'SCL': {'title': 'Scene Classification Map (SCL)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/SCL.tif', 'proj:shape': [5490, 5490], 'proj:transform': [20, 0, 499980, 0, -20, 5600040, 0, 0, 1]}}

Quite a bit of information there. I’ve printed it all so you can see a common pattern (remember STAC is a standard). Let’s have a look at just one band:

print(items[0].asset('red'))
{'title': 'Band 4 (red)', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B04.tif', 'proj:shape': [10980, 10980], 'proj:transform': [10, 0, 499980, 0, -10, 5600040, 0, 0, 1]}

Ok great, just need that href link now and assign it to a variable (in this case called file_url):

print(items[0].asset(‘red’)[‘href’])
file_url = items[0].asset(‘red’)[‘href’]

https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/U/WA/2020/7/S2B_30UWA_20200730_0_L2A/B04.tif

This is pretty much all we need. To quickly plot this data in the lightest way let’s plot the thumnail in the code. In this last piece of code I am using code from the excellent automaing-gis-processes https://automating-gis-processes.github.io/CSC/notebooks/L5/read-cogs.html (go and check out the amazing resources).

%matplotlib inline
import rasterio
import matplotlib.pyplot as plt
with rasterio.open(file_url) as src:
   # List of overviews from biggest to smallest
   oviews = src.overviews(1)
   # Retrieve the smallest thumbnail
   oview = oviews[-1]
   # NOTE this is using a 'decimated read' (http://rasterio.readthedocs.io/en/latest/topics/resampling.html)
   thumbnail = src.read(1, out_shape=(1, int(src.height // oview), int(src.width // oview)))
plt.imshow(thumbnail)
plt.colorbar()

Using sat search in QGIS and plotting

One nice thing would be to load data directly into QGIS, where we can combine it with other data and perform desktop GIS analysis. We can also do this of course with Python.

The biggest challenge we face here is perhaps to get the library working in the Python interpreter in QGIS. Look what happens by default (QGIS 3.10, which at the time of writing was the LTR)

No module!

In Windows (I don’t use other OS so this will be different), assuming you have installed QGIS via OSGeo4W open a prompt and type:

py3_env

and then:

pip install sat-search

Restart QGIS and try importing satsearch in the console:

Success!

STAC, COG, Python and QGIS

So here we are. I am going to combine STAC, COG, Python and QGIS in a few lines of code – feel free to re-use this (its here). This time I am loading in a shapefile containing my boundary (in WGS 84) and making that the active layer. Then running the following code:

from satsearch import Search
layer = iface.activeLayer()
feats = layer.getFeatures()
for feat in feats:
   geom = feat.geometry().boundingBox()
bounds = [geom.xMinimum(), geom.yMinimum(), geom.xMaximum(), geom.yMaximum()]
search = Search(bbox=bounds, datetime='2020-05-01/2020-07-30', collections=['sentinel-s2-l2a-cogs'], url='https://earth-search.aws.element84.com/v0')
items = search.items()
file_url = items[0].asset('red')['href']
rlayer = iface.addRasterLayer(file_url, 'S2_cog')

In the first line I am importing satsearch. Then I am using pyqgis to get the boundary layer in the table of contents (which is my active layer), I am getting the features and looping over them to get the geometrey. I only have one feature in my shapefile. If you have more you will need to make adjustments to the code to get the boundary you need. I am then building a list of minx, miny, maxx and maxy and calling it bounds. Then reusing the code from above to search and get the url. Finally I create a new variable called rlayer (raster layer!) and using the addRaster method I add the cog url and assign the layer the name S2_cog. Which gives me:

COG and QGIS!

Great stuff, only that it doesn’t really cover the area I am interested in, so I need to go back and tune my search a little, adjust the boundary slightly or select a different tile, which in this case would be the one with the filename UXB (rather than UXA). I happen to know that that the a cloud free image is the 53rd item so adjusting this line:

file_url = items[52].asset('red')['href']

Gives me this (zoomed and shaded):

COG and QGIS

Nothing of course stops you from tuning up your search better, filtering for cloud cover etc. It’s also very much possible to write queries that will create you .vrt of the exact data that you want and load these into QGIS as needed. A vrt will store the url for the COG.

This post is aimed at being a very simple introduction. I’ve downloaded no imagery here – although it should be noted that you can use sat search to query different non cog catalogs and download imagery.

All the code for this blog post is available here. I hope this has been of use.

Image credit https://unsplash.com/photos/afW1hht0NSs

I am a freelancer able to help you with your projects. I offer consultancy, training and writing. I’d be delighted to hear from you.

Feel free to connect or follow me; I am always keen to talk about Earth Observation.

I am @map_andrew on twitter