I wasn’t sure what to call this post… “The Fastest Way to Read a Satellite Image in Python”? “Using Magic Functions in Jupyter Notebooks”? “Four Ways to Read Images into NumPy Arrays”?
There are several ways; I’ve not even looked at PIL here, to read your Satellite data into a NumPy array. After all, if you are working with Python and Satellite imagery then Satellite Image = NumPy array.
Magic Functions
First off, let’s take a look at a very handy magic function in Jupyter Notebooks. Namely the timeit function. The timeit function will automatically perform multiple runs of your code and report the best of three results – useful for testing.
timeit% # test for 1 line of code
timeit%% # test for multiple lines of code
There are other magic functions within Jupyter Notebooks. Have a look – time well spent!
To give you an idea of speed when using a list comprehension, versus a for loop. For this process, list comprehension is significantly faster.
That makes a pretty convincing case for using list comprehension! What is the fastest way to read a satellite image into a NumPy array? Let’s test using a Landsat 8 image (Band 4) downloaded from AWS https://landsat-pds.s3.amazonaws.com/c1/L8/139/045/LC08_L1TP_139045_20170304_20170316_01_T1/index.html
4 ways to read satellite data into NumPy arrays
Let’s import all the Python libraries first and then look at the code. This is just a simple test on one .tif file. Perhaps other image formats will impact the speed? At the bottom of this post is a link to the notebook, if you fancy trying it out.
import numpy as np
## method 1 - opencv
import cv2
# Import GDAL method 2
from osgeo import gdal
## import method 3 - skimage
from skimage import io
## import method 4 - rasterio
import rasterio
image_path = "../LC81390452014295LGN00_B4.TIF" ### set the path to your image choice
First up, OpenCV
OpenCV is great at working with images. I’ve written a great deal about this in the past – have a look here for a full list. It is, however, a little overkill if you are just looking at reading an image into an array. I don’t think it would be my instinctive first choice.
Secondly, GDAL
GDAL is made for this kind of operation, plus it gives you access to all the spatial data associated to the image. It would be my preferred choice. But how does it fare for speed?
Well, its faster. 857 ms per loop vs 991 ms.
Thirdly, and perhaps a little unexpectedly, Skimage
The slowest so far. I am only just starting to use Skimage but for more complex image processing it looks like a winner.
Fourth, and finally, Rasterio
I haven’t used Rasterio very much in the past. I tend to use OSGeo4W and as far as I am aware it doesn’t ship with a version and try as I might I couldn’t get Rasterio to work with it. The solution in the end for me was to use a Virtual Python Environment, which to be fair is the recommended way to use it. A Virtual Python Environment will work nicely with Jupyter Notebooks.
The results
GDAL is, again, the winner. Rasterio is a close second followed by OpenCV and then finally Skimage. This is probably not a surprise and of course you should choose the right library for what YOU want to do. This was a test on a Satellite image – when I ran the same code on a picture taken on my phone Skimage was the fastest.
Review
- The timeit% and timeit%% commands in Jupyter notebooks are magic functions that run multiple tests on your code.
- There are many ways to read an image into a NumPy array.
- GDAL is the fastest when working with Satellite images.
This code is on my GitHub page.
I am a freelancer able to help you with your projects. I offer consultancy, training and writing. Iād be delighted to hear from you. Please check out the books I have written on QGIS 3.4
https://www.packtpub.com/application-development/learn-qgis-fourth-edition
https://www.packtpub.com/application-development/qgis-quick-start-guide
I have grouped all my previous blogs (technical stuff / tutorials / opinions / ideas) are here http://gis.acgeospatial.co.uk.