I’m writing a series of irregular short posts on some tips and tricks on using Geospatial Python. They may not be best practice, but I hope they are of help. I’ll try and give some context to when I’ve used the ‘tip’.
Open files once if possible and.. use a context manager
Something I’ve been guilty of the past is just getting some code from stackoverflow and brute forcing it into my problem to create ‘my solution’. Sometimes this works, but often frankly it is just blind hope. If you don’t understand what your code is doing then you are hurting your future self.
Opening the same file multiple times is often unnecessary and should be avoided, it will slow down your code. If you don’t .close your file after your open method then you are also heading for future problems. This second problem is solved using a context manager – and you’ve seen it hundreds of times – its the with statement. Pretty much any opening and closing of files, that you do in Python, has a context manager – so use it.
with open(x) as y:
This is a super common ‘tip’ – its pretty practical in all your coding. However I want to highlight the idea of opening the same file multiple times. If you do this (even with a context manager) you are hurting yourself and it just will not scale – your code will run slowly.
Consider this: say you want to extract pixel values at given coordinates. Perhaps these coordinates are in a file themselves or a list or a numpy array (or …). You iterate over the each coordinate, open the image extract the value and do wantever you need to do with that value (save it?). If you have 100 coordinates, using this approach you will open (and close) the image 100 times. So instead just open the image and extract the points with the image already opened. And guess what you don’t even have to use a for loop.
with rasterio.open(raster_file) as src: band1 = src.read(1) # create a list random points in the raster x = np.random.randint(0, src.width, size=10) y = np.random.randint(0, src.height, size=10) # extract the values of the raster at the random points values = band1[y, x]
In this code block I’m opening a file name (raster_file) as src with a context manager, extract the first band to an array called band1. I then create 2 arrays (x and y) of 10 random integers between 0 and the width (src.width) in pixels for x and 0 and the height (src.height) in pixels for y. Finally I am creating a 1d numpy array of the values at these locations. It is very fast.
After this its upto you what you do with the values. I suppose most commonly you’d want to extract these to a shapefile (or other vector point format). So convert the data to a Pandas dataframe, calculate the xcoord and ycoord assign to the coordinate system of the input src and save. If you do this, all indented under the with statement, the context manager will close out the image after your code runs.
# create a dataframe with the random points and the values df = pd.DataFrame({'x': x, 'y': y, 'values': values}) # convert row and column to x coord and y coord df['xcoord'] = df['x'] * src.res[0] + src.bounds.left df['ycoord'] = df['y'] * src.res[1] + src.bounds.bottom # convert dataframe to geopandas gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.xcoord, df.ycoord)) # set crs gdf.crs = src.crs # save to shapefile gdf.to_file(raster_file.replace('.tif', '.shp'))
There are actually numerous tips and tricks in this post, but hopefully you can see the benefit of
a) using a context manager and
b) not opening the file multiple times
I’ve added comments where needed to the code here I hope this has been of use.
I’ll be adding more tips and tricks
Image credit https://unsplash.com/photos/fpxUB3PPbEg
I am a freelancer able to help you with your projects. I offer consultancy, training and writing. I’d be delighted to hear from you.
Feel free to connect or follow me; I am always keen to talk about Earth Observation.