If you work with Geospatial software you cannot ignore the Shapefile. Whatever your thoughts on them (and it does polarise opinion in the GIS world; look for #Teamshapefile or #switchfromshapefile), I feel that I am ultimately driven by whatever a client would prefer. More often than not that is a preference for a Shapefile.
FME awarded the Shapefile a lifetime achievement award at their user conference last year. 33% of all FME readers were Shapefiles and likewise for 40% of FME writers. Most surprisingly, the usage of Shapefiles as an output from FME has increased from 2006 to 2016 – go here for the graph. It is not as if we haven’t seen any new formats in the last decade.
Shapefiles in a raster world
The natural home for Satellite images, I feel, is within a GIS environment. This allows them to be manipulated and used for spatial analysis. For someone working with Raster data being able to understand and interact with Shapefiles is very useful. Whether it be extracting raster boundaries or getting the footprints of irrigation fields, often one of the products from a raster based analysis is a vector data set and more often than not this is a Shapefile.
Which got me wondering. How many Shapefiles do I have on my computer? How does that compare to the number of .geojson files and .kml files I have? (they are examples of equivalent popular vector formats).
To do this we will use Python and OGR. (What is this ogr stuff?). I am sure this could also be done with an ESRI Python library.
Opening a Shapefile
It is a relatively straightforward process.
from osgeo import ogr
# Open the shapefile
dataset = ogr.Open('../Shapefile1.shp') ## great name eh?
Once you have opened the Shapefile it is possible to extract properties such as its geometry or the fields contained within the Shapefile. For this example let’s get the geometry and how many features there are e.g. the number of rows in the Shapefile.
## get the layer
layer = dataset.GetLayerByIndex()
## count the feayures
feature_count = layer.GetFeatureCount()
## print the count
print feature_count
## get the geometry - this will be a number so...
geometry = layer.GetGeomType()
## convert that number to the associated geometry type
geometry_name = ogr.GeometryTypeToName(geometry)
## print the geometry type
print geometry_name
This is useful stuff. How many Shapefiles on my computer?
import os
count = 0
lsshape = []
for root, dirs, files in os.walk("D:/"):
for file in files:
if file.endswith(".shp"):
shapefile = (os.path.join(root, file))
lsshape.append(shapefile)
count +=1
print count
We use os.walk to traverse all the directories and sub directories on the computer. In this case it is walking through my D drive. As it walks through, the code looks for files ending in .shp and any found are added to a list called lsshape. Finally, I count the number of files. I could do this just by looking at the length of the list but this seems easiest to read.
Now that we can walk through the computer and find all the Shapefiles, what about returning all this information to a csv file with the properties including geometry and feature count, thus creating our own Shapefile report.
A Shapefile report
First up, run the walk to build a list of the Shapefiles on the computer – as above. Secondly, we will make a function to get the Shapefile statistics using the very first piece of code.
def infoShp(path):
# Open the shapefile
dataset = ogr.Open(path)
## get the layer
layer = dataset.GetLayerByIndex()
## count the feayures
feature_count = layer.GetFeatureCount()
## print the count
#print feature_count
## get the geometry - this will be a number so...
geometry = layer.GetGeomType()
## convert that number to the associated geometry type
geometry_name = ogr.GeometryTypeToName(geometry)
## print the geometry type
#print geometry_name
return dataset.name, geometry_name, feature_count
Thirdly, now that we have built the function we will just loop through the list of paths to the Shapefiles to extract the information and write to a Pandas data frame and then export to a .csv file. Actually it is much easier than it sounds.
## import pandas
import pandas as pd
## set the dataframe up, filename, its geometry and number of features
df = pd.DataFrame(columns=['File', 'Geometry', 'features'])
## loop through the list if shapefile paths
for i in range(0, len(lsshape)):
## call the function infoshp
name,geoName, featcount = infoShp(lsshape[i])
## write each result to the dataframe
df.loc[i] = name,geoName,featcount
Pandas is really great and if you are familiar with NumPy arrays they behave in a very similar way. If you want to read more about this then there is a really excellent blog on the subject here.
To check the data frame has been built as you want, print the dataframe’s first five rows with the command:
print df.head()
All that is left is to export to .csv file now. That can be done in one line of code:
df.to_csv('../outshapefilereport.csv', encoding='utf-8', index=False)
With that, you are done 🙂
Recap
- Read Shapefiles using OGR in Python
- Extract properties of each Shapefile
- Write out a Shapefile computer statistics report to .csv using Pandas.
- Count the number of Shapefiles on computer
I never did answer the question, how many Shapefiles on my computer?
How many Shapefiles on my computer? 318
How many .geojson files on my computer? 14
How many .kml files on my computer? 10
How many .kmz files on my computer? 4
Oh, and finally,
How many .tif files? 447 🙂
This code is on my GitHub as a Jupyter Notebook
I am a freelancer able to help you with your projects. I offer consultancy, training and writing. I’d be delighted to hear from you.
I have grouped all my previous blogs (technical stuff / tutorials / opinions / ideas) at http://gis.acgeospatial.co.uk.
Feel free to connect or follow me; I am always keen to talk about Earth Observation.