Season 1 Episode 2: What is Open Data?
The following is an attempt to transcribe the #scenefromabove podcast for those who might prefer to read instead of listen. For a full list of podcasts please visit www.acgeospatial.co.uk/podcasts
AC = Andrew Cutts
AG = Alastair Graham
AG – Hello and welcome to season 1 Episode 2 of the scenefromabove podcast. I’m Alistair
AC- And I’m Andrew
AG – And we are your hosts for a show that aims to bring you an informal discussion around the cool things happening in and around the world of earth observation at the moment. You can reach us on twitter using the hastags #scenefromabove.
NEWS Item 1 – More things being launched in Jan 2018
AC – News lets do news on the 12th January 2018. The big news today is the launch of 31 satellites including CartoSat-2, Carbonite-2 for Earth-I, 4 more Doves, Telsat Leo Phase 1, ICEYE-X1. I’ve got some numbers for you; do you want to hear them?
PSLV-C40 Successfully Launches Cartosat-2 Series Satellite along with 30 Co-passenger Satelliteshttps://t.co/bTZaQRjLmo
— ISRO (@isro) January 12, 2018
AG – Yeah go for it
AC – 126, 204, 193, 172, 176, 367. These are the amount of satellites launched from 2012 to 2017.
AG – Whooah
AC – 367 in total according to space-track.org last year more than double from 2016.
AG – That is brilliant news. Amazing
AC – And already we have had 31 today and there was a launch last week, a Chinese launch
AG – Yes
AC – So we have already had more than a Satellite launch per day that we are averaging, we’re way ahead already of the world record.
AG – That’s 2018 done then
AC – Yep, amazing isn’t it? 367 last year, an enormous amount. Even when I looked at it I was stunned.
AG – And linked to that as well, you’ve already mentioned the 2 the Chinese put up last week. They are known as the Gaojing -1 Satellites and there the second pair I think. The first pair was launched in December 2016 and these are providing images of 0.5m Panchromatic and 2m multispectral. I think they are going to be commercial satellites so hopefully people outside of China will be able to access the data as well. There is so much data out there at the moment, some amazing opportunities. So I see you have also got something about Hyperspectral CubeSats
NEWS Item 2 – Hyperspectral Cubesats
AC – Yeah I saw ESA put up a news page on their website about the first hyperspectral camera to fly on the next, on ESA’s next CubeSat. We’ll put the link in the show notes. These are generally quite big bulky things so its an impressive hand sized hyperspectral imager. Due to be launched on the 2nd of February 2018
— ESA (@esa) January 9, 2018
AG – Ok
AC – I am just reading on there page at the moment, its going to have 45 visible and near infrared spectral bands
AG – wow, that’s going to be proper hyperspectral then
AC – Yeah and hyperspectral – it feels as if it’s the forgotten sensor to me. There is not that many of them. I have worked too much with Hyperspectral data in the past but I think there is quite a lot of value in it. You know we are seeing more bands, on commercial satellites
AG – I suppose the rush has been towards higher spatial resolution and then secondly higher temporal resolution.
NEWS Item 3 – USGS and NASA Landsat 9 select new science team
AG – I’ve got 2 more things that I am excited about
AC – [laughter] you sound excited about them
AG – So one is the USGS and NASA have selected their new science team. And the reason this is super exciting is that the period that this team are going to be working 2018 -2023 sees the launch of, fingers crossed, Landsat 9 satellite.
AC – Yeah
AG – And everything that is going to happen from that. So, this is a really good step forward in the whole workflow in order to get Landsat 9 up and operational. Being able to drive the science and there is some really interesting names in there.
AC – A lot of people
AG – There are a lot of people. I think if you are involved in Earth Observation you can head over there and see who is involved, as these are the people who are going to be shaping the science behind the Landsat program for the next 5 years or so.
AC – yeah great isn’t it
NEWS Item 4 – Kryten and the Gigapixel
AG – The final thing I want to say. I’ve recently got into something called fully charged on YouTube and its Robert Llewellyn the guy who plays Kryten in Red Dwarf
AC – Yeah
AG – And basically, he hosts a whole load of things about electric vehicles and the move the renewables and electronification of technology. He is really is enthusiastic especially when he talks about Gigawatts of power. And I came across this [Laughter] I just want to be able to say Gigapixel
AG – Pierre Markuse and a team of people from Sentinel Hub and elsewhere have created a 4 GigaPixel temporal mosaic of Sentinel 2 data for Europe. And it looks absolutely stunning. It’s a 64k by 64k pixel temporal mosaic. That’s amazing.
AC – [laughter] Wow that was and amazing piece of news to go from Kryten to 6 and a bit gigabytes of imagery
AG – Quality link I think
AC – It is! I never know what you are going to say that the joy of this.
NEWS Item 5 – SCP plugin for QGIS
AG – Shall we do are linking bit of news? Our final bit of news
AC – Yeah so finally the semi-automatic classification plugin as known as SCP version 6 codename Greenbelt apparently, which is only going to be compatible with QGIS 3, is due to be released in 10 days time on 22nd January . This is a really brilliant plugin for QGIS I cannot encourage people enough to look at this fantastic resource and there is good videos, good tutorials, you can download data straight from it; Sentinel, Landsat, Modis, all sorts. This is a really great tool and its great news that its going to be ready for QGIS 3. We are going to talk about open data in a little bit not so much open software, but open software is an amazing resource that I use and I am sure you use a lot in the day to day operations
I am very pleased to announce that the new Semi-Automatic Classification Plugin version 6 for #QGIS 3 will be released on the 22 of January 2018
Following a video that illustrates the main features of the new version 6 https://t.co/TEfLPnYTde
— Luca Congedo (@LucaCongedoGIS) January 8, 2018
AG – Yeah definitely. I am fairly sure this is a 1 man project as well and the support is amazing the quality of the product is really good as soon as anyone brings up an issue on the google + support or whatever, it either gets fixed or the query gets answered. I think just being able to have something where you can do the whole workflow from obtaining the data to preprocessing through to post processing and band calculations and maths, its really neat. And it’s a good way to get people into Remote Sensing
Topic discussion – Open Data
AC – Which takes us on quite nicely to our topic of Open Data.
AG – What is open data?
AC – Its just free data, isn’t it right?
AG – Oooh
AC – [Laughter] – it’s a bit like QI!
AG – Well I suppose it depends on what you mean by free. Its data that is made available without restriction, without license restriction and as part of that it comes without cost to the person who is using it obviously though there are costs elsewhere and we will probably discuss these in a little bit. The same way that free and open source software is developed there is the implication that there is a free in that there is no cost and a free as in freedom. I think the thing that makes open data, open data is the ability to have the freedom to use the data however you want and there is no restrictions as to whether its commercial or research or whatever. To generate new products off the back of that and there is no restrictions on the products that you generate. That to me is what open data is
AC – And is it fair to say that most open data is either Governmental or Science based?
AG – Yeah I would say that is probably the case but it is more to do with the way those data have been funded in terms of collection in the first place. If you think about it the Government is funded by tax payers and science is funded by Government so it makes more sense for those organisations to make the data available back to ostensibly the people who have funded that ie the tax payer. There is no reason why other commercial companies and entities can’t open up their data. But yes you are right the majority of it is government and science based
AC – I did a little bit of background reading, its not just totally winged this Podcast though it does sound like it is I think
AC – I refreshed or more like learnt that in May 2007 the USGS announced that it was going to make scenes from Landsat 7 available over the internet as a Pilot project for the first time. And then in April 2008, so just under a decade ago, the USGS decided to release all imagery in archive free. So ten years of free data and I read this article, that at the time it was written (2008) that the USGS receives a couple of million dollars a year to process the data as well as some supplemental income 2-4 million in sales of each scene. They figured that by putting this data back on the web that they would cut the money for processing and hopefully lose the need for supplemental income. In the older system if you wanted a Landsat scene as we have talked before on the Podcast (Pilot) you’d have to order the computer tape and they would copy it send it to you and obviously that is quiet high cost. So by going open they eliminated all their billing and accounting system. I went looking for the total number of downloads of Landsat scenes, so since December 2008 up until September 2017 we have made 68 million downloads. So, when it started running at barely a million. So, a huge number and without sounding like a stuck record of listing numbers at you today I also looked at the mission status documents for sentinel 2 and this is really amazing, the start of 2017 ESA on the scihub had 60,000 odd registered users and by the end of the year it had doubled it to 113,000. At the start of 2017 2.37 Petabytes of data downloaded by the end of last year they had a total of 12.24 Petabytes. So we have doubled the users and gone 6 times the amount of data downloaded in a year.
AG- It really is hats off to the people at the USGS when they decided to open up the Landsat archive and just plonk it out there. No one knew whether it was going to work or not as an idea whether people would download it. You’d have to take a punt and its really paid off I think its been an absolute game changer. And the numbers you’ve quoted for Sentinel  these are absolutely insane in terms of the size and the growth – this is just great news.
AC – I think you can probably make a reasonable case because of the open data policy of both ESA and USGS that it has driven more people into looking at Satellite data and has potentially driven more sales and interest for more commercial satellites.
AG – Yeah I would agree with that. I worked on a project in 2017 that the overarching point of the project was to look at different river catchments and I just downloaded the most recent cloud free image that I could find. Chopped it out to the catchment created false colour composites and put a .png in front of people and it totally transformed the project. It became we have to have Sentinel imagery for all of our catchments.
AC – To my mind that feels like a very good case for the economic benefits of open data.
AG – Without that being open there is no way I would have been able to get that image in front of them.
AC – How do you feel about the hesitation to use open data because it might lead to people making decisions and then blaming the data and then blaming the data on the errors that they may have introduced to the data?
AG – That is an interesting one. I suppose it comes back in some respects to knowledge intensive data sets. The open data is out there and is accessible and anyone can download it. But what this concept means is that how to access it, process it and get some information out of it means that you have to be sufficiently technical and have quite a grounded understanding in what it is that you are looking at. A good example of that is Sentinel 1 SAR data most people would look at that, who didn’t have any earth observation or remote sensing knowledge and just go it’s a grey speckly mess that I don’t really understand. As long as people who are championing open data also point out the limitations and the issues around open data as well as all the great benefits and the plus points then hopefully the people who are then using the data as well will be able to understand that it is not the data’s fault and that sometimes the data needs to be processed in a certain way in order to get the answer that they are looking for.
AC – The opposing argument I guess there is that if you paid for that radar data and you’d miss processed it or miss interpreted it what would be the come back on the supplier? Would there be a comeback on the supplier?
AG – No I don’t think so
AC – I don’t think so either, so I don’t want to trivialise that issue but I don’t feel it is that troublesome. We are faced in an age now with whatever you subscribe to or buy that you are faced with huge disclaimers that you click sign or accept before you go ahead with that product. How often do you look at that for open data?
AG – I look at those disclaimers more for open data than I do for other things. In part because my business is based around supplying and delivering open data
AC – Yes
AG – I need to know whether or not its fully open and whether I can use it. So I always do check the usage rights of an open data set. That said I don’t know whether I am particularly strange in that way or that other people don’t bother.
AC- Yeah I don’t think you are alone
AG – What about you do you? Do you use a lot of open data?
AC – Yes. Do I check, not as often as I should do. Maybe I should check a bit more? I take it as read. I think that is part of open data to remove the layers.
AG – So are there levels of openness? This is something that bugs me quite a bit. If you have to log in and create an account and provide details is that pure open data? I understand that whoever is providing the open data would like to keep them to get an idea of how it is being used and who is using it. And linked to that The EA in the uk has released aerial photography and its brilliant that they have done that and its an amazing data resource. However in order to keep the data volumes down they have created it in something called an ecw format which is a proprietary format. So is that open data? It is openly available – you don’t have to login it is available you just available. But it’s a proprietary data format so – discuss!
AC – I don’t know. Certainly if you want to use/write to ecw format you need a license. I think QGIS can open ecw format
AG – I think it can on windows. But I don’t think it can on linux.
AC – I normally go with the OSGeo4W installer. But gut feeling is that it does.
AG – I think it does in that installation because it is all packaged to together. When I was installing it on Unbuntu 16.04 it doesn’t come with that driver so I had to install it myself. Gvsig is the other piece of software that has it all bundled in with it. I downloaded that and stuck it onto my linux box and that worked fine. Its fine that you can get at the data, its just that it was a step of faffyiness that was really needed and it seemed that there wasn’t anything on the website that was explaining the rational so the only thing I could think of was that it creates smaller files.
AC- I think as we go forward though we are going to see more open data.
AG – Yeah
AC – The quality of it will be good whether you can find it will be another significant issue, how accessible it is. We joked at the start how things being free, but someone’s got to maintain it. These things are important in their documentation and integrity as well. The governmental data, some of its great, some of its way out of date.
AG – In terms of being able to generate new ideas, new science, new products then having open data is definitely way, way better than not having it. What I’d quite like to see is some of the commercial data providers starting to do an almost rolling release of some of there archive. So I don’t know how much data get sold from pre 2005 so could they open that chunk and then and the end of every year an extra year goes onto the end of every archive? And they just sell the remaining 10-15-20 years of data or whatever that is best for their business model.
AC – Yeah I don’t, I mean that comes down to a business decision. They might say yes we will do that but they might get bombarded by requests and then incur huge cost. I think we are very lucky with satellite data, how well and how standardised it is. I get a lot of my data from AWS and someone’s got to pay for it. You don’t have to login to get the data. You can login to get the data, but you don’t have to login to get the data.
AG – Even though the costs on AWS are low and they are decreasing all the time, but even so there is an inherent cost and some of these datasets are so huge and accessed by so many people then it must be a bit of a headache to keep on top of what these budgets are to make them open.
AC – How do we sum this all up then. That open data isn’t free as perhaps you would instinctively say it would be. There is significant costs involved behind the scenes as well.
AG – I think open data is probably one of the best things certainly in the last 10 years. Its transformed the way in which people and businesses and organisations work and who gets to see the data. I think the more data that is made open as long as its maintained and documented and users are educated about what it can and can’t be used for then its just going to create more and more opportunities for everyone.
AC – yeah let us know on the hastag #scenefromabove what you think.