A look at the growth and the polarity of sentiment of Earth Observation and Remote Sensing since 2008, with Python and Pandas

 

up

About 18 months ago I looked at tweets from Twitter containing the phrases Earth Observation or Remote Sensing. Primarily I did this after the first Future EO event that ESA held. You can have a look at the post here:

#FutureEO and twitter data mining

With the next future EO event this November I wondered if the trend for tweets about the downstream satellite industry was still increasing. I also wanted to use the opportunity to explore the average sentimentality of the tweets. By that I mean how positive or negative the feeling is towards both Earth Observation and Remote Sensing.

Before I explore the data it is worth looking at the Google trend for both these terms

google_trend

The Google trend data does not seem to relate much to the data I have extracted from tweets. Firstly, Earth Observation as a term compared with Remote Sensing barely registers and secondly, the trend for Remote Sensing seems to be in decline. I certainly do not see this in the tweet data from twitter. I don’t really have any insights on why that would be the case.

Extracting the tweet data

To extract the data I used a Python library called twint.

It is super easy to use and does not limit the number of tweets. After a little experimenting I found that the following code got me csv files of the data I wanted to look at.

#!/usr/bin/python3

import twint

c = twint.Config()

c.Search = "RemoteSensing" ## change to EarthObservation

c.Store_csv = True


c.Output = "RS" ## creates a folder called RS

twint.run.Search(c)

Use the twint help to refine a query. If you are looking for a heavily tweeted word, perhaps restrict the fields returned.

Processing the data (technical part)

I have written and commented a Jupyter Notebook showing the steps I made to derive the results (more in a moment). I set off with the intention of reading and processing the data in several Python lists but this quickly became increasingly clunky. In the end I used Pandas and it has really been eye opening how powerful this Python library is for me. Have a look at the Notebook here. You should be able to parse in any twint derived csv file and perform the analysis.

Results

The number of tweets about Remote Sensing and Earth Observation is increasing. I looked at the time between January 2008 and the start of November 2018 (in the last month the values are not complete).

EO_RS_growth

The fact is that more satellites have been launched and that could help explain this growth. Also, investment is increasing, more companies have been formed and the market is expected to grow to $66 billion by 2020. So, overall the increased number of tweets about both Earth Observation and Remote Sensing is increasing on a month by month basis.

This is essentially an update of my previous post. I’ll continue to explore this data set and ponder why the google searches don’t show the same thing.

Sentiment analysis

There are various Python libraries that allow you to look at the sentiment of a word or, more usefully, words. We can use this data with a bit of preprocessing to remove links/emojs/@ symbols etc (see the Notebook code for more details). The simplest Python library is textblob. There are others though and plenty of examples on GitHub including this brilliant one which I used to help with preprocessing the tweets.

Sentiment polarity

Consider this:

y = ['happy', 'sad', 'low', 'high', 'big', 'small', 'expensive', 'cheap']
for word in y:
    print (word, TextBlob(word).sentiment.polarity)

The print statement will return values between -1 (most negative) and +1 (most positive). In the case above the printed values are:

happy 0.8
sad -0.5
low 0.0
high 0.16
big 0.0
small -0.25
expensive -0.5
cheap 0.4

If I convert this list of words into a string of words and print the polarity it will look like this:

y = 'happy sad low high big small expensive cheap'
print (TextBlob(y).sentiment.polarity)

Which will return the value:

0.013750000000000012

Which also happens to be the average of the 8 scores above. So, these 8 words together make a 1.3% positive statement.

I fed all the tweets for both remote sensing and Earth Observation into textblob:

sentiment

The scores are averaged over the month and in the case above I haven’t done any further filtering of the tweets. Whilst the data is spiky, there does seem to be a trend of the tweets being more positive.

I added an additional processing step to remove these stopwords. I did this because the script I was adapting was also filtering out stopwords.

The impact on the stopwords was to slightly reduce the polarity of the sentiment. The trend does still seem to be increasingly positive for the terms Earth Observation and Remote Sensing.

Conclusions?

If this data is anything to go by then in 2018 more tweets are being sent about Earth Observation and Remote Sensing than anytime in the last decade. (Twitter is only 12 years old). These tweets combined on a month by month basis are about 8-12% positive with the trend to increasing positivity.

As ever I have posted all the code on my Github; please do feel free to explore and use. I would love to hear comments and thoughts on the tweets vs google trend. And also whether 8-12% positivity is interesting/suprising/low/high in your opinion.

Thanks for reading!

Photo credit https://unsplash.com/photos/cYUMaCqMYvI

I am a freelancer able to help you with your projects. I offer consultancy, training and writing. I’d be delighted to hear from you.

I have grouped all my previous blogs (technical stuff / tutorials / opinions / ideas) at http://gis.acgeospatial.co.uk.

Feel free to connect or follow me; I am always keen to talk about Earth Observation.

I am @map_andrew on twitter