Data Viz Storytelling with a Data Science Twist

Ghafar Shah
6 min readMar 12, 2023

--

As you may already know, text data is abundant in our everyday lives — from text messages to social media comments and articles. Recently, I’ve developed a strong interest in NLP and text analytics. I believe there is potential in leveraging these tools to get more insight from people’s remarks and feelings.

While I am no expert in this field, I’d like to share my perspective and experience with one specific aspect of text analytics — data visualization. I believe that visualizing text data can help us discover new insights. As you read through this blog, you’ll come across a few Python scripts, so having a basic grasp of Python would be helpful. At the end, I have also curated a list of incredibly helpful resources on the topics that will be discussed here.

Exploring the Tinder Experience: User Reviews and Insights

A few months ago, I created a Tableau dashboard on the dating app, Tinder. While working on this project, I experimented with various tools and techniques, including Figma for background design and a natural language processing sentiment model called TextBlob. TextBlob is an amazing library for quantifying the sentiment of text data based on its subjectivity and polarity.

I was really surprised to learn that polarity quantifies text based on its positivity (1) or negativity (-1), on a scale ranging from -1 to 1. The code script below is an example of how TextBlob is used in Python.

# Imports the textblob package for sentiment analysis
from textblob import TextBlob

# Some sample text data
text = "I love data viz!"

# Assigns the text to a new variable called comment
comment = TextBlob(text)

# Preview polarity results
for sentence in comment.sentences:
print(sentence.sentiment.polarity)

# Preview subjectivity results
for sentence in comment.sentences:
print(sentence.sentiment.subjectivity)

Additionally, TextBlob also quantifies text based on subjectivity, which measures how factual or opinionated a text is. Of course, I was aware that TextBlob’s subjectivity scale was probably not meant to be applied in the setting of a dating app, but I was nevertheless curious to give it a go and see what insights I might gain. And thanks to Zach Bowders from the #DataFam, when I swapped the plotted data on my scatterplot with polarity and subjectivity on the X and Y axes, respectively, the resulting shape almost looked like a heart!

In the end, I was able to design a user-friendly dashboard with a seamless user experience by utilizing Tableau as a data visualization software. I also integrated text analytics to examine the various comments made by Tinder app users. Overall, the findings were remarkable, and this method revealed some very intriguing insights.

Behind the Scenes of TikTok: Reviews by Users

Later on, I stumbled upon a new Python package that I just had to try: text2emotion. Excited to see what this package could do, I got to work on another Tableau dashboard, this time looking at user reviews for the TikTok app.

The functionality of text2emotion amazes me. The python package identifies the best emotion for a particular remark. Its five major categories — happiness, anger, sadness, surprise, and fear — cover five different types of emotions.

So, how does it work? Well, if you were to run text2emotion on a comment like “I love data viz!”, it would provide a score across each of the emotion categories. Based on this scale, the package might determine that the comment is 100% Happy, 0% Angry, 0% Sad, 0% Surprise, and 0% Fear.

And in order to flag the comment with a particular emotion, I simply look at the emotion with the highest number. So, in this case, “I love data viz!” would be categorized as “Happy.” It is honestly amazing how these kinds of packages can help us gain a deeper understanding of how people are feeling and reacting to various topics.

# Imports text2emotion
import text2emotion as te

text = "I love data viz!"

# Calls the get_emotion function from text2emotion
# Prints out the emotion scores: Happy, Angry, Sad, Surprise, and Fear
print(te.get_emotion(text))

When working on my TikTok dashboard, I decided to experiment with the new Python package, text2emotion. To make the dashboard more informative, I analyzed the count of reviews by month for each of the emotion categories, and it turns out that most of the comments fell under the “Surprise” bucket. Yet, I didn’t end there. Using TextBlob’s [-1, 1] polarity scale, I looked more closely at each comment across the emotion categories and color-coded them according to whether they were positive, negative, or neutral.

This actually revealed some interesting insights, such as how some “Surprise” comments could be negative. And, there were even comments in the “Sad” category that were flagged as positive. Clearly, it is important to look beyond the sentiment and emotion scores because NLP is not perfect.

As I continue to grow and learn in the field of NLP analytics, there are a few key takeaways that I’d like to share with you:

Remove stop words

Next time, I plan to remove stop words using NLTK. It’s astonishing how simple things like removing these stop words (i.e., “a”, “the”, “is”, “are”) can significantly improve the accuracy of a TextBlob sentiment score. Even more, knowing that I can create my own custom list of stop words with NLTK is a huge game changer!

Reduce the word back to its root word

An awesome tip that I recently learned about in NLP is lemmatization. At the time, I had no clue this even existed. Basically, lemmatization reduces the form of words back to its root form. This is particularly useful when similar words such as “good”, “awesome”, “great”,and “fantastic” have to be grouped together. By reducing them to their root words, we can obtain a more accurate representation of emotions and avoid potentially misleading our data results.

Remove punctuations

I now recognize how critical it is to eliminate punctuation at the text cleaning and pre-processing stage. Although omitting extra punctuation may seem like a minor issue, it can actually affect how accurately our text analytics findings are generated. So, it is important that we spend the time cleaning up this portion of our data before moving on to NLP.

Remove case sensitivity

Lastly, another critical step in text preprocessing is to convert all text to either lower or upper case. This is particularly important in text analytics, where you want words like ‘good’, ‘Good’, or ‘GOOD’ to be counted as one word and not three distinct versions.

Wrap-up

Through the exploration of these dashboard exercises, I realized that, although these packages help detect emotions in text data, there is a lot of text cleaning and preprocessing to do.

This was a valuable lesson that really highlighted the importance of taking the time to clean and pre-process data before jumping into data visualization. The whole dashboard experience — from text analytics to data viz storytelling — was a challenging one. But, I am very excited to continue exploring NLP analytics, and I hope that my experience and insights will also inspire others to dive into the world of NLP and text analytics!

If you have any questions or would like to share ideas, please feel free to reach out to me on Twitter (@GhafarShah9) or LinkedIn.

Resources

Text Analytics for Beginners using Python NLTK

TextBlob: Simplified Text Processing

text2emotion

Text2emotion: Python package to detect emotions from textual data

How to add custom stopwords and remove them from text in NLP

Text Preprocessing: Removal of Punctuations

--

--

Ghafar Shah
Ghafar Shah

Written by Ghafar Shah

Passionate about analytics and insights, I am a dedicated learner with a keen interest in data science and visualization.

Responses (2)