top of page
Search

Project #1: Sentiment Analysis on Amazon Product Review using Helium10

  • Writer: Ira
    Ira
  • Feb 8, 2021
  • 4 min read

Updated: Apr 1, 2021

As I rummage through my makeup kit, I realized that it has been a long time since I was able to put on a full maquillage. At a time when only half of one’s face is revealed outdoors, I was surprised to see today’s #1 bestselling makeup item from Amazon: a mascara. This is an obvious option to highlight one’s eyes (considering the rest of the face is hidden under a mask), but a very elaborate choice I must say. What is this item and why is it today’s bestseller?



With 37,776 reviews on Amazon, I was curious with L’Oreal Paris Mascara Voluminous Lash Paradise. I went ahead and scraped data for these reviews through software suite, Helium 10. Filtering “only helpful” Review Insights garnered me a data set with 99 reviews. I was happy with my data set and began running the file on Python by importing Pandas. After having done data cleaning, I was convinced to run my codes for data analyses.


Figure 1. Data Cleaning and Preparation



I was able to generate a word cloud (see Figure 2) by importing Python Imaging Library (PLI) which returned various interesting tokens. Congruent to its 4.3 out of 5 stars rating on Amazon, there is a jumble of several positive words generated. As for the pain points of this customer review data set, ‘clump’ and ‘clumpy’ appear to be the most identified texts. These initial findings will tell a better narrative after we conduct sentiment analysis and topic modelling.

Figure 2. Word Cloud

To better analyze and present the sentiment analysis, Seaborn was imported. Simply put, “Seaborn is a data visualization library built on top of matplotlib and closely generated with pandas data structures in Python”. Through simple pyplot code, I was able to arrive at a Polarity and Subjectivity scatterplot (see Figure 3) which depicted a positive polarity, still congruent to the high ratings and reviews of this product. And as expected, these reviews were highly subjective.


Figure 3. Reviews by Polarity and Subjectivity


While all my current findings support the promising claim of this item, the question remains: why is this a best-selling product? Through topic modelling, this query can be answered elaborately. I imported Linear Discriminant Analysis (pyLDAvis), which uses an algorithm to classify words together and return an Intertopic Distance Map. There were 8 topics identified which I will be discussing in 4 parts.


The Rave for L’Oreal Lash Paradise


Figure 4. Saliency and Relevance of Topics 1 and 4

Topics 1, 3, 4, and 5 are closely clustered together which explains the ‘rave’ over the product. Looking closely, for topics 1 and 4 (see Figure 4), one of the most salient and relevant word for this cluster of topics is the word ‘look’ and ‘volume’. This explains that the customers were talking about how the product ‘looks’ on them and the true promise of having voluminous lashes. In addition, the relevant terms in the corpus include ‘recommend’, ‘like’, ’love’, and ‘amazing’. Surely, these topics were coming from satisfied customers who were raving about how the product volumized their lashes and generally made them look good.


Notice that I moved the lambda to 0, instead of 1. This is because I wanted to figure out which terms are exclusive to topic 1, having the largest collection of words discussed.


When water won’t work


Figure 5. Saliency and Relevance of Topics 3 and 5

Topics 3 and 5 talk about one of the biggest concerns for mascaras: is it waterproof? The tokens reveal that this product holds to that promise and in fact, customers are mentioning the need for a makeup remover to completely erase the product after a long day of wearing it. (see Figure 5)


Is it really … Better Than Sex?


Figure 6. Saliency and Relevance of Topic 2

Topic 2 is noticeably farther from the other topics depicted in the Intertopic Distance Map (see Figure 6). At first glance, one would be surprised to see words ‘better’ and ‘sex’ as top relevant terms. Being the makeup maven that I claim to be, I understood instantly that this topic covers reviews that compare L’Oreal Paris’ Lash Paradise with another mascara product from Too Faced which is called Better Than Sex. Reviewers are comparing these products which are in tight competition and as it turns out, buyers of this item prefer L’Oreal.


It is worth noting how customers are comparing L’Oreal’s $8.98 with Too Faced’s $54.13 (both Amazon prices) and going for the former with reviews solely based on product performance, and without a mention of the price difference.


L'Oreal should Look into Clumping


Figure 7. Saliency and Relevance of Topics 7 and 8

Finally, topics 7 and 8 are also worth noting due to its distance from the other clusters (see Figure 7). Analyzing topic 7, words ‘clumps’, ‘brush’, and ‘tube’ seem to dominate the reviews. Given this, L’Oreal should look into improving their product to completely fix the evident customer pain point which is ‘clumping’ of mascara. Specifically, L’Oreal should look into its packaging rather than mascara formulation because the reviews dictate that the brush wand and the tube are causing this issue.


All in all, the product has a mean Rating of 3.6 out of 5 stars. However, the percentiles depict a more telling story that the dataset is negatively skewed. The very low rating of few reviews is most likely hiding the high rating of the majority by deflating the overall score. This product has, after all, an Amazon rating of 4.3 out of 5. Moreover, the Subjectivity mean at 0.57 supports that these reviews are truly based on personal insights; while the Polarity mean at 0.20 reflects that the reviewers have a positive remark on this product.


Figure 8. Statistical Summary of Rating, Subjectivity, and Polarity

It seems like I will be checking out my Amazon cart sooner and more convinced than expected. Oh, and in case you are interested, check the link below. Until next time!



 
 
 

Comments


bottom of page