Scraping Trustpilot Reviews Using Python

In the digital age, customer reviews have transitioned from casual word-of-mouth comments to impactful, online narratives that wield significant power over brands. 

These online testimonials have the potential to elevate a brand to stardom or push it into obscurity, based on authentic user experiences. 

Trustpilot stands at the forefront of this revolution, offering a global platform where customers candidly share their interactions and, the pros and cons they feel about various products and services. 

As businesses, educators, and enthusiasts, tapping into this vast repository of insights becomes crucial. 

In this article, we’ll learn to extract these valuable reviews from Trustpilot using Python. But prior to that, we’ll see how scraping them can help us. 

Some Popular Use Cases of Scraping Reviews

1. Market Research:

Companies & business owners might want to understand the market better. Scraping reviews can show what customers think about different products in a category, helping businesses decide what kind of products they should build & have a plan for the future.


2. Alerts for Negative Reviews:

Every negative review of your product harms your online reputation. Via data extraction from review sites, businesses can set up a system where they get regular updates on reviews. This way, you can quickly react and fix problems that might even harm your business in the future.


3. Study Trends:

Over time, trends can emerge in customer feedback. For instance, there might be a growing demand for “integrated analytics features” or “multi-platform compatibility.” Recognizing these trends allows SaaS businesses to enhance their offerings and better cater to user needs.


4. Identifying Fake Reviews:

Sometimes, people might leave fake good or bad reviews. By analyzing the scraped data, it might be possible to spot patterns that show which reviews are not genuine.


5. Making Marketing Strategies:

If a business sees they get lots of good reviews about a particular feature of their product, they might decide to highlight that feature in their next ad campaign.
I have mentioned some basic use cases here, however in the real world there can be endless scenarios where data from review sites can help your business.

Let’s start extracting data from trustpilot.

Basic Requirements for Web Scraping Trustpilot

You should have Python 3.x installed on your computer for this tutorial. Along with that, you need to install two more libraries which will be used further in this tutorial for web scraping.

  1. Requests will help us to make an HTTP connection with Trustpilot.com.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.

Read More: A beginner-friendly tutorial on web scraping with Python

Let’s Begin Scraping Trustpilot

We will be pulling the review data from Scrapingdog. The link to the Trustpilot review page of scrapingdog is here.  The data points we are going to scrape from Trustpilot: –

1. Name of the Reviewer

  1. Rating they have given
  2. Review Body (The text that contains the review) 

We will start with making a normal GET request and then we will use BeautifulSoup to parse the html.

				
					import requests
from bs4 import BeautifulSoup


url = "https://www.trustpilot.com/review/scrapingdog.com"
o={}
l=[]

response = requests.get(url)

				
			

I am importing our libraries which were discussed earlier and then we have declared the target url. Then we declared an empty list l and object o. At last, we are making a GET request using the requests library.

As you can see in the above image all the data we need is located inside this div tag with class styles_cardWrapper__LcCPA. Let’s first find all of them and then we will extract the data we are looking for.

				
					if response.status_code == 200:


soup = BeautifulSoup(response.text, "html.parser")


review_elements = soup.find_all("div",{"class":"styles_cardWrapper__LcCPA"})



else:
print("Failed to retrieve the page. Status code:", response.status_code)
				
			

Here we are checking if the status code is 200 or not. If it is then we are making a BeautifulSoup object. Then we use .find_all() function to extract all the div elements with the class styles_cardWrapper__LcCPA.

As we can see the name of the reviewer is hidden inside span tag with class typography_heading-xxs__QKBS8.

Now, let’s use this information to extract the text.

				
					if response.status_code == 200:


soup = BeautifulSoup(response.text, "html.parser")


review_elements = soup.find_all("div",{"class":"styles_cardWrapper__LcCPA"})


for review_element in review_elements:
o["review_name"] = review_element.find("span",{"class":"typography_heading-xxs__QKBS8"}).text
o["review_text"] = review_element.find("h2",{"class":"typography_heading-s__f7029"}).text
o["review_text"] = review_element.find("div",{"class":"star-rating_starRating__4rrcf"}).find('img').get('alt')
l.append(o)
o={}
print(l)

				
			

Once you run the code you will get this.

Finally, we have managed to scrape the name of the reviewer, the review he has left, and the rating.

Complete Code

				
					import requests
from bs4 import BeautifulSoup

# Define the URL of the Trustpilot page with reviews
url = "https://www.trustpilot.com/review/scrapingdog.com"
o={}
l=[]
# Send an HTTP GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:

# Parse the HTML content of the page using Beautiful Soup
soup = BeautifulSoup(response.text, "html.parser")

# Find the review elements on the page (you may need to inspect the HTML structure to get the right selector)
review_elements = soup.find_all("div",{"class":"styles_cardWrapper__LcCPA"})

# Loop through the review elements and extract the text of each review
for review_element in review_elements:
o["review_name"] = review_element.find("span",{"class":"typography_heading-xxs__QKBS8"}).text
o["review_text"] = review_element.find("h2",{"class":"typography_heading-s__f7029"}).text
o["review_text"] = review_element.find("div",{"class":"star-rating_starRating__4rrcf"}).find('img').get('alt')
l.append(o)
o={}
print(l)

else:
print("Failed to retrieve the page. Status code:", response.status_code)
				
			

Conclusion

It’s evident how transformative such data can be for businesses, especially in the SaaS domain. The information we’ve gathered isn’t just numbers or star ratings; it’s the voice of countless customers offering a roadmap for product evolution. 

As we wrap up, remember that with great data comes great responsibility. It’s not just about collecting feedback but acting on it, ensuring that you continuously strive hard to improve the user experience.

Leave a Comment