May 5th, 2020
Monitor Competitor Prices on eBay with Python and BeautifulSoup

One of the most significant applications of Web Scraping in retail and e-commerce is in monitoring competitor price movements. This, when done well, can mean extra revenue and also will allow the retailer to ensure that they are always in the game and are not taken by surprise by anything the competition is doing.

Here is a simple script that does that. We will use BeautifulSoup to help us extract information, and we will track the prices on eBay.

To start with, this is the boilerplate code we need to get a page on Amazon and set up BeautifulSoup to help us use CSS selectors to query the page for meaningful data.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=iphone &_sacat=0&LH_TitleDesc=0&Model=Apple%20iPhone%208&_sop=12&LH_PrefLoc=0&rt=nc&Storage%20Capacity=64%20GB&_dcat=9355'

response=requests.get(url,headers=headers)


soup=BeautifulSoup(response.content,'lxml')

We are also passing the user agent headers to simulate a browser call, so we dont get blocked.

Now let's analyze the eBay site for players in the iPhone market, specifically the iPhone 8 with 64 GB storage. This is how it looks.

And when we inspect the page, we find that each of the items HTML is encapsulated in a

Tag with the class s-item

  • We could just use this to break the HTML document into these cards, which contain individual item information like this.

    # -*- coding: utf-8 -*-
    from bs4 import BeautifulSoup
    import requests
    
    headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
    url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=iphone &_sacat=0&LH_TitleDesc=0&Model=Apple%20iPhone%208&_sop=12&LH_PrefLoc=0&rt=nc&Storage%20Capacity=64%20GB&_dcat=9355'
    
    response=requests.get(url,headers=headers)
    
    
    soup=BeautifulSoup(response.content,'lxml')
    
    #print(soup.select('.a-carousel-card')[0].get_text())
    
    for item in soup.select('.s-item'):
    	try:
    		print('----------------------------------------')
    		print(item.select('.s-item__title')[0].get_text().strip())
    		print(item.select('.s-item__subtitle')[0].get_text())
    		print(item.select('.s-item__reviews')[0].get_text())
    		print(item.select('.clipped')[0].get_text())
    		print(item.select('.s-item__reviews-count span')[0].get_text())
    		print(item.select('.s-item__price')[0].get_text())
    		print(item.select('.s-item__logisticsCost')[0].get_text())
    
    		print(item.select('.s-item__location')[0].get_text())
    
    		print('----------------------------------------')
    	except Exception as e:
    		#raise e
    		print('')

    And when you run it.

    python3 PriceTracker.py

    You can tell that the code is isolating the cards HTML

    on further inspection, you can see that the Title of the product always has the s-item__title class and the price has the s-item__price class. Let's also get the number of reviews and ratings while we are at it.

    print(item.select('.s-item__title')[0].get_text().strip())
    		print(item.select('.s-item__subtitle')[0].get_text())
    		print(item.select('.s-item__reviews')[0].get_text())
    		print(item.select('.clipped')[0].get_text())
    		print(item.select('.s-item__reviews-count span')[0].get_text())
    		print(item.select('.s-item__price')[0].get_text())
    		print(item.select('.s-item__logisticsCost')[0].get_text())
    
    		print(item.select('.s-item__location')[0].get_text())

    We have also tried to get the location and the shipping costs, all crucial pieces of information.

    The whole code looks like this.

    # -*- coding: utf-8 -*-
    from bs4 import BeautifulSoup
    import requests
    
    headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
    url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=iphone &_sacat=0&LH_TitleDesc=0&Model=Apple%20iPhone%208&_sop=12&LH_PrefLoc=0&rt=nc&Storage%20Capacity=64%20GB&_dcat=9355'
    
    response=requests.get(url,headers=headers)
    
    
    soup=BeautifulSoup(response.content,'lxml')
    
    #print(soup.select('.a-carousel-card')[0].get_text())
    
    for item in soup.select('.s-item'):
    	try:
    		print('----------------------------------------')
    		print(item.select('.s-item__title')[0].get_text().strip())
    		print(item.select('.s-item__subtitle')[0].get_text())
    		print(item.select('.s-item__reviews')[0].get_text())
    		print(item.select('.clipped')[0].get_text())
    		print(item.select('.s-item__reviews-count span')[0].get_text())
    		print(item.select('.s-item__price')[0].get_text())
    		print(item.select('.s-item__logisticsCost')[0].get_text())
    
    		print(item.select('.s-item__location')[0].get_text())
    
    		print('----------------------------------------')
    	except Exception as e:
    		#raise e
    		print('')

    And when run.

    Produces all the info we need.

    In more advanced implementations, you will need to even rotate the User-Agent string, so eBay cant tell it the same browser!

    If we get a little bit more advanced, you will realize that eBay can simply block your IP, ignoring all your other tricks. This is a bummer, and this is where most web crawling projects fail.

    Overcoming IP Blocks

    Investing in a private rotating proxy service like Proxies API can most of the time make the difference between a successful and headache-free web scraping project, which gets the job done consistently and one that never really works.

    Plus, with the 1000 free API calls running an offer, you have almost nothing to lose by using our rotating proxy and comparing notes. It only takes one line of integration to its hardly disruptive.

    Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

    • With millions of high speed rotating proxies located all over the world
    • With our automatic IP rotation
    • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions)
    • With our automatic CAPTCHA solving technology

    Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

    A simple API can access the whole thing like below in any programming language.

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

  • Share this article:

    Get our articles in your inbox

    Dont miss our best tips/tricks/tutorials about Web Scraping
    Only great content, we don’t share your email with third parties.
    Icon