I recently came to love Python after starting a Data Analyst course on Udacity. Each time I go through the course content I see things I could do with Python. Something that by default my day to experience with PHP won't allow me to see.
After a few days of taking the course with just a very little introduction to Python, I decided to dip my feet in the water with a real-world project which I named OrdaMe.
In this article, I'm going to explain what OrdaMe is about, how it was built, and things I learned using the Python programming language.
The OrdaMe problem
I frequently purchase Chinese products on aliexpress.com. Aliexpress is a global eCommerce site based in China. But one of the biggest issues I face is dealing with fake products. There are thousands of fake sellers selling on the platform; if you are not familiar with their tricks you may lose your time and money.
Before I buy a product, I have to check the basics like the seller's reputation and followers; the product sold and ratings; and also read the negative reviews to check for delivery issues (a common issue). But that takes time and bandwidth - 5 to 10 minutes for each product I want to buy.
Knowing what Python can do, I decided to save my time and bandwidth with Python.
What is Python?
Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis. Python is a general-purpose language, meaning it can be used to create various programs and isn't specialized for any specific problems.
If you want to find out more about Python, I recommend reading this step-by-step guide on DataQuest.
How OrdaMe Work
OrdaMe will be a Chrome plugin that tells you whether to add a product to your cart or not. But in the meantime, it is currently a command line app entirely written in Python.
Here is how it works:
You paste the product link
A headless request will be made to the product link you entered and information about the product and its seller will be scrapped.
An OrdaMe score will be returned with information about the seller and product.
$ python main.py
Welcome to AliExpress Product Rating System
Enter product link: https://www.aliexpress.com/item/2251832665683450.html
extracting data from link...this may take few minutes.
{'store_followers': 3832.0, 'store_rating': 94.4, 'store_name': 'S-u-p-e-r Laptop parts Store', 'store_status': '', 'product_title': 'English Laptop Keyboard for HP for EliteBook Folio 1040 G3 keyboard US silver', 'product_price': 7363.27, 'product_sold': 4.0, 'product_ratings': 5.0, 'avg_similar_price': 7730.108333333334, 'product_reviews': 1.0}
OrdaMe score 51.660000000000004
Warning: Consider this product carefully before you buy.
The OrdaMe score is determined by three variables:
- Seller reputation score (0 - 50)
- Product score (0 - 25)
- Price reputation (0 - 25)
A score >75 indicates you should purchase the product; you only have to view the product photos and read a few negative reviews. A score < 75 is not recommended and indicates you should be very careful.
More details about the analysis will be explained later on the GitHub readme.
Things I learned about Python
1. Learned how to read input from a command line with Python.
link = input('Enter product link: ') # base url
if fn.isLinkValid(link) is False:
print('Error: Please enter a valid aliexpress product page link e.g https://www.aliexpress.com/item/2251832665683450.html')
sys.exit()
I'm not a command line guy, I spent most of my time building APIs or websites.
2. Learned how scrapping works with Beautiful Soup and Selenium
driver = webdriver.Chrome(options=options)
driver.get(link) # make request
try:
# scroll down to the bottom of the page
driver.execute_script("window.scrollTo(0,document.body.clientHeight)")
time.sleep(3)
page_source = driver.page_source
# initialize beautiful soup
soup = BeautifulSoup(page_source, "html.parser")
root_element = soup.find(id="root")
header_element = soup.find(id="header")
For the first in my life, I learnt how web scraping works. I learnt a great detail about how Beautiful Soup and Selenium work. I had to combine both of them else it will be impossible to crawl a site like aliexpress.com with several dynamic contents.
3. I learned how to measure the similarities between two strings
def bleu_score(a, b):
return bleu([a], b, smoothing_function=smoothie)
# get the title similarity
# and check if title is close enough
similarity = fn.bleu_score(payload['product_title'], item_title_element.get_text())
if (similarity >= 0.5):
# print(item_title)
price_wrapper_element = item_element.find('div', class_="mGXnE")
price_elements = price_wrapper_element.findChildren()
For instance:
"iPhone 6 Charger white colour" and "iPhone 6 Charger black"
How do you measure their similarities and why?
Some fake sellers usually post fake products and use price bait to lure people e.g An iPhone charger costing $1. This is very obvious to you, right? But for some people out there, it's loot they can't resist. To combat this, I used the Natural Language Toolkit package to get similar products and compare the prices with the original product's price. If there is a varying degree of differences (up to 75%) with about 3 related products it flags the product with a price reputation score of zero (except if the store is a top brand).
I learned how to solve very complex problems with the help of a powerful NLP package.
4. I also learned how to create modules
import aliexpress
import helper as fn
I hated to have multiple functions declaration in the main script. So I had to learn how to create and work with modules to make my code cleaner and easier to maintain as I progressed.
5. I learned how to work with packages.
The most important thing I learned about Python with this project is that there is a package for every complex problem.
How to run this project
You can find the source on GitHub which you can clone and run on your local machine.
It requires the following:
- Python 3.8+
- Selenium Chrome driver running
- A command line interface e.g Git, CMD, or Powershell
-
Thanks for reading - I appreciate it.