How Does Web Scraping Become Simpler and How To Prevent It? (With Scraping NBA Players’ Salary Example)

Table of Contents

Introduction

 

Web scraping refers to extracting the content of a website programmatically. Specifically, developers create bots to get the HTML code of a website, parse the code and export the result to an external data source.

Developers do it for different purposes. Search Engines scrape data from websites and further index it so that we can find information much easily. However, there are quite a lot of bad bots on the internet (25.6% of all website traffic comes from bad bots), these bad bots may try to steal your content, e.g. Data Leak in Alibaba’s Taobao due to web scraping.

 

Web Scraping On NBA Players’ Information

Today, web scraping becomes much easier due to technology advance, which we will illustrate it by a simple example, how to scrape NBA players’ information, e.g. Height, Birthdate, salary.

Here’s the main page of NBA players’ basic information:

https://hoopshype.com/salaries/players.

We can navigate to another web page that contains each player’s basic information from this page.

6AJsq6J4gbJtoR A81yQY0UBRIM5FlmtpaDdhIa3XTD6UZyaHWf5CST9Z4hOSaJC8oe2tsl1Ts0o4kMzMarGgUoK3RCSa1Gho7S0v1b1kF0ikrPH01F7IFyQCI4XmpRR8D1XpHY5s0

Main page

HyvKBySdRb4Bpz3Z6NrF ExF39hXXaoIQs1vZQAbNSfbwMS5340X5VF8Rr4SkHd7bp8flPYn rYy4fcCKSo79DS5HKADT Rax

Stephen Curry’s basic information

Prerequisite

Basic Python & HTML knowledge is required.

We will use Python for web scraping, these are Python modules that we will use

  1. Selenium
  2. Beautiful Soup
  3. Pandas

Selenium Driver Installation

To let Selenium module functions, we need to install Selenium driver. The driver depends on the operating system of your machine and the version of your web browser.

We will illustrate the installation steps for Windows, you may refer to https://selenium-python.readthedocs.io/installation.html#drivers for more detail.

1.Download the zip file containing the chromedriver.exe

rwFl4eTL4lqv4QiC7D2Gxmuy7I8F82U5P zbXwz5x9DS lLyxnbL5MPB8BxP8 NuoCVUQYcNI540m44Vv9JKl8I9evWqJDhsd 435TklmGd0Rk7DDnXm9yPIPk Kj R zXu7pCsKs0

ZDBJ4Wfl7Juw3 bYeOIw9RdetJqj8OR1aL2Efj40OWkg23KjCLdul3bFFv5UPck 7Z176dnp5vtJtTGftvdg8S8nS1NI1TPoFzuLXh0WsGYLYpNlbwkzQyrzlLuVZI7t9ZDo1n9s0

2.Unzip the folder. Optionally, you can move the folder to another directory

3.Type “Environment Variables” in start menu & select “Edit the system environment variables”

cRqgQTQkaABBs5Tagwm9vpB5y5V1M7Xi2Vt WBj2koRyvQ0G2dDBtLYi7llubV5OeLKdez2eozbXGaU8LQyTbU5Hy9HMnSwyEtmG7Z5P9MqEeccuaychKLpQ6CmUUnPLNJShX lis0

4.Update PATH variable to include the folder path which contains the driver program

Get HTML code of the website

First of all, let us try to use Selenium to launch a new browser & get the source code of NBA players’ salary data source.

aowqpm6lLgNWLo1ZfzgkndCkxUgRVTbDWRZTDku8KFUtf8UzDaJpMsXinWSDDZqS2orVlMLht2M0zQx hdpADswX5I8 zB76O3FonmZlayJT1cbQ2yW85rP4UCtwwRDGgcDJjOGMs0

You should get the HTML code as below.

GPJspbSBDIFxZj4dIEOv pv4r5gLjga2htCZmeIm5hTZXavlb7RTzpAyoY1DQ mEW1oJtr0HjT2 7 Jwrd8SjrKbJVG2wL 8qjmo wy1MFJtwg

On the other hand, you should see a new browser is launched.

5FRUQDqr5HPpGxVzlhtRfNzfSlGKUi7RfFcTaeBWPr8vbeD0fZYl6lq4ORq9WhDrV9FVEPyto4sNLQ0ZE4Hk83dN 3Bc7wZP7oaIETdXnWt6dR640MiDGYDKl d1wm6dkvOdZMKzs0

Navigate to NBA player’s basic information page

In order to get each player’s basic information, we need to navigate to the corresponding page & extract the data. The links of these pages are already in a table of the main page.

iV9l9GpnelQTYWDi0zix81vhNp6UMWlyVxUHJD SxrJlhVR5vvlWsHtDkM0WdfU qm5lM JhByCy525rOOKlKrvmUAm92qEZQwGc0eR7otVkhImQMZFBz7WN7v2 ZCLBC t9cVBVs0

To get the links in this table, we can find the corresponding HTML elements. We can use the below method to find the HTML element of those links of NBA players’ basic information page.

1.Right click one of the links

BvjERK9s0SzJ6ljjToVmUoTmgY2HmIHVNzuviF4StMdR5U6Zz M2WsoCyBO4nrvlfPbQfbXyCVTfDv2.Select “Inspect”

BZuQA4mjOO5yy1uxQQc4eWf2VlPC t5O6HVEyst1GuWsBNiAh9vFI 5HalkgPUwyKUJsLNZMCQhr4MRMD3wfGhruPks0TGPiyQ5MpmjjslgHeMjDbbDJ95mfe3fjWaVIkIM XuWWs0

3. Developer tool should appear & the corresponding HTML element should be highlighted.

uv00cmgm4q6SMOzv ZdjsdkLmy6u3B uKngctTizucVoUrOhO4VM ckPHUtFg

4.HTML elements for other players are similar to this one.

Next, we use Beautiful Soup to extract the links of basic information page for all players.

dKK5ENd1 91BgOc6LB x N4iTihYWUKDNm Tg0BqqvlMETtSbd 4Y dtJG A abS4L1 6km

There are numerous ways to query the HTML code with BeautifulSoup. Here, we locate the salary table by “table tag” & its classes, then extract all links inside it.

Extract basic information

After getting the link to each player’s basic information page, we will extract the basic information for each user. Most of those pieces of information are text and non-clickable, we need to locate its element by highlighting them and right clicking as below.

g3RQs5bL05WBlw6Ri8rE9958YRkXsppVST7OGS24m3YPFPQpQiKDfpKQ JJWpdeRDRIzk u9y1TL0 gwLM4X3LDOra1h0bGQCDNSJegGzXnqTV9eK NJo686 AcVxtrb GwcUHQOs0

Once again, you can use BeautifulSoup to extract the elements of those

pieces of information by performing certain queries.

jBfujaD1y y Q lWxFXNo03r01lzz0ZZ0tWeVoy WxL lKNgTeLeJ 24D9zR4yShtkSds9m ogsAfakezb6dHVxu gIp

Unfortunately, there is no way to identify position, birth date, height, weight and salary, as all of them share common attributes. Therefore, we get all relevant elements and match them one by one.

The output should look like this.

Q9O8ceyqfzw4npxfg1RREldNhME5XiQwcbpmFedYqz7W6 Fbb7LznGA63hJlunF99OZenOISK0HpRE 0YYXJg3XFmIg4F615dRGVLGP 2RX Vfre1nVmQkKtR1hJfoh02mjLvrYPs0

Repeat the steps for each player

After being able to extract the information from a player, we just need to repeat the whole process for each player and store the information.

eJ86aUH9 kUdOVz gtxOdSOfTmbe829PxxCkTxVW3qLwNzm 3 nY2NhizTAE1chWu744e4lpbzZbpMvG201XcnSPyDF2fjLWlNu

Export result to a csv file

We convert the data to a table-like format

AIoOSXGuRRVkMNm8HlhHsjpHJgknwJU Zv7e HNWEQ2uadmR86Ch f1Pf5lbtNrpaT1DqLlJNKqwqgO YJjMBidjiagNLK0ApY6mfL3V6yXdAPmbQ0K5YslfGwXnV2rAhHq45ldws0

You should see a table as below.

vWVO GCeaJBlW4tmADvEPhjVuAKU7L6FyXWYqGhCg2LEMoYBvmXVMh4mwCvSm2BIUqXD13AohK5vNY7t3J4xHCIUK3Qm8ivqB09yyfdOF8fPaOv2y CXseeHVjCZl4rkKAS401 ps0

Finally, we export the information as a csv file

JGWDUSlODdH2Ug7uxibIf4D5IzXV7SLNj5h5zMPzyrqLTmAgmwJekp9msuZay14VseAo0gGdVZCSK6Cvzjy7xyfRg6c75keV6BwCYTmdIFOBpT6jDU3xl1BNZhYUeYhaCxaT M52s0

Conclusion

The above tutorial outlines how to scrape data from web pages with just three python modules. In fact, anyone who has basic knowledge of Python & HTML can learn web scraping quickly given that there are lots of mature tools. In other words, anybody can steal your web content easily if you have zero protection on your web content. Therefore, it becomes crucial to protect your content by adopting Cyber Security technologies.

These technologies can monitor your websites’ traffic, verify the authenticity of incoming traffic & block the incoming traffic. For example, Geetest’s BotSonar, which is adopted by multinational companies, e.g. KFC & Nike, that technology monitors your website 24/7 and distinguishes the traffic between bad bots and human beings by their AI technology. On top of that, you can choose how do you handle those bad incoming traffic, e.g. blocking the bad incoming traffic or showing fake content to them. Besides, Geetest respects your data privacy, their products are GDPR compliant, which is a plus if you are from enterprise background.

pKVFvN7DIyv5146ZXFvjpOv

GeeTest’s anti web scraping

Wrote By:Joe

Source Code

Source code is available at

https://github.com/JoeHO888/How-does-web-scraping-become-simpler-and-how-to-prevent-it/blob/main/How

does web scraping become simpler and how to prevent it – Source Code.ipynb

Picture of GeeTest
GeeTest
GeeTest Support Team, providing smart and secure bot management solutions.
Table of Contents
More Posts
20250905_1144_Digital Security Breach_simple_compose_01k4bzh618ezqvsgxcm307q78b (1)
Credential Compromise Explained: How It Happens and How Businesses Can Prevent It

Credentials such as usernames, passwords, and authentication tokens are the...

SMS OTP Protection
What is SMS OTP, and How to Ensure SMS OTP Security in 2025?
Learn what SMS OTP is, why it remains critical in...
a cellphone with a shield on the screen
Top 5 SMS Pumping Protection Tools You Need in 2025
Compare the top SMS Pumping protection tools for 2025 to...