1. Introduction

Web scraping is the process of extracting data from websites. In Python, we commonly use libraries like requests
(to fetch web pages) and BeautifulSoup
(to parse and extract information from HTML).
2. Installing Required Libraries
Before scraping, install the libraries:
pip install requests beautifulsoup4
3. Basic Web Scraping Example
import requests
from bs4 import BeautifulSoup
# Step 1: Fetch a web page
url = "https://quotes.toscrape.com/"
response = requests.get(url)
# Step 2: Parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")
# Step 3: Extract quotes and authors
quotes = soup.find_all("span", class_="text")
authors = soup.find_all("small", class_="author")
for q, a in zip(quotes, authors):
print(f"{q.get_text()} - {a.get_text()}")
Output Example:
“The world as we have created it is a process of our thinking.” - Albert Einstein
“It is our choices, Harry, that show what we truly are.” - J.K. Rowling
...
4. Commonly Used BeautifulSoup Methods
Method | Description |
---|---|
soup.find(tag) | Finds the first occurrence of a tag |
soup.find_all(tag) | Finds all occurrences of a tag |
element.get_text() | Extracts text content inside an element |
element['attribute'] | Gets the value of an attribute (e.g., href ) |
soup.select("css_selector") | Finds elements using CSS selectors |
5. Extracting Links Example
links = soup.find_all("a")
for link in links:
href = link.get("href")
text = link.get_text(strip=True)
print(f"Text: {text} -> Link: {href}")
6. Scraping with CSS Selectors
# Example: Get all quotes using CSS selectors
quotes = soup.select("span.text")
for q in quotes:
print(q.text)
7. Handling Pagination (Multiple Pages)
page = 1
while True:
url = f"https://quotes.toscrape.com/page/{page}/"
response = requests.get(url)
if "No quotes found!" in response.text:
break
soup = BeautifulSoup(response.text, "html.parser")
quotes = soup.find_all("span", class_="text")
for q in quotes:
print(q.get_text())
page += 1
8. Best Practices for Web Scraping
- ✅ Always check the website’s robots.txt rules.
- ✅ Avoid overloading servers (use delays with
time.sleep
). - ✅ Consider using APIs if available instead of scraping.
- ✅ Be respectful and ethical when scraping.
9. Summary
requests
→ Fetches web pages.BeautifulSoup
→ Parses and extracts data from HTML.- Methods like
.find()
,.find_all()
,.select()
, and.get_text()
help extract elements. - Pagination lets you scrape multiple pages in a loop.
- Always follow best practices and respect website policies.