Dauo

1. Introduction

The current image has no alternative text. The file name is: IMG_20250912_225223-scaled.jpg

Web scraping is the process of extracting data from websites. In Python, we commonly use libraries like requests (to fetch web pages) and BeautifulSoup (to parse and extract information from HTML).


2. Installing Required Libraries

Before scraping, install the libraries:

pip install requests beautifulsoup4

3. Basic Web Scraping Example

import requests
from bs4 import BeautifulSoup

# Step 1: Fetch a web page
url = "https://quotes.toscrape.com/"
response = requests.get(url)

# Step 2: Parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")

# Step 3: Extract quotes and authors
quotes = soup.find_all("span", class_="text")
authors = soup.find_all("small", class_="author")

for q, a in zip(quotes, authors):
    print(f"{q.get_text()} - {a.get_text()}")
Output Example:
“The world as we have created it is a process of our thinking.” - Albert Einstein
“It is our choices, Harry, that show what we truly are.” - J.K. Rowling
...

4. Commonly Used BeautifulSoup Methods

MethodDescription
soup.find(tag)Finds the first occurrence of a tag
soup.find_all(tag)Finds all occurrences of a tag
element.get_text()Extracts text content inside an element
element['attribute']Gets the value of an attribute (e.g., href)
soup.select("css_selector")Finds elements using CSS selectors

5. Extracting Links Example

links = soup.find_all("a")
for link in links:
    href = link.get("href")
    text = link.get_text(strip=True)
    print(f"Text: {text} -> Link: {href}")

6. Scraping with CSS Selectors

# Example: Get all quotes using CSS selectors
quotes = soup.select("span.text")
for q in quotes:
    print(q.text)

7. Handling Pagination (Multiple Pages)
page = 1
while True:
    url = f"https://quotes.toscrape.com/page/{page}/"
    response = requests.get(url)
    
    if "No quotes found!" in response.text:
        break
    
    soup = BeautifulSoup(response.text, "html.parser")
    quotes = soup.find_all("span", class_="text")
    
    for q in quotes:
        print(q.get_text())
    
    page += 1

8. Best Practices for Web Scraping

  • ✅ Always check the website’s robots.txt rules.
  • ✅ Avoid overloading servers (use delays with time.sleep).
  • ✅ Consider using APIs if available instead of scraping.
  • ✅ Be respectful and ethical when scraping.

9. Summary

  • requests → Fetches web pages.
  • BeautifulSoup → Parses and extracts data from HTML.
  • Methods like .find(), .find_all(), .select(), and .get_text() help extract elements.
  • Pagination lets you scrape multiple pages in a loop.
  • Always follow best practices and respect website policies.
⬇️ Download Dauo تحميل Server 3

Related Posts

Macia_Pro

1. What is an API? 2. The requests Library The most common library for working with APIs in Python. Install if needed: 3. Making a GET Request ✅ GET requests…

Read more

Drag_Club

1. What is JSON? Example JSON data: 2. Working with JSON in Python Python has a built-in json module. 3. Converting JSON to Python (Parsing) 4. Converting Python to JSON…

Read more

BEASh

1. What Is a Module? A module is simply a Python file (.py) that contains functions, classes, or variables you can reuse in other programs. Example: mymodule.py Using it in…

Read more

Phonex_pro

1. Opening a File Python uses the built-in open() function: File Modes Mode Description “r” Read (default) – opens file for reading; error if file doesn’t exist “w” Write –…

Read more

Tigriptv

Introduction Errors are a normal part of programming. In Python, you will encounter two main types of errors: Python provides a powerful way to handle exceptions so your program doesn’t…

Read more

Cleo_patra

1. Introduction File handling is one of the most important skills in Python. Almost every real-world application needs to read from or write to a file. Python makes file handling…

Read more