Web scraping is a powerful technique to extract data from websites, and Python offers several libraries for this purpose. In this tutorial, we'll walk through a Python script that uses BeautifulSoup and Pandas to scrape book information from the 'https://books.toscrape.com/' website.
Step 1: Importing Libraries
We begin by importing the necessary libraries. BeautifulSoup is used for parsing HTML content, requests for making HTTP requests to the website, and Pandas for creating and manipulating data frames.
Step 2: Fetching Web Page Content
Next, we specify the URL of the website and use the requests library to fetch the HTML content of the page. We then decode the content to remove any encoding issues.
Step 3: Extracting Book Information
The book information is contained within <ol> (ordered list) tags on the webpage. We use BeautifulSoup to find all the <ol> tags.
Step 4: Creating a DataFrame
We define the column names for our data frame and create an empty data frame using Pandas.
Step 5: Looping Through Book Elements
We loop through each <ol> tag to find the <li> tags (list items) containing book information. Within each <li>, we locate the <article> tag that encompasses the book details.
Step 6: Extracting Book Details
Within the <article> tag, we locate the <h3> tag to get the book title and link. We also access the price and availability information.
Step 7: Populating the DataFrame
For each book, we insert a new row into the data frame.
That's it! You've successfully scraped book information from a website and stored it in a CSV file using Python. Feel free to customize the code for your specific needs or explore additional features provided by BeautifulSoup and Pandas. Happy coding!
For complete code : https://github.com/Aaminah27/Python-Scripts/blob/main/books_website_scraping.py

Comments
Post a Comment