Selenium web scraping

Updated on

0
(0)

  1. Understand the Basics: Selenium is primarily a browser automation framework, not just a scraping tool. It controls web browsers like Chrome, Firefox, Edge to simulate human interaction. This is crucial for websites that rely heavily on JavaScript to load content, which traditional libraries like Beautiful Soup or Requests might struggle with.

    👉 Skip the hassle and get the ready to use 100% working script (Link in the comments section of the YouTube Video) (Latest test 31/05/2025)

  2. Prerequisites:

    • Python: Ensure Python 3.x is installed on your system. You can download it from python.org.
    • pip: Python’s package installer, usually comes bundled with Python.
    • Selenium WebDriver: Install the Selenium library using pip: pip install selenium.
    • Web Browser: Choose a browser you want to automate e.g., Google Chrome, Mozilla Firefox.
    • WebDriver Executable: Download the specific WebDriver executable for your chosen browser. For Chrome, it’s ChromeDriver sites.google.com/a/chromium.org/chromedriver/downloads. for Firefox, it’s GeckoDriver github.com/mozilla/geckodriver/releases. Place this executable in a directory that’s in your system’s PATH, or provide its full path in your script.
  3. Initial Setup Code Example:

    from selenium import webdriver
    
    
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.common.by import By # For locating elements
    
    # Path to your WebDriver executable e.g., ChromeDriver
    # Make sure this path is correct, or add chromedriver to your PATH
    
    
    webdriver_service = Service'/path/to/your/chromedriver'
    
    
    driver = webdriver.Chromeservice=webdriver_service
    
    # Navigate to a website
    driver.get"https://www.example.com" # Replace with your target URL
    
    # Print the page title
    printdriver.title
    
    # Close the browser
    driver.quit
    
  4. Locating Elements: Selenium offers various methods to find elements on a webpage:

    • find_elementBy.ID, "element_id"
    • find_elementBy.NAME, "element_name"
    • find_elementBy.CLASS_NAME, "class_name"
    • find_elementBy.TAG_NAME, "tag_name"
    • find_elementBy.LINK_TEXT, "Link Text"
    • find_elementBy.PARTIAL_LINK_TEXT, "Partial Link"
    • find_elementBy.CSS_SELECTOR, "css_selector"
    • find_elementBy.XPATH, "xpath_expression"

    Use find_elements plural to get a list of all matching elements.

  5. Interacting with Elements: Once you’ve located an element, you can interact with it:

    • element.click: Clicks an element.
    • element.send_keys"your text": Types text into an input field.
    • element.clear: Clears text from an input field.
    • element.text: Retrieves the visible text of an element.
    • element.get_attribute"attribute_name": Gets the value of an attribute e.g., href, src.
  6. Handling Dynamic Content and Waits: Websites often load content dynamically. Selenium provides waiting mechanisms:

    • Implicit Waits: driver.implicitly_wait10 sets a general timeout for finding elements e.g., 10 seconds.
    • Explicit Waits: More precise, waits for a specific condition to be met.
      
      
      from selenium.webdriver.support.ui import WebDriverWait
      
      
      from selenium.webdriver.support import expected_conditions as EC
      
      try:
      
      
         element = WebDriverWaitdriver, 10.until
      
      
             EC.presence_of_element_locatedBy.ID, "some_dynamic_element"
          
          printelement.text
      except Exception as e:
          printf"Element not found: {e}"
      
  7. Ethical Considerations and Best Practices:

    • Respect robots.txt: Always check the robots.txt file of the website e.g., example.com/robots.txt to understand their scraping policies. Some sites explicitly disallow scraping.
    • Rate Limiting: Don’t bombard a server with requests. Introduce delays time.sleep between requests to avoid overwhelming the server, which could lead to your IP being blocked. A common practice is to wait for 1-5 seconds between requests, or even longer for sensitive sites.
    • User-Agent: Set a custom User-Agent string to mimic a real browser. While Selenium does this by default, sometimes customizing it can help.
    • Handle Errors: Implement robust error handling try-except blocks to gracefully manage situations where elements aren’t found or network issues occur.
    • Data Storage: Once data is extracted, store it responsibly in formats like CSV, JSON, or a database.
    • Avoid Illegal Activities: Never use scraping for illegal purposes like spamming, financial fraud, or unauthorized access to sensitive data. Always prioritize ethical conduct and legality in your data collection endeavors. If you’re looking for financial information, explore legitimate sources like official financial reports, public APIs, or reputable financial data providers instead of scraping private financial platforms.

Table of Contents

Understanding Selenium for Web Scraping

Selenium is a powerful tool primarily designed for automated testing of web applications, but its capability to control web browsers programmatically makes it an excellent choice for web scraping, especially when dealing with dynamic, JavaScript-heavy websites.

Unlike traditional scraping libraries that only fetch raw HTML, Selenium can interact with web elements, execute JavaScript, simulate user actions like clicks and scrolls, and wait for content to load, thereby mimicking a real user’s browsing experience.

This nuanced interaction allows access to data that is not immediately present in the initial HTML response.

The Core Difference: Dynamic Content vs. Static HTML

Websites today are rarely static HTML documents.

A significant portion of their content, especially on e-commerce sites, social media platforms, or news portals, is loaded asynchronously using JavaScript, APIs, and AJAX requests after the initial page load.

  • Static HTML Scraping: Libraries like requests and Beautiful Soup are highly efficient for static websites where all the desired data is present in the initial HTML source. They fetch the HTML and allow you to parse it directly. The primary advantage is speed and low resource consumption.
  • Dynamic Content Scraping: When content is loaded dynamically, requests will only get you the initial HTML, often devoid of the data you need. This is where Selenium shines. It launches a real browser, allowing the JavaScript to execute, AJAX calls to complete, and content to render just as it would for a human user. This enables you to scrape data from elements that appear after user interactions, scrolling, or specific time delays. For instance, infinite scrolling pages or those with “Load More” buttons are perfect candidates for Selenium.

When to Choose Selenium Over Other Libraries

Choosing the right tool is crucial for efficient and ethical scraping.

Selenium, while powerful, comes with its own overhead.

  • When to Use Selenium:
    • JavaScript-Rendered Content: If the data you need is generated or displayed via JavaScript, Selenium is almost a necessity. This includes Single Page Applications SPAs built with frameworks like React, Angular, or Vue.js.
    • User Interactions Required: If you need to click buttons, fill out forms, scroll down the page, navigate through pagination, or interact with pop-ups to reveal content, Selenium is the ideal choice.
    • Capturing Screenshots/Page States: When you need to visually verify the page content or capture screenshots at specific interaction points.
    • Handling Iframes and Pop-ups: Selenium can easily switch contexts to interact with elements within iframes or handle various types of pop-ups.
  • When to Consider Alternatives or Combine:
    • Static Content: For simple, static HTML pages, requests and Beautiful Soup are significantly faster and lighter on system resources. Always try these first.
    • API Discovery: Sometimes, dynamic content is loaded via an underlying API. If you can identify and directly call the API endpoint, it’s often much faster and more efficient than simulating browser actions with Selenium. Use browser developer tools Network tab to inspect API calls.
    • Performance is Critical: Selenium is slower because it launches a full browser instance. For large-scale scraping of millions of pages, this can be a bottleneck.
    • Resource Constraints: Selenium consumes more CPU and RAM. If you’re running scraping jobs on resource-limited servers, this could be an issue.

Setting Up Your Selenium Web Scraping Environment

Getting your development environment ready is the first practical step towards building effective Selenium scrapers.

This involves installing Python, the Selenium library, and the specific browser drivers.

Installing Python and pip

Python is the programming language of choice for most web scraping projects due to its rich ecosystem of libraries and readability. Usage accounts

  • Python Installation:

    1. Visit the official Python website: python.org/downloads.

    2. Download the latest stable version of Python 3.x for your operating system Windows, macOS, Linux.

    3. Crucially for Windows users: During installation, ensure you check the box that says “Add Python X.Y to PATH”. This makes it easier to run Python commands from your terminal. For macOS/Linux, Python is often pre-installed, but it’s good practice to install the latest version via package managers like brew for macOS or apt for Debian/Ubuntu.

  • Verifying Installation: Open your terminal or command prompt and type:

    python --version
    
    
    or for macOS/Linux, sometimes `python3 --version`. You should see the installed Python version.
    
  • pip: pip is Python’s package installer. It typically comes bundled with Python 3.4 and later. You can verify its installation by typing:
    pip –version
    or pip3 --version.

Installing Selenium WebDriver Library

Once Python and pip are ready, installing the Selenium library is straightforward.

  • Installation Command: Open your terminal or command prompt and run:
    pip install selenium

    This command downloads and installs the latest version of the Selenium package from the Python Package Index PyPI.

  • Verification: You can quickly verify the installation by opening a Python interpreter type python or python3 in your terminal and trying to import a module:
    import selenium
    printselenium.version Best multilogin alternatives

    If it runs without errors and prints a version number, Selenium is installed correctly.

Downloading Browser Drivers e.g., ChromeDriver, GeckoDriver

Selenium controls web browsers through specific executables called “browser drivers.” Each browser requires its own driver.

  • ChromeDriver for Google Chrome:
    1. Check Chrome Version: Open Google Chrome, click the three dots menu in the top-right corner, go to “Help” > “About Google Chrome”. Note your exact Chrome browser version number e.g., 120.0.6099.109.

    2. Download ChromeDriver: Go to the official ChromeDriver downloads page: sites.google.com/a/chromium.org/chromedriver/downloads.

    3. Find the ChromeDriver version that matches your Chrome browser version.

If an exact match isn’t available, choose the closest one that is compatible often the major version number, e.g., if Chrome is 120, use ChromeDriver 120.

4.  Download the appropriate `.zip` file for your operating system.
5.  Extract and Place: Extract the `chromedriver.exe` Windows or `chromedriver` macOS/Linux file from the downloaded zip.
6.  Add to PATH Recommended: Place this executable file into a directory that is already in your system's PATH environment variable e.g., `C:\Windows` for Windows, `/usr/local/bin` for macOS/Linux. This allows Selenium to find the driver automatically without specifying its full path in your code.
7.  Alternatively Specify Path: If you don't want to add it to PATH, you can place it anywhere and provide the full path to the executable in your Selenium script.
  • GeckoDriver for Mozilla Firefox:
    1. Check Firefox Version: Open Firefox, go to “Help” > “About Firefox”.

    2. Download GeckoDriver: Go to the official GeckoDriver releases page on GitHub: github.com/mozilla/geckodriver/releases.

    3. Download the latest stable release for your operating system.

    4. Extract and Place: Extract geckodriver.exe or geckodriver and place it in your system’s PATH, or note its location for explicit path specification. Train llm browserless

  • Other Drivers: Similar drivers exist for Microsoft Edge EdgeDriver and Apple Safari SafariDriver. The process for setup is analogous.

Important Note on Paths: If you don’t add the driver to your system’s PATH, you will need to specify its location when initializing the WebDriver:

from selenium import webdriver


from selenium.webdriver.chrome.service import Service

# Example for ChromeDriver
# driver_path = "C:/path/to/your/chromedriver.exe" # Windows
driver_path = "/usr/local/bin/chromedriver" # macOS/Linux if not in PATH

service = Servicedriver_path
driver = webdriver.Chromeservice=service

Basic Navigation and Element Interaction

With your Selenium environment set up, you can now start writing code to control the browser.

The essence of web scraping with Selenium lies in navigating to pages and interacting with specific elements on those pages.

Launching a Browser and Navigating

The first step in any Selenium script is to launch a browser instance and direct it to a URL.

Import time # For delays

Path to your ChromeDriver executable

Make sure this path is correct if not added to system PATH

service = Service’/path/to/your/chromedriver’

try:
# Navigate to a specific URL
target_url = “https://quotes.toscrape.com/” # A simple, legal scraping target
driver.gettarget_url
printf”Navigated to: {driver.current_url}”

# Get the page title
 printf"Page Title: {driver.title}"

# You can also get the full page source
# page_source = driver.page_source
# printpage_source # Print first 500 characters of source

# Add a small delay to observe the browser optional
 time.sleep3

except Exception as e:
printf”An error occurred: {e}”

finally:
# Always close the browser when done
print”Browser closed.”

Explanation: Youtube scraper

  • webdriver.Chromeservice=service: Initializes a Chrome browser instance. Replace Chrome with Firefox, Edge, etc., if using a different browser. The service object specifies the path to your WebDriver executable.
  • driver.geturl: This command instructs the browser to open the specified URL. It waits until the page or at least the initial HTML has loaded before proceeding.
  • driver.current_url: Returns the URL of the current page.
  • driver.title: Returns the title of the current page, as found in the <title> tag.
  • driver.page_source: Returns the complete HTML source code of the current page, including any modifications made by JavaScript.
  • driver.quit: Crucially, this command closes the browser window and terminates the WebDriver session. Failing to call quit can leave orphaned browser processes running in the background, consuming system resources.

Locating Elements: XPath and CSS Selectors

Once a page is loaded, the next critical step is to find the specific pieces of data you want to extract. Selenium provides several methods to locate elements, but XPath and CSS Selectors are generally the most robust and flexible.

What are Element Locators?

Element locators are strategies used by Selenium to identify unique elements on a web page.

Think of them as addresses for specific parts of the HTML document.

  • By.ID: Locates an element by its id attribute. IDs are supposed to be unique on a page. driver.find_elementBy.ID, "some_id"
  • By.NAME: Locates an element by its name attribute. driver.find_elementBy.NAME, "input_name"
  • By.CLASS_NAME: Locates an element by its class attribute. Be aware that multiple elements can share the same class name. driver.find_elementBy.CLASS_NAME, "product-title"
  • By.TAG_NAME: Locates elements by their HTML tag name e.g., div, a, p. driver.find_elementBy.TAG_NAME, "h1"
  • By.LINK_TEXT and By.PARTIAL_LINK_TEXT: Used for <a> anchor elements, matching the visible text of the link. driver.find_elementBy.LINK_TEXT, "Next Page"
  • By.XPATH: A powerful language for navigating XML documents and HTML, which is a form of XML. It allows for complex selections based on element relationships, attributes, and text content.
  • By.CSS_SELECTOR: A common and often simpler way to select elements using CSS syntax. Developers use CSS selectors to style web pages, so they are naturally well-suited for identifying elements.

Practical Examples Using quotes.toscrape.com

Let’s try to scrape the first quote and its author from https://quotes.toscrape.com/.

from selenium.webdriver.common.by import By
import time

 driver.get"https://quotes.toscrape.com/"
time.sleep2 # Give page time to load

# --- Locating a single element the first quote's text ---
# Inspect the page: the first quote text is usually within a <span class="text">
# Using CSS Selector:


first_quote_css = driver.find_elementBy.CSS_SELECTOR, ".quote .text"


printf"First Quote CSS Selector: {first_quote_css.text}"

# Using XPath:


first_quote_xpath = driver.find_elementBy.XPATH, "//div/span"


printf"First Quote XPath: {first_quote_xpath.text}"

# --- Locating the author of the first quote ---
# The author is usually within a <small class="author">


first_author_css = driver.find_elementBy.CSS_SELECTOR, ".quote .author"


printf"First Author CSS Selector: {first_author_css.text}"



first_author_xpath = driver.find_elementBy.XPATH, "//div/small"


printf"First Author XPath: {first_author_xpath.text}"

# --- Locating multiple elements all quotes on the page ---
# Use find_elements plural to get a list


all_quotes_elements = driver.find_elementsBy.CLASS_NAME, "text"
 print"\nAll Quotes on Page:"


for i, quote_element in enumerateall_quotes_elements:
     printf"{i+1}. {quote_element.text}"

# --- Extracting attributes e.g., href from a link ---
# Let's find the "Login" link and get its href attribute


login_link = driver.find_elementBy.LINK_TEXT, "Login"
 login_href = login_link.get_attribute"href"
 printf"\nLogin link Href: {login_href}"

# --- Interacting with elements e.g., clicking a button ---
# Let's try to click the "Next" button
 try:


    next_button = driver.find_elementBy.CLASS_NAME, "next"
    next_button_link = next_button.find_elementBy.TAG_NAME, "a" # The <a> tag inside the 'next' div
     next_button_link.click
    time.sleep3 # Wait for the new page to load


    printf"\nNavigated to next page: {driver.current_url}"
 except Exception as e:


    printf"No 'Next' button found or clickable: {e}"

Key Takeaways:

  • find_element vs. find_elements: find_element returns the first matching element or raises a NoSuchElementException if none are found. find_elements returns a list of all matching elements an empty list if none are found.
  • element.text: Retrieves the visible, rendered text content of an element.
  • element.get_attribute"attribute_name": Retrieves the value of a specified HTML attribute e.g., href, src, value, class.
  • element.click: Simulates a mouse click on the element.
  • element.send_keys"text": Simulates typing text into an input field or text area.
  • element.clear: Clears any existing text from an input field.

Mastering these basic interactions is the foundation for any complex web scraping task with Selenium.

Handling Dynamic Content and Waits

Many modern websites load content dynamically using JavaScript, meaning that parts of the page might not be immediately available when Selenium first loads the URL.

This can lead to NoSuchElementException errors if your script tries to find an element before it has rendered.

Selenium provides powerful “wait” mechanisms to overcome this. Selenium alternatives

Implicit Waits

An implicit wait tells the WebDriver to poll the DOM Document Object Model for a certain amount of time when trying to find an element or elements if they are not immediately available. The default setting is 0 seconds.

Once set, an implicit wait remains in effect for the life of the WebDriver object.

Set an implicit wait of 10 seconds

Driver.implicitly_wait10 # seconds

driver.get"https://www.example.com" # Replace with a site that has dynamic loading

# Selenium will wait up to 10 seconds for an element with ID 'dynamic_element' to appear


dynamic_element = driver.find_elementBy.ID, "dynamic_element"


printf"Found dynamic element: {dynamic_element.text}"

Pros: Simple to implement, applies globally to all find_element calls.
Cons: Can slow down tests or scrapers unnecessarily if elements appear quickly, as it always waits for the full duration if an element isn’t found immediately. It only waits for the element to exist in the DOM, not necessarily to be visible or clickable.

Explicit Waits

Explicit waits are more sophisticated and allow you to pause your script until a specific condition has been met, or a maximum timeout has been reached.

This is generally preferred for its precision, as it only waits as long as necessary.

You’ll use the WebDriverWait class in conjunction with expected_conditions aliased as EC.

From selenium.webdriver.support.ui import WebDriverWait

From selenium.webdriver.support import expected_conditions as EC

From selenium.common.exceptions import TimeoutException, NoSuchElementException Record puppeteer scripts

driver.get"https://quotes.toscrape.com/scroll" # A page with infinite scroll

# Simulate scrolling to load more quotes


last_height = driver.execute_script"return document.body.scrollHeight"
 scroll_count = 0
max_scrolls = 3 # Limit to 3 scrolls for demonstration

 while scroll_count < max_scrolls:
    # Scroll down to bottom


    driver.execute_script"window.scrollTo0, document.body.scrollHeight."

    # Wait for new content to load
        # Explicitly wait for the new quotes to appear
        # We wait for the number of quote elements to be greater than before
         WebDriverWaitdriver, 10.until


            lambda driver: lendriver.find_elementsBy.CLASS_NAME, "quote" > lenoriginal_quotes if 'original_quotes' in locals else True
        # Alternatively, wait for a specific element that signals new content
        # EC.presence_of_element_locatedBy.CSS_SELECTOR, "div.quote:last-child"
        # This lambda is a bit advanced. often you'd wait for a loading spinner to disappear or specific content to appear.
        # For simplicity, let's just wait for a moment after scrolling to give time for new content to load
         time.sleep2



        new_height = driver.execute_script"return document.body.scrollHeight"
         if new_height == last_height:


            print"No more content loaded after scrolling."
             break
         last_height = new_height
        original_quotes = driver.find_elementsBy.CLASS_NAME, "quote" # Update count
         scroll_count += 1
         printf"Scrolled {scroll_count} times. Current quotes: {lenoriginal_quotes}"

     except TimeoutException:


        print"Timed out waiting for new content to load."
         break
     except NoSuchElementException:


        print"Element not found after scroll."

# After scrolling, let's extract some quotes


all_quotes = driver.find_elementsBy.CLASS_NAME, "quote"


printf"\nTotal quotes found: {lenall_quotes}"
for i, quote_elem in enumerateall_quotes: # Print first 5


    text = quote_elem.find_elementBy.CLASS_NAME, "text".text


    author = quote_elem.find_elementBy.CLASS_NAME, "author".text
     printf"{i+1}. \"{text}\" - {author}"


 printf"An unexpected error occurred: {e}"

Common expected_conditions:

  • EC.presence_of_element_locatedBy.ID, 'some_id': Waits until an element is present in the DOM. It doesn’t necessarily mean it’s visible.
  • EC.visibility_of_element_locatedBy.CSS_SELECTOR, '.some_class': Waits until an element is present in the DOM and visible.
  • EC.element_to_be_clickableBy.XPATH, '//button': Waits until an element is visible and enabled, so you can click it.
  • EC.text_to_be_present_in_elementBy.ID, 'status', 'Complete': Waits until a specific text is present in an element.
  • EC.invisibility_of_element_locatedBy.CLASS_NAME, 'loading-spinner': Useful for waiting for a loading indicator to disappear.
  • EC.title_contains'keyword': Waits until the page title contains a specific keyword.

Key Differences:

  • Implicit Wait: Applied once globally, polls for the entire duration if element not found immediately. Simpler but less precise.
  • Explicit Wait: Applied on a per-element basis with specific conditions. Waits only as long as necessary or until timeout. More powerful and flexible.

Best Practice: While implicit waits are easy, explicit waits are generally recommended for robust scrapers, especially when dealing with highly dynamic content. Combine them with try-except blocks to handle TimeoutException gracefully.

Advanced Techniques: Scrolling, Pagination, and Forms

Beyond basic element interaction, many scraping tasks require more advanced techniques to navigate complex website structures or extract data from interactive components.

Handling Scrolling Infinite Scroll

Many modern websites implement “infinite scrolling,” where content loads dynamically as the user scrolls down the page, eliminating traditional pagination. Selenium can simulate this behavior.

driver.get"https://quotes.toscrape.com/scroll" # A demo site with infinite scroll
time.sleep2 # Give initial page time to load

 all_quotes_data = 


 scroll_attempts = 0
max_scroll_attempts = 5 # Prevent infinite loops, adjust as needed

 while True:


    time.sleep2 # Short pause to allow new content to load

    # Calculate new scroll height and compare with last scroll height


    new_height = driver.execute_script"return document.body.scrollHeight"
     if new_height == last_height:
         print"No more content loaded. Reached end of scroll."
     last_height = new_height
     scroll_attempts += 1


    printf"Scrolled to new height: {new_height}. Attempt {scroll_attempts}/{max_scroll_attempts}"

     if scroll_attempts >= max_scroll_attempts:


        printf"Reached maximum scroll attempts {max_scroll_attempts}. Stopping."

# After scrolling, extract all visible quotes


quotes_elements = driver.find_elementsBy.CLASS_NAME, "quote"
 for quote_elem in quotes_elements:


        text = quote_elem.find_elementBy.CLASS_NAME, "text".text


        author = quote_elem.find_elementBy.CLASS_NAME, "author".text


        tags_elements = quote_elem.find_elementsBy.CLASS_NAME, "tag"


        tags = 
         all_quotes_data.append{
             "text": text,
             "author": author,
             "tags": tags
         }


        printf"Could not extract quote details: {e}"



printf"\nTotal quotes scraped: {lenall_quotes_data}"
# for quote in all_quotes_data: # Print first 10
#     printquote

driver.execute_script: This is a powerful method that allows you to execute arbitrary JavaScript code within the browser context. It’s essential for tasks like scrolling, interacting with hidden elements, or manipulating the DOM directly.

Handling Pagination

Traditional pagination involves clicking “Next Page” buttons or navigating to specific page numbers. Selenium can simulate these clicks.

Continuing from previous example, but using a paginated site

Let’s use https://quotes.toscrape.com/ for pagination demo

all_quotes_paginated =
base_url = “https://quotes.toscrape.com/

 driver.getbase_url
 time.sleep2

 current_page = 1


    printf"\nScraping Page {current_page} {driver.current_url}"


    quotes_on_page = driver.find_elementsBy.CLASS_NAME, "quote"
     for quote_elem in quotes_on_page:
         try:


            text = quote_elem.find_elementBy.CLASS_NAME, "text".text


            author = quote_elem.find_elementBy.CLASS_NAME, "author".text


            all_quotes_paginated.append{"text": text, "author": author}
         except Exception as e:


            printf"Error scraping quote on page {current_page}: {e}"

    # Check for the "Next" button
     next_button_exists = False
        # Look for the <li> with class "next" and then the <a> inside it


        next_link_element = WebDriverWaitdriver, 5.until


            EC.element_to_be_clickableBy.XPATH, "//li/a"
         next_link_element.click
         next_button_exists = True
        time.sleep2 # Wait for next page to load
         current_page += 1
         print"No 'Next' button found. End of pagination."


        printf"Error clicking next button: {e}"



printf"An error occurred during pagination: {e}"



printf"\nTotal quotes scraped across pages: {lenall_quotes_paginated}"
# for quote in all_quotes_paginated: # Print first 20

Key Points for Pagination:

  • Looping: Use a while True loop that breaks when the “Next” button is no longer found.
  • Waiting for the Next Button: Use WebDriverWait with EC.element_to_be_clickable to ensure the next button is ready before clicking.
  • Error Handling: Wrap the click operation in a try-except block to catch TimeoutException when the “Next” button is no longer present.

Interacting with Forms

Filling out forms is a common task, whether for logging in, searching, or filtering content. Optimizing puppeteer

driver.get"https://quotes.toscrape.com/login" # Login page

# Find the username and password input fields


username_field = driver.find_elementBy.ID, "username"


password_field = driver.find_elementBy.ID, "password"

# Type credentials use dummy credentials for demonstration
username_field.send_keys"test_user" # Replace with actual if needed
password_field.send_keys"test_password" # Replace with actual if needed

# Find and click the login button
# The button has a type="submit" and class "btn btn-primary"


login_button = driver.find_elementBy.CSS_SELECTOR, "input"
 login_button.click

# Wait for the login process to complete and page to redirect or show message
# For this site, it redirects to a generic page after login attempt
WebDriverWaitdriver, 10.untilEC.url_changesdriver.current_url # Wait for URL to change


printf"After login attempt, current URL: {driver.current_url}"

# You can then check for success/failure messages
# For example, look for an alert message or a specific element on the logged-in page
 if "No account found" in driver.page_source:


    print"Login failed: No account found or invalid credentials."
 elif "/login" not in driver.current_url:


    print"Login attempt might have been successful redirected."
 else:


    print"Login page still present or unexpected behavior."



printf"An error occurred during form interaction: {e}"

Form Interaction Details:

  • send_keys: Used to type text into input fields <input type="text">, <textarea>, <input type="password">.

  • click: Used to activate buttons <button>, <input type="submit">, <input type="button">, checkboxes, radio buttons, and select options.

  • Selecting from Dropdowns <select>: For <select> elements, Selenium provides a Select class.

    From selenium.webdriver.support.ui import Select

    Assuming a

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *