-
Understand the Basics: Selenium is primarily a browser automation framework, not just a scraping tool. It controls web browsers like Chrome, Firefox, Edge to simulate human interaction. This is crucial for websites that rely heavily on JavaScript to load content, which traditional libraries like Beautiful Soup or Requests might struggle with.
👉 Skip the hassle and get the ready to use 100% working script (Link in the comments section of the YouTube Video) (Latest test 31/05/2025)
-
Prerequisites:
- Python: Ensure Python 3.x is installed on your system. You can download it from python.org.
- pip: Python’s package installer, usually comes bundled with Python.
- Selenium WebDriver: Install the Selenium library using pip:
pip install selenium
. - Web Browser: Choose a browser you want to automate e.g., Google Chrome, Mozilla Firefox.
- WebDriver Executable: Download the specific WebDriver executable for your chosen browser. For Chrome, it’s ChromeDriver sites.google.com/a/chromium.org/chromedriver/downloads. for Firefox, it’s GeckoDriver github.com/mozilla/geckodriver/releases. Place this executable in a directory that’s in your system’s PATH, or provide its full path in your script.
-
Initial Setup Code Example:
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By # For locating elements # Path to your WebDriver executable e.g., ChromeDriver # Make sure this path is correct, or add chromedriver to your PATH webdriver_service = Service'/path/to/your/chromedriver' driver = webdriver.Chromeservice=webdriver_service # Navigate to a website driver.get"https://www.example.com" # Replace with your target URL # Print the page title printdriver.title # Close the browser driver.quit
-
Locating Elements: Selenium offers various methods to find elements on a webpage:
find_elementBy.ID, "element_id"
find_elementBy.NAME, "element_name"
find_elementBy.CLASS_NAME, "class_name"
find_elementBy.TAG_NAME, "tag_name"
find_elementBy.LINK_TEXT, "Link Text"
find_elementBy.PARTIAL_LINK_TEXT, "Partial Link"
find_elementBy.CSS_SELECTOR, "css_selector"
find_elementBy.XPATH, "xpath_expression"
Use
find_elements
plural to get a list of all matching elements. -
Interacting with Elements: Once you’ve located an element, you can interact with it:
element.click
: Clicks an element.element.send_keys"your text"
: Types text into an input field.element.clear
: Clears text from an input field.element.text
: Retrieves the visible text of an element.element.get_attribute"attribute_name"
: Gets the value of an attribute e.g.,href
,src
.
-
Handling Dynamic Content and Waits: Websites often load content dynamically. Selenium provides waiting mechanisms:
- Implicit Waits:
driver.implicitly_wait10
sets a general timeout for finding elements e.g., 10 seconds. - Explicit Waits: More precise, waits for a specific condition to be met.
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC try: element = WebDriverWaitdriver, 10.until EC.presence_of_element_locatedBy.ID, "some_dynamic_element" printelement.text except Exception as e: printf"Element not found: {e}"
- Implicit Waits:
-
Ethical Considerations and Best Practices:
- Respect
robots.txt
: Always check therobots.txt
file of the website e.g.,example.com/robots.txt
to understand their scraping policies. Some sites explicitly disallow scraping. - Rate Limiting: Don’t bombard a server with requests. Introduce delays
time.sleep
between requests to avoid overwhelming the server, which could lead to your IP being blocked. A common practice is to wait for 1-5 seconds between requests, or even longer for sensitive sites. - User-Agent: Set a custom User-Agent string to mimic a real browser. While Selenium does this by default, sometimes customizing it can help.
- Handle Errors: Implement robust error handling try-except blocks to gracefully manage situations where elements aren’t found or network issues occur.
- Data Storage: Once data is extracted, store it responsibly in formats like CSV, JSON, or a database.
- Avoid Illegal Activities: Never use scraping for illegal purposes like spamming, financial fraud, or unauthorized access to sensitive data. Always prioritize ethical conduct and legality in your data collection endeavors. If you’re looking for financial information, explore legitimate sources like official financial reports, public APIs, or reputable financial data providers instead of scraping private financial platforms.
- Respect
Understanding Selenium for Web Scraping
Selenium is a powerful tool primarily designed for automated testing of web applications, but its capability to control web browsers programmatically makes it an excellent choice for web scraping, especially when dealing with dynamic, JavaScript-heavy websites.
Unlike traditional scraping libraries that only fetch raw HTML, Selenium can interact with web elements, execute JavaScript, simulate user actions like clicks and scrolls, and wait for content to load, thereby mimicking a real user’s browsing experience.
This nuanced interaction allows access to data that is not immediately present in the initial HTML response.
The Core Difference: Dynamic Content vs. Static HTML
Websites today are rarely static HTML documents.
A significant portion of their content, especially on e-commerce sites, social media platforms, or news portals, is loaded asynchronously using JavaScript, APIs, and AJAX requests after the initial page load.
- Static HTML Scraping: Libraries like
requests
andBeautiful Soup
are highly efficient for static websites where all the desired data is present in the initial HTML source. They fetch the HTML and allow you to parse it directly. The primary advantage is speed and low resource consumption. - Dynamic Content Scraping: When content is loaded dynamically,
requests
will only get you the initial HTML, often devoid of the data you need. This is where Selenium shines. It launches a real browser, allowing the JavaScript to execute, AJAX calls to complete, and content to render just as it would for a human user. This enables you to scrape data from elements that appear after user interactions, scrolling, or specific time delays. For instance, infinite scrolling pages or those with “Load More” buttons are perfect candidates for Selenium.
When to Choose Selenium Over Other Libraries
Choosing the right tool is crucial for efficient and ethical scraping.
Selenium, while powerful, comes with its own overhead.
- When to Use Selenium:
- JavaScript-Rendered Content: If the data you need is generated or displayed via JavaScript, Selenium is almost a necessity. This includes Single Page Applications SPAs built with frameworks like React, Angular, or Vue.js.
- User Interactions Required: If you need to click buttons, fill out forms, scroll down the page, navigate through pagination, or interact with pop-ups to reveal content, Selenium is the ideal choice.
- Capturing Screenshots/Page States: When you need to visually verify the page content or capture screenshots at specific interaction points.
- Handling Iframes and Pop-ups: Selenium can easily switch contexts to interact with elements within iframes or handle various types of pop-ups.
- When to Consider Alternatives or Combine:
- Static Content: For simple, static HTML pages,
requests
andBeautiful Soup
are significantly faster and lighter on system resources. Always try these first. - API Discovery: Sometimes, dynamic content is loaded via an underlying API. If you can identify and directly call the API endpoint, it’s often much faster and more efficient than simulating browser actions with Selenium. Use browser developer tools Network tab to inspect API calls.
- Performance is Critical: Selenium is slower because it launches a full browser instance. For large-scale scraping of millions of pages, this can be a bottleneck.
- Resource Constraints: Selenium consumes more CPU and RAM. If you’re running scraping jobs on resource-limited servers, this could be an issue.
- Static Content: For simple, static HTML pages,
Setting Up Your Selenium Web Scraping Environment
Getting your development environment ready is the first practical step towards building effective Selenium scrapers.
This involves installing Python, the Selenium library, and the specific browser drivers.
Installing Python and pip
Python is the programming language of choice for most web scraping projects due to its rich ecosystem of libraries and readability. Usage accounts
-
Python Installation:
-
Visit the official Python website: python.org/downloads.
-
Download the latest stable version of Python 3.x for your operating system Windows, macOS, Linux.
-
Crucially for Windows users: During installation, ensure you check the box that says “Add Python X.Y to PATH”. This makes it easier to run Python commands from your terminal. For macOS/Linux, Python is often pre-installed, but it’s good practice to install the latest version via package managers like
brew
for macOS orapt
for Debian/Ubuntu.
-
-
Verifying Installation: Open your terminal or command prompt and type:
python --version or for macOS/Linux, sometimes `python3 --version`. You should see the installed Python version.
-
pip:
pip
is Python’s package installer. It typically comes bundled with Python 3.4 and later. You can verify its installation by typing:
pip –version
orpip3 --version
.
Installing Selenium WebDriver Library
Once Python and pip are ready, installing the Selenium library is straightforward.
-
Installation Command: Open your terminal or command prompt and run:
pip install seleniumThis command downloads and installs the latest version of the Selenium package from the Python Package Index PyPI.
-
Verification: You can quickly verify the installation by opening a Python interpreter type
python
orpython3
in your terminal and trying to import a module:
import selenium
printselenium.version Best multilogin alternativesIf it runs without errors and prints a version number, Selenium is installed correctly.
Downloading Browser Drivers e.g., ChromeDriver, GeckoDriver
Selenium controls web browsers through specific executables called “browser drivers.” Each browser requires its own driver.
- ChromeDriver for Google Chrome:
-
Check Chrome Version: Open Google Chrome, click the three dots menu in the top-right corner, go to “Help” > “About Google Chrome”. Note your exact Chrome browser version number e.g.,
120.0.6099.109
. -
Download ChromeDriver: Go to the official ChromeDriver downloads page: sites.google.com/a/chromium.org/chromedriver/downloads.
-
Find the ChromeDriver version that matches your Chrome browser version.
-
If an exact match isn’t available, choose the closest one that is compatible often the major version number, e.g., if Chrome is 120, use ChromeDriver 120.
4. Download the appropriate `.zip` file for your operating system.
5. Extract and Place: Extract the `chromedriver.exe` Windows or `chromedriver` macOS/Linux file from the downloaded zip.
6. Add to PATH Recommended: Place this executable file into a directory that is already in your system's PATH environment variable e.g., `C:\Windows` for Windows, `/usr/local/bin` for macOS/Linux. This allows Selenium to find the driver automatically without specifying its full path in your code.
7. Alternatively Specify Path: If you don't want to add it to PATH, you can place it anywhere and provide the full path to the executable in your Selenium script.
- GeckoDriver for Mozilla Firefox:
-
Check Firefox Version: Open Firefox, go to “Help” > “About Firefox”.
-
Download GeckoDriver: Go to the official GeckoDriver releases page on GitHub: github.com/mozilla/geckodriver/releases.
-
Download the latest stable release for your operating system.
-
Extract and Place: Extract
geckodriver.exe
orgeckodriver
and place it in your system’s PATH, or note its location for explicit path specification. Train llm browserless
-
- Other Drivers: Similar drivers exist for Microsoft Edge EdgeDriver and Apple Safari SafariDriver. The process for setup is analogous.
Important Note on Paths: If you don’t add the driver to your system’s PATH, you will need to specify its location when initializing the WebDriver:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# Example for ChromeDriver
# driver_path = "C:/path/to/your/chromedriver.exe" # Windows
driver_path = "/usr/local/bin/chromedriver" # macOS/Linux if not in PATH
service = Servicedriver_path
driver = webdriver.Chromeservice=service
Basic Navigation and Element Interaction
With your Selenium environment set up, you can now start writing code to control the browser.
The essence of web scraping with Selenium lies in navigating to pages and interacting with specific elements on those pages.
Launching a Browser and Navigating
The first step in any Selenium script is to launch a browser instance and direct it to a URL.
Import time # For delays
Path to your ChromeDriver executable
Make sure this path is correct if not added to system PATH
service = Service’/path/to/your/chromedriver’
try:
# Navigate to a specific URL
target_url = “https://quotes.toscrape.com/” # A simple, legal scraping target
driver.gettarget_url
printf”Navigated to: {driver.current_url}”
# Get the page title
printf"Page Title: {driver.title}"
# You can also get the full page source
# page_source = driver.page_source
# printpage_source # Print first 500 characters of source
# Add a small delay to observe the browser optional
time.sleep3
except Exception as e:
printf”An error occurred: {e}”
finally:
# Always close the browser when done
print”Browser closed.”
Explanation: Youtube scraper
webdriver.Chromeservice=service
: Initializes a Chrome browser instance. ReplaceChrome
withFirefox
,Edge
, etc., if using a different browser. Theservice
object specifies the path to your WebDriver executable.driver.geturl
: This command instructs the browser to open the specified URL. It waits until the page or at least the initial HTML has loaded before proceeding.driver.current_url
: Returns the URL of the current page.driver.title
: Returns the title of the current page, as found in the<title>
tag.driver.page_source
: Returns the complete HTML source code of the current page, including any modifications made by JavaScript.driver.quit
: Crucially, this command closes the browser window and terminates the WebDriver session. Failing to callquit
can leave orphaned browser processes running in the background, consuming system resources.
Locating Elements: XPath and CSS Selectors
Once a page is loaded, the next critical step is to find the specific pieces of data you want to extract. Selenium provides several methods to locate elements, but XPath and CSS Selectors are generally the most robust and flexible.
What are Element Locators?
Element locators are strategies used by Selenium to identify unique elements on a web page.
Think of them as addresses for specific parts of the HTML document.
By.ID
: Locates an element by itsid
attribute. IDs are supposed to be unique on a page.driver.find_elementBy.ID, "some_id"
By.NAME
: Locates an element by itsname
attribute.driver.find_elementBy.NAME, "input_name"
By.CLASS_NAME
: Locates an element by itsclass
attribute. Be aware that multiple elements can share the same class name.driver.find_elementBy.CLASS_NAME, "product-title"
By.TAG_NAME
: Locates elements by their HTML tag name e.g.,div
,a
,p
.driver.find_elementBy.TAG_NAME, "h1"
By.LINK_TEXT
andBy.PARTIAL_LINK_TEXT
: Used for<a>
anchor elements, matching the visible text of the link.driver.find_elementBy.LINK_TEXT, "Next Page"
By.XPATH
: A powerful language for navigating XML documents and HTML, which is a form of XML. It allows for complex selections based on element relationships, attributes, and text content.By.CSS_SELECTOR
: A common and often simpler way to select elements using CSS syntax. Developers use CSS selectors to style web pages, so they are naturally well-suited for identifying elements.
Practical Examples Using quotes.toscrape.com
Let’s try to scrape the first quote and its author from https://quotes.toscrape.com/
.
from selenium.webdriver.common.by import By
import time
driver.get"https://quotes.toscrape.com/"
time.sleep2 # Give page time to load
# --- Locating a single element the first quote's text ---
# Inspect the page: the first quote text is usually within a <span class="text">
# Using CSS Selector:
first_quote_css = driver.find_elementBy.CSS_SELECTOR, ".quote .text"
printf"First Quote CSS Selector: {first_quote_css.text}"
# Using XPath:
first_quote_xpath = driver.find_elementBy.XPATH, "//div/span"
printf"First Quote XPath: {first_quote_xpath.text}"
# --- Locating the author of the first quote ---
# The author is usually within a <small class="author">
first_author_css = driver.find_elementBy.CSS_SELECTOR, ".quote .author"
printf"First Author CSS Selector: {first_author_css.text}"
first_author_xpath = driver.find_elementBy.XPATH, "//div/small"
printf"First Author XPath: {first_author_xpath.text}"
# --- Locating multiple elements all quotes on the page ---
# Use find_elements plural to get a list
all_quotes_elements = driver.find_elementsBy.CLASS_NAME, "text"
print"\nAll Quotes on Page:"
for i, quote_element in enumerateall_quotes_elements:
printf"{i+1}. {quote_element.text}"
# --- Extracting attributes e.g., href from a link ---
# Let's find the "Login" link and get its href attribute
login_link = driver.find_elementBy.LINK_TEXT, "Login"
login_href = login_link.get_attribute"href"
printf"\nLogin link Href: {login_href}"
# --- Interacting with elements e.g., clicking a button ---
# Let's try to click the "Next" button
try:
next_button = driver.find_elementBy.CLASS_NAME, "next"
next_button_link = next_button.find_elementBy.TAG_NAME, "a" # The <a> tag inside the 'next' div
next_button_link.click
time.sleep3 # Wait for the new page to load
printf"\nNavigated to next page: {driver.current_url}"
except Exception as e:
printf"No 'Next' button found or clickable: {e}"
Key Takeaways:
find_element
vs.find_elements
:find_element
returns the first matching element or raises aNoSuchElementException
if none are found.find_elements
returns a list of all matching elements an empty list if none are found.element.text
: Retrieves the visible, rendered text content of an element.element.get_attribute"attribute_name"
: Retrieves the value of a specified HTML attribute e.g.,href
,src
,value
,class
.element.click
: Simulates a mouse click on the element.element.send_keys"text"
: Simulates typing text into an input field or text area.element.clear
: Clears any existing text from an input field.
Mastering these basic interactions is the foundation for any complex web scraping task with Selenium.
Handling Dynamic Content and Waits
Many modern websites load content dynamically using JavaScript, meaning that parts of the page might not be immediately available when Selenium first loads the URL.
This can lead to NoSuchElementException
errors if your script tries to find an element before it has rendered.
Selenium provides powerful “wait” mechanisms to overcome this. Selenium alternatives
Implicit Waits
An implicit wait tells the WebDriver to poll the DOM Document Object Model for a certain amount of time when trying to find an element or elements if they are not immediately available. The default setting is 0 seconds.
Once set, an implicit wait remains in effect for the life of the WebDriver object.
Set an implicit wait of 10 seconds
Driver.implicitly_wait10 # seconds
driver.get"https://www.example.com" # Replace with a site that has dynamic loading
# Selenium will wait up to 10 seconds for an element with ID 'dynamic_element' to appear
dynamic_element = driver.find_elementBy.ID, "dynamic_element"
printf"Found dynamic element: {dynamic_element.text}"
Pros: Simple to implement, applies globally to all find_element
calls.
Cons: Can slow down tests or scrapers unnecessarily if elements appear quickly, as it always waits for the full duration if an element isn’t found immediately. It only waits for the element to exist in the DOM, not necessarily to be visible or clickable.
Explicit Waits
Explicit waits are more sophisticated and allow you to pause your script until a specific condition has been met, or a maximum timeout has been reached.
This is generally preferred for its precision, as it only waits as long as necessary.
You’ll use the WebDriverWait
class in conjunction with expected_conditions
aliased as EC
.
From selenium.webdriver.support.ui import WebDriverWait
From selenium.webdriver.support import expected_conditions as EC
From selenium.common.exceptions import TimeoutException, NoSuchElementException Record puppeteer scripts
driver.get"https://quotes.toscrape.com/scroll" # A page with infinite scroll
# Simulate scrolling to load more quotes
last_height = driver.execute_script"return document.body.scrollHeight"
scroll_count = 0
max_scrolls = 3 # Limit to 3 scrolls for demonstration
while scroll_count < max_scrolls:
# Scroll down to bottom
driver.execute_script"window.scrollTo0, document.body.scrollHeight."
# Wait for new content to load
# Explicitly wait for the new quotes to appear
# We wait for the number of quote elements to be greater than before
WebDriverWaitdriver, 10.until
lambda driver: lendriver.find_elementsBy.CLASS_NAME, "quote" > lenoriginal_quotes if 'original_quotes' in locals else True
# Alternatively, wait for a specific element that signals new content
# EC.presence_of_element_locatedBy.CSS_SELECTOR, "div.quote:last-child"
# This lambda is a bit advanced. often you'd wait for a loading spinner to disappear or specific content to appear.
# For simplicity, let's just wait for a moment after scrolling to give time for new content to load
time.sleep2
new_height = driver.execute_script"return document.body.scrollHeight"
if new_height == last_height:
print"No more content loaded after scrolling."
break
last_height = new_height
original_quotes = driver.find_elementsBy.CLASS_NAME, "quote" # Update count
scroll_count += 1
printf"Scrolled {scroll_count} times. Current quotes: {lenoriginal_quotes}"
except TimeoutException:
print"Timed out waiting for new content to load."
break
except NoSuchElementException:
print"Element not found after scroll."
# After scrolling, let's extract some quotes
all_quotes = driver.find_elementsBy.CLASS_NAME, "quote"
printf"\nTotal quotes found: {lenall_quotes}"
for i, quote_elem in enumerateall_quotes: # Print first 5
text = quote_elem.find_elementBy.CLASS_NAME, "text".text
author = quote_elem.find_elementBy.CLASS_NAME, "author".text
printf"{i+1}. \"{text}\" - {author}"
printf"An unexpected error occurred: {e}"
Common expected_conditions
:
EC.presence_of_element_locatedBy.ID, 'some_id'
: Waits until an element is present in the DOM. It doesn’t necessarily mean it’s visible.EC.visibility_of_element_locatedBy.CSS_SELECTOR, '.some_class'
: Waits until an element is present in the DOM and visible.EC.element_to_be_clickableBy.XPATH, '//button'
: Waits until an element is visible and enabled, so you can click it.EC.text_to_be_present_in_elementBy.ID, 'status', 'Complete'
: Waits until a specific text is present in an element.EC.invisibility_of_element_locatedBy.CLASS_NAME, 'loading-spinner'
: Useful for waiting for a loading indicator to disappear.EC.title_contains'keyword'
: Waits until the page title contains a specific keyword.
Key Differences:
- Implicit Wait: Applied once globally, polls for the entire duration if element not found immediately. Simpler but less precise.
- Explicit Wait: Applied on a per-element basis with specific conditions. Waits only as long as necessary or until timeout. More powerful and flexible.
Best Practice: While implicit waits are easy, explicit waits are generally recommended for robust scrapers, especially when dealing with highly dynamic content. Combine them with try-except
blocks to handle TimeoutException
gracefully.
Advanced Techniques: Scrolling, Pagination, and Forms
Beyond basic element interaction, many scraping tasks require more advanced techniques to navigate complex website structures or extract data from interactive components.
Handling Scrolling Infinite Scroll
Many modern websites implement “infinite scrolling,” where content loads dynamically as the user scrolls down the page, eliminating traditional pagination. Selenium can simulate this behavior.
driver.get"https://quotes.toscrape.com/scroll" # A demo site with infinite scroll
time.sleep2 # Give initial page time to load
all_quotes_data =
scroll_attempts = 0
max_scroll_attempts = 5 # Prevent infinite loops, adjust as needed
while True:
time.sleep2 # Short pause to allow new content to load
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script"return document.body.scrollHeight"
if new_height == last_height:
print"No more content loaded. Reached end of scroll."
last_height = new_height
scroll_attempts += 1
printf"Scrolled to new height: {new_height}. Attempt {scroll_attempts}/{max_scroll_attempts}"
if scroll_attempts >= max_scroll_attempts:
printf"Reached maximum scroll attempts {max_scroll_attempts}. Stopping."
# After scrolling, extract all visible quotes
quotes_elements = driver.find_elementsBy.CLASS_NAME, "quote"
for quote_elem in quotes_elements:
text = quote_elem.find_elementBy.CLASS_NAME, "text".text
author = quote_elem.find_elementBy.CLASS_NAME, "author".text
tags_elements = quote_elem.find_elementsBy.CLASS_NAME, "tag"
tags =
all_quotes_data.append{
"text": text,
"author": author,
"tags": tags
}
printf"Could not extract quote details: {e}"
printf"\nTotal quotes scraped: {lenall_quotes_data}"
# for quote in all_quotes_data: # Print first 10
# printquote
driver.execute_script
: This is a powerful method that allows you to execute arbitrary JavaScript code within the browser context. It’s essential for tasks like scrolling, interacting with hidden elements, or manipulating the DOM directly.
Handling Pagination
Traditional pagination involves clicking “Next Page” buttons or navigating to specific page numbers. Selenium can simulate these clicks.
Continuing from previous example, but using a paginated site
Let’s use https://quotes.toscrape.com/ for pagination demo
all_quotes_paginated =
base_url = “https://quotes.toscrape.com/“
driver.getbase_url
time.sleep2
current_page = 1
printf"\nScraping Page {current_page} {driver.current_url}"
quotes_on_page = driver.find_elementsBy.CLASS_NAME, "quote"
for quote_elem in quotes_on_page:
try:
text = quote_elem.find_elementBy.CLASS_NAME, "text".text
author = quote_elem.find_elementBy.CLASS_NAME, "author".text
all_quotes_paginated.append{"text": text, "author": author}
except Exception as e:
printf"Error scraping quote on page {current_page}: {e}"
# Check for the "Next" button
next_button_exists = False
# Look for the <li> with class "next" and then the <a> inside it
next_link_element = WebDriverWaitdriver, 5.until
EC.element_to_be_clickableBy.XPATH, "//li/a"
next_link_element.click
next_button_exists = True
time.sleep2 # Wait for next page to load
current_page += 1
print"No 'Next' button found. End of pagination."
printf"Error clicking next button: {e}"
printf"An error occurred during pagination: {e}"
printf"\nTotal quotes scraped across pages: {lenall_quotes_paginated}"
# for quote in all_quotes_paginated: # Print first 20
Key Points for Pagination:
- Looping: Use a
while True
loop that breaks when the “Next” button is no longer found. - Waiting for the Next Button: Use
WebDriverWait
withEC.element_to_be_clickable
to ensure the next button is ready before clicking. - Error Handling: Wrap the click operation in a
try-except
block to catchTimeoutException
when the “Next” button is no longer present.
Interacting with Forms
Filling out forms is a common task, whether for logging in, searching, or filtering content. Optimizing puppeteer
driver.get"https://quotes.toscrape.com/login" # Login page
# Find the username and password input fields
username_field = driver.find_elementBy.ID, "username"
password_field = driver.find_elementBy.ID, "password"
# Type credentials use dummy credentials for demonstration
username_field.send_keys"test_user" # Replace with actual if needed
password_field.send_keys"test_password" # Replace with actual if needed
# Find and click the login button
# The button has a type="submit" and class "btn btn-primary"
login_button = driver.find_elementBy.CSS_SELECTOR, "input"
login_button.click
# Wait for the login process to complete and page to redirect or show message
# For this site, it redirects to a generic page after login attempt
WebDriverWaitdriver, 10.untilEC.url_changesdriver.current_url # Wait for URL to change
printf"After login attempt, current URL: {driver.current_url}"
# You can then check for success/failure messages
# For example, look for an alert message or a specific element on the logged-in page
if "No account found" in driver.page_source:
print"Login failed: No account found or invalid credentials."
elif "/login" not in driver.current_url:
print"Login attempt might have been successful redirected."
else:
print"Login page still present or unexpected behavior."
printf"An error occurred during form interaction: {e}"
Form Interaction Details:
-
send_keys
: Used to type text into input fields<input type="text">
,<textarea>
,<input type="password">
. -
click
: Used to activate buttons<button>
,<input type="submit">
,<input type="button">
, checkboxes, radio buttons, and select options. -
Selecting from Dropdowns
<select>
: For<select>
elements, Selenium provides aSelect
class.From selenium.webdriver.support.ui import Select
Assuming a
Leave a Reply