Selene selenium-based functionality¶

Submodules¶

selene.core.selenium.conditions module¶

selene.core.selenium.conditions.bool_clickable(driver, by, identifier, wait=10, logger=None)[source]¶

Wait a specified number of seconds until either:

A found element is clickable
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the element is clickable, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_correct_handle(driver, handle, wait, logger, message='Incorrect handle.')[source]¶

Wait a specified number of seconds until either:

The active handle i.e. tab) is the expected one
A TimeoutException is raised

This is useful when navigating between different tabs.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
handle (str) – the expected handle
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
message (str) – log message (default: “Incorrect handle.”)

Returns:

output – True if the active handle is the same as the expected handle, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_element_class_contains(driver, element, wait, logger, string, message='Element class does not contain')[source]¶

Wait a specified number of seconds until either:

An element’s class contains a specified string
A TimeoutException is raised

This is useful for cases where, for example, a dropdown element’s class contains “expanded” only if and when the dropdown has been expanded.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (core.selenium.element.Element) – the instance of the Element class representing the web element
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
string (str) – the string to be found
message (str) – log message (default: “Element class does not contain {string}”)

Returns:

output – True if the element’s class contains the string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_element_class_does_not_contain(driver, element, wait, logger, string, message='Element class contains')[source]¶

Wait a specified number of seconds until either:

An element’s text contains a specified string
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (core.selenium.element.Element) – the instance of the Element class representing the web element
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
string (str) – the string to be found
message (str) – log message (default: “Element class contains {string}.”)

Returns:

output – True if the element’s class contains the string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_element_text_contains(driver, element, wait, logger, string, message='Element text does not contain')[source]¶

Wait a specified number of seconds until either:

An element’s text contains a specified string
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (core.selenium.element.Element) – the instance of the Element class representing the web element
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
string (str) – the string to be found
message (str) – log message (default: “Element text does not contain {string}.”)

Returns:

output – True if the element’s text contains the string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_element_text_does_not_contain(driver, element, wait, logger, string, message='Element text contains')[source]¶

Wait a specified number of seconds until either:

An element’s text DOES NOT contain a specified string
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (core.selenium.element.Element) – the instance of the Element class representing the web element
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
string (str) – the string to be found
message (str) – log message (default: “Element text contains {string}.”)

Returns:

output – True if the element’s text does not contain the string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_invisible(driver, by, identifier, wait=10, logger=None)[source]¶

Wait a specified number of seconds until either:

A found element is NOT visible
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the element is invisible, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_new_handle(driver, n_handles_old, wait, logger, message='No new handles found.')[source]¶

Wait a specified number of seconds until either:

The number of window handles (i.e. the number of tabs open) has increased by one
A TimeoutException is raised

This is useful when navigating between different tabs.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
n_handles_old (int) – the previous number of existing window handles
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
message (str) – log message (default: “No new handles found.”)

Returns:

output – True if the number of handles has increased by one, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_scroll_height_changed(driver, wait, logger, height, element=None, message='Scroll height did not change.')[source]¶

Wait a specified number of seconds until either:

The page OR an element’s scroll height changes. This is what changes if the height of the page or element increases due to dynamically-generated content.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
height (float) – the original scroll height value
element (EITHER selenium.webdriver.remote.webelement.WebElement OR core.selenium.element.Element OR None) – the scrollable element. If None, then the page itself is the element.
message (str) – log message (default: “Scroll height did not change.”)

Returns:

output – True if the scroll height has changed, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_scroll_position_changed(driver, element, wait, logger, position, message='Scroll position did not change.')[source]¶

Wait a specified number of seconds until either:

An element’s scroll position changes. This is what changes as you scroll down a scrollable element.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (EITHER selenium.webdriver.remote.webelement.WebElement OR core.selenium.element.Element) – the scrollable element
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
position (float) – the original scroll position value
message (str) – log message (default: “Scroll position did not change.”)

Returns:

output – True if the element’s scroll position has changed, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_url_changed(driver, wait, logger, url, message='URL has not changed.')[source]¶

Wait a specified number of seconds until either:

The browser’s url changes.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
url (str) – the original url
message (str) – log message (default: “URL has not changed.”)

Returns:

output – True if the url changes, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_url_contains(driver, wait, logger, string, message='URL does not contain the specified string.')[source]¶

Wait a specified number of seconds until either:

The browser’s url contains a specified string.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
message (str) – log message (default: “URL does not contain the specified string.”)

Returns:

output – True if the url contains the specified string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_url_does_not_contain(driver, wait, logger, string, message='URL contains the specified string.')[source]¶

Wait a specified number of seconds until either:

The browser’s url DOES NOT contain a specified string.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
message (str) – log message (default: “URL contains the specified string.”)

Returns:

output – True if the url does not contain the specified string, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_url_expected(driver, wait, logger, url, message='URL is not the expected URL.')[source]¶

Wait a specified number of seconds until either:

The browser’s url matches the expected url.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
url (str) – the expected url
message (str) – log message (default: “URL is not the expected URL.”)

Returns:

output – True if the url is the expected url, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_url_unexpected(driver, wait, logger, url, message='URL is the unexpected URL.')[source]¶

Wait a specified number of seconds until either:

The browser’s url matches the UNexpected url.
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
url (str) – the unexpected url
message (str) – log message (default: “URL is the unexpected URL.”)

Returns:

output – True if the url is the unexpected url, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_visible(driver, by, identifier, wait=10, logger=None)[source]¶

Wait a specified number of seconds until either:

A found element is visible
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the element is visible, False otherwise

Return type:

bool

selene.core.selenium.conditions.bool_yoffset_changed(driver, wait, logger, yoffset, message='Y-offset did not change.')[source]¶

Wait a specified number of seconds until either:

The y-offset changes. This is what changes as you scroll down a page
A TimeoutException is raised

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)
yoffset (float) – the original y-offset value
message (str) – log message (default: “Y-offset did not change.”)

Returns:

output – True if the y-offset has changed, False otherwise

Return type:

bool

selene.core.selenium.crawler module¶

class selene.core.selenium.crawler.CrawlerSelene(id_crawler='Crawler', debug=True)[source]¶

Bases: Crawler

A crawler class to assist any workflow which requires selenium webdriver.

Inherits selene.core.crawler.Crawler

selene.core.selenium.driver module¶

selene.core.selenium.driver.get_driver(width=2560, height=1440, user_agent='default', incognito=False, disable_gpu=False, use_display=False)[source]¶

Get an instance of selenium.webdriver and start browser

Parameters:

width (int) – the width of the browser
height (int) – the height of the browser
user_agent – If False, then no user agent is used. If ‘default’, then a default user agent is used. If ‘random’, then a random user agent is selected. Otherwise, the specified user agent is used.
incognito (bool) – whether or not to start the browser in incognito mode
disable_gpu (bool) – whether or not to disable GPU
use_display (bool) – whether or not to use a virtual display

Returns:

driver – selenium.webdriver instance

Return type:

selenium.webdriver

selene.core.selenium.driver.get_user_agent(i)[source]¶

Get a specific user agent string from core.config.USER_AGENTS

Parameters:: i (int) – the list index
Returns:: user_agent – The selected user agent
Return type:: str

selene.core.selenium.driver.get_user_agent_random()[source]¶

Get a random user agent string from core.config.USER_AGENTS

Returns:: user_agent – The selected user agent
Return type:: str

selene.core.selenium.driver.restart_driver(driver, wait=30)[source]¶

Stop and close the selenium.webdriver instance, wait for a specified number of seconds, then start a new instance

Parameters:: driver (selenium.webdriver) – the selenium webdriver instance to stop
Returns:: driver – The new selenium.webdriver instance
Return type:: selenium.webdriver

selene.core.selenium.driver.stop_driver(driver, display=None)[source]¶

Stop and close the selenium.webdriver instance

Parameters:

driver (selenium.webdriver) – the selenium webdriver instance to stop
display (pyvirtualdisplay.Display optional) – if using a pyvirtual display, display to stop

selene.core.selenium.element module¶

class selene.core.selenium.element.ElementSelene(element, logger=None)[source]¶

Bases: Element

An element class to wrap a selenium.webdriver.remote.webelement.WebElement object, in order to:

provide extra functionality

make it easier to crawlers to change between handling

Selenium workflows and BeautifulSoup workflows.

Inherits selene.core.element.Element

click(driver)[source]¶

Click the element.

Returns:: output – True if the operation was successful, False otherwise
Return type:: bool

find(by, identifier, wait=10, log=True)[source]¶

This:

wraps core.selenium.tasks.task_find
finds only elements which are within this element.

Parameters:

by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – returns the element if an element is found, None otherwise

Return type:

None or core.selenium.element.ElementSelene

find_all(by, identifier, wait=10, log=True)[source]¶

This:

wraps core.selenium.tasks.task_find_all
finds only elements which are within this element.

Parameters:

by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – returns the elements if one or more element is found, an empty list otherwise

Return type:

list

get_attribute(*args, **kwargs)[source]¶

Gets an attribute from the element. E.g. self.get_attribute(‘href’) would return the hyperlink.

Returns:: output – the attribute
Return type:: str

get_parent(driver)[source]¶

Get the element’s parent

Returns:: out
Return type:: ElementSelene object wrapping the parent element

get_text()[source]¶

Get the element’s text

TODO this is redundant, but removing it might break some things

Returns:: text
Return type:: str

has_attribute(*args, **kwargs)[source]¶

Check whether the element contains a specified attribute.

Returns:: output – True of the element has the attribute, False otherwise
Return type:: bool

scroll_down(driver, wait=10)[source]¶

Scroll down the element IF the element has a scrollbar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

scroll_to(driver, position_new, wait=10)[source]¶

Scroll to a new position on the element IF the element has a scrollbar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

scroll_to_bottom(driver, wait=10)[source]¶

Scroll to the bottom of the element IF the element has a scrollbar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.page module¶

class selene.core.selenium.page.PageSelene(driver, url, logger=None, *args, **kwargs)[source]¶

Bases: Page

A page class to assist any workflow which requires selenium webdriver.

A website is made out of pages.
Dynamically-generated pages require Selenium Webdriver.
Each page will need general functionality (e.g. finding and element, scrolling etc.).
Inheriting this class provides that general functionality

NOTE 1: Generally, the way to use this object is to initalise using the from_url() method, as this will attach the url to the page AND navigate to the url.

NOTE 2: Any PageSelene object will also contain a PageSoup object (see core.soup.page). This is an attempt to allow both the use of Selenium (for dynamic elements) and BeautifulSoup (for static elements) when scraping.

Inherits selene.core.page.Page

click(driver, by, identifier, wait=10)[source]¶

Find and click an element on the page.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

static close_all_tabs_except_specified_tab(driver, handle_keep, attempts=3)[source]¶

Closes all open tabs EXCEPT for the tab given by the specified handle.

Useful for cleanup of any open tabs.

It has an attempts variable, in case it doesn’t work first time.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
handle_keep (str) – the tab/handle to not close.
attempts (int) – a number of attempts before returning False

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

expand_scroll_height(driver, wait=1)[source]¶

Keep scrolling to the bottom of the page, as the page dynamically expands due to the continued scrolling.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

find(driver, by, identifier, wait=10, log=True)[source]¶

This:

wraps core.selenium.tasks.task_find
returns the result, not as a selenium.webdriver.remote.webelement.WebElement object,

but instead as a core.selenium.element.ElementSelene wrapper object, which gives added functionality.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – returns the element if an element is found, None otherwise

Return type:

None or core.selenium.element.ElementSelene

find_all(driver, by, identifier, wait=10, log=True)[source]¶

This:

wraps core.selenium.tasks.task_find_all
returns the result, not as a list of selenium.webdriver.remote.webelement.WebElement objects,

but instead as a list of core.selenium.element.ElementSelene wrapper objects, which gives added functionality.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – returns the elements if one or more element is found, an empty list otherwise

Return type:

list

find_all_soup(*args, **kwargs)[source]¶

Each PageSelene object contains a PageSoup object. This wraps the core.soup.page.PageSoup.find_all function, so it can use BeautifulSoup to find elements.

Returns:: output – the list of ElementSoup instances relating to the found webelements
Return type:: list

find_soup(*args, **kwargs)[source]¶

Each PageSelene object contains a PageSoup object. This wraps the core.soup.page.PageSoup.find function, so it can use BeautifulSoup to find elements.

Returns:: output – the ElementSoup instance relating to the found webelement
Return type:: core.soup.element.ElementSoup

classmethod from_url(driver, url, string='', logger=None, *args, **kwargs)[source]¶

Initialise a PageSelene instance and navigate to the instance’s specified url

Checking the correct url can be done in 2 ways:

Checking for an exact match
Checking whether the url contains a specified string.

Parameters:

driver (selenium.webdriver) – the initialised webdriver instance
url (str) – the url of the page
string (str) – a specified string for the new url to contain
logger (logging.Logger) – a logger instance (see core.logger.py)

get_page_soup(driver)[source]¶

Get a PageSoup object (see core.soup.page) with the current source html code as found by the webdriver instance.

Parameters:: driver (selenium.webdriver) – the initialised webdriver instance
Returns:: output – PageSoup object initialised using the page’s source html code.
Return type:: PageSoup

navigate_to_url(driver, url, string='', wait=10)[source]¶

This wraps core.selenium.tasks.task_navigate_to_url

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
url (str) – the url to navigate to
string (str) – a specified string for the new url to contain
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

classmethod new_tab(driver, url, string='', logger=None)[source]¶

Initialise a PageSelene instance and navigate to the instance’s specified url in a new tab

Checking the correct url can be done in 2 ways:

Checking for an exact match
Checking whether the url contains a specified string.

Parameters:

driver (selenium.webdriver) – the initialised webdriver instance
url (str) – the url of the page
string (str) – a specified string for the new url to contain
logger (logging.Logger) – a logger instance (see core.logger.py)

refresh(driver, wait=0)[source]¶

Refresh the page by refreshing the driver and re-initialising the PageSelene object.

Parameters:

driver (selenium.webdriver) – the initialised webdriver instance
wait (int) – a number of seconds to wait before re-initialising

Returns:

output – re-initialised PageSelene object

Return type:

PageSelene

refresh_until_true(driver, func, message, attempts, *args, **kwargs)[source]¶

This wraps other functions such as self.find.

If the wrapped function returns anything other than False or None, then this function returns True.

If the wrapped function returns False or None, then this function calls self.refresh. It does so for a number of attempts. If all attempts fail, then this function returns False

This becomes useful if a web page did not load properly, and therefore needs to be refreshed.

Parameters:

driver (selenium.webdriver) – the initialised webdriver instance
func (function) – the function to be wrapped
message (str) – the error message to print to the logs
attempts (int) – the number of attempts before returning False

Returns:

output – False if the function fails a specified number of times; True if it succeeds

Return type:

bool

static screenshot_to_local(driver, dirpath, filestem, logger=None)[source]¶

This wraps core.selenium.tasks.screenshot_to_local

Save a browser screenshot to a local directory

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
dirpath (str) – directory to save file
filestem (str) – a string to add to a datetime to create the filename
logger (logging.Logger) – a logger instance (see core.logger.py)

static screenshot_to_notebook(driver, width=600, height=400, logger=None)[source]¶

This wraps core.selenium.tasks.task_screenshot_to_notebook

Display a browser screenshot in a Jupyter notebook.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
width (int) – the width of the image
height (int) – the height of the image
logger (logging.Logger) – a logger instance (see core.logger.py)

scroll_down(driver, wait=10)[source]¶

Scroll down the page.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

scroll_to(driver, position_new, wait=10)[source]¶

Scroll to a new position on the page.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
position_new (int) – y position in pixels
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

scroll_to_bottom(driver, wait=10)[source]¶

Scroll to the bottom of the page.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
wait (int) – a number of seconds to wait before raising a TimeoutException

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.scripts module¶

selene.core.selenium.scripts.script_click_element(driver, element)[source]¶

Execute JavaScript to click an element

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (selenium.webdriver.remote.webelement.WebElement) – the element from which to get the scroll position (if None, then the scroll position of the page is found).

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.scripts.script_expand_all_by_class_name(driver, identifier, attribute, indicator, clickable=None)[source]¶

WARNING: EXPERIMENTAL

Execute JavaScript to expand a list of dropdown menus.

Steps:

Find dropdowns by finding all elements with a class name specified with identifier.
For each dropdown found:
- Check if the dropdown is expanded or not. This can be done by:
  
  Does the attribute ‘class’ contain an indicator (e.g. ‘expanded’)?
  
  Does the attribute ‘text’ contain an indicator (e.g. ‘Show More’)?
  
  Is there an attribute caalled ‘exists’?
- Find the element to click to expand the dropdown. Sometimes the clickable elemnt is not the dropdown itself, but is a button inside the dropdown.
- Click the clickable element.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
identifier (str) – the class name to search for
attribute (str) – the attribute of the element to check whether it is expanded or not
indicator (str) – the indicator within the attribute, which will indicate whether it is expanded or not
'clickable' (str) – the class name of the element within the dropdown which you have to click to expand the dropdown

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.scripts.script_get_parent(driver, element)[source]¶

Execute JavaScript to get the parent of an element

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (selenium.webdriver.remote.webelement.WebElement) – the element from which to get the scroll position (if None, then the scroll position of the page is found).

Returns:

output – the parent WebElement

Return type:

selenium.webdriver.remote.webelement.WebElement

selene.core.selenium.scripts.script_get_scroll_height(driver, element=None)[source]¶

Execute JavaScript to get the scroll height of either:

the page
an element with a scroll bar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (selenium.webdriver.remote.webelement.WebElement) – the element from which to get the scroll height (if None, then the scroll height of the page is found).

Returns:

output – the scroll height in pixels

Return type:

int

selene.core.selenium.scripts.script_get_scroll_position(driver, element=None)[source]¶

Execute JavaScript to get the scroll position of either:

the page
an element with a scroll bar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
element (selenium.webdriver.remote.webelement.WebElement) – the element from which to get the scroll position (if None, then the scroll position of the page is found).

Returns:

output – the scroll position in pixels

Return type:

int

selene.core.selenium.scripts.script_scroll_to(driver, position, element=None)[source]¶

Execute JavaScript to scroll to a position on either:

the page
an element with a scroll bar.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
position (int) – the y position (in pxels) to scroll to
element (selenium.webdriver.remote.webelement.WebElement) – the element from which to get the scroll position (if None, then the scroll position of the page is found).

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.tasks module¶

selene.core.selenium.tasks.mouse_move(driver, max_mouse_moves=10)[source]¶: performs mouse move, for help with bot mitigation, partially ported from OpenWPM

selene.core.selenium.tasks.task_click(driver, by, identifier, wait=10, logger=None)[source]¶

Click an element using a By. selector and an identifier.

For more info, see: https://selenium-python.readthedocs.io/locating-elements.html

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.tasks.task_close_tab_return_to_url_and_handle(driver, url, handle, string='', wait=10, logger=None)[source]¶

Close the current tab and check that the driver is back at the expected handle and url.

Checking the correct url can be done in 2 ways:

Checking for an exact match
Checking whether the url contains a specified string.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
handle (str) – the handle to navigate to
url (str) – the url to navigate to
string (str) – a specified string for the new url to contain
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.tasks.task_find(parent, by, identifier, wait=10, logger=None)[source]¶

Find an element using a By. selector and an identifier.

For more info, see: https://selenium-python.readthedocs.io/locating-elements.html

If the operation is to find an element on the whole page, then the parent variable is a Selenium Webdriver instance (usually named driver).

If the operation is to find an element on the whole page, then the parent variable is a Selenium WebElement instance (NOT a core.selenium.element.Element instance).

Parameters:

parent (EITHER selenium.webdriver OR selenium.webdriver.remote.webelement.WebElement) – where to search for the element
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – returns the webelement if it is found; None otherwise

Return type:

[None, selenium.webdriver.remote.webelement.WebElement]

selene.core.selenium.tasks.task_find_all(parent, by, identifier, wait=10, logger=None)[source]¶

Find a list of elements using a By. selector and an identifier.

For more info, see: https://selenium-python.readthedocs.io/locating-elements.html

If the operation is to find an element on the whole page, then the parent variable is a Selenium Webdriver instance (usually named driver).

If the operation is to find an element on the whole page, then the parent variable is a Selenium WebElement instance (NOT a core.selenium.element.Element instance).

Parameters:

parent (EITHER selenium.webdriver OR selenium.webdriver.remote.webelement.WebElement) – where to search for the element
by (selenium.webdriver.common.by.By) – see https://selenium-python.readthedocs.io/locating-elements.html
identifier (str) – see https://selenium-python.readthedocs.io/locating-elements.html
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – returns a list of webelements if one or more are found; an empty list otherwise

Return type:

list

selene.core.selenium.tasks.task_navigate_to_url(driver, url, string='', wait=10, logger=None)[source]¶

Navigate to a new url and check that the url is correct.

Checking the correct url can be done in 2 ways:

Checking for an exact match
Checking whether the url contains a specified string.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
url (str) – the url to navigate to
string (str) – a specified string for the new url to contain
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.tasks.task_navigate_to_url_in_new_tab(driver, url, string='', wait=10, logger=None)[source]¶

Navigate to a new url in a new tab, and check that the url is correct.

Checking the correct url can be done in 2 ways:

Checking for an exact match
Checking whether the url contains a specified string.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
url (str) – the url to navigate to
string (str) – a specified string for the new url to contain
wait (int) – a number of seconds to wait before raising a TimeoutException
logger (logging.Logger) – a logger instance (see core.logger.py)

Returns:

output – True if the operation was successful, False otherwise

Return type:

bool

selene.core.selenium.tasks.task_screenshot_to_local(driver, dirpath, filestem, logger)[source]¶

Save a browser screenshot to a local directory

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
dirpath (str) – the directory path
filestem (str) – a string to add to a datetime to create the filename TODO tidy up workflow?
logger (logging.Logger) – a logger instance (see core.logger.py)

selene.core.selenium.tasks.task_screenshot_to_notebook(driver, width, height, logger)[source]¶

Display a browser screenshot in a Jupyter notebook.

Parameters:

driver (selenium.webdriver) – a selenium webdriver instance
width (int) – the width of the image
height (int) – the height of the image
logger (logging.Logger) – a logger instance (see core.logger.py)

Selene selenium-based functionality¶

Submodules¶

selene.core.selenium.conditions module¶

selene.core.selenium.crawler module¶

selene.core.selenium.driver module¶

selene.core.selenium.element module¶

selene.core.selenium.page module¶

selene.core.selenium.scripts module¶

selene.core.selenium.tasks module¶

Module contents¶

Table of Contents