Selene soup-based functionality¶

Submodules¶

selene.core.soup.element module¶

class selene.core.soup.element.ElementSoup(element, logger=None)[source]¶

Bases: Element

An element class to wrap beautiful soup functionality for finding and returning attributes from soup objects.

find(*args, **kwargs)[source]¶

Find and return specific elements within the html

Parameters:

element (str) – the type of html element searched for e.g. ‘div’
attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

el

Return type:

ElementSoup

find_all(*args, **kwargs)[source]¶

Find and return all elements within the html that meet the given criteria

Parameters:

element (str) – the type of html element searched for e.g. ‘div’
attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

els – all ElementSoup that meet criteria

Return type:

list

classmethod from_selene(element_selene, logger=None)[source]¶

Initialise an ElementSoup instance from an ElementSelene object. Allow interchangeability between selenium-based on soup-based elements

Parameters:

element_selene (selene.core.selenium.ElementSelene) –
logger (logging.Logger) – a logger instance (see core.logger.py)

get(*args, **kwargs)[source]¶: return a given attribute of the element

get_text()[source]¶: return text of object

has_attr(*args, **kwargs)[source]¶: check whether element has a given attribute

class selene.core.soup.element.ElementSoupBlank[source]¶

Bases: ElementSoup

A class for blank soup objects. Used in cases where another method has not returned anything

selene.core.soup.page module¶

class selene.core.soup.page.PageSoup(url, soup, logger=None)[source]¶

Bases: Page

A page class to assist any workflow which requires BeautifulSoup.

This is really a way to make Selenium WebDriver and BeautifulSoup more interchangeable, in as far as you can instantiate either a PageSoup or a PageSelene object, and the .find and .find_all function work in similar ways.

Inherits selene.core.page.Page

find(*args, **kwargs)[source]¶

Find and return specific a specific element within the page html

Parameters:

element (str) – the type of html element searched for e.g. ‘div’
attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

el

Return type:

ElementSoup

find_all(*args, **kwargs)[source]¶

Find and return all elements within the page html that meet the given criteria

Parameters:

element (str) – the type of html element searched for e.g. ‘div’
attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

els – all ElementSoup that meet criteria

Return type:

list

classmethod from_html(url, html, logger=None)[source]¶

Initialise a PageSoup instance from existing html source code.

Parameters:

url (str) – the url of the page
html (str) – the html code to parse
logger (logging.Logger) – a logger instance (see core.logger.py)

classmethod from_request(url, logger=None)[source]¶

Initialise a PageSoup instance by parsing a request to a web url.

Parameters:

url (str) – the url of the page
logger (logging.Logger) – a logger instance (see core.logger.py)

classmethod from_soup(url, soup, logger=None)[source]¶

Initialise a PageSoup instance from existing, parsed soup.

Parameters:

url (str) – the url of the page
soup –
logger (logging.Logger) – a logger instance (see core.logger.py)

Selene soup-based functionality¶

Submodules¶

selene.core.soup.element module¶

selene.core.soup.page module¶

Module contents¶

Table of Contents