Selene soup-based functionality

Submodules

selene.core.soup.element module

class selene.core.soup.element.ElementSoup(element, logger=None)[source]

Bases: Element

An element class to wrap beautiful soup functionality for finding and returning attributes from soup objects.

find(*args, **kwargs)[source]

Find and return specific elements within the html

Parameters:
  • element (str) – the type of html element searched for e.g. ‘div’

  • attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

el

Return type:

ElementSoup

find_all(*args, **kwargs)[source]

Find and return all elements within the html that meet the given criteria

Parameters:
  • element (str) – the type of html element searched for e.g. ‘div’

  • attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

els – all ElementSoup that meet criteria

Return type:

list

classmethod from_selene(element_selene, logger=None)[source]

Initialise an ElementSoup instance from an ElementSelene object. Allow interchangeability between selenium-based on soup-based elements

Parameters:
  • element_selene (selene.core.selenium.ElementSelene) –

  • logger (logging.Logger) – a logger instance (see core.logger.py)

get(*args, **kwargs)[source]

return a given attribute of the element

get_text()[source]

return text of object

has_attr(*args, **kwargs)[source]

check whether element has a given attribute

class selene.core.soup.element.ElementSoupBlank[source]

Bases: ElementSoup

A class for blank soup objects. Used in cases where another method has not returned anything

selene.core.soup.page module

class selene.core.soup.page.PageSoup(url, soup, logger=None)[source]

Bases: Page

A page class to assist any workflow which requires BeautifulSoup.

This is really a way to make Selenium WebDriver and BeautifulSoup more interchangeable, in as far as you can instantiate either a PageSoup or a PageSelene object, and the .find and .find_all function work in similar ways.

Inherits selene.core.page.Page

find(*args, **kwargs)[source]

Find and return specific a specific element within the page html

Parameters:
  • element (str) – the type of html element searched for e.g. ‘div’

  • attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

el

Return type:

ElementSoup

find_all(*args, **kwargs)[source]

Find and return all elements within the page html that meet the given criteria

Parameters:
  • element (str) – the type of html element searched for e.g. ‘div’

  • attributes (dict) – attributes of the searched element e.g. {“class”: “text-1”}

Returns:

els – all ElementSoup that meet criteria

Return type:

list

classmethod from_html(url, html, logger=None)[source]

Initialise a PageSoup instance from existing html source code.

Parameters:
  • url (str) – the url of the page

  • html (str) – the html code to parse

  • logger (logging.Logger) – a logger instance (see core.logger.py)

classmethod from_request(url, logger=None)[source]

Initialise a PageSoup instance by parsing a request to a web url.

Parameters:
  • url (str) – the url of the page

  • logger (logging.Logger) – a logger instance (see core.logger.py)

classmethod from_soup(url, soup, logger=None)[source]

Initialise a PageSoup instance from existing, parsed soup.

Parameters:
  • url (str) – the url of the page

  • soup

  • logger (logging.Logger) – a logger instance (see core.logger.py)

Module contents