Getting started with selenider

This vignette introduces you to the basics of automating a web browser using selenider.

library(selenider)

Starting the session

To use selenider, you must first start a session with selenider_session(). If you don’t do this, it is done automatically for you, but you may want to change some of the options from their defaults (the backend, for example). Here, we use chromote as a backend (the default), and we set the timeout to 10 seconds (the default is 4).

session <- selenider_session(
  "chromote",
  timeout = 10
)

The session, once created, will be set as the local session inside the current environment, meaning that in this case, it can be accessed anywhere in this script, and will be closed automatically when the script finishes running.

One thing to remember is that if you start a session inside a function, it will be closed automatically when the function finishes running. If you want to use the session outside the function, you need to use the .env argument. For example, let’s say we want a wrapper function around selenider_session() that always uses selenium:

# Bad (unless you only need to use the session inside the function)
my_selenider_session <- function(...) {
  selenider_session("selenium", ...)
  # The session will be closed here
}

# Good - the session will be open in the caller environment/function
my_selenider_session <- function(..., .env = rlang::caller_env()) {
  selenider_session("selenium", ..., .env = .env)
}

Use open_url() to navigate to a website. selenider also provides the back() and forward() functions to easily navigate through your search history, and the reload() function to reload the current page.

open_url("https://www.r-project.org/")

open_url("https://www.tidyverse.org/")

back()

forward()

reload()

Selecting elements

Use s() to select an element. By default, CSS selectors are used, but other options are available.

header <- s("#rStudioHeader")

header
#> { selenider_element }
#> <div id="rStudioHeader">
#>   \n          <div class="band">\n            <div class="innards bandContent ...
#> </div>

For example, an XPath can be used instead. XPaths can be useful for more complex selectors, and are not limited to selecting from the ancestors of the current element. However, they can be difficult to read.

s(xpath = "//div/a")
#> { selenider_element }
#> <a class="productName" href="/">
#>   Tidyverse
#> </a>

Use ss() to select multiple elements.

all_links <- ss("a")

all_links
#> { selenider_elements (25) }
#> [1] <a class="productName" href="/">Tidyverse</a>
#> [2] <a class="menuItem " href="/packages/">Packages</a>
#> [3] <a class="menuItem " href="/blog/">Blog</a>
#> [4] <a class="menuItem " href="/learn/">Learn</a>
#> [5] <a class="menuItem " href="/help/">Help</a>
#> [6] <a class="menuItem " href="/contribute/">Contribute</a>
#> [7] <a href="https://dplyr.tidyverse.org"><img src="/css/images/hex/dplyr.png" al ...
#> [8] <a href="https://ggplot2.tidyverse.org"><img src="/css/images/hex/ggplot2.png ...
#> [9] <a href="https://readr.tidyverse.org"><img src="/css/images/hex/readr.png" al ...
#> [10] <a href="https://forcats.tidyverse.org"><img src="/css/images/hex/forcats.png ...
#> [11] <a href="https://stringr.tidyverse.org"><img src="/css/images/hex/stringr.png ...
#> [12] <a href="https://tibble.tidyverse.org"><img src="/css/images/hex/tibble.png"  ...
#> [13] <a href="https://tidyr.tidyverse.org"><img src="/css/images/hex/tidyr.png" al ...
#> [14] <a href="https://purrr.tidyverse.org"><img src="/css/images/hex/purrr.png" al ...
#> [15] <a href="/packages">collection of R packages</a>
#> [16] <a href="https://r4ds.hadley.nz/" target="_blank" rel="noopener">online</a>
#> [17] <a href="http://amzn.to/2aHLAQ1" target="_blank" rel="noopener">the book</a>
#> [18] <a href="/learn/">resource</a>
#> [19] <a href="http://amzn.to/2aHLAQ1"><img class="bookCover" src="/images/cover.pn ...
#> [20] <a href="/help/#reprex">reprex</a>
#> ...

Use find_element() and find_elements() to find child elements of an existing element. These can be chained with the pipe operator (|>) to specify paths to elements. Just like s() and ss(), a variety of selector types are available, but CSS selectors are used by default.

tidyverse_title <- s("#rStudioHeader") |>
  find_element("div") |>
  find_element(".productName")

tidyverse_title
#> { selenider_element }
#> <a class="productName" href="/">
#>   Tidyverse
#> </a>

menu_items <- s("#rStudioHeader") |>
  find_element("#menu") |>
  find_elements(".menuItem")

menu_items
#> { selenider_elements (5) }
#> [1] <a class="menuItem " href="/packages/">Packages</a>
#> [2] <a class="menuItem " href="/blog/">Blog</a>
#> [3] <a class="menuItem " href="/learn/">Learn</a>
#> [4] <a class="menuItem " href="/help/">Help</a>
#> [5] <a class="menuItem " href="/contribute/">Contribute</a>

Use elem_children() and friends to find elements using their relative position to another.

s("#menuItems") |>
  elem_children()
#> { selenider_elements (5) }
#> [1] <a class="menuItem " href="/packages/">Packages</a>
#> [2] <a class="menuItem " href="/blog/">Blog</a>
#> [3] <a class="menuItem " href="/learn/">Learn</a>
#> [4] <a class="menuItem " href="/help/">Help</a>
#> [5] <a class="menuItem " href="/contribute/">Contribute</a>

s("#menuItems") |>
  elem_ancestors()
#> { selenider_elements (8) }
#> [1] <html lang="en-us"><head>\n\n\n\n\n\n<link rel="stylesheet" href="/css/fonts. ...
#> [2] <body>\n    <div id="appTidyverseSite">\n      <div id="main">\n        \n    ...
#> [3] <div id="appTidyverseSite">\n      <div id="main">\n        \n        <div id ...
#> [4] <div id="main">\n        \n        <div id="rStudioHeader">\n          <div c ...
#> [5] <div id="rStudioHeader">\n          <div class="band">\n            <div clas ...
#> [6] <div class="band">\n            <div class="innards bandContent">\n           ...
#> [7] <div class="innards bandContent">\n              <div>\n                <a cl ...
#> [8] <div id="menu">\n  <div id="menuToggler"></div>\n  <div id="menuItems" class= ...

You can use elem_filter() and elem_find() to filter collections of elements using a custom function. elem_find() returns the first matching element, while elem_filter() returns all matching elements. These functions use the same interface as elem_expect(): see the “Expectations” section below.

# Find the blog item in the menu
menu_items |>
  elem_find(has_text("Blog"))
#> { selenider_element }
#> <a class="menuItem " href="/blog/">
#>   Blog
#> </a>

# Find the hex badges on the second row
s(".hexBadges") |>
  find_elements("img") |>
  elem_filter(
    \(x) substring(elem_attr(x, "class"), 1, 2) == "r2"
  )
#> { selenider_elements (3) }
#> [1] <img src="/css/images/hex/ggplot2.png" alt="ggplot2 hex sticker" class="r2 c0">
#> [2] <img src="/css/images/hex/forcats.png" alt="forcats hex sticker" class="r2 c1">
#> [3] <img src="/css/images/hex/tibble.png" alt="tibble hex sticker" class="r2 c2">

Interacting with an element

selenider elements are lazy, meaning that when you specify the path to an element or group of elements, they are not actually located in the DOM until you do something with them.

There are three types of functions that force an element to be collected:

Most functions that act on elements use the elem_ prefix.

Actions

There are various ways to interact with a HTML element.

Use elem_click(), elem_right_click(), or elem_double_click() to click on an element, and elem_hover() to hover over an element. Use elem_scroll_to() to scroll to an element before clicking it, which is useful if the element is not currently in view.

s(".blurb") |>
  find_element("a") |> # List of packages
  elem_scroll_to() |>
  elem_click()

Some links will not work when clicked on, since they will open their content in a new tab. Use open_url() manually to solve this. This approach is recommended over using elem_click(), as it is more reliable.

s(".packages") |>
  find_elements("a") |>
  elem_find(has_text("dplyr")) |> # Find the link to the dplyr documentation
  elem_attr("href") |> # Get the URL
  open_url()

Use elem_set_value() to set the value of an input element, and elem_clear_value() to clear the value.

s("input[type='search']") |>
  elem_set_value("filter")

# Go back to the main page
back()
back()

selenider also provides a elem_submit() function, allowing you to submit a HTML form using any element inside the form.

Properties

HTML elements have a number of accessible properties.

# Get the tag name
s("#appTidyverseSite") |>
  elem_name()
#> [1] "div"

# Get the text inside the element
s(".tagline") |>
  elem_text()
#> [1] "\n          R packages for data science\n          "

# Get an attribute
s(".hexBadges") |>
  find_element("img") |>
  elem_attr("alt")
#> [1] "dplyr hex sticker"

# Get every attribute
s(".hexBadges") |>
  find_element("img") |>
  elem_attrs()
#> $src
#> [1] "/css/images/hex/dplyr.png"
#> 
#> $alt
#> [1] "dplyr hex sticker"
#> 
#> $class
#> [1] "r1 c0"

# Get the 'value' attribute (`NULL` in this case)
s("#homeContent") |>
  elem_value()
#> NULL

# Get a CSS property
s(".tagline") |>
  elem_css_property("font-size")
#> [1] "36px"

Conditions

Conditions are predicate functions on HTML elements. Unlike all other functions in selenider, they do not wait for the element to exist or for the condition to be met: they return TRUE or FALSE (or throw an error) instantly. For this reason, they are designed to be used with elem_expect() and elem_wait_until(), which will automatically wait for conditions to be met.

There are a wide range of conditions, many of which do the same thing. Each HTML property has a corresponding condition, and selenider also provides conditions for basic checks like is_present(), is_visible() and is_enabled(). In the documentation for any condition, you can find all other conditions in the “See Also” section.

s(".hexBadges") |>
  is_present()
#> [1] TRUE

Expectations

selenider provides a concise testing interface using the elem_expect() function. Provide an element, and one or more conditions, and the function will wait until all the conditions are met. Conditions can be functions or simple calls (e.g. has_text("text") will be turned into has_text(<THE ELEMENT>, "text")). elem_expect() tends to work well with R’s lambda function syntax.

s(".tagline") |>
  elem_expect(is_present) |>
  elem_expect(has_text("data science"))

s(".hexBadges") |>
  find_element("a") |>
  elem_expect(is_visible, is_enabled)

s("#menu") |>
  find_element("#menuItems") |>
  elem_children() |>
  elem_expect(has_at_least(4))

s(".productName") |>
  elem_expect(
    \(x) substring(elem_text(x), 1, 1) == "T" # Tidyverse starts with T
  )

Errors try to give as much information as possible. Since we know this condition is going to fail, we’ll set the timeout to a lower value so we don’t have to wait for too long.

s(".band.first") |>
  find_element(".blurb") |>
  find_element("code") |>
  elem_expect(has_text('install.packages("selenider")'), timeout = 1)
#> Error in `elem_expect()`:
#> ! Condition failed after waiting for 1 seconds:
#> `has_text("install.packages(\"selenider\")")`
#> ℹ `x` does not have text "install.packages(\"selenider\")".
#> ℹ Actual text: "install.packages(\"tidyverse\")".

And (&&), or (||) and not (!) can be used as if the conditions were logical values. Additionally, you can omit the first argument to elem_expect() (but in this case, all conditions must be calls).

s(".random-class") |>
  elem_expect(!is_present)

s(".innards") |>
  elem_expect(is_visible || is_enabled)

elem_1 <- s(".random-class")

elem_2 <- s("#main")

# Test that either the first or second element exists
elem_expect(is_present(elem_1) || is_present(elem_2))

Use elem_wait_until() if you don’t want an error to be thrown if a condition is not met. elem_wait_until() will do the exact same thing as elem_expect() but always returns TRUE or FALSE.

elem_wait_until(is_present(elem_1) || is_present(elem_2))
#> [1] TRUE

The syntax used for elem_expect() and elem_wait_until() can also be used in elem_filter() and elem_find() to filter element collections. Additionally, selenider provides elem_expect_all() and elem_wait_until_all() to test a condition on every element in a collection.

s(".hexBadges") |>
  find_elements("a") |>
  elem_expect_all(is_visible)

Once we are done, we do not need to close the session; it is closed for us automatically!

mirror server hosted at Truenetwork, Russian Federation.