Fantastic Web Elements and how to find them — a guide to locate web elements for testing and web scraping

Mkhitar Mkrtchyan
9 min readMar 1, 2021

--

Preface (if you know why you need to locate web elements skip this section) — Locating web elements is needed for testing (regardless of the framework), web scraping, and other tasks. You may think that locating the right element is an easy task as walking in the park, but depending on the architecture, rendering logic, application flow, elements are becoming unstable (even mercurial). Another reason to well organize elements is to optimize your code and make it elegant, exquisite, and readable.

Here I’ve collected ways of targeting web elements each of the methods has different difficulty levels — basic, intermediate, advanced.

Locating element by CSS

The fastest, the most efficient, and the most widely used way of locating elements. There is a wide variety of syntaxes for locating elements:

Basic

  • .classname — selects all elements with given class name e.g. .titleColumn
  • .classname1.classname2 — select elements that have both classnames e.g. for element <a class = “topRatted anime”> the selector will be .topRated.anime (no space between).
  • .classname1 .classname2 — if there is a space between, selects element with classname2 which parent is classname1
  • #id — e.g. <a id=“email”> element can be found as #email
  • element — by name of the tag e.g. h2, p, input etc.
  • element.class — e.g. h2.title — selects h2 element with class name equals to title
  • element1, element2 — e.g. p, div, td selects all div and td elements
  • element1 element2 — e.g. div a all “a” child and grandchild elements of div
  • element1>element2 — selects element2 which is child of element1 e.g. div>a
  • element1 + element2 — selects element2, which is placed immediately after element1
  • [attribute] — Selects all elements that has given attribute e.g. [aria-label]
  • [attribute = “value”] — e.g. [data-testid = ‘email’]

The last one is the most common way of locating web elements by css.

Intermediate (Locating by part of the value)

Using [attribute = “value”] way of locating web elements we can target them by having only part of the value.

  • [attribute~=val] — attribute’s value contains val e.g. <a data-testid = “id for testing link”> the locator will be [data-testid~=“for”] or another word of the value. Instead of space can be following characters as well period ., hyphen -, underscore _.
  • [attribute*=val] — also contains, but unlike of ~=, val can be part of the word. For instance, for<a data-testid = “id for testing link”> the locator will be [data-testid*=“sting”] as part of the word testing
  • [attribute |= “val”] — attribute’s value starts with val word <a data-testid = “id for testing link”> the locator will be [data-testid|=“id”]
  • [attribute^= “val”] — Also starts with, but here the val shouldn’t be the whole word, can be part of it e.g. element <div data-testid=’itemKZ98876654’> can be located by[data-testid^=’item’]
  • [attribute$= “val”] — Ends with val e.g. <a data-testid = “id for testing link”> can be located by [data-testid$=“link”]

Locating element by class

Mostly if we’re locating elements by class name to collect a list of elements. The syntaxis is simple to locate in DOM - just write .className. For example to collect all films title in the IMDB Top 250 just search for .titleColumn (same as in css, difference is used method in the code — getElementByClassName() vs getElementByCss())

Searching for elements with class name .titleColumn

In your code, you need to use a method dedicated to getting by className — for example in js — document.getElementByClassName(“className”) like shown in the screenshot below.

All located elements are enclosed within one array and can be easily accessed

It returns you an array-like object with 250 entries and you can access the desired one by the index — elementsWithQueriedClassName[0]

In Selenium (Java), you should use driver.findElements(By.className(“classname”)) which returns a List of elements. No matter the framework and language outcome are similar…

Locating element by id

To find element by id in the console you need to type # at the beginning of the id: #myId (same as in CSS, the difference is in the method used), and in the code respectively use methods for id: .getElementById(“myId”), By.id(“myId”), etc.

Easy right, but the expectations are breaking apart when they’re facing reality… Here are 2 inconveniences

  1. You have to be lucky enough to have well organized, consistent class names and ids on all elements
  2. In both of these cases, there are no advanced ways to locate elements like when using XPath or CSS selectors i.e. only basic level

Locating element by XPath (XML path)

Xpaths nowadays are not frequently used, mostly because of the speed in comparison with CSS selectors, but it offers so many options to select an element precisely, that no other means has. Thus in the case of hard locating or unstable elements, the performance tradeoff is acceptable. Let’s start with the basic use of XPath.

Basic

The syntax for finding a web element by XPath is very straightforward:

  • / — Slash for navigating from parent to child element — html/body/div
  • [ ] — To indicate the number of matching element — html/body/div[4] (fourth div child of the body)
  • [@attribute = “value” ] — To indicate the certain node’ attribute — [@id= “body”] (value within quotes is optional)
  • // — To start from the node by tag name — //tr[8]
  • //* — To start from the node by attribute name — //*[@id= “body”] (find any* element, that has “id” attribute with “body” value)
  • .. — one of XPath’s unique features, that allows you to navigate from child to parent element. This is very handy, if you have a relative path or you have code that generates the path.
Navigating to parent elements from relative XPath

Here we talked about the relative path, but depending on the parent we distinguish 2 types of XPath

  • Absolute — the rating of the first movie in IMDB top 250 is html/body/div[3]/div/div[2]/div[3]/div/div[1]/div/span/div/div/div[3]/table/tbody/tr[1]/td[3]/strong

(try it)

  • Relative — starting from an element, which has a unique identifier. For the same element, the relative XPath would be

//*[@id=”main”]/div/span/div/div/div[3]/table/tbody/tr[1]/td[3]/strong

As you can see, the highlighted div element has a unique id attribute, starting from which it’s possible to unequivocally locate the desired element.

Intermediate (Predicates & more)

Let’s start intermediate use of XPath from enumeration of the element (div[5]). In some cases, we may not know the exact number. So followings come to help:

  • last() — Selects the last of querying element, like in the case of IMDB top 250, by the following relative path you’ll find the last of the list //*[@class= “lister-list”]/tr[last()]. The last() stands as a number, so you can easily do arithmetic operations f.e. …/tr[last() — 8] etc. More about operators in the advanced section
  • position() — Mostly used to locate list of elements by comparison of its position e.g. //*[@class= “lister-list”]/tr[position()<101] to locate first 100 films of the top, or …/tr[position()>200] for last 50 moves, you can put equal sign as well but it’s dump, because …/tr[position()=10] is the same as /tr[10]
  • @attribute — to select an element that has a specific attribute — …/div[@data-testid]
  • @attribute and value — to select an element that has a specific attribute with a specific value — …/div[@data-testid= “customId”]
  • @attribute, value, and logic …/input[@value>90] or …/div[@tabindex<1], basically it’s applicable to any attribute that has a numeric value.
  • text() — useful in quadrillion of cases //tag[text()= “containing text”] e.g. //a[text()=”The Godfather”]

Xpath allows you to select many elements as well, but in my opinion, they are not widely used.

  • //@* — selects all nodes that have at least one attribute.
  • //* — all nodes (not nested ones)
  • //node() — all nodes, including nested
  • By tag — //div[@*] — all divs that have at least one attribute.
  • “|” as “and” — if you want to select multiple elements by different xpaths you can do it with | e.g //div[@role = “button”] | //a[@href] selects a list applicable for both conditions
The number of found elements are indicating how waste is the coverage

Another marvelous feature of XPath allows us to locate elements, even when the part of the locator is dynamically changing. For example, you have an element to locate with data-testid but the part of the value is not constant e.g. <… data-testid = “film-cRnSxT4ODC”> and <… data-testid = “film-KCIZVMRRJ4”> are the same. Here XPath offers two ways of locating.

  • starts-with//*[starts-with(@attributeName, “value”)] e.g. //*[starts-with(@data-testid, “film”)]
  • contains — //*[contains(@attributeName, “value”)] in this case the value can be anywhere, even in the middle of a word.

Note that in both cases the value is case sensitive.

Advanced (Logic, axes & more)

Here I’ll try to reveal the whole might of the XPath. Xpath allows us to use operators for numerical and logical operations:

Numerical operators that are returning a numeric value

Logical operators that are returning true/false

Besides above mentioned logical operations there are also and & or e.g.:

//input[@class = “rating” and @value>7] — selects element(s) that for which the expression is true

//a[text()= “The Godfather” or text()= “12 Angry Men”] — selects both items

And now axes… But first, what is an axis? — an axis represents a relationship to the context (current) node.

For example you have an Xpath, but need to find another one relative to it.

  • ancestor //a[@id= “email”]/ancestor::div selects all the div elements that are parents of given element
  • ancestor-or-self//a[@id= “email”]/ancestor-or-self::div selects all div elements that are parents and current element as well (if it’s a div)
  • child//a[@id= “email”]/child::div selects child div element
  • descendant//a[@id= “email”]/descendant::div select all child (and grandchild) div elements
  • descendant-or-self//a[@id= “email”]/descendant::div select all child (and grandchild) div elements including given element as well (if it’s a div)
  • following//a[@id= “email”]/following::div select all elements after closing tag (/> or </a> in this case) of current node
  • following-sibling//a[@id= “email”]/following-sibling::div following div element of the same level.
  • parent//a[@id= “email”]/parent::div selects parent div element. If the parent isn’t a div, selects nothing.
  • preceding //a[@id= “email”]/preceeding::div all preceding div elements
  • preceding-sibling//a[@id= “email”]/preceeding-sibling::div all div “siblings” (elements that have the same parent) of the node.

Bonus. If the element vanishes on click, can be a tooltip, a dropdown option container, description, toast, etc. you can fix the state by putting a breakpoint on the element.

For example, the “New Message” tooltip on the Facebook page.

Once you move the mouse away, the tooltip disappears. To locate this, you need to put a breaking point.

Notice, that the attributes of the element are changing on hovering, Chrome indicates it by highlighting. On the left side of the element, a three-dot menu is appearing. By clicking on that element, you can find “Break on”.

In this case, on hovering over the element the attribute is modifying, but if you’re not sure, check all of them. Next time on hovering over the element (when the attribute is changing), the page will freeze at its state.

Then you can easily find the element you want in numerous ways described above.

--

--