Skip to content

DOM Detection

DOM detection is implemented in packages/page-controller/src/dom/ and called by PageController.updateTree().

  1. PageController.getBrowserState() calls updateTree().
  2. updateTree() calls dom.getFlatTree().
  3. flatTreeToString() converts the flat DOM map into simplified text.
  4. getSelectorMap() stores highlightIndex -> InteractiveElementDomNode.
  5. DOM actions use getElementByIndex(selectorMap, index).

The extension content script constructs PageController with:

new PageController({
enableMask: false,
viewportExpansion: 400,
})

The scanner fast-paths these native tags:

a, button, input, select, textarea, details, summary, label

The broader interactive element set also includes:

option, optgroup, fieldset, legend

It also considers an element interactive when it has:

  • An interactive computed cursor, such as pointer, text, grab, move, resize cursors, zoom-in, or zoom-out
  • contenteditable="true" or element.isContentEditable
  • ARIA roles including button, menu, menubar, menuitem, menuitemradio, menuitemcheckbox, radio, checkbox, tab, switch, slider, spinbutton, combobox, searchbox, textbox, listbox, option, and scrollbar
  • Attributes such as onclick, role, tabindex, data-action, or interactive ARIA attributes like aria-expanded, aria-checked, aria-selected, aria-pressed, aria-haspopup, aria-controls, and aria-owns
  • Event listener signals for click, mousedown, mouseup, keydown, keyup, submit, change, input, focus, or blur
  • Scrollability with meaningful scroll distance

For elements, isElementVisible() requires:

element.offsetWidth > 0
element.offsetHeight > 0
getComputedStyle(element).visibility !== 'hidden'
getComputedStyle(element).display !== 'none'

For text nodes, the code checks Range.getClientRects(), non-zero rect width and height, viewport intersection, and parent visibility. When viewportExpansion === -1, text nodes still require parent visibility through checkVisibility({ checkOpacity: true, checkVisibilityCSS: true }) or a computed-style fallback.

Top-element checks use document.elementFromPoint() at the center, top-left, and bottom-right sample points. Elements in iframes are treated as top elements by default.

Viewport checks use getClientRects() or getBoundingClientRect() and compare top, bottom, left, and right against the configured viewportExpansion.

flatTreeToString() emits interactive elements with a numeric index:

*[0]<input type=search placeholder=Search />
[1]<button aria-label=Submit>Search />
[2]<div role=listbox data-scrollable="top=0, bottom=420" />

* marks a newly observed interactive element. Attributes are filtered to a default allowlist:

title, type, checked, name, role, value, placeholder, data-date-format,
alt, aria-label, aria-expanded, data-state, aria-checked, id, for,
target, aria-haspopup, aria-controls, aria-owns, contenteditable

Long attribute values are truncated to 20 characters. Duplicate attribute values longer than five characters are removed, and redundant aria-label, placeholder, or title values are removed when they match the visible text.

The page controller returns:

interface BrowserState {
url: string
title: string
header: string
content: string
footer: string
}

header includes current page title, URL, viewport size, total page size, page position, and a scroll hint. content is the simplified HTML. footer contains either an end-of-page marker or a pixels-below scroll hint.