(According to its author. Pinch of salt required.)
Reading this short document is enough to cover every aspect of the API.
Be sure to check out the API documentation
Yes, the best webdriver is hosted on github
Yes, it's also available on NPM
- Slim code: 817 lines of code and 7 active classes, compared to the selenium-webdriver's 5654 lines of code and 92 classes
- 100% W3C's webdriver compliant. The code only ever makes pure webdriver calls
- (Having said that) Compatibility layer for specific browsers, in order to fix mistakes and gaps in drivers' implementations
- Well documented API which comes with a simple quickstart guide
- The API is async/await friendly. Each call returns a promise. Development is a breeze
- Easy to debug. There is a 1:1 mapping between calls and the webdriver protocol, without trickery
- Simple system to define sequences of webdriver UI actions
First of all, install best-webdriver
using NPM:
npm install --save best-webdriver
Also, make sure you install at least one webdriver on your computer:
Once you are done, you are pretty much ready to go.
To open up a driver, simply run:
;(async () => {
try {
const { drivers, Config, Actions } = require('best-webdriver')
// Create a new driver object, using the Chrome browser
var driver = new drivers.ChromeDriver(new Config())
// Create a new session. This will also run `chromewebdriver` for you
await driver.newSession()
// ...add more code here
// This is where code from this guide will live
} catch (e) {
console.log('ERROR:', e)
}
})()
If everything goes well, you will see a Chrome window appear. Note that that (async () => {
is there to make sure that you can use await
.
The role of the Chrome-specific Driver here is:
- To provide a way to execute Chrome's webdriver command
- To provide a software layer around Chrome's own limitations or mistakes in implementing the W3c protocol
Please note that in this guide it will always be assumed that the code is placed in // ...add more code here
, and that the async function, require and session creation won't be repeated.
Understanding how sessions are created is crucial. This section explains the config object itself (and helper methods), creating a session without spawning a webdriver process, and creating a session with the generic Driver.
Most of the time, especially when you are just starting with webdrivers, you tend to use APIs such as this one for one specific browser's webdriver. Most APIs (including this one) will spawn a Chrome webdriver process, for example, when you create a new session using the ChromeDriver:
var driver = new drivers.ChromeDriver(new Config())
At this point, no process is spawned yet. However, when you run:
await driver.newSession()
The driver, by default, will use the driver'srun()
method to spawn a chromedriver
process, and will then connect to it and create a new browsing session.
You can use any one of the chromedrivers available: ChromeDriver, FirefoxDriver, SafariDriver, EdgeDriver.
The basic configuration is pretty empty. To see it:
var config = new Config()
var params = config.getSessionParameters()
console.log('Session parameters:', require('util').inspect(params, { depth: 10 } ))
This is display the configuration object created by default by the Chrome browser. You will see:
{
capabilities: {
alwaysMatch: {
goog:chromeOptions: { w3c: true },
},
firstMatch: []
}
}
It's important that you understand the configuration option:
- It must have a
capabilities
key - Under
capabilities
, it must have the keysalwaysMatch
(object) andfirstMatch
(an array) - It may have more keys in the object's root namespace
goog:chromeOptions
(underalwaysMatch
) represents Chrome-specific options. In this case,w3c:true
is specified in order to use Chrome with this API (since this API implements webdriver in its pure form, you need Chrome to use the W3c protocol as much as possible).
You can set the config options using the setting methods:
var config = new Config()
config.setAlwaysMatch('browserName', 'chrome')
.setAlwaysMatch('pageLoadStrategy', 'eager')
.addFirstMatch({ platformName: 'linux' })
.set('login', 'merc')
.set('password', 'youwish')
.setSpecific('chrome', 'detach', true)
var params = chrome.getSessionParameters()
console.log('Session parameters:', require('util').inspect(params, { depth: 10 } ))
You will see:
{
// set by set()
login: 'merc',
pass: 'youwish',
capabilities: {
alwaysMatch: {
goog:chromeOptions: {
// Always here, to make Chrome compliant
w3c: true,
// Set by setSpecific()
detach: true
},
// Set my setAlwaysMatch()
browserName: 'chrome',
pageLoadStrategy: 'eager'
},
firstMatch: [
// Added by addFirstmatch()
{ platformName: 'linux' }
]
}
}
Remember that in Config#setAlwaysMatch, Config#set and Config#setSpecific, the key can actually be a path: if it has a .
(e.g. chrome.setAlwaysMatch('timeouts.implicit
), the property capabilities.alwaysMatch.timeouts.implicit
will be set.
You might decide to use this API without spawning a process for the chromedriver. This is especially handy if you are using for example an online service, or a webdriver already running on a different machine.
Here is how you do it. Notice the spawn: false
property:
// Create the driver, using that browser's
// configuration WITHOUT spawning a chromedriver process
var driver = new drivers.ChromeDriver(new Config(), {
spawn: false,
hostname: '10.10.10.45',
port: 4444
})
Note that since you are using the ChromeDriver driver, the remote end will be assumed to be a Chrome webdriver: it will fix any mistakes and partial implementations of the W3C protocol.
Lastly, you might want to connect to a generic webdriver proxy, which will accept your session requirement and will provide you with a suitable browser. In this case, you will use the generic driver Driver, which is a "plain" driver without the ability to spawn a webdriver process (obviously) and, more cruclaly, no browser-specific layering to fix problems with vendor-specific issues with their implementation.
Here is how you would run it:
// Create a new generic browser object, specifying the alwaysMatch parameter
var config = new Config()
// We only care that this is a linux browser
config.setAlwaysMatch('platformName', 'linux')
// Creating the driver
var driver = new drivers.Driver(config, {
hostname: '10.10.10.45',
port: 4444
})
Note that you are using the generic Driver, which means that no browser-specific workarounds for W3C compliance will be applied.
If you have the following chunk of code:
// Create a new driver object, using the Chrome browser
var driver = new drivers.ChromeDriver(new Config())
// Create a new session. This will also run `chromewebdriver` for you
await driver.newSession()
You can then run commands using the webdriver. There are three types of call:
- Calls that will deal with parameters and values on the currently opened page
- Calls that will return objects Driver#findElement and Driver#findElement
- Call to run user Actions
Finally, all calls can be "polled", which implies re-running the command at intervals until it succeeds, or until it fails (after it reaches a timeout).
Once you've created a driver object, you can use it to actually make webdriver calls.
For example:
var driver = new drivers.ChromeDriver(new Config())
await driver.newSession()
await driver.navigateTo('https://www.google.com')
var screenshotData = await driver.takeScreenshot()
var src = await driver.getPageSource()
var title = await driver.getTitle()
await driver.refresh()
All of these commands are self-explanatory, and fully documented in the Driver documentation (basically, all of the listed calls under the Driver object)
Remember that there is a 1:1 mapping between driver calls and Webdriver calls.
Some of the driver calls will return an Element object. For example:
await driver.navigateTo('https://www.google.com')
var el = await driver.findElementsCss('[name=q]')
The returned element will be an instance of Element, created with the data returned by the findElementCss()
call.
An element object is simply an object with a reference to the Driver
that created it, and a unique ID returned by the webdriver call.
Element objects have several element-related methods. For example, you can get the tag name for a found element:
await driver.navigateTo('https://www.google.com')
var el = await driver.findElementsCss('[name=q]')
var tagName = await el.getTagName()
More importantly, Element objects also offer methods that will return elements. In this case, the search will be limited to elements children of the element being searched. For example:
await driver.navigateTo('https://www.example.com')
// Get the OL tag
var ol = await driver.findElementsTagName('ol')
// Get the LI tags within OL
var lis = await ol.findElementsTagName('li')
Actions are a rather complex part of the webdriver specs. Actions are important so that you can get the browser to perform a list of timed, complex UI actions.
Actions are always performed by either a keyboard device, or a pointer device (which could be a MOUSE
, TOUCH
or PEN
)
Once the action object is created, you can add "ticks" to it using the
property tick
(which is actually a getter). The way you use tick
depends on the devices you created.
If you call the constructor like this:
var actions = new Actions()
It's the same as writing:
var actions = new Actions(
new Actions.Keyboard('keyboard'),
new Actions.Pointer('mouse', Pointer.Type.MOUSE)
)
This will make two devices, mouse
and keyboard
, available.
Such a scenario will allow you to call:
actions.tick.keyboardDown('r').mouseDown()
actions.tick.keyboardUp('r').mouseUp()
Here, keyboardUp
was available as a combination of the keyboard ID keyboard
and the keyboard action Up
.
In short:
- Keyboard devices will have the methods
Up
,Down
- Pointer devices will have the methors
Move
,Up
,Down
,Cancel
- Both of them have the method
pause
If you create an actions object like this:
var actions = new Actions(new Actions.Keyboard('cucumber'))
You are then able to run:
actions.tick.cucumberDown('r')
actions.tick.cucumberUp('r')
However running:
actions.tick.cucumberMove('r')
Will result in an error, since cucumber
is a keyboard device, and it doesn't
implement move
(only pointers do)
If you have two devices set (like the default keyboard
and mouse
, which
is the most common use-case), you can set one action per tick:
var actions = new Actions() // By default, mouse and keyboard
// Only a keyboard action in this tick. Mouse will pause
actions.tick.keyboardDown('r')
// Only a mouse action in this tick. Keyboard will pause
actions.tick.mouseDown()
// Both a mouse and a keyboard action this tick
actions.tick.keyboardUp('r').mouseUp()
You can only add one action per device in each tick. This will give an error,
because the mouse
device is trying to define two different actions in the same
tick:
actions.tick.mouseDown().mouseUp()
You are able to chain tick calls if you want to:
actions
.tick.keyboardDown('r').mouseDown()
.tick.keyboardUp('r').mouseUp()
Once you have decided your actions, you can submit them:
await driver.performActions(actions)
You can set multiple touch devices, and use them for multi-touch:
var actions = new Actions(
new Actions.Pointer('finger1', Pointer.Type.TOUCH),
new Actions.Pointer('finger2', Pointer.Type.TOUCH)
)
// Define actions: Moving two fingers vertically at the same time
actions
.tick.finger1Move({ x: 40, y: 40 }).finger2Move({ x: 40, y: 60 }
.tick.finger2Move({ x: 40, y: 440 }).finger2Move({ x: 40, y: 460 }
// Actually perform the actions
driver.performActions(actions)
You can also move a pointer over a specific element, specifying how long it will take (in milliseconds):
await driver.navigateTo('https://www.google.com')
var el = await driver.findElementsCss('[name=q]')
var actions = new Actions(new Actions.Pointer('mouse', Pointer.Type.MOUSE))
// Moving over `el`, taking 1 second
actions.tick.mouseMove({ origin: el, duration: 1000 })
Keyboard devices can perform:
Mouse devices can perform:
The Actions class documentation explains exactly how actions work.
When writing tests for web sites and applications, timing can become an issue. For example while you know that your page will be load after this:
await driver.navigateTo('https://www.google.com')
What you don't know is this: have all of the AJAX finished fetching data? Has all of the DOM been updated after the event?
The answer is "you don't know". So, the ability to poll is very important.
This API has the simplest, most streamlined approach possible i nterms of polling: there is only one call, waitFor()
, which is available in Element#waitFor and {@Driver#waitFor} objects.
The way it works is really simple: waitFor()
actually acts as a proxy to the real object calls, wit hthe twist that it will retry them until they work out. Each call will also accept one extra parameter (compared to their signature), which is a function that will also return a truly value for the call to be successful.
So, while you would normally do:
var el = driver.findElementCss('#main')
If you wanted to wait, you would run the following call, which will run findElementsCss()
every 300ms, until it's finally worked or until the default timeout of 10000ms (10 seconds) has expired:
var el = await driver.waitFor().findElementCss('#main')
You can set different poll interval and timeout:
driver.setPollTimeout(15000)
driver.setPollInterval(200)
Or, you can set them on a per-call basis:
driver.waitFor(15000, 300).findElementCss('#main')
Finally, you can add one extra parameter to the call: it will be
driver.waitFor().findElementsCss('li', (r) => r.length))
In this case, the callback (r) => r.length
will only return truly when r
(the result from the call) is a non-empty array.
Behind the scenes, waitFor()
returns a proxy object which will in turn run the call and check that it didn't return an error; it also checks that the result passes the required checker function, if one was passed.
The result of this is that one simple chained method, Driver#waitFor/Element#waitFor, turns every call for Driver and Element into a polling function able to check the result.
The main limitation of this API is that it will only ever speak in w3c webdriver protocol. For example, as of today Chrome doesn't yet implement Actions. While other APIs try to "emulate" actions (with crippling limitations) by calling non-standard endpoints, this API will simply submit the actions to the chrome webdriver and surely receive an error in response.
Another limitation is that it's an API that is very close to the metal: you are supposed to understand how the session configuration works, for example; so, while you do have helper methods such as setAlwaysMatch()
, addFirstMatch()
etc., you are still expected to understand what these calls do. Also, browser-specific parameters are added via setSpecific()
; however, there are no helpers methods to get these parameters right. For example, if you want to add plugins to Chrome using the extensions
option, you will need to create an array of packed extensions loaded from the disk and converted to base64. This may change in the future, as this API matures; however, it won't add more classes and any enhancement will always be close enough to the API to be easy to understand.
That's all you need -- time to get testing!