Start | ZIP Release | Maven | Windows Install Guide | Debugging | Karate - Main Index |
---|---|
Config |
driver
| robot options
|
Concepts |
Methods
| Element
| Window
| Finding Windows
| Debugging
| Retries
| karate.fork()
| Utility Functions
| Conditional Start
|
Locators | Windows Locators | Image Locators | OCR Locators |
App |
window()
| windowExists()
| windowOptional()
| waitForWindowOptional()
| robot.root
| robot.active
| robot.focused
| robot.location
| robot.region()
| robot.clipboard
| robot.allWindows
| screenshot()
| screenshotActive()
|
Actions |
click()
| doubleClick()
| rightClick()
| move()
| press()
| release()
| input()
| focus()
| select()
| highlight()
| highlightAll()
|
State |
exists()
| optional()
| waitForOptional()
| locate()
| locateAll()
|
Retry / Wait |
retry()
| waitFor()
| waitUntil()
| delay()
|
- Available as a standalone binary via the ZIP Release
- Native Mouse Events
- Native Keyboard Events
- Windows object-recognition using Microsoft UI Automation
- Navigation via image detection - cross-platform (mac, win, linux) via JavaCPP and OpenCV
- OCR driven navigation and text extraction - cross-platform (mac, win, linux) via JavaCPP and Tesseract
- Tightly integrated into Karate - which means a debugger, HTML reports, and more
- Clicking the native "File Upload" button in a Web Page - Link
- details, code and explanation here
- Clicking a button in an iOS Mobile Emulator - Link
- Windows automation by natively accessing UI controls and the window / object tree - Link
- Refer to the
examples/robot-test
project which is a stand-alone Maven project that can be used as a starting point - Opening a browser tab and performing actions - Link
If you are not that experienced with programming - or don't want to set up a Java development environment, please look at the ZIP Release which you can run using Visual Studio Code.
Maven (or Gradle) users can read on below. Make sure you follow the Karate conventions and you can use the examples/robot-test
project as a template.
The karate-robot
capabilities are not part of the karate-core
, because they bring in a few extra dependencies.
Add this to the <dependencies>
:
<dependency>
<groupId>com.intuit.karate</groupId>
<artifactId>karate-robot</artifactId>
<version>${karate.version}</version>
<scope>test</scope>
</dependency>
This may result in a few large JAR files getting downloaded by default because of the javacpp-presets
dependency. But you can narrow down to what is sufficient for your OS by following these instructions.
This is one of the highlights of Karate's capabilities. You can see a video of it in action here.
Refer to the documentation on how to set it up and use it: Karate Robot Windows Install Guide.
Karate Robot is designed to only activate when you use the robot
keyword, and if the karate-robot
Java / JAR dependency is present in the project classpath.
Here Karate will look for an application window called Chrome
and will "focus" it so that it becomes the top-most window, and be visible. This will work on Mac, Windows and Linux (X Window System / X11).
* robot { window: 'Chrome' }
In development mode, you can switch on a red highlight border around areas that Karate finds via image matching. Note that the ^
prefix means that Karate will look for a window where the name contains Chrome
.
* robot { window: '^Chrome', highlight: true }
You can use fork
to run a console command to start an application if needed, before "activating" it. Also see karate.fork()
If you want to do conditional logic depending on the OS, you can use
karate.os
- for e.g.* if (karate.os.type == 'windows') karate.set('filename', 'start.bat')
The keys that the robot
keyword supports are the following:
key | description |
---|---|
window |
(optional) the name of the window to bring to focus, and you can use a ^ prefix to do a string "contains" match or ~ for a regular-expression match, also see window() |
fork |
(optional) calls an OS executable and takes a string (e.g. 'some.exe -h' ), string-array (e.g. ['some.exe', '-h'] ) or JSON as per karate.fork() |
autoClose |
default true - to close the current window if fork was used on startup |
attach |
defult true - if the window exists, fork will not be executed |
basePath |
defaults to null , which means the "find by image" search will be relative to the "entry point" feature file, but can be used to point to prefixed / relative paths such as classpath:some/folder |
highlight |
default false if an element (or image) match should be highlighted |
highlightDuration |
default 3000 - time to highlight in milliseconds |
retryCount |
default normally 3 - overrides the default retry() count, this applies only for finding the window after a fork was executed |
retryInterval |
default normally 3000 - overrides the default retry() interval, this applies only for finding the window after a fork was executed |
autoDelay |
default 0 - time delay added (in milliseconds) after a native action (key press, mouse click), you can set this to a small value e.g. 40 only in case of any issues with OS actions being too fast, etc |
tessData |
default tessdata - the path to a directory where the Tesseract (OCR engine) data files will be looked for, this is needed only if you use an OCR Locator or attempt to call Element.extract() . Note that the default value "tessdata " is all lower-case. |
tessLang |
default eng - the default OCR language to use, see OCR Locator |
For convenience, the same pattern in Karate UI is supported, where you can have a "central" config, perhaps set-up in karate-config.js
- and have your tests specify the "intent" (or even over-ride "global" config) more clearly:
* configure robot = { highlight: true }
# and then later
* robot { window: '^My App' }
# or even
* robot '^My App'
The fork
option simply calls karate.fork()
which means that you can use it directly within a test any time you want to start any OS process. This is convenient to implement conditional logic, for e.g. to start an application involving a different main window - if a certain window does not exist.
Here's an example using karate.call()
:
* robot { highlight: true, highlightDuration: 500, autoClose: false }
* if (!windowExists('^Main Window')) karate.call('sign-in.feature')
And sign-in.feature
looks like this. This example code below also showcases a few Karate capabilities extremely relevant for testing GUI-s such as retry()
and waitFor()
.
@ignore
Feature:
Scenario:
* karate.fork('C:/MyDir/my.exe')
* retry(5).window('Sign In')
* waitFor('#userid').input(testUser)
* input('#password', testPassword)
* click('#submit-btn')
Also see Conditional Start which is a more advanced version of the above flow, when the "Sign In" window title is different.
Note how you can inject variables from global config e.g. testUser
and testPassword
using Karate.
Finding Windows and dialogs is a critical aspect of UI automation and Karate makes easy the process of handling even dynamic Window titles or un-predictable Windows.
Here's a typical situation with some challenges, and the script that solves them:
- if the app is already running, don't start it
- the window title is un-predictable, it can be "MyApp" or "MYAPP"
- the app takes almost 20 seconds to start
- after the application starts, a modal dialog with the title "Tips on Startup" may or may not appear
* def windowName = '~MyApp|MYAPP'
* robot { window: '#(windowName)', fork: 'C:/Program Files (x86)/MyApp/myapp.exe', retryCount: 10 }
* windowOptional('Tips on Startup').locate('Close').click()
* window(windowName)
Explanation:
robot { window: '<name>' }
will not callfork
if the window was found to be already present- the
~
prefix means that Karate will use a regex (regular expression) match to find the window by title retryCount: 10
means that iffork
was executed, Karate will wait10 x 3000
milliseconds where3000
is the defaultretryDuration
windowOptional()
will do nothing if the window does not exist- note how the variable
windowName
can be used as an embedded expression or directly when within "round brackets", e.g.window(windowName)
- the last line makes sure that we switch back to the main window and make it "active"
Please refer to the available methods in Robot.java
. Most of them are "chainable". The built-in robot
JS object is where you script UI automation. It will be initialized only after the robot
keyword has been used to start / attach to a desktop window.
Any method on the Robot
type that returns Element
can be chained for convenience. Here is an example:
* locate('Taxpayer').click(20, 40)
This locates a UI control by name, and then within the bounds of that element, proceeds to click the mouse at an inner offset of 20 pixels(horizontal) and 40 pixels (vertical) from the top-left corner of the element.
Also see windowOptional()
for a good example of chaining a click()
after calling locate()
.
The following properties (Java getters) are available on an Element
instance:
parent
children
(returns a list / array ofElement
-s)firstChild
lastChild
nextSibling
previousSibling
This is convenient in some cases, for example:
* locate('SomeName').parent.click('Close')
* waitFor('//pane{Info}').children[3].click()
A call to window()
will set the current or "active" window and also return an object of type Window
(which extends Element
). So to set the window and restore()
it in one step you could do this:
* window('^Tax Organizer').restore()
As a convenience, all the methods on the robot
have been injected into the context as special (JavaScript) variables so you can omit the "robot.
" part and save a lot of typing. For example instead of:
* robot { window: '^Chrome', highlight: true }
* robot.input(Key.META + 't')
* robot.input('karate dsl' + Key.ENTER)
* robot.click('tams.png')
You can shorten all that to:
* robot { window: '^Chrome', highlight: true }
* input(Key.META + 't')
* input('karate dsl' + Key.ENTER)
* click('tams.png')
The above flow performs the following operations:
- finds an already open window where the name contains "Chrome"
- note that on Windows you may need to use "New Tab" instead
- enables "highlight" mode for ease of development / troubleshooting
- triggers keyboard events for [COMMAND + t] which will open a new browser tab
- on Windows this should be
Key.CONTROL
instead
- on Windows this should be
- triggers keyboard events for the input "karate dsl" and an ENTER key-press
- waits for a section of the screen defined by
tams.png
to appear - and clicks in the center of that region- Karate will try to use different scaling factors for an image match, for best results - try to use images that are the same resolution (or as close) as the desktop resolution
- if you run into issues, try re-taking a PNG capture of the area to click-on
Also see Image Locators
Just like Karate UI, the special keys are made available under the namespace Key
. You can see all the available codes here.
* input('karate dsl' + Key.ENTER)
Rarely used since basePath
would typically be set by the robot
options. But you can do this any time during a test to "switch". Note that classpath:
would typically resolve to src/test/java
.
* robot.basePath = 'classpath:some/package'
Images have to be in PNG format, and with the extension *.png
. Karate will attempt to find images that are smaller or larger to a certain extent. But for the best results, try to save images that are the same resolution as the application under test. Also see robot.basePath
* click('someimage.png')
So any string that ends with .png
will be treated as an "image locator".
You can optionally prefix a number and :
to the image path like this:
* click('5:someimage.png')
This number is a "strictness" factor, 1 for being the most strict and 10 (the default) for "normal". As of now, consider this experimental while we try to arrive at the values that will work for most real-life situations.
In case you find it really hard to get a "match", you can try providing values greater than 10 which means Karate will look for more "lenient" matches.
Tip: use the debugger and highlight()
or highlightAll()
to troubleshoot image matching.
Any string that starts with the {lang}
pattern will be treated as an OCR locator.
Karate uses the Tesseract OCR engine (v4.X). You will need to acquire data files for the language of your choice, e.g. English (eng
). You can choose between the options "tessdata", "tessdata-fast" and "tessdata-best" depending on the quality vs speed (and data-file size) compromise you are willing to make. So for example here is the English data file for "tessdata-best": link. You can download it and make it available in a directory called "tessdata" in the root directory of the project you are working in. To change the "tessdata" location, look at the tessData
configuration option.
So to find the text "Click Me" and click on it:
* click('{eng}Click Me')
A variation is that if the language-key is prefixed with a -
, the screen or element-region capture will be converted to a "negative" before OCR processing. This is useful in cases the text is in a light font against a dark background.
* click('{-eng}Dark Mode')
You can omit the language in which case the tessLang
configuration option will be used:
* click('{}Some Text')
The Element
has an extract()
method which can scrape out the text via OCR from the bounds of an Element position on the screen. Results may vary and include line-breaks and white-space, but you may be able to pull-off some string-contains comparisons:
* match locate('Some Pane').extract('eng') contains 'Search Results'
If you don't pass the language-key to the extract()
method like you see eng
above, the default tessLang
configured will be used:
* def text = locate('Some Pane').extract()
To extract the text from the whole screen (desktop), you can do this via the robot.root
API:
* def text = robot.root.extract()
For debugging and troubleshooting, there is an Element.debugExtract()
API. This will highlight all the words found within the given Element
. This is super-useful during a step-through debugger session.
Prefixing with a #
means using the "Automation ID" which may or may not be available depending on the application under test. And finding by "name" is the default, if the first character is not /
or #
. As a convenience, you can use the ^
prefix for a name "contains" match and ~
for a name regular-expression match.
But the most useful locator strategy is an XPath-like one. While it does not support all the extensions and functions in proper XPath, it is designed to make selecting elements super-easy and for improved performance, you can "scope" to parent element / paths and make these selectors robust.
Here are examples:
Locator | Description |
---|---|
click('Click Me') |
the first control (any type) where the name is exactly: "Click Me" |
click('//*{Click Me}') |
the "long-form" of the above. Try to use more specific path-selectors for better performance. |
click('^Click') |
the first control (any type) where the name contains: "Click" |
click('//*{^Click}') |
the "long-form" of the above. Try to use more specific path-selectors for better performance. |
click('//button{Click Me}') |
the first button where the name is equal to "Click Me" |
click('/pane[2]/button') |
absolute path, the second pane on the active window, and the first button on it |
click('//pane/*/button') |
other examples of what you can use, the * will match any control type |
click('//button.TButton{^Click}') |
the first button with a "class name" of "TButton" and the name contains "Click" |
click('//.TButton/{^Click}') |
a different example, so you can use only a "class name" or element name, note the position of the / |
Use a tool like Inspect.exe to identify the properties needed for automation from an application window.
By default, all the locators above would be from the currently active Window or Element, but you can force the search from the Desktop onwards like this:
* def allPopUps = locateAll('/root//window')
This is of course extremely useful in some situations.
The control "type" is case-insensitive. Examples are edit
, button
and checkbox
. The complete list of types can be found here. You don't have to rely on the LocalizedControlType
shown in tools such as "Inspect.exe" because Karate uses the ControlType
.
Similarly, the "class name" is not case-sensitive. This can be useful in some cases, for example in Delphi you can use values such as TScrollBox
and TEdit
.
Also see locateAll()
for ways to find the n-th control on a page that matches a locator and do something with it.
Here is an example that operates the Calculator app on Windows.
Feature: windows calculator
Scenario:
* robot { window: 'Calculator', fork: 'calc' }
* click('Clear')
* click('One')
* click('Plus')
* click('Two')
* click('Equals')
* match locate('#CalculatorResults').name == 'Display is 3'
* screenshot()
* click('Close Calculator')
Please refer to the documentation for the Karate browser-automation syntax for retry()
. It is the same for Karate Robot.
Convenient to wait for an element. Try to use this only when necessary, for example once a Window loads, all components within it would be immediately accessible without needing to "wait". So you can use a waitFor()
only for the first element within that window that you need to act upon:
* waitFor('Add New').click()
Wait for the JS function to evaluate to true
. Will poll using the retry() settings configured.
* def fun = function(){ return optional('Close').enabled }
* waitUntil(fun)
This gives you a lot of flexibility. Note that Karate can call OS commands using karate.exec()
or even make HTTP API requests. You can even call Java code if required.
Will return a "real" Element
if it exists or a "fake" object if it does not.
This is useful to perform conditional logic as one-liners:
* optional('//pane{Warning}').locate('Close').click()
Note that optional()
, exists()
, windowExists()
and windowOptional()
are a little different from the other actions such as locate()
, because they will not honor any intent to retry()
and immediately check the active window for the given locator. This is important because they are designed to answer the question: "does the element exist in the application right now ?"
If you want to wait but move on even if something was not found, you can use waitForOptional()
and waitForWindowOptional()
.
Returns an "optional" Window
object and will not update the "active" window. You can call activate()
on the returned Window
object to set it as the current, typically after checking that it exists (by using the present
property getter).
Here's an example of clicking a button within an "optional" modal pop-up only if it exists:
* windowOptional('Tips on Startup').locate('Close').click()
Note that on the Element
API, there is no click(locator)
API, but you can chain a locate()
and then call click()
.
Also see finding windows and conditional start.
Useful for those cases, where you want to wait for something that may not appear. Note that since the retry() count defaults to 3, you may want to tone-down the wait like this:
* retry(1).waitForOptional('Schrodingers Pane')
Just like windowOptional()
but can retry and move on:
* retry(1).waitForWindowOptional('^My Window')
Similar to optional()
but returns a boolean, convenient to use with the assert
keyword:
* assert exists('//pane{Main}')
The above is functionally equivalent to:
* assert optional('//pane{Main}').present
Returns true
or false
and will not set or "activate" the current window.
See also windowOptional()
.
Sets focus (and activates as "current") to the window by title, prefix with ^
for a string "contains" match or ~
for a regular-expression match. The "active" window will be used as the root of all operations such as locating controls.
Also see finding windows.
Short-cut to activate any Element
by locator. The difference from window()
is that this uses the Windows Locator system to find elements. If you do this at the start of a test without a window activated or if robot.active
is null
, the search-root will be robot.root
or the "Desktop". This can be useful in rare cases where the application under test lives under a "pane" Control Type instead of a "window".
* activate('//pane{Some Name}')
Gets the root of all available objects as an Element
reference. Useful when you want to search within the entire "Desktop" on Windows. Try to avoid "any-depth" e.g. robot.root.locate('//button')
kinds of searches on this element, and stick to things like robot.root.locate('/pane')
.
Note that using the /root
as the start of a locator can be used instead.
Returns the currently "active" element, typically set after a previous call to window()
or windowOptional()
. This will fail the test if a window (or any other Element
type) has not been "activated".
The Element
API has an activate()
method, so you can also do this:
* robot.root.locate('//pane{Some Name}').activate()
But it can be more convenient to use the below pattern, as active
is also a "setter" property on the robot
object:
* def e = locate('//{Some Name}')
* robot.active = e
Returns the Element
that currently has "focus" on the screen, no matter where or what type it is.
Returns a Location
instance that represents the mouse position, useful for troubleshooting in debug mode.
* def region = locate('foo').region
* region.inset(30, region.height / 6).move()
* robot.location.highlight()
# you can also construct a location
* robot.location(885, 406).highlight()
Constructs a Region
instance that can be used for debugging:
* def region = robot.region({ x: 100, y: 100, width: 100, height: 100 })
* region.debugCapture()
Returns the clipboard contents as text. This can be convenient to validate text in non-standard controls where Element.value
does not work.
# assume that a control containing text has focus
* input(Key.CONTROL + 'a')
* input(Key.CONTROL + 'c')
* match robot.clipboard == 'hello world'
Returns an array of all windows that exist on the desktop. This is convenient to quickly list all window names on the console, especially in debug mode. Also you could loop over all of them and call methods on the Element
or Window
instance.
* print robot.allWindows
Note that this is equivalent to the below, but with the difference that the returned elements are of type Window
for the above but are of type Element
for the below.
* print robot.root.locateAll('//window')
Also note that you can use Element.children
to get all direct children of any element:
* print robot.root.children
Rarely used, but when you want to just instantiate an Element
instance, typically when you are writing custom re-usable functions, or using an element as a "waypoint" to access other elements in a large, complex "tree".
* def e = locate('//pane{Some Pane}')
# now you can have multiple steps refer to "e"
* e.locate('//edit').input('foo')
* e.locate('//button').click()
Note that locate()
will fail the test if the element was not found. Think of it as just like waitFor()
but without the "wait" part.
Also see exists()
and optional()
.
This can be convenient if you need to loop over a bunch of element and do something. More useful is the ability to target a single item by index. For example, here is how you can find the second control with the name "Address" and click on it:
* locateAll('Address')[1].click()
Designed for use within a debug session, very convenient to interactively locate an element by trial and error.
* highlight('Some Name')
Note that the Element
API also has an activate()
method so you can do things like this in debug mode:
* robot.active.highlight()
Which will highlight the currently "active" element.
Like highlight()
and super convenient, you can try doing the following to show all buttons on a window !
* highlightAll('//button')
Defaults to a "left-click", pass 1, 2 or 3 as the argument to specify left, middle or right mouse button.
* click('Continue')
You can also click on any X and Y co-ordinate. Note that (0, 0) is the top, left of the screen.
* click(100, 200)
Performs a double-click at the current mouse position. Note that you can also chain this off an Element
.
Performs a right-click at the current mouse position.
Argument can be x, y
co-ordinates or typically the name of an image, which will be looked for in the basePath
. Note that relative paths will work.
Not recommended unless un-avoidable. Argument is time in milliseconds.
The single string argument can include special characters such as a line-feed:
* input('karate dsl' + Key.ENTER)
If you need to simulate key combinations, just ensure that the modifier keys such as Key.CTRL
, Key.ALT
are the first in the sequence (they will be auto-released at the end):
* input(Key.META + 't')
For convenience, you can pass an array of strings as a single argument, convenient for a lot of "brute force" keyboard navigation:
* input([Key.DOWN, Key.RIGHT, Key.ENTER])
And you can also add a second argument to the above case, convenient when you want to slow-down things because for e.g. Karate is too fast for the UI to perform validations or refresh:
* input([Key.DOWN, Key.RIGHT, Key.ENTER], 100)
And a string argument is also supported in which case each the delay is before each character.
* input('type this slowly', 100)
To select from a drop-down, for elements with a control-type of itemtype
. The pattern is to get a reference to the item and call select()
on it:
* locate('Some Text').select()
A mouse press that will be held down, useful for simulating a drag and drop operation.
Release mouse button, useful for simulating a drag and drop operation.
Will save a screenshot of the viewport (entire desktop), which will appear in the HTML report. Note that this returns a byte-array of the PNG image.
* screenshot()
Note that you can call this on an Element
instance if you really want to "zoom in":
* locate('//pane{Tree}').screenshot()
This will screenshot only the active control, typically the window having focus.
* screenshotActive()
Note that this is a convenience short-cut for:
* robot.active.screenshot()
A useful pattern is to run an app-boot and sign-in sequence only if the main application window is not present. Note how karate.abort()
can be used to conditionaly exit a "called" feature early.
This is also a great example of using windowOptional()
.
* def mainWindowName = '^MyApp'
* robot {}
* def mainWindow = windowOptional(mainWindowName)
* if (mainWindow.present) { mainWindow.activate(); karate.abort() }
* karate.fork('C:/myapp/app.exe')
* retry(10).window('Sign In')
* waitFor('#userid').input('[email protected]')
* input('#password', 'Test@123')
* click('#submit-btn')
* retry(10).window(lacWindowName)
And the "calling feature" can directly jump into the flow to be tested after making a call
to the above:
Feature: main
Background:
* call read('start.feature')
Scenario:
# main flow
* click('#some-btn')
Also see finding windows.
Some of the Karate JS API that are more relevant to desktop or Windows app testing are described here:
This will return the OS specific path form, for example on Windows, back-slash characters will be used. This is useful to generate file-names needed to input()
into file-chooser dialogs and the like.
Here is an example of creating a random file-name on Windows. Also refer to commonly needed utilities. The reason we use target
here is that because it is the standard build-output directory where temp-files and reports are created.
* def random = function(){ return java.lang.System.currentTimeMillis() + '' }
* def dataFolder = function(){ return karate.toAbsolutePath('file:target') }
* def tempTextFile = function(){ return dataFolder() + '\\' + random() + '.txt' }
The multiple functions in one file pattern can be used to set up these common utilities, and now within a feature-file you can do this:
* def tempFile = tempTextFile()
Can execute any OS command, wait for it it terminate, and return the system / console output as a string.
Also see karate.fork()
The karate-robot
for Windows is around 150 MB and hence not distributed with the ZIP Release. But you can download it separately, and it can be easily added to the classpath. You can find instructions here.
For MacOSX, Linux, Android or iOS, you can build a stand-alone JAR by following the Developer Guide.