It is Ruby clean and high-level API to Chrome. Runs headless by default, but you can configure it to run in a headful mode. All you need is Ruby and Chrome or Chromium. Ferrum connects to the browser by CDP protocol and there's no Selenium/WebDriver/ChromeDriver dependency. The emphasis was made on a raw CDP protocol because Chrome allows you to do so many things that are barely supported by WebDriver because it should have consistent design with other browsers.
-
Cuprite is a pure Ruby driver for Capybara based on Ferrum. If you are going to crawl sites you better use Ferrum or Vessel because you crawl, not test.
-
Vessel high-level web crawling framework based on Ferrum. It looks like Scrapy except that it uses a real browser in order to grab data.
Web design by Evrone, what else we build with Ruby on Rails, what else we do at Evrone.
If you like this project, please consider to become a backer on Patreon.
- Install
- Examples
- Docker
- Customization
- Navigation
- Finders
- Screenshots
- Network
- Mouse
- Keyboard
- Cookies
- Headers
- JavaScript
- Frames
- Frame
- Dialog
- Thread safety
- License
There's no official Chrome or Chromium package for Linux don't install it this
way because it's either outdated or unofficial, both are bad. Download it from
official source.
Chrome binary should be in the PATH
or BROWSER_PATH
or you can pass it as an
option to browser instance see :browser_path
in
Customization.
Add this to your Gemfile
and run bundle install
.
gem "ferrum"
Navigate to a website and save a screenshot:
browser = Ferrum::Browser.new
browser.goto("https://google.com")
browser.screenshot(path: "google.png")
browser.quit
Interact with a page:
browser = Ferrum::Browser.new
browser.goto("https://google.com")
input = browser.at_xpath("//div[@id='searchform']/form//input[@type='text']")
input.focus.type("Ruby headless driver for Chrome", :Enter)
browser.at_css("a > h3").text # => "rubycdp/ferrum: Ruby Chrome/Chromium driver - GitHub"
browser.quit
Evaluate some JavaScript and get full width/height:
browser = Ferrum::Browser.new
browser.goto("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara")
width, height = browser.evaluate <<~JS
[document.documentElement.offsetWidth,
document.documentElement.offsetHeight]
JS
# => [1024, 1931]
browser.quit
Do any mouse movements you like:
# Trace a 100x100 square
browser = Ferrum::Browser.new
browser.goto("https://google.com")
browser.mouse
.move(x: 0, y: 0)
.down
.move(x: 0, y: 100)
.move(x: 100, y: 100)
.move(x: 100, y: 0)
.move(x: 0, y: 0)
.up
browser.quit
In docker as root you must pass the no-sandbox browser option:
Ferrum::Browser.new(browser_options: { 'no-sandbox': nil })
You can customize options with the following code in your test setup:
Ferrum::Browser.new(options)
- options
Hash
:headless
(Boolean) - Set browser as headless or not,true
by default.:window_size
(Array) - The dimensions of the browser window in which to test, expressed as a 2-element array, e.g. [1024, 768]. Default: [1024, 768]:extensions
(Array[String | Hash]) - An array of paths to files or JS source code to be preloaded into the browser e.g.:["/path/to/script.js", { source: "window.secret = 'top'" }]
:logger
(Object responding toputs
) - When present, debug output is written to this object.:slowmo
(Integer | Float) - Set a delay to wait before sending command. Usefull companion of headless option, so that you have time to see changes.:timeout
(Numeric) - The number of seconds we'll wait for a response when communicating with browser. Default is 5.:js_errors
(Boolean) - When true, JavaScript errors get re-raised in Ruby.:browser_name
(Symbol) -:chrome
by default, only experimental support for:firefox
for now.:browser_path
(String) - Path to Chrome binary, you can also set ENV variable asBROWSER_PATH=some/path/chrome bundle exec rspec
.:browser_options
(Hash) - Additional command line options, see them all e.g.{ "ignore-certificate-errors" => nil }
:port
(Integer) - Remote debugging port for headless Chrome:host
(String) - Remote debugging address for headless Chrome:url
(String) - URL for a running instance of Chrome. If this is set, a browser process will not be spawned.:process_timeout
(Integer) - How long to wait for the Chrome process to respond on startup:ws_max_receive_size
(Integer) - How big messages to accept from Chrome over the web socket, in bytes. Defaults to 64MB. Incoming messages larger than this will cause aFerrum::DeadBrowserError
.
Navigate page to.
- url
String
The url should include scheme unless you setbase_url
when configuring driver.
browser.goto("https://github.com/")
Navigate to the previous page in history.
browser.goto("https://github.com/")
browser.at_xpath("//a").click
browser.back
Navigate to the next page in history.
browser.goto("https://github.com/")
browser.at_xpath("//a").click
browser.back
browser.forward
Reload current page.
browser.goto("https://github.com/")
browser.refresh
Stop all navigations and loading pending resources on the page
browser.goto("https://github.com/")
browser.stop
Find node by selector. Runs document.querySelector
within the document or
provided node.
- selector
String
- options
Hash
- :within
Node
|nil
- :within
browser.goto("https://github.com/")
browser.at_css("a[aria-label='Issues you created']") # => Node
Find nodes by selector. The method runs document.querySelectorAll
within the
document or provided node.
- selector
String
- options
Hash
- :within
Node
|nil
- :within
browser.goto("https://github.com/")
browser.css("a[aria-label='Issues you created']") # => [Node]
Find node by xpath.
- selector
String
- options
Hash
- :within
Node
|nil
- :within
browser.goto("https://github.com/")
browser.at_xpath("//a[@aria-label='Issues you created']") # => Node
Find nodes by xpath.
- selector
String
- options
Hash
- :within
Node
|nil
- :within
browser.goto("https://github.com/")
browser.xpath("//a[@aria-label='Issues you created']") # => [Node]
Returns current top window location href.
browser.goto("https://google.com/")
browser.current_url # => "https://www.google.com/"
Returns current top window title
browser.goto("https://google.com/")
browser.current_title # => "Google"
Returns current page's html.
browser.goto("https://google.com/")
browser.body # => '<html itemscope="" itemtype="http://schema.org/WebPage" lang="ru"><head>...
Saves screenshot on a disk or returns it as base64.
- options
Hash
- :path
String
to save a screenshot on the disk.:encoding
will be set to:binary
automatically - :encoding
Symbol
:base64
|:binary
you can set it to return image as Base64 - :format
String
"jpeg" | "png" - :quality
Integer
0-100 works for jpeg only - :full
Boolean
whether you need full page screenshot or a viewport - :selector
String
css selector for given element - :scale
Float
zoom in/out
- :path
browser.goto("https://google.com/")
# Save on the disk in PNG
browser.screenshot(path: "google.png") # => 134660
# Save on the disk in JPG
browser.screenshot(path: "google.jpg") # => 30902
# Save to Base64 the whole page not only viewport and reduce quality
browser.screenshot(full: true, quality: 60) # "iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uhUNAAAAAXNSR0IArs4c6Q...
Saves PDF on a disk or returns it as base64.
- options
Hash
-
:path
String
to save a pdf on the disk.:encoding
will be set to:binary
automatically -
:encoding
Symbol
:base64
|:binary
you can set it to return pdf as Base64 -
:landscape
Boolean
paper orientation. Defaults to false. -
:scale
Float
zoom in/out -
:format
symbol
standard paper sizes :letter, :legal, :tabloid, :ledger, :A0, :A1, :A2, :A3, :A4, :A5, :A6 -
:paper_width
Float
set paper width -
:paper_height
Float
set paper height -
See other native options you can pass
-
browser.goto("https://google.com/")
# Save to disk as a PDF
browser.pdf(path: "google.pdf", paper_width: 1.0, paper_height: 1.0) # => 14983
browser.network
Returns all information about network traffic as Network::Exchange
instance
which in general is a wrapper around request
, response
and error
.
browser.goto("https://github.com/")
browser.network.traffic # => [#<Ferrum::Network::Exchange, ...]
Page request of the main frame.
browser.goto("https://github.com/")
browser.network.request # => #<Ferrum::Network::Request...
Page response of the main frame.
browser.goto("https://github.com/")
browser.network.response # => #<Ferrum::Network::Response...
Contains the status code of the main page response (e.g., 200 for a
success). This is just a shortcut for response.status
.
browser.goto("https://github.com/")
browser.network.status # => 200
Waits for network idle or raises Ferrum::TimeoutError
error
- options
Hash
- :connections
Integer
how many connections are allowed for network to be idling,0
by default - :duration
Float
sleep for given amount of time and check again,0.05
by default - :timeout
Float
during what time we try to check idle,browser.timeout
by default
- :connections
browser.goto("https://example.com/")
browser.at_xpath("//a[text() = 'No UI changes button']").click
browser.network.wait_for_idle
Clear browser's cache or collected traffic.
- type
Symbol
it is either:traffic
or:cache
traffic = browser.network.traffic # => []
browser.goto("https://github.com/")
traffic.size # => 51
browser.network.clear(:traffic)
traffic.size # => 0
Set request interception for given options. This method is only sets request
interception, you should use on
callback to catch requests and abort or
continue them.
- options
Hash
- :pattern
String
* by default - :resource_type
Symbol
one of the resource types
- :pattern
browser = Ferrum::Browser.new
browser.network.intercept
browser.on(:request) do |request|
if request.match?(/bla-bla/)
request.abort
elsif request.match?(/lorem/)
request.respond(body: "Lorem ipsum")
else
request.continue
end
end
browser.goto("https://google.com")
If site uses authorization you can provide credentials using this method.
- options
Hash
- :type
Symbol
:server
|:proxy
site or proxy authorization - :user
String
- :password
String
- :type
browser.network.authorize(user: "login", password: "pass")
browser.goto("http://example.com/authenticated")
puts browser.network.status # => 200
puts browser.body # => Welcome, authenticated client
browser.mouse
Scroll page to a given x, y
- x
Integer
the pixel along the horizontal axis of the document that you want displayed in the upper left - y
Integer
the pixel along the vertical axis of the document that you want displayed in the upper left
browser.goto("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara")
browser.mouse.scroll_to(0, 400)
Click given coordinates, fires mouse move, down and up events.
- options
Hash
- :x
Integer
- :y
Integer
- :delay
Float
defaults to 0. Delay between mouse down and mouse up events - :button
Symbol
:left | :right, defaults to :left - :count
Integer
defaults to 1 - :modifiers
Integer
bitfield for key modifiers. Seekeyboard.modifiers
- :x
Mouse down for given coordinates.
- options
Hash
- :button
Symbol
:left | :right, defaults to :left - :count
Integer
defaults to 1 - :modifiers
Integer
bitfield for key modifiers. Seekeyboard.modifiers
- :button
Mouse up for given coordinates.
- options
Hash
- :button
Symbol
:left | :right, defaults to :left - :count
Integer
defaults to 1 - :modifiers
Integer
bitfield for key modifiers. Seekeyboard.modifiers
- :button
Mouse move to given x and y.
- options
Hash
- :x
Integer
- :y
Integer
- :steps
Integer
defaults to 1. Sends intermediate mousemove events.
- :x
browser.keyboard
Dispatches a keydown event.
- key
String
|Symbol
Name of key such as "a", :enter, :backspace
Dispatches a keyup event.
- key
String
|Symbol
Name of key such as "b", :enter, :backspace
Sends a keydown, keypress/input, and keyup event for each character in the text.
- text
String
|Array<String> | Array<Symbol>
A text to type into a focused element,[:Shift, "s"], "tring"
Returns bitfield for a given keys
- keys
Array<Symbol>
:alt | :ctrl | :command | :shift
browser.cookies
Returns cookies hash
browser.cookies.all # => {"NID"=>#<Ferrum::Cookies::Cookie:0x0000558624b37a40 @attributes={"name"=>"NID", "value"=>"...", "domain"=>".google.com", "path"=>"/", "expires"=>1583211046.575681, "size"=>178, "httpOnly"=>true, "secure"=>false, "session"=>false}>}
Returns cookie
- value
String
browser.cookies["NID"] # => <Ferrum::Cookies::Cookie:0x0000558624b67a88 @attributes={"name"=>"NID", "value"=>"...", "domain"=>".google.com", "path"=>"/", "expires"=>1583211046.575681, "size"=>178, "httpOnly"=>true, "secure"=>false, "session"=>false}>
Sets given values as cookie
- options
Hash
- :name
String
- :value
String
- :domain
String
- :expires
Integer
- :samesite
String
- :httponly
Boolean
- :name
browser.cookies.set(name: "stealth", value: "omg", domain: "google.com") # => true
Removes given cookie
- options
Hash
- :name
String
- :domain
String
- :url
String
- :name
browser.cookies.remove(name: "stealth", domain: "google.com") # => true
Removes all cookies for current page
browser.cookies.clear # => true
browser.headers
Get all headers
Set given headers. Eventually clear all headers and set given ones.
- headers
Hash
key-value pairs for example"User-Agent" => "Browser"
Adds given headers to already set ones.
- headers
Hash
key-value pairs for example"Referer" => "http://example.com"
Clear all headers.
Evaluate and return result for given JS expression
- expression
String
should be valid JavaScript - args
Object
you can pass arguments, though it should be a validNode
or a simple value.
browser.evaluate("[window.scrollX, window.scrollY]")
Evaluate asynchronous expression and return result
- expression
String
should be valid JavaScript - wait_time How long we should wait for Promise to resolve or reject
- args
Object
you can pass arguments, though it should be a validNode
or a simple value.
browser.evaluate_async(%(arguments[0]({foo: "bar"})), 5) # => { "foo" => "bar" }
Execute expression. Doesn't return the result
- expression
String
should be valid JavaScript - args
Object
you can pass arguments, though it should be a validNode
or a simple value.
browser.execute(%(1 + 1)) # => true
- options
Hash
- :url
String
- :path
String
- :content
String
- :type
String
-text/javascript
by default
- :url
browser.add_script_tag(url: "http://example.com/stylesheet.css") # => true
- options
Hash
- :url
String
- :path
String
- :content
String
- :url
browser.add_style_tag(content: "h1 { font-size: 40px; }") # => true
- enabled
Boolean
,true
by default
browser.bypass_csp # => true
browser.goto("https://github.com/ruby-concurrency/concurrent-ruby/blob/master/docs-source/promises.in.md")
browser.refresh
browser.add_script_tag(content: "window.__injected = 42")
browser.evaluate("window.__injected") # => 42
Returns all the frames current page have.
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
browser.frames # =>
# [
# #<Ferrum::Frame @id="C6D104CE454A025FBCF22B98DE612B12" @parent_id=nil @name=nil @state=:stopped_loading @execution_id=1>,
# #<Ferrum::Frame @id="C09C4E4404314AAEAE85928EAC109A93" @parent_id="C6D104CE454A025FBCF22B98DE612B12" @state=:stopped_loading @execution_id=2>,
# #<Ferrum::Frame @id="2E9C7F476ED09D87A42F2FEE3C6FBC3C" @parent_id="C6D104CE454A025FBCF22B98DE612B12" @state=:stopped_loading @execution_id=3>,
# ...
# ]
Returns page's main frame, the top of the tree and the parent of all frames.
Find frame by given options.
- options
Hash
- :id
String
- Unique frame's id that browser provides - :name
String
- Frame's name if there's one
- :id
browser.frame_by(id: "C6D104CE454A025FBCF22B98DE612B12")
Frame's unique id.
Parent frame id if this one is nested in another one.
Execution context id which is used by JS, each frame has it's own context in which JS evaluates.
If frame was given a name it should be here.
One of the states frame's in:
:started_loading
:navigated
:stopped_loading
Returns current frame's location href.
browser.goto("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
frame = browser.frames[1]
frame.url # => https://interactive-examples.mdn.mozilla.net/pages/tabbed/iframe.html
Returns current frame's title.
browser.goto("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
frame = browser.frames[1]
frame.title # => HTML Demo: <iframe>
If current frame is the main frame of the page (top of the tree).
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
frame = browser.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93")
frame.main? # => false
Returns current frame's top window location href.
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
frame = browser.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93")
frame.current_url # => "https://www.w3schools.com/tags/tag_frame.asp"
Returns current frame's top window title.
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
frame = browser.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93")
frame.current_title # => "HTML frame tag"
Returns current frame's html.
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
frame = browser.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93")
frame.body # => "<html><head></head><body></body></html>"
Returns current frame's doctype.
browser.goto("https://www.w3schools.com/tags/tag_frame.asp")
browser.main_frame.doctype # => "<!DOCTYPE html>"
Sets a content of a given frame.
- html
String
browser.goto("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
frame = browser.frames[1]
frame.body # <html lang="en"><head><style>body {transition: opacity ease-in 0.2s; }...
frame.set_content("<html><head></head><body><p>lol</p></body></html>")
frame.body # => <html><head></head><body><p>lol</p></body></html>
Accept dialog with given text or default prompt if applicable
- text
String
Dismiss dialog
browser = Ferrum::Browser.new
browser.on(:dialog) do |dialog|
if dialog.match?(/bla-bla/)
dialog.accept
else
dialog.dismiss
end
end
browser.goto("https://google.com")
Ferrum is fully thread-safe. You can create one browser or a few as you wish and start playing around using threads. Example below shows how to create a few pages which share the same context. Context is similar to an incognito profile but you can have more than one, think of it like it's independent browser session:
browser = Ferrum::Browser.new
context = browser.contexts.create
t1 = Thread.new(context) do |c|
page = c.create_page
page.goto("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara")
page.screenshot(path: "t1.png")
end
t2 = Thread.new(context) do |c|
page = c.create_page
page.goto("https://www.google.com/search?q=Ruby+static+typing")
page.screenshot(path: "t2.png")
end
t1.join
t2.join
context.dispose
browser.quit
or you can create two independent contexts:
browser = Ferrum::Browser.new
t1 = Thread.new(browser) do |b|
context = b.contexts.create
page = context.create_page
page.goto("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara")
page.screenshot(path: "t1.png")
context.dispose
end
t2 = Thread.new(browser) do |b|
context = b.contexts.create
page = context.create_page
page.goto("https://www.google.com/search?q=Ruby+static+typing")
page.screenshot(path: "t2.png")
context.dispose
end
t1.join
t2.join
browser.quit
Copyright 2018-2020 Machinio
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.