Skip to content

Commit

Permalink
Remove the check for internet connection (elastic#1517)
Browse files Browse the repository at this point in the history
Remove the check for Rally being online. Let it fail when trying to update the repositories instead.
  • Loading branch information
j-bennet authored Jul 26, 2022
1 parent 20f7600 commit 8e2f733
Show file tree
Hide file tree
Showing 10 changed files with 89 additions and 96 deletions.
5 changes: 2 additions & 3 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ system
This section contains global information for the current benchmark environment. This information should be identical on all machines where Rally is installed.

* ``env.name`` (default: "local"): The name of this benchmark environment. It is used as meta-data in metrics documents if an Elasticsearch metrics store is configured. Only alphanumeric characters are allowed.
* ``probing.url`` (default: "https://github.com"): This URL is used by Rally to check for a working Internet connection. It's useful to change this to an internal server if all data are hosted inside the corporate network and connections to the outside world are prohibited.
* ``available.cores`` (default: number of logical CPU cores): Determines the number of available CPU cores. Rally aims to create one asyncio event loop per core and will distribute clients evenly across event loops.
* ``async.debug`` (default: false): Enables debug mode on Rally's internal `asyncio event loop <https://docs.python.org/3/library/asyncio-eventloop.html#enabling-debug-mode>`_. This setting is mainly intended for troubleshooting.
* ``passenv`` (default: "PATH"): A comma-separated list of environment variable names that should be passed to the Elasticsearch process.
Expand Down Expand Up @@ -140,7 +139,7 @@ Rally downloads all necessary data automatically for you:
* Track meta-data from Github
* Track data from an S3 bucket

Hence, it needs to connect via http(s) to the outside world. If you are behind a corporate proxy you need to configure Rally and git. As many other Unix programs, Rally relies that the HTTP proxy URL is available in the environment variable ``http_proxy`` (note that this is in lower-case). Hence, you should add this line to your shell profile, e.g. ``~/.bash_profile``::
Hence, it needs to connect via http(s) to the outside world. If you are behind a corporate proxy you need to configure Rally and git. As many other Unix programs, Rally relies that the proxy URL is available in the environment variables ``http_proxy`` (lowercase only), ``https_proxy`` or ``HTTPS_PROXY``, ``all_proxy`` or ``ALL_PROXY``. Hence, you should add this line to your shell profile, e.g. ``~/.bash_profile``::

export http_proxy=http://proxy.acme.org:8888/

Expand All @@ -158,7 +157,7 @@ If the configuration is correct, git will clone this repository. You can delete

To verify that Rally will connect via the proxy server you can check the log file. If the proxy server is configured successfully, Rally will log the following line on startup::

Rally connects via proxy URL [http://proxy.acme.org:3128/] to the Internet (picked up from the environment variable [http_proxy]).
Connecting via proxy URL [http://proxy.acme.org:3128/] to the Internet (picked up from the environment variable [http_proxy]).


.. note::
Expand Down
2 changes: 1 addition & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,6 @@ No. Rally does not collect or send any usage data and also the complete source c
Do I need an Internet connection?
---------------------------------

You do NOT need Internet access on any node of your Elasticsearch cluster but the machine where you start Rally needs an Internet connection to download track data sets and Elasticsearch distributions. After it has downloaded all data, an Internet connection is not required anymore and you can specify ``--offline``. If Rally detects no active Internet connection, it will automatically enable offline mode and warn you.
You do NOT need Internet access on any node of your Elasticsearch cluster but the machine where you start Rally needs an Internet connection to download track data sets and Elasticsearch distributions. After it has downloaded all data, an Internet connection is not required anymore and you can specify ``--offline``.

We have a dedicated documentation page for :doc:`running Rally offline </offline>` which should cover all necessary details.
13 changes: 1 addition & 12 deletions docs/offline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,7 @@ We provide a special offline installation package. Follow the :ref:`offline inst
Command Line Usage
------------------

Rally will automatically detect upon startup that no Internet connection is available and print the following warning::

[WARNING] No Internet connection detected. Automatic download of track data sets etc. is disabled.

It detects this by trying to connect to ``https://github.com``. If you want it to probe against a different HTTP endpoint (e.g. a company-internal git server) you need to add a configuration property named ``probing.url`` in the ``system`` section of Rally's configuration file at ``~/.rally/rally.ini``. Specify ``--offline`` if you want to disable probing entirely.

Example of ``system`` section with custom probing url in ``~/.rally/rally.ini``::

[system]
env.name = local
probing.url = https://www.company-internal-server.com/

Rally will attempt to update tracks and teams repositories configured in ``rally.ini``, unless it's being run with the ``--offline`` flag.

Using tracks
------------
Expand Down
28 changes: 21 additions & 7 deletions esrally/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,32 @@
# under the License.


MSG_NO_CONNECTION = "You may need to specify --offline if running without Internet connection."


class RallyError(Exception):
"""
Base class for all Rally exceptions
"""

def __init__(self, message, cause=None):
super().__init__(message, cause)
super().__init__(message)
self.message = message
self.cause = cause

def __repr__(self):
return self.message

def __str__(self):
return self.message
@property
def full_message(self):
msg = str(self.message)
nesting = 0
current_exc = self
while hasattr(current_exc, "cause") and current_exc.cause:
nesting += 1
current_exc = current_exc.cause
if hasattr(current_exc, "message"):
msg += "\n%s%s" % ("\t" * nesting, current_exc.message)
else:
msg += "\n%s%s" % ("\t" * nesting, str(current_exc))
return msg


class LaunchError(RallyError):
Expand Down Expand Up @@ -68,7 +79,10 @@ class DataError(RallyError):


class SupplyError(RallyError):
pass
def __init__(self, message, cause=None):
super().__init__(message, cause)
if MSG_NO_CONNECTION not in self.full_message:
self.message += f" {MSG_NO_CONNECTION}"


class BuildError(RallyError):
Expand Down
25 changes: 5 additions & 20 deletions esrally/rally.py
Original file line number Diff line number Diff line change
Expand Up @@ -915,6 +915,7 @@ def dispatch_sub_command(arg_parser, args, cfg):

cfg.add(config.Scope.application, "system", "quiet.mode", args.quiet)
cfg.add(config.Scope.application, "system", "offline.mode", args.offline)
logger = logging.getLogger(__name__)

try:
if sub_command == "compare":
Expand Down Expand Up @@ -1013,27 +1014,17 @@ def dispatch_sub_command(arg_parser, args, cfg):
raise exceptions.SystemSetupError(f"Unknown subcommand [{sub_command}]")
return ExitStatus.SUCCESSFUL
except (exceptions.UserInterrupted, KeyboardInterrupt) as e:
logging.getLogger(__name__).info("User has cancelled the subcommand [%s].", sub_command, exc_info=e)
logger.info("User has cancelled the subcommand [%s].", sub_command, exc_info=e)
console.info("Aborted %s. %s" % (sub_command, e))
return ExitStatus.INTERRUPTED
except exceptions.RallyError as e:
logging.getLogger(__name__).exception("Cannot run subcommand [%s].", sub_command)
msg = str(e.message)
nesting = 0
while hasattr(e, "cause") and e.cause:
nesting += 1
e = e.cause
if hasattr(e, "message"):
msg += "\n%s%s" % ("\t" * nesting, e.message)
else:
msg += "\n%s%s" % ("\t" * nesting, str(e))

console.error("Cannot %s. %s" % (sub_command, msg))
logger.exception("Cannot run subcommand [%s].", sub_command)
console.error("Cannot %s. %s" % (sub_command, e.full_message))
console.println("")
print_help_on_errors()
return ExitStatus.ERROR
except BaseException as e:
logging.getLogger(__name__).exception("A fatal error occurred while running subcommand [%s].", sub_command)
logger.exception("A fatal error occurred while running subcommand [%s].", sub_command)
console.error("Cannot %s. %s." % (sub_command, e))
console.println("")
print_help_on_errors()
Expand Down Expand Up @@ -1075,12 +1066,6 @@ def main():
logger.debug("Command line arguments: %s", args)
# Configure networking
net.init()
if not args.offline:
probing_url = cfg.opts("system", "probing.url", default_value="https://github.com", mandatory=False)
if not net.has_internet_connection(probing_url):
console.warn("No Internet connection detected. Specify --offline to run without it.", logger=logger)
sys.exit(0)
logger.info("Detected a working Internet connection.")

def _trap(function, path, exc_info):
if exc_info[0] == FileNotFoundError:
Expand Down
57 changes: 25 additions & 32 deletions esrally/utils/net.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,26 +27,35 @@
from esrally import exceptions
from esrally.utils import console, convert

__HTTP = None
_HTTP = None
_HTTPS = None


def init():
logger = logging.getLogger(__name__)
global __HTTP
proxy_url = os.getenv("http_proxy")
def __proxy_manager_from_env(env_var, logger):
proxy_url = os.getenv(env_var.lower()) or os.getenv(env_var.upper())
if not proxy_url:
env_var = "all_proxy"
proxy_url = os.getenv(env_var) or os.getenv(env_var.upper())
if proxy_url and len(proxy_url) > 0:
parsed_url = urllib3.util.parse_url(proxy_url)
logger.info("Connecting via proxy URL [%s] to the Internet (picked up from the env variable [http_proxy]).", proxy_url)
__HTTP = urllib3.ProxyManager(
logger.info("Connecting via proxy URL [%s] to the Internet (picked up from the environment variable [%s]).", proxy_url, env_var)
return urllib3.ProxyManager(
proxy_url,
cert_reqs="CERT_REQUIRED",
ca_certs=certifi.where(),
# appropriate headers will only be set if there is auth info
proxy_headers=urllib3.make_headers(proxy_basic_auth=parsed_url.auth),
)
else:
logger.info("Connecting directly to the Internet (no proxy support).")
__HTTP = urllib3.PoolManager(cert_reqs="CERT_REQUIRED", ca_certs=certifi.where())
logger.info("Connecting directly to the Internet (no proxy support) for [%s].", env_var)
return urllib3.PoolManager(cert_reqs="CERT_REQUIRED", ca_certs=certifi.where())


def init():
logger = logging.getLogger(__name__)
global _HTTP, _HTTPS
_HTTP = __proxy_manager_from_env("http_proxy", logger)
_HTTPS = __proxy_manager_from_env("https_proxy", logger)


class Progress:
Expand Down Expand Up @@ -174,7 +183,7 @@ def download_from_bucket(blobstore, url, local_path, expected_size_in_bytes=None


def download_http(url, local_path, expected_size_in_bytes=None, progress_indicator=None):
with __http().request(
with _request(
"GET", url, preload_content=False, enforce_content_length=True, retries=10, timeout=urllib3.Timeout(connect=45, read=240)
) as r, open(local_path, "wb") as out_file:
if r.status > 299:
Expand Down Expand Up @@ -250,32 +259,16 @@ def download(url, local_path, expected_size_in_bytes=None, progress_indicator=No


def retrieve_content_as_string(url):
with __http().request("GET", url, timeout=urllib3.Timeout(connect=45, read=240)) as response:
with _request("GET", url, timeout=urllib3.Timeout(connect=45, read=240)) as response:
return response.read().decode("utf-8")


def has_internet_connection(probing_url):
logger = logging.getLogger(__name__)
try:
# We try to connect to Github by default. We use that to avoid touching too much different remote endpoints.
logger.debug("Checking for internet connection against [%s]", probing_url)
# We do a HTTP request here to respect the HTTP proxy setting. If we'd open a plain socket connection we circumvent the
# proxy and erroneously conclude we don't have an Internet connection.
response = __http().request("GET", probing_url, timeout=10.0, retries=8) # wait up to 90s, 9 requests in total
status = response.status
logger.debug("Probing result is HTTP status [%s]", str(status))
return status == 200
except KeyboardInterrupt:
raise
except BaseException:
logger.info("Could not detect a working Internet connection", exc_info=True)
return False


def __http():
if not __HTTP:
def _request(method, url, **kwargs):
if not _HTTP or not _HTTPS:
init()
return __HTTP
parsed_url = urllib3.util.parse_url(url)
manager = _HTTPS if parsed_url.scheme == "https" else _HTTP
return manager.request(method, url, **kwargs)


def resolve(hostname_or_ip):
Expand Down
6 changes: 1 addition & 5 deletions esrally/utils/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,10 @@ def run_subprocess(command_line):
return subprocess.call(command_line, shell=True)


def run_subprocess_with_output(command_line, env_vars=None):
def run_subprocess_with_output(command_line, env=None):
logger = logging.getLogger(__name__)
logger.debug("Running subprocess [%s] with output.", command_line)
command_line_args = shlex.split(command_line)
env = None
if env_vars:
env = os.environ.copy()
env.update(env_vars)
with subprocess.Popen(command_line_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=env) as command_line_process:
has_output = True
lines = []
Expand Down
7 changes: 5 additions & 2 deletions esrally/utils/repo.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,11 @@ def __init__(self, remote_url, root_dir, repo_name, resource_name, offline, fetc
else:
try:
git.fetch(src=self.repo_dir, remote="origin")
except exceptions.SupplyError:
console.warn("Could not update %s. Continuing with your locally available state." % self.resource_name)
except exceptions.SupplyError as e:
console.warn(
"Could not update %s. Continuing with your locally available state. Original error: %s\n"
% (self.resource_name, e.message)
)
else:
if not git.is_working_copy(self.repo_dir):
if io.exists(self.repo_dir):
Expand Down
16 changes: 12 additions & 4 deletions it/basic_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import os
import tempfile

import it
from esrally.utils import process
Expand All @@ -37,7 +39,13 @@ def test_run_with_help(cfg):

@it.rally_in_mem
def test_run_without_http_connection(cfg):
cmd = it.esrally_command_line_for(cfg, "list races")
output = process.run_subprocess_with_output(cmd, {"http_proxy": "http://invalid"})
expected = "No Internet connection detected. Specify --offline"
assert expected in "\n".join(output)
cmd = it.esrally_command_line_for(cfg, "list tracks")
with tempfile.TemporaryDirectory() as tmpdir:
env = os.environ.copy()
env["http_proxy"] = "http://invalid"
env["https_proxy"] = "http://invalid"
# make sure we don't have any saved state
env["RALLY_HOME"] = tmpdir
output = process.run_subprocess_with_output(cmd, env=env)
expected = "[ERROR] Cannot list"
assert expected in "\n".join(output)
26 changes: 16 additions & 10 deletions it/proxy_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,20 +75,26 @@ def test_run_with_direct_internet_connection(cfg, http_proxy, fresh_log_file):


@it.rally_in_mem
def test_anonymous_proxy_no_connection(cfg, http_proxy, fresh_log_file):
def test_anonymous_proxy_no_connection(cfg, http_proxy):
env = dict(os.environ)
env["http_proxy"] = http_proxy.anonymous_url
assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0
assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.anonymous_url}] to the Internet")
# unauthenticated proxy access is prevented
assert_log_line_present(fresh_log_file, "No Internet connection detected. Specify --offline")
env["https_proxy"] = http_proxy.anonymous_url
lines = process.run_subprocess_with_output(it.esrally_command_line_for(cfg, "list tracks"), env=env)
output = "\n".join(lines)
# there should be a warning because we can't connect
assert "[WARNING] Could not update tracks." in output
# still, the command succeeds because of local state
assert "[INFO] SUCCESS" in output


@it.rally_in_mem
def test_authenticated_proxy_user_can_connect(cfg, http_proxy, fresh_log_file):
def test_authenticated_proxy_user_can_connect(cfg, http_proxy):
env = dict(os.environ)
env["http_proxy"] = http_proxy.authenticated_url
assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0
assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.authenticated_url}] to the Internet")
# authenticated proxy access is allowed
assert_log_line_present(fresh_log_file, "Detected a working Internet connection")
env["https_proxy"] = http_proxy.authenticated_url
lines = process.run_subprocess_with_output(it.esrally_command_line_for(cfg, "list tracks"), env=env)
output = "\n".join(lines)
# rally should be able to connect, no warning
assert "[WARNING] Could not update tracks." not in output
# the command should succeed
assert "[INFO] SUCCESS" in output

0 comments on commit 8e2f733

Please sign in to comment.