diff --git a/docs/configuration.rst b/docs/configuration.rst index 34a6a68f3..e0f75d712 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -19,7 +19,6 @@ system This section contains global information for the current benchmark environment. This information should be identical on all machines where Rally is installed. * ``env.name`` (default: "local"): The name of this benchmark environment. It is used as meta-data in metrics documents if an Elasticsearch metrics store is configured. Only alphanumeric characters are allowed. -* ``probing.url`` (default: "https://github.com"): This URL is used by Rally to check for a working Internet connection. It's useful to change this to an internal server if all data are hosted inside the corporate network and connections to the outside world are prohibited. * ``available.cores`` (default: number of logical CPU cores): Determines the number of available CPU cores. Rally aims to create one asyncio event loop per core and will distribute clients evenly across event loops. * ``async.debug`` (default: false): Enables debug mode on Rally's internal `asyncio event loop `_. This setting is mainly intended for troubleshooting. * ``passenv`` (default: "PATH"): A comma-separated list of environment variable names that should be passed to the Elasticsearch process. @@ -140,7 +139,7 @@ Rally downloads all necessary data automatically for you: * Track meta-data from Github * Track data from an S3 bucket -Hence, it needs to connect via http(s) to the outside world. If you are behind a corporate proxy you need to configure Rally and git. As many other Unix programs, Rally relies that the HTTP proxy URL is available in the environment variable ``http_proxy`` (note that this is in lower-case). Hence, you should add this line to your shell profile, e.g. ``~/.bash_profile``:: +Hence, it needs to connect via http(s) to the outside world. If you are behind a corporate proxy you need to configure Rally and git. As many other Unix programs, Rally relies that the proxy URL is available in the environment variables ``http_proxy`` (lowercase only), ``https_proxy`` or ``HTTPS_PROXY``, ``all_proxy`` or ``ALL_PROXY``. Hence, you should add this line to your shell profile, e.g. ``~/.bash_profile``:: export http_proxy=http://proxy.acme.org:8888/ @@ -158,7 +157,7 @@ If the configuration is correct, git will clone this repository. You can delete To verify that Rally will connect via the proxy server you can check the log file. If the proxy server is configured successfully, Rally will log the following line on startup:: - Rally connects via proxy URL [http://proxy.acme.org:3128/] to the Internet (picked up from the environment variable [http_proxy]). + Connecting via proxy URL [http://proxy.acme.org:3128/] to the Internet (picked up from the environment variable [http_proxy]). .. note:: diff --git a/docs/faq.rst b/docs/faq.rst index 2422a34f8..fd84f7f12 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -95,6 +95,6 @@ No. Rally does not collect or send any usage data and also the complete source c Do I need an Internet connection? --------------------------------- -You do NOT need Internet access on any node of your Elasticsearch cluster but the machine where you start Rally needs an Internet connection to download track data sets and Elasticsearch distributions. After it has downloaded all data, an Internet connection is not required anymore and you can specify ``--offline``. If Rally detects no active Internet connection, it will automatically enable offline mode and warn you. +You do NOT need Internet access on any node of your Elasticsearch cluster but the machine where you start Rally needs an Internet connection to download track data sets and Elasticsearch distributions. After it has downloaded all data, an Internet connection is not required anymore and you can specify ``--offline``. We have a dedicated documentation page for :doc:`running Rally offline ` which should cover all necessary details. diff --git a/docs/offline.rst b/docs/offline.rst index dd9191b66..46476456e 100644 --- a/docs/offline.rst +++ b/docs/offline.rst @@ -11,18 +11,7 @@ We provide a special offline installation package. Follow the :ref:`offline inst Command Line Usage ------------------ -Rally will automatically detect upon startup that no Internet connection is available and print the following warning:: - - [WARNING] No Internet connection detected. Automatic download of track data sets etc. is disabled. - -It detects this by trying to connect to ``https://github.com``. If you want it to probe against a different HTTP endpoint (e.g. a company-internal git server) you need to add a configuration property named ``probing.url`` in the ``system`` section of Rally's configuration file at ``~/.rally/rally.ini``. Specify ``--offline`` if you want to disable probing entirely. - -Example of ``system`` section with custom probing url in ``~/.rally/rally.ini``:: - - [system] - env.name = local - probing.url = https://www.company-internal-server.com/ - +Rally will attempt to update tracks and teams repositories configured in ``rally.ini``, unless it's being run with the ``--offline`` flag. Using tracks ------------ diff --git a/esrally/exceptions.py b/esrally/exceptions.py index dfc604ea1..91bdd09c2 100644 --- a/esrally/exceptions.py +++ b/esrally/exceptions.py @@ -16,21 +16,32 @@ # under the License. +MSG_NO_CONNECTION = "You may need to specify --offline if running without Internet connection." + + class RallyError(Exception): """ Base class for all Rally exceptions """ def __init__(self, message, cause=None): - super().__init__(message, cause) + super().__init__(message) self.message = message self.cause = cause - def __repr__(self): - return self.message - - def __str__(self): - return self.message + @property + def full_message(self): + msg = str(self.message) + nesting = 0 + current_exc = self + while hasattr(current_exc, "cause") and current_exc.cause: + nesting += 1 + current_exc = current_exc.cause + if hasattr(current_exc, "message"): + msg += "\n%s%s" % ("\t" * nesting, current_exc.message) + else: + msg += "\n%s%s" % ("\t" * nesting, str(current_exc)) + return msg class LaunchError(RallyError): @@ -68,7 +79,10 @@ class DataError(RallyError): class SupplyError(RallyError): - pass + def __init__(self, message, cause=None): + super().__init__(message, cause) + if MSG_NO_CONNECTION not in self.full_message: + self.message += f" {MSG_NO_CONNECTION}" class BuildError(RallyError): diff --git a/esrally/rally.py b/esrally/rally.py index 4ad144882..c47563086 100644 --- a/esrally/rally.py +++ b/esrally/rally.py @@ -915,6 +915,7 @@ def dispatch_sub_command(arg_parser, args, cfg): cfg.add(config.Scope.application, "system", "quiet.mode", args.quiet) cfg.add(config.Scope.application, "system", "offline.mode", args.offline) + logger = logging.getLogger(__name__) try: if sub_command == "compare": @@ -1013,27 +1014,17 @@ def dispatch_sub_command(arg_parser, args, cfg): raise exceptions.SystemSetupError(f"Unknown subcommand [{sub_command}]") return ExitStatus.SUCCESSFUL except (exceptions.UserInterrupted, KeyboardInterrupt) as e: - logging.getLogger(__name__).info("User has cancelled the subcommand [%s].", sub_command, exc_info=e) + logger.info("User has cancelled the subcommand [%s].", sub_command, exc_info=e) console.info("Aborted %s. %s" % (sub_command, e)) return ExitStatus.INTERRUPTED except exceptions.RallyError as e: - logging.getLogger(__name__).exception("Cannot run subcommand [%s].", sub_command) - msg = str(e.message) - nesting = 0 - while hasattr(e, "cause") and e.cause: - nesting += 1 - e = e.cause - if hasattr(e, "message"): - msg += "\n%s%s" % ("\t" * nesting, e.message) - else: - msg += "\n%s%s" % ("\t" * nesting, str(e)) - - console.error("Cannot %s. %s" % (sub_command, msg)) + logger.exception("Cannot run subcommand [%s].", sub_command) + console.error("Cannot %s. %s" % (sub_command, e.full_message)) console.println("") print_help_on_errors() return ExitStatus.ERROR except BaseException as e: - logging.getLogger(__name__).exception("A fatal error occurred while running subcommand [%s].", sub_command) + logger.exception("A fatal error occurred while running subcommand [%s].", sub_command) console.error("Cannot %s. %s." % (sub_command, e)) console.println("") print_help_on_errors() @@ -1075,12 +1066,6 @@ def main(): logger.debug("Command line arguments: %s", args) # Configure networking net.init() - if not args.offline: - probing_url = cfg.opts("system", "probing.url", default_value="https://github.com", mandatory=False) - if not net.has_internet_connection(probing_url): - console.warn("No Internet connection detected. Specify --offline to run without it.", logger=logger) - sys.exit(0) - logger.info("Detected a working Internet connection.") def _trap(function, path, exc_info): if exc_info[0] == FileNotFoundError: diff --git a/esrally/utils/net.py b/esrally/utils/net.py index bdd51ffc4..97f577a34 100644 --- a/esrally/utils/net.py +++ b/esrally/utils/net.py @@ -27,17 +27,19 @@ from esrally import exceptions from esrally.utils import console, convert -__HTTP = None +_HTTP = None +_HTTPS = None -def init(): - logger = logging.getLogger(__name__) - global __HTTP - proxy_url = os.getenv("http_proxy") +def __proxy_manager_from_env(env_var, logger): + proxy_url = os.getenv(env_var.lower()) or os.getenv(env_var.upper()) + if not proxy_url: + env_var = "all_proxy" + proxy_url = os.getenv(env_var) or os.getenv(env_var.upper()) if proxy_url and len(proxy_url) > 0: parsed_url = urllib3.util.parse_url(proxy_url) - logger.info("Connecting via proxy URL [%s] to the Internet (picked up from the env variable [http_proxy]).", proxy_url) - __HTTP = urllib3.ProxyManager( + logger.info("Connecting via proxy URL [%s] to the Internet (picked up from the environment variable [%s]).", proxy_url, env_var) + return urllib3.ProxyManager( proxy_url, cert_reqs="CERT_REQUIRED", ca_certs=certifi.where(), @@ -45,8 +47,15 @@ def init(): proxy_headers=urllib3.make_headers(proxy_basic_auth=parsed_url.auth), ) else: - logger.info("Connecting directly to the Internet (no proxy support).") - __HTTP = urllib3.PoolManager(cert_reqs="CERT_REQUIRED", ca_certs=certifi.where()) + logger.info("Connecting directly to the Internet (no proxy support) for [%s].", env_var) + return urllib3.PoolManager(cert_reqs="CERT_REQUIRED", ca_certs=certifi.where()) + + +def init(): + logger = logging.getLogger(__name__) + global _HTTP, _HTTPS + _HTTP = __proxy_manager_from_env("http_proxy", logger) + _HTTPS = __proxy_manager_from_env("https_proxy", logger) class Progress: @@ -174,7 +183,7 @@ def download_from_bucket(blobstore, url, local_path, expected_size_in_bytes=None def download_http(url, local_path, expected_size_in_bytes=None, progress_indicator=None): - with __http().request( + with _request( "GET", url, preload_content=False, enforce_content_length=True, retries=10, timeout=urllib3.Timeout(connect=45, read=240) ) as r, open(local_path, "wb") as out_file: if r.status > 299: @@ -250,32 +259,16 @@ def download(url, local_path, expected_size_in_bytes=None, progress_indicator=No def retrieve_content_as_string(url): - with __http().request("GET", url, timeout=urllib3.Timeout(connect=45, read=240)) as response: + with _request("GET", url, timeout=urllib3.Timeout(connect=45, read=240)) as response: return response.read().decode("utf-8") -def has_internet_connection(probing_url): - logger = logging.getLogger(__name__) - try: - # We try to connect to Github by default. We use that to avoid touching too much different remote endpoints. - logger.debug("Checking for internet connection against [%s]", probing_url) - # We do a HTTP request here to respect the HTTP proxy setting. If we'd open a plain socket connection we circumvent the - # proxy and erroneously conclude we don't have an Internet connection. - response = __http().request("GET", probing_url, timeout=10.0, retries=8) # wait up to 90s, 9 requests in total - status = response.status - logger.debug("Probing result is HTTP status [%s]", str(status)) - return status == 200 - except KeyboardInterrupt: - raise - except BaseException: - logger.info("Could not detect a working Internet connection", exc_info=True) - return False - - -def __http(): - if not __HTTP: +def _request(method, url, **kwargs): + if not _HTTP or not _HTTPS: init() - return __HTTP + parsed_url = urllib3.util.parse_url(url) + manager = _HTTPS if parsed_url.scheme == "https" else _HTTP + return manager.request(method, url, **kwargs) def resolve(hostname_or_ip): diff --git a/esrally/utils/process.py b/esrally/utils/process.py index 52b9de32a..5d9e3bb41 100644 --- a/esrally/utils/process.py +++ b/esrally/utils/process.py @@ -28,14 +28,10 @@ def run_subprocess(command_line): return subprocess.call(command_line, shell=True) -def run_subprocess_with_output(command_line, env_vars=None): +def run_subprocess_with_output(command_line, env=None): logger = logging.getLogger(__name__) logger.debug("Running subprocess [%s] with output.", command_line) command_line_args = shlex.split(command_line) - env = None - if env_vars: - env = os.environ.copy() - env.update(env_vars) with subprocess.Popen(command_line_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=env) as command_line_process: has_output = True lines = [] diff --git a/esrally/utils/repo.py b/esrally/utils/repo.py index ad75b6758..966ae7528 100644 --- a/esrally/utils/repo.py +++ b/esrally/utils/repo.py @@ -44,8 +44,11 @@ def __init__(self, remote_url, root_dir, repo_name, resource_name, offline, fetc else: try: git.fetch(src=self.repo_dir, remote="origin") - except exceptions.SupplyError: - console.warn("Could not update %s. Continuing with your locally available state." % self.resource_name) + except exceptions.SupplyError as e: + console.warn( + "Could not update %s. Continuing with your locally available state. Original error: %s\n" + % (self.resource_name, e.message) + ) else: if not git.is_working_copy(self.repo_dir): if io.exists(self.repo_dir): diff --git a/it/basic_test.py b/it/basic_test.py index cd43ca23d..d1cc45507 100644 --- a/it/basic_test.py +++ b/it/basic_test.py @@ -14,6 +14,8 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. +import os +import tempfile import it from esrally.utils import process @@ -37,7 +39,13 @@ def test_run_with_help(cfg): @it.rally_in_mem def test_run_without_http_connection(cfg): - cmd = it.esrally_command_line_for(cfg, "list races") - output = process.run_subprocess_with_output(cmd, {"http_proxy": "http://invalid"}) - expected = "No Internet connection detected. Specify --offline" - assert expected in "\n".join(output) + cmd = it.esrally_command_line_for(cfg, "list tracks") + with tempfile.TemporaryDirectory() as tmpdir: + env = os.environ.copy() + env["http_proxy"] = "http://invalid" + env["https_proxy"] = "http://invalid" + # make sure we don't have any saved state + env["RALLY_HOME"] = tmpdir + output = process.run_subprocess_with_output(cmd, env=env) + expected = "[ERROR] Cannot list" + assert expected in "\n".join(output) diff --git a/it/proxy_test.py b/it/proxy_test.py index 9ed6eff6c..47e77d336 100644 --- a/it/proxy_test.py +++ b/it/proxy_test.py @@ -75,20 +75,26 @@ def test_run_with_direct_internet_connection(cfg, http_proxy, fresh_log_file): @it.rally_in_mem -def test_anonymous_proxy_no_connection(cfg, http_proxy, fresh_log_file): +def test_anonymous_proxy_no_connection(cfg, http_proxy): env = dict(os.environ) env["http_proxy"] = http_proxy.anonymous_url - assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0 - assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.anonymous_url}] to the Internet") - # unauthenticated proxy access is prevented - assert_log_line_present(fresh_log_file, "No Internet connection detected. Specify --offline") + env["https_proxy"] = http_proxy.anonymous_url + lines = process.run_subprocess_with_output(it.esrally_command_line_for(cfg, "list tracks"), env=env) + output = "\n".join(lines) + # there should be a warning because we can't connect + assert "[WARNING] Could not update tracks." in output + # still, the command succeeds because of local state + assert "[INFO] SUCCESS" in output @it.rally_in_mem -def test_authenticated_proxy_user_can_connect(cfg, http_proxy, fresh_log_file): +def test_authenticated_proxy_user_can_connect(cfg, http_proxy): env = dict(os.environ) env["http_proxy"] = http_proxy.authenticated_url - assert process.run_subprocess_with_logging(it.esrally_command_line_for(cfg, "list tracks"), env=env) == 0 - assert_log_line_present(fresh_log_file, f"Connecting via proxy URL [{http_proxy.authenticated_url}] to the Internet") - # authenticated proxy access is allowed - assert_log_line_present(fresh_log_file, "Detected a working Internet connection") + env["https_proxy"] = http_proxy.authenticated_url + lines = process.run_subprocess_with_output(it.esrally_command_line_for(cfg, "list tracks"), env=env) + output = "\n".join(lines) + # rally should be able to connect, no warning + assert "[WARNING] Could not update tracks." not in output + # the command should succeed + assert "[INFO] SUCCESS" in output