Skip to content

Commit

Permalink
ICU: make optional
Browse files Browse the repository at this point in the history
  • Loading branch information
mjquinn-google committed Apr 13, 2023
1 parent 38c533b commit ce84415
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 14 deletions.
34 changes: 27 additions & 7 deletions client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,37 @@ but it is not officially supported.

**Download and extract [the latest release zip `dwh-migration-tools-vX.X.X.zip`](https://github.com/google/dwh-migration-tools/releases/latest).**

Python ≥ 3.7.2 is required, as well as the additional local dependencies
`pkg-config` and `libicu-dev`. Typical commands for installing these
dependencies on a Debian-based Linux distribution such as Ubuntu would be:
### Python

Python ≥ 3.7.2 is required.
You can check whether you have a recent enough version of Python installed by
running the command `python3 --version`.

You must also have the Python `pip` and `venv` modules installed. Altogether,
the distribution-specific commands to install these are:

* Debian-based distros: `sudo apt install python3-pip python3-venv`
* Red Hat-based distros: `sudo yum install python38 python38-pip` (for e.g. Python
3.8)

### Support for Encodings other than UTF-8

If all of the files you wish to translate are UTF-8 encoded
(this is commonly the case), you can skip this section.
Otherwise, you will need to install additional system dependencies:

* Debian-based distros: `sudo apt install pkg-config libicu-dev`
* RedHat-based distros: `sudo yum install gcc gcc-c++ libicu-devel
python38-devel`

**You must also remember**, upon reaching the step to `pip install` further down
in the Quickstart section below, to use this command instead:

```shell
sudo apt update
sudo apt install -y python3-pip python3-venv pkg-config libicu-dev
pip install ../dwh-migration-tools/client[icu]
```

You can check whether you have a recent enough version of Python installed by
running the command `python3 --version`.
### GCP

You need a GCP project and a Google Cloud Storage bucket to use for uploading
your input SQL files and downloading the translated output. [Learn how to
Expand Down
31 changes: 25 additions & 6 deletions client/bqms_run/encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@

import logging

import icu

logger = logging.getLogger(__name__)


Expand All @@ -25,14 +23,35 @@ class EncodingDetector:
An encoding detector.
"""

DEFAULT_ENCODING = "utf-8"
has_logged_fallback_warning = False

def detect(self, data: bytes) -> str:
"""
Detect the encoding of the provided bytes, return the encoding name.
"""
encoding = icu.CharsetDetector(data).detect().getName()
if not isinstance(encoding, str):
return "utf-8"
return encoding
try:
# pylint: disable-next=import-outside-toplevel
import icu

encoding = icu.CharsetDetector(data).detect().getName()
if not isinstance(encoding, str):
return EncodingDetector.DEFAULT_ENCODING
return encoding
# pylint: disable-next=broad-exception-caught
except Exception as ex:
# any ICU-related exceptions should not halt execution,
# just assume UTF-8
if not EncodingDetector.has_logged_fallback_warning:
# only print a single per-process warning to avoid log noise
EncodingDetector.has_logged_fallback_warning = True
logger.warning(
# pylint: disable-next=line-too-long
"PyICU is either not available or misconfigured; assuming default encoding of %s (cause: %s)",
EncodingDetector.DEFAULT_ENCODING,
ex,
)
return EncodingDetector.DEFAULT_ENCODING

def decode(self, data: bytes) -> str:
"""
Expand Down
5 changes: 4 additions & 1 deletion client/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ PyYAML = "^6.0"
marshmallow = "^3.17.1"
cloudpathlib = "^0.10.0"
typing-extensions = "^4.4.0"
PyICU = "^2.10.2"
PyICU = { version = "^2.10.2", optional = true }

[tool.poetry.extras]
icu = ["PyICU"]

[tool.poetry.dev-dependencies]
black = "22.6.0"
Expand Down

0 comments on commit ce84415

Please sign in to comment.