Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make full-icu the default #29522

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 29 additions & 18 deletions BUILDING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,23 @@ file a new issue.
* [Building Node.js](#building-nodejs-1)
jasnell marked this conversation as resolved.
Show resolved Hide resolved
* [Android/Android-based devices (e.g. Firefox OS)](#androidandroid-based-devices-eg-firefox-os)
* [`Intl` (ECMA-402) support](#intl-ecma-402-support)
* [Default: `small-icu` (English only) support](#default-small-icu-english-only-support)
* [Build with full ICU support (all locales supported by ICU)](#build-with-full-icu-support-all-locales-supported-by-icu)
* [Unix/macOS](#unixmacos)
* [Windows](#windows-1)
* [Building without Intl support](#building-without-intl-support)
* [Trimmed: `small-icu` (English only) support](#trimmed-small-icu-english-only-support)
* [Unix/macOS](#unixmacos-1)
* [Windows](#windows-2)
* [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only)
* [Build with a specific ICU](#build-with-a-specific-icu)
* [Building without Intl support](#building-without-intl-support)
* [Unix/macOS](#unixmacos-2)
* [Windows](#windows-3)
* [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only)
* [Build with a specific ICU](#build-with-a-specific-icu)
* [Unix/macOS](#unixmacos-3)
* [Windows](#windows-4)
* [Building Node.js with FIPS-compliant OpenSSL](#building-nodejs-with-fips-compliant-openssl)
* [Building Node.js with external core modules](#building-nodejs-with-external-core-modules)
* [Unix/macOS](#unixmacos-3)
* [Windows](#windows-4)
* [Unix/macOS](#unixmacos-4)
* [Windows](#windows-5)
* [Note for downstream distributors of Node.js](#note-for-downstream-distributors-of-nodejs)

## Supported platforms
Expand Down Expand Up @@ -598,31 +600,40 @@ $ make
## `Intl` (ECMA-402) support

[Intl](https://github.com/nodejs/node/blob/master/doc/api/intl.md) support is
enabled by default, with English data only.
enabled by default.

### Default: `small-icu` (English only) support
### Build with full ICU support (all locales supported by ICU)

By default, only English data is included, but
the full `Intl` (ECMA-402) APIs. It does not need to download
any dependencies to function. You can add full
data at runtime.
This is the default option.

### Build with full ICU support (all locales supported by ICU)
#### Unix/macOS

With the `--download=all`, this may download ICU if you don't have an
ICU in `deps/icu`. (The embedded `small-icu` included in the default
Node.js source does not include all locales.)
```console
$ ./configure --with-intl=full-icu
```

#### Windows

```console
> .\vcbuild full-icu
```

### Trimmed: `small-icu` (English only) support

In this configuration, only English data is included, but
the full `Intl` (ECMA-402) APIs. It does not need to download
any dependencies to function. You can add full data at runtime.

#### Unix/macOS

```console
$ ./configure --with-intl=full-icu --download=all
$ ./configure --with-intl=small-icu
```

#### Windows

```console
> .\vcbuild full-icu download-all
> .\vcbuild small-icu
```

### Building without Intl support
Expand Down
87 changes: 61 additions & 26 deletions configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
import shlex
import subprocess
import shutil
import bz2

from distutils.spawn import find_executable as which

# If not run from node/, cd to node/.
Expand Down Expand Up @@ -409,7 +411,7 @@
intl_optgroup.add_option('--with-intl',
action='store',
dest='with_intl',
default='small-icu',
default='full-icu',
choices=valid_intl_modes,
help='Intl mode (valid choices: {0}) [default: %default]'.format(
', '.join(valid_intl_modes)))
Expand Down Expand Up @@ -1399,38 +1401,35 @@ def write_config(data, name):
icu_parent_path = 'deps'

# The full path to the ICU source directory. Should not include './'.
icu_full_path = 'deps/icu'
icu_deps_path = 'deps/icu'
icu_full_path = icu_deps_path

# icu-tmp is used to download and unpack the ICU tarball.
icu_tmp_path = os.path.join(icu_parent_path, 'icu-tmp')

# canned ICU. see tools/icu/README.md to update.
canned_icu_dir = 'deps/icu-small'

# use the README to verify what the canned ICU is
canned_is_full = os.path.isfile(os.path.join(canned_icu_dir, 'README-FULL-ICU.txt'))
canned_is_small = os.path.isfile(os.path.join(canned_icu_dir, 'README-SMALL-ICU.txt'))
if canned_is_small:
warn('Ignoring %s - in-repo small icu is no longer supported.' % canned_icu_dir)

# We can use 'deps/icu-small' - pre-canned ICU *iff*
# - with_intl == small-icu (the default!)
# - with_icu_locales == 'root,en' (the default!)
# - deps/icu-small exists!
# - canned_is_full AND
# - with_icu_source is unset (i.e. no other ICU was specified)
# (Note that this is the *DEFAULT CASE*.)
#
# This is *roughly* equivalent to
# $ configure --with-intl=small-icu --with-icu-source=deps/icu-small
# $ configure --with-intl=full-icu --with-icu-source=deps/icu-small
# .. Except that we avoid copying icu-small over to deps/icu.
# In this default case, deps/icu is ignored, although make clean will
# still harmlessly remove deps/icu.

# are we using default locales?
using_default_locales = ( options.with_icu_locales == icu_default_locales )

# make sure the canned ICU really exists
canned_icu_available = os.path.isdir(canned_icu_dir)

if (o['variables']['icu_small'] == b(True)) and using_default_locales and (not with_icu_source) and canned_icu_available:
if (not with_icu_source) and canned_is_full:
# OK- we can use the canned ICU.
icu_config['variables']['icu_small_canned'] = 1
icu_full_path = canned_icu_dir

icu_config['variables']['icu_full_canned'] = 1
# --with-icu-source processing
# now, check that they didn't pass --with-icu-source=deps/icu
elif with_icu_source and os.path.abspath(icu_full_path) == os.path.abspath(with_icu_source):
Expand Down Expand Up @@ -1508,29 +1507,40 @@ def write_config(data, name):
icu_endianness = sys.byteorder[0]
o['variables']['icu_ver_major'] = icu_ver_major
o['variables']['icu_endianness'] = icu_endianness
icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l')
icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l') # LE filename
icu_data_file = 'icudt%s%s.dat' % (icu_ver_major, icu_endianness)
# relative to configure
icu_data_path = os.path.join(icu_full_path,
'source/data/in',
icu_data_file_l)
icu_data_file_l) # LE
compressed_data = '%s.bz2' % (icu_data_path)
if not os.path.isfile(icu_data_path) and os.path.isfile(compressed_data):
# unpack. deps/icu is a temporary path
if os.path.isdir(icu_tmp_path):
shutil.rmtree(icu_tmp_path)
os.mkdir(icu_tmp_path)
icu_data_path = os.path.join(icu_tmp_path, icu_data_file_l)
with open(icu_data_path, 'wb') as outf:
with bz2.BZ2File(compressed_data, 'rb') as inf:
shutil.copyfileobj(inf, outf)
# Now, proceed..

# relative to dep..
icu_data_in = os.path.join('..','..', icu_full_path, 'source/data/in', icu_data_file_l)
icu_data_in = os.path.join('..','..', icu_data_path)
if not os.path.isfile(icu_data_path) and icu_endianness != 'l':
# use host endianness
icu_data_path = os.path.join(icu_full_path,
'source/data/in',
icu_data_file)
# relative to dep..
icu_data_in = os.path.join('..', icu_full_path, 'source/data/in',
icu_data_file)
# this is the input '.dat' file to use .. icudt*.dat
# may be little-endian if from a icu-project.org tarball
o['variables']['icu_data_in'] = icu_data_in
icu_data_file) # will be generated
if not os.path.isfile(icu_data_path):
# .. and we're not about to build it from .gyp!
error('''ICU prebuilt data file %s does not exist.
See the README.md.''' % icu_data_path)

# this is the input '.dat' file to use .. icudt*.dat
# may be little-endian if from a icu-project.org tarball
o['variables']['icu_data_in'] = icu_data_in

# map from variable name to subdirs
icu_src = {
'stubdata': 'stubdata',
Expand All @@ -1547,6 +1557,31 @@ def write_config(data, name):
var = 'icu_src_%s' % i
path = '../../%s/source/%s' % (icu_full_path, icu_src[i])
icu_config['variables'][var] = glob_to_var('tools/icu', path, 'patches/%s/source/%s' % (icu_ver_major, icu_src[i]) )
# calculate platform-specific genccode args
# print("platform %s, flavor %s" % (sys.platform, flavor))
# if sys.platform == 'darwin':
# shlib_suffix = '%s.dylib'
# elif sys.platform.startswith('aix'):
# shlib_suffix = '%s.a'
# else:
# shlib_suffix = 'so.%s'
if flavor == 'win':
icu_config['variables']['icu_asm_ext'] = 'obj'
icu_config['variables']['icu_asm_opts'] = [ '-o ' ]
elif with_intl == 'small-icu' or options.cross_compiling:
icu_config['variables']['icu_asm_ext'] = 'c'
icu_config['variables']['icu_asm_opts'] = []
elif flavor == 'mac':
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc-darwin' ]
elif sys.platform.startswith('aix'):
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'xlc' ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use xlc on AIX. I'm wondering if this is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's "xlc format" assembly, which seems to be what the .S file taken in by gcc on AIX expects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 thanks for the clarification.

else:
# assume GCC-compatible asm is OK
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc' ]

# write updated icu_config.gypi with a bunch of paths
write(icu_config_name, do_not_edit +
pprint.pformat(icu_config, indent=2) + '\n')
Expand Down
9 changes: 9 additions & 0 deletions deps/icu-small/README-FULL-ICU.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
ICU sources - auto generated by shrink-icu-src.py

This directory contains the ICU subset used by --with-intl=full-icu
It is a strict subset of ICU 64 source files with the following exception(s):
* deps/icu-small/source/data/in/icudt64l.dat.bz2 : compressed data file


To rebuild this directory, see ../../tools/icu/README.md

8 changes: 0 additions & 8 deletions deps/icu-small/README-SMALL-ICU.txt

This file was deleted.

Binary file removed deps/icu-small/source/data/in/icudt64l.dat
Binary file not shown.
Binary file added deps/icu-small/source/data/in/icudt64l.dat.bz2
Binary file not shown.
26 changes: 11 additions & 15 deletions doc/api/intl.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,9 @@ programs. Some of them are:
* [`RegExp` Unicode Property Escapes][]

Node.js (and its underlying V8 engine) uses [ICU][] to implement these features
in native C/C++ code. However, some of them require a very large ICU data file
in order to support all locales of the world. Because it is expected that most
Node.js users will make use of only a small portion of ICU functionality, only
a subset of the full ICU data set is provided by Node.js by default. Several
options are provided for customizing and expanding the ICU data set either when
in native C/C++ code. The full ICU data set is provided by Node.js by default.
However, due to the size of the ICU data file, several
options are provided for customizing the ICU data set either when
building or running Node.js.

## Options for building Node.js
Expand All @@ -38,8 +36,8 @@ in [BUILDING.md][].

* `--with-intl=none`/`--without-intl`
* `--with-intl=system-icu`
* `--with-intl=small-icu` (default)
* `--with-intl=full-icu`
* `--with-intl=small-icu`
* `--with-intl=full-icu` (default)

An overview of available Node.js and JavaScript features for each `configure`
option:
Expand All @@ -66,8 +64,8 @@ operation is identical to that of `Date.prototype.toString()`.

### Disable all internationalization features (`none`)

If this option is chosen, most internationalization features mentioned above
will be **unavailable** in the resulting `node` binary.
If this option is chosen, ICU is disabled and most internationalization
features mentioned above will be **unavailable** in the resulting `node` binary.

### Build with a pre-installed ICU (`system-icu`)

Expand Down Expand Up @@ -106,9 +104,7 @@ console.log(spanish.format(january));
// Should print "enero"
```

This mode provides a good balance between features and binary size, and it is
the default behavior if no `--with-intl` flag is passed. The official binaries
are also built in this mode.
This mode provides a balance between features and binary size.

#### Providing ICU data at runtime

Expand Down Expand Up @@ -149,8 +145,9 @@ enable full `Intl` support.

This option makes the resulting binary link against ICU statically and include
a full set of ICU data. A binary created this way has no further external
dependencies and supports all locales, but might be rather large. See
[BUILDING.md][BUILDING.md#full-icu] on how to compile a binary using this mode.
dependencies and supports all locales, but might be rather large. This is
the default behavior if no `--with-intl` flag is passed. The official binaries
are also built in this mode.

## Detecting internationalization support

Expand Down Expand Up @@ -205,7 +202,6 @@ to be helpful:
[`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase
[`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc
[`require('util').TextDecoder`]: util.html#util_class_util_textdecoder
[BUILDING.md#full-icu]: https://github.com/nodejs/node/blob/master/BUILDING.md#build-with-full-icu-support-all-locales-supported-by-icu
[BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md
[ECMA-262]: https://tc39.github.io/ecma262/
[ECMA-402]: https://tc39.github.io/ecma402/
Expand Down
42 changes: 20 additions & 22 deletions doc/api/util.md
Original file line number Diff line number Diff line change
Expand Up @@ -932,26 +932,9 @@ Per the [WHATWG Encoding Standard][], the encodings supported by the
one or more aliases may be used.

Different Node.js build configurations support different sets of encodings.
While a very basic set of encodings is supported even on Node.js builds without
ICU enabled, support for some encodings is provided only when Node.js is built
with ICU and using the full ICU data (see [Internationalization][]).
(see [Internationalization][])

#### Encodings Supported Without ICU

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |

#### Encodings Supported by Default (With ICU)

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
| `'utf-16be'` | |

#### Encodings Requiring Full ICU Data
#### Encodings Supported by Default (With Full ICU Data)

| Encoding | Aliases |
| ----------------- | -------------------------------- |
Expand Down Expand Up @@ -990,6 +973,21 @@ with ICU and using the full ICU data (see [Internationalization][]).
| `'shift_jis'` | `'csshiftjis'`, `'ms932'`, `'ms_kanji'`, `'shift-jis'`, `'sjis'`, `'windows-31j'`, `'x-sjis'` |
| `'euc-kr'` | `'cseuckr'`, `'csksc56011987'`, `'iso-ir-149'`, `'korean'`, `'ks_c_5601-1987'`, `'ks_c_5601-1989'`, `'ksc5601'`, `'ksc_5601'`, `'windows-949'` |

#### Encodings Supported when Node.js is built with the `small-icu` option

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
| `'utf-16be'` | |

#### Encodings Supported when ICU is disabled

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |

The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
is not supported.

Expand All @@ -1005,9 +1003,9 @@ changes:
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
supports. **Default:** `'utf-8'`.
* `options` {Object}
* `fatal` {boolean} `true` if decoding failures are fatal. This option is only
supported when ICU is enabled (see [Internationalization][]). **Default:**
`false`.
* `fatal` {boolean} `true` if decoding failures are fatal.
This option is not supported when ICU is disabled
(see [Internationalization][]). **Default:** `false`.
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
order mark in the decoded result. When `false`, the byte order mark will
be removed from the output. This option is only used when `encoding` is
Expand Down
Loading