Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing docs on how to run unicode-table-generator #131640

Open
RalfJung opened this issue Oct 13, 2024 · 14 comments
Open

Missing docs on how to run unicode-table-generator #131640

RalfJung opened this issue Oct 13, 2024 · 14 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@RalfJung
Copy link
Member

The file library/core/src/unicode/unicode_data.rs says

///! This file is generated by src/tools/unicode-table-generator; do not edit manually!

However, it doesn't say how to run that tool. The "obvious" ./x.py run src/tools/unicode-table-generator does not work. I didn't find anything in the dev guide either.

So I now edited the file by hand instead 🤷 but that can't be the goal of this.^^

Cc @Mark-Simulacrum

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

maybe

cargo +nightly run src/tools/unicode-table-generator -- library/core/src/unicode/unicode_data.rs

This looks like a helper util that's not managed by bootstrap AFAICT

EDIT: no it's not that simple, this will cause src/tool/miri errors because of how the workspace works.

@jieyouxu
Copy link
Member

I got something working, I think?

PS E:\Repos\rust> ./x run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.06s
Building stage0 tool unicode-table-generator (x86_64-pc-windows-msvc)
   Compiling unicode-table-generator v0.1.0 (E:\Repos\rust\src\tools\unicode-table-generator)
    Finished `release` profile [optimized + debuginfo] target(s) in 1.72s
Alphabetic     : 1727 bytes, 142759 codepoints in 757 ranges (65 - 205744) using skiplist
Case_Ignorable : 1053 bytes, 2749 codepoints in 452 ranges (39 - 918000) using skiplist
Cased          : 407 bytes, 4578 codepoints in 159 ranges (65 - 127370) using skiplist
Cc             : 9 bytes, 65 codepoints in 2 ranges (0 - 160) using skiplist
Grapheme_Extend: 887 bytes, 2193 codepoints in 375 ranges (768 - 918000) using skiplist
Lowercase      : 935 bytes, 2569 codepoints in 675 ranges (97 - 125252) using bitset
N              : 457 bytes, 1911 codepoints in 144 ranges (48 - 130042) using skiplist
Uppercase      : 799 bytes, 1978 codepoints in 656 ranges (65 - 127370) using bitset
there are 25 points
White_Space    : 256 bytes, 25 codepoints in 10 ranges (9 - 12289) using cascading
Total table sizes: 6530 bytes
Build completed successfully in 0:00:15
PS E:\Repos\rust>

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

Ah nvm:

        modified:   library/core/src/unicode/unicode_data.rs

@RalfJung
Copy link
Member Author

That's very strange, I just get an error:

$ ./x.py run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.08s
ERROR: no `run` rules matched ["src/tools/unicode-table-generator"]
HELP: run `x.py run --help --verbose` to show a list of available paths
NOTE: if you are adding a new Step to bootstrap itself, make sure you register it with `describe!`
Build completed unsuccessfully in 0:00:00

@jieyouxu
Copy link
Member

Oh sorry to be clear, I said "I got something working" i.e. I had to hook it up in bootstrap 😆

@RalfJung
Copy link
Member Author

Ah :D

@jieyouxu
Copy link
Member

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

PS E:\Repos\rust> git diff .\library\
diff --git a/library/core/src/unicode/unicode_data.rs b/library/core/src/unicode/unicode_data.rs
index db2e3ddd754..143beb37706 100644
--- a/library/core/src/unicode/unicode_data.rs
+++ b/library/core/src/unicode/unicode_data.rs
@@ -18,14 +18,16 @@ const fn bitset_search<
     let bucket_idx = (needle / 64) as usize;
     let chunk_map_idx = bucket_idx / CHUNK_SIZE;
     let chunk_piece = bucket_idx % CHUNK_SIZE;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let chunk_idx = if chunk_map_idx < chunk_idx_map.len() {
         chunk_idx_map[chunk_map_idx]
     } else {
         return false;
     };
     let idx = bitset_chunk_idx[chunk_idx as usize][chunk_piece] as usize;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let word = if idx < bitset_canonical.len() {
         bitset_canonical[idx]
     } else {

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

@RalfJung I hooked it up to bootstrap ./x run src/tools/unicode-table-generator in #131647 (maybe you can base your changes on top of that branch if you want to edit the tool itself to generate the unicode tables and run it via bootstrap 🤷). I can't say I have any clue about how that tool is supposed to be run, though.

@jieyouxu jieyouxu added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. labels Oct 13, 2024
@RalfJung
Copy link
Member Author

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

That's probably because I patched the unicode_data file to update the comments and didn't realize that this is a generated file with its sources somewhere else, and CI did not stop me. 😂

@jieyouxu
Copy link
Member

Ok cool, I synced the comments in the tool to your changes, and ./x run src/tools/unicode-table-generator now no longer produces a diff for library/core/src/unicode/unicode_data.rs.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again. EDIT: the tool was also updated

@jieyouxu
Copy link
Member

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

@RalfJung
Copy link
Member Author

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again.

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

Ah right good point.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

I mean, it can be checked by trying to run ./x run src/tools/unicode-table-generator then asserting nothing is modified. But yeah, it could be a tidy check. Probably Mark can decide on what to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

3 participants