Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize remainder validation #51

Draft
wants to merge 74 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
f759a78
refactor: check_remainder()
hkratz May 26, 2021
e562d48
bit mor idiomatic
hkratz May 26, 2021
1edb3b2
WIP: remainder optimization
hkratz May 26, 2021
a7efda1
aarch64 remainder processing: use lane loads
hkratz May 29, 2021
b3c7636
fix
hkratz May 29, 2021
defd4d2
SimdInput::new_partial() implementation
hkratz May 29, 2021
5df7b38
ascii special-casing
hkratz May 29, 2021
7f0cf3b
ascii-optimized remainder checking
hkratz May 30, 2021
2cbf8a9
test-bed for different partial load implementations
hkratz May 30, 2021
42f1acd
aarch64: select partial load implementation
hkratz May 30, 2021
8e2706f
cleanup
hkratz May 30, 2021
695c4f8
remove unnecessary pub
hkratz May 30, 2021
fa25b46
basic x86 partial load impl.
hkratz May 30, 2021
3d71822
clippy
hkratz May 30, 2021
d4d3ccd
add missing helper fns
hkratz May 30, 2021
d08c88b
clippy
hkratz May 30, 2021
0c39149
add unit test for AVX 2 load_partial()
hkratz Jun 1, 2021
198e11c
Implement AVX 2 simd value Display and LowerHex traits for debugging
hkratz Jun 1, 2021
57b0790
add load_partial_direct() method
hkratz Jun 1, 2021
38b58de
only run avx2 masked load test if AVX 2 is available
hkratz Jun 1, 2021
b6b2388
fix avx2 detection
hkratz Jun 1, 2021
44f8fd1
prevent remainder loop unrolling in sse 4.2 which caused the methods …
hkratz Jun 1, 2021
cb7ed19
only implement Debug/LowerHex for tests
hkratz Jun 1, 2021
11de67f
Merge branch 'remainder_optimization' of github.com:rusticstuff/simdu…
hkratz Jun 1, 2021
10df336
expand benchmarks
hkratz Jun 2, 2021
bab2fbf
x86: make delegation to std for small inputs in-code configurable
hkratz Jun 2, 2021
0bb570d
comment
hkratz Jun 2, 2021
cbae848
x86: delegate inputs < 9 bytes to std
hkratz Jun 2, 2021
5229492
Dsiplay for SimdU8Value
hkratz Jun 2, 2021
2952db3
SSE 4.2 load_partial()
hkratz Jun 2, 2021
2333e0e
cleanup
hkratz Jun 3, 2021
54e774e
testbed for partial load
hkratz Jun 3, 2021
1969057
AVX2 in-code config var
hkratz Jun 3, 2021
7e346a8
Faster than std impl on Apple Silicon
hkratz Jun 3, 2021
fcbb14c
comment wording
hkratz Jun 3, 2021
74ecc4e
Update algorithm.rs
hkratz Jun 17, 2021
7df6fe7
Update mod.rs
hkratz Jun 17, 2021
1458d7c
Update mod.rs
hkratz Jun 17, 2021
439dc90
Update mod.rs
hkratz Jun 17, 2021
fb9b955
Merge branch 'remainder_optimization' of github.com:rusticstuff/simdu…
hkratz Jun 18, 2021
d650782
Rust impl for _mm_loadu_si64 intrinsic was wrong, see
hkratz Jun 18, 2021
25b4272
Rust 1.38.0 compat: _mm_loadu_si64() not available -> replace. Asm is…
hkratz Jun 18, 2021
8f5b758
clippy
hkratz Jun 18, 2021
d1753c2
remove extra println!()
hkratz Jun 18, 2021
1b2d9aa
remove stray comment
hkratz Jun 18, 2021
150103b
loop unrolling prevention no longer needed
hkratz Jun 18, 2021
44cd43c
benchmark for small inputs of a random length in ranges
hkratz Jun 18, 2021
97794b3
aarch64 partial load asm impl.
hkratz Jun 18, 2021
138ea37
Update Cargo.toml
hkratz Jun 18, 2021
051dcc1
Merge branch 'remainder_optimization' of github.com:rusticstuff/simdu…
hkratz Jun 18, 2021
b6304c2
fix/allow lint warnings
hkratz Jun 18, 2021
7fea546
example is only for x86-64
hkratz Jun 18, 2021
b995b7f
fix small std benchmark
hkratz Jun 18, 2021
1141636
Update lib.rs
hkratz Jun 18, 2021
e1fd53c
use classic lengths for throughput benchmark again now that we have a…
hkratz Jun 19, 2021
b4ce94f
add small_basic benchmark
hkratz Jun 19, 2021
8babc9f
ARM64 CI
hkratz Jun 24, 2021
4021faa
fix ci
hkratz Jun 24, 2021
80b0bc8
Update ci.yml
hkratz Jun 24, 2021
d29695d
fix ci
hkratz Jun 24, 2021
b2c0339
more ARM64 ci
hkratz Jun 24, 2021
4abbcbe
Merge branch 'main' into remainder_optimization
hkratz Jun 25, 2021
b1416f6
fix ci
hkratz Jun 25, 2021
5753cad
clippy
hkratz Jun 25, 2021
b86d00f
add one more small benchmark
hkratz Jun 30, 2021
8eb9ebf
more consistent small basic benchmarks
hkratz Jun 30, 2021
47588bc
rename fn
hkratz Jun 30, 2021
119b622
make cargo test --all-features work on non-x86
hkratz Jul 6, 2021
bb58d71
remove aarch64 partial load assembly impl. non-inlined fn
hkratz Jul 6, 2021
45befdb
asm fn -> method
hkratz Jul 6, 2021
431e8cc
Merge branch 'main' into remainder_optimization
hkratz Jul 11, 2021
37f9198
simplify
hkratz Jul 11, 2021
d3399cd
Trigger GitHub actions
hkratz Aug 15, 2021
82c5fee
fixes for current nightly
hkratz Oct 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,23 @@ jobs:
env:
RUSTFLAGS: ${{ matrix.rustflags }}

test-arm64:
runs-on: ARM64
strategy:
matrix:
features: ["", "--features std", "--features aarch64_neon,std", "--features aarch64_neon,std,public_imp", "--features aarch64_neon,std,public_imp"]
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: nightly
profile: minimal
override: true
- name: Run tests
run: cargo test --no-default-features ${{ matrix.features }} --all-targets --verbose
env:
RUSTFLAGS: ${{ matrix.rustflags }}

test-inlining-x86:
runs-on: ubuntu-latest
strategy:
Expand Down Expand Up @@ -106,12 +123,12 @@ jobs:
env:
RUSTDOCFLAGS: --cfg docsrs

cross-build-arm:
cross-build-arm32:
runs-on: ubuntu-latest
strategy:
matrix:
toolchain: ["1.38.0", stable, beta, nightly ]
target: [arm-unknown-linux-gnueabi, aarch64-unknown-linux-gnu]
target: [arm-unknown-linux-gnueabi]
features: ["--features std", ""]
include:
- toolchain: nightly
Expand Down Expand Up @@ -172,7 +189,10 @@ jobs:
run: cargo fmt -- --check

clippy_check:
runs-on: ubuntu-latest
runs-on: ${{ matrix.runner }}
strategy:
matrix:
runner: [ubuntu-latest, ARM64]
steps:
- uses: actions/checkout@v1
- uses: actions-rs/toolchain@v1
Expand Down
15 changes: 14 additions & 1 deletion bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ core_affinity = "0.5"
criterion = "0.3"
simdutf8 = { version = "*", path = "..", features = ["aarch64_neon"] }
simdjson-utf8 = { version = "*", path = "simdjson-utf8", optional = true }
rand = "0.8"

[[bench]]
name = "throughput_basic"
Expand All @@ -37,4 +38,16 @@ harness = false
[[bench]]
name = "throughput_simdjson"
harness = false
required-features = ["simdjson"]
required-features = ["simdjson"]

[[bench]]
name = "small_basic"
harness = false

[[bench]]
name = "small_compat"
harness = false

[[bench]]
name = "small_std"
harness = false
3 changes: 3 additions & 0 deletions bench/benches/small_basic.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
use simdutf8_bench::define_small_benchmark;

define_small_benchmark!(BenchFn::Basic);
3 changes: 3 additions & 0 deletions bench/benches/small_compat.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
use simdutf8_bench::define_small_benchmark;

define_small_benchmark!(BenchFn::Compat);
3 changes: 3 additions & 0 deletions bench/benches/small_std.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
use simdutf8_bench::define_small_benchmark;

define_small_benchmark!(BenchFn::Std);
95 changes: 95 additions & 0 deletions bench/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,37 @@ pub fn criterion_benchmark<M: Measurement>(c: &mut Criterion<M>, bench_fn: Bench
bench_late_error(c, bench_fn);
}

pub fn criterion_benchmark_small<M: Measurement>(c: &mut Criterion<M>, bench_fn: BenchFn) {
let core_ids = core_affinity::get_core_ids().unwrap();
core_affinity::set_for_current(*core_ids.get(2).unwrap_or(&core_ids[0]));

bench_small(
c,
"1-latin",
&scale_to_one_mib(include_bytes!("../data/Latin-Lipsum.txt")),
bench_fn,
);

bench_small(
c,
"2-cyrillic",
&scale_to_one_mib(include_bytes!("../data/Russian-Lipsum.txt")),
bench_fn,
);
bench_small(
c,
"3-chinese",
&scale_to_one_mib(include_bytes!("../data/Chinese-Lipsum.txt")),
bench_fn,
);
bench_small(
c,
"4-emoji",
&scale_to_one_mib(include_bytes!("../data/Emoji-Lipsum.txt")),
bench_fn,
);
}

fn bench_empty<M: Measurement>(c: &mut Criterion<M>, bench_fn: BenchFn) {
let mut group = c.benchmark_group("0-empty");
bench_input(&mut group, b"", false, true, bench_fn);
Expand Down Expand Up @@ -129,6 +160,70 @@ fn bench<M: Measurement>(c: &mut Criterion<M>, name: &str, bytes: &[u8], bench_f
group.finish();
}

fn bench_small<M: Measurement>(c: &mut Criterion<M>, name: &str, bytes: &[u8], bench_fn: BenchFn) {
let mut group = c.benchmark_group(name);
bench_range(&mut group, bytes, 0, 16, bench_fn);
bench_range(&mut group, bytes, 16, 32, bench_fn);
bench_range(&mut group, bytes, 32, 64, bench_fn);
bench_range(&mut group, bytes, 64, 128, bench_fn);
bench_range(&mut group, bytes, 128, 256, bench_fn);
group.finish();
}

fn gen_valid_in_range(bytes: &[u8], lower_limit: usize, upper_limit: usize) -> usize {
use rand::Rng;
let mut rng = rand::thread_rng();
loop {
let x = rng.gen_range(lower_limit..upper_limit);
if std_from_utf8(&bytes[0..x]).is_ok() {
return x;
}
}
}

fn bench_range<T: Measurement>(
group: &mut BenchmarkGroup<T>,
bytes: &[u8],
lower_limit: usize,
upper_limit: usize,
bench_fn: BenchFn,
) {
let bench_id = format!("rand_{:03}-{:03}", lower_limit, upper_limit);
let gen_fn = || gen_valid_in_range(bytes, lower_limit, upper_limit);
match bench_fn {
BenchFn::Basic => {
group.bench_function(bench_id, |b| {
b.iter_batched(
gen_fn,
|x| assert!(basic_from_utf8(&bytes[0..x]).is_ok()),
criterion::BatchSize::SmallInput,
)
});
}
BenchFn::Compat => {
group.bench_function(bench_id, |b| {
b.iter_batched(
gen_fn,
|x| assert!(compat_from_utf8(&bytes[0..x]).is_ok()),
criterion::BatchSize::SmallInput,
)
});
}
BenchFn::Std => {
group.bench_function(bench_id, |b| {
b.iter_batched(
gen_fn,
|x| assert!(std_from_utf8(&bytes[0..x]).is_ok()),
criterion::BatchSize::SmallInput,
)
});
}
_ => {
unimplemented!();
}
}
}

#[inline(never)]
fn basic_from_utf8_no_inline(v: &[u8]) -> bool {
basic_from_utf8(v).is_ok()
Expand Down
28 changes: 26 additions & 2 deletions bench/src/macros.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,38 @@ macro_rules! define_throughput_benchmark {

use simdutf8_bench::*;

fn benchmark_compat<M: Measurement>(c: &mut Criterion<M>) {
fn benchmark_throughput<M: Measurement>(c: &mut Criterion<M>) {
criterion_benchmark(c, $bench_fn);
}

criterion_group!(
name = benches;
config = Criterion::default().measurement_time(Duration::from_secs(1)).warm_up_time(Duration::from_secs(1)).sample_size(300);
targets = benchmark_compat
targets = benchmark_throughput
);

criterion_main!(benches);
};
}

#[macro_export]
macro_rules! define_small_benchmark {
($bench_fn:expr) => {
use std::time::Duration;

use criterion::measurement::Measurement;
use criterion::{criterion_group, criterion_main, Criterion};

use simdutf8_bench::*;

fn benchmark_small<M: Measurement>(c: &mut Criterion<M>) {
criterion_benchmark_small(c, $bench_fn);
}

criterion_group!(
name = benches;
config = Criterion::default().measurement_time(Duration::from_secs(1)).warm_up_time(Duration::from_secs(1)).sample_size(300);
targets = benchmark_small
);

criterion_main!(benches);
Expand Down
8 changes: 3 additions & 5 deletions examples/streaming.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
#[cfg(feature = "public_imp")]
use simdutf8::basic::imp::Utf8Validator;

#[allow(unused_imports)]
use std::io::{stdin, Read, Result};

#[cfg(feature = "public_imp")]
#[cfg(all(feature = "public_imp", target_arch = "x86_64"))]
fn main() -> Result<()> {
use simdutf8::basic::imp::Utf8Validator;
unsafe {
if !std::is_x86_feature_detected!("avx2") {
panic!("This example only works with CPUs supporting AVX 2");
Expand All @@ -32,5 +30,5 @@ fn main() -> Result<()> {
}

/// Dummy main. This example requires the crate feature `public_imp`.
#[cfg(not(feature = "public_imp"))]
#[cfg(not(all(feature = "public_imp", target_arch = "x86_64")))]
fn main() {}
5 changes: 5 additions & 0 deletions src/basic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,11 @@ pub mod imp {
/// use simdutf8::basic::imp::Utf8Validator;
/// use std::io::{stdin, Read, Result};
///
/// # #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
/// # fn main() {
/// # }
///
/// # #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
/// fn main() -> Result<()> {
/// unsafe {
/// if !std::is_x86_feature_detected!("avx2") {
Expand Down
8 changes: 0 additions & 8 deletions src/implementation/aarch64/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ pub(crate) mod neon;
#[inline]
#[cfg(all(feature = "aarch64_neon", target_feature = "neon"))]
pub(crate) unsafe fn validate_utf8_basic(input: &[u8]) -> Result<(), crate::basic::Utf8Error> {
if input.len() < super::helpers::SIMD_CHUNK_SIZE {
return super::validate_utf8_basic_fallback(input);
}

validate_utf8_basic_neon(input)
}

Expand All @@ -24,10 +20,6 @@ pub(crate) use super::validate_utf8_basic_fallback as validate_utf8_basic;
#[inline]
#[cfg(all(feature = "aarch64_neon", target_feature = "neon"))]
pub(crate) unsafe fn validate_utf8_compat(input: &[u8]) -> Result<(), crate::compat::Utf8Error> {
if input.len() < super::helpers::SIMD_CHUNK_SIZE {
return super::validate_utf8_compat_fallback(input);
}

validate_utf8_compat_neon(input)
}

Expand Down
Loading