Skip to content

Commit

Permalink
v0.1.3 release
Browse files Browse the repository at this point in the history
  • Loading branch information
hkratz authored May 14, 2021
1 parent 5f020de commit 0798da0
Show file tree
Hide file tree
Showing 7 changed files with 50 additions and 32 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Changelog
## [Unreleased]

## [0.1.3] - 2021-05-14
### New features
* Low-level streaming validation API in `simdutf8::basic::imp`

## [0.1.2] - 2021-05-09
### New features
* Aarch64 support (e.g. Apple Silicon, Raspberry Pi 4, ...) with nightly Rust and crate feature `aarch64_neon`
Expand Down Expand Up @@ -46,7 +50,8 @@
## [0.0.1] - 2021-04-20
- Initial release.

[Unreleased]: https://github.com/rusticstuff/simdutf8/compare/v0.1.2...HEAD
[Unreleased]: https://github.com/rusticstuff/simdutf8/compare/v0.1.3...HEAD
[0.1.3]: https://github.com/rusticstuff/simdutf8/compare/v0.1.2...v0.1.3
[0.1.2]: https://github.com/rusticstuff/simdutf8/compare/v0.1.1...v0.1.2
[0.1.1]: https://github.com/rusticstuff/simdutf8/compare/v0.1.0...v0.1.1
[0.1.0]: https://github.com/rusticstuff/simdutf8/compare/v0.0.3...v0.1.0
Expand Down
13 changes: 11 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "simdutf8"
version = "0.1.2"
version = "0.1.3"
authors = ["Hans Kratz <[email protected]>"]
edition = "2018"
description = "SIMD-accelerated UTF-8 validation."
Expand All @@ -11,7 +11,16 @@ readme = "README.md"
keywords = ["utf-8", "unicode", "string", "validation", "simd"]
categories = ["encoding", "algorithms", "no-std"]
license = "MIT OR Apache-2.0"
exclude = ["/.github", "/.vscode", "/bench", "/fuzzing", "/img", "expected-methods-*.txt"]
exclude = [
"/.gitignore",
"/.github",
"/.vscode",
"/bench",
"/fuzzing",
"/img",
"/inlining",
"TODO.md",
]

[features]
default = ["std"]
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ This library has been thoroughly tested with sample data as well as fuzzing and
Add the dependency to your Cargo.toml file:
```toml
[dependencies]
simdutf8 = { version = "0.1.2" }
simdutf8 = { version = "0.1.3" }
```
or on ARM64 with Rust Nightly:
```toml
[dependencies]
simdutf8 = { version = "0.1.2", features = ["aarch64_neon"] }
simdutf8 = { version = "0.1.3", features = ["aarch64_neon"] }
```

Use `simdutf8::basic::from_utf8()` as a drop-in replacement for `std::str::from_utf8()`.
Expand Down
22 changes: 12 additions & 10 deletions src/basic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ impl std::error::Error for Utf8Error {}
/// Analogue to [`std::str::from_utf8()`].
///
/// Checks if the passed byte sequence is valid UTF-8 and returns an
/// [`std::str``] reference to the passed byte slice wrapped in `Ok()` if it is.
/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
///
/// # Errors
/// Will return the zero-sized Err([`Utf8Error`]) on if the input contains invalid UTF-8.
Expand All @@ -43,7 +43,7 @@ pub fn from_utf8(input: &[u8]) -> Result<&str, Utf8Error> {
/// Analogue to [`std::str::from_utf8_mut()`].
///
/// Checks if the passed mutable byte sequence is valid UTF-8 and returns a mutable
/// [`std::str``] reference to the passed byte slice wrapped in `Ok()` if it is.
/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
///
/// # Errors
/// Will return the zero-sized Err([`Utf8Error`]) on if the input contains invalid UTF-8.
Expand All @@ -58,16 +58,18 @@ pub fn from_utf8_mut(input: &mut [u8]) -> Result<&mut str, Utf8Error> {
/// Allows direct access to the platform-specific unsafe validation implementations.
#[cfg(feature = "public_imp")]
pub mod imp {
use crate::basic;

/// A low-level interfacne for streaming validation of UTF-8 data. It is meant to be integrated
/// in high-performance data processing pipelines.
///
/// Data can be streamed in arbitrarily-sized chunks using the [`Self::update()`] method. There is
/// no way to find out if the input so far was valid UTF-8 during the validation. Only when
/// the validation is completed with the [`Self::finalize()`] method the result of the validation is
/// returned. Use [`ChunkedUtf8Validator`] is possible for highest performance.
/// returned. Use [`ChunkedUtf8Validator`] if possible for highest performance.
///
/// This implementation requires CPU SIMD features specified by the module it resides in.
/// It is undefined behavior to call it if the required CPU features are not available which
/// It is undefined behavior to use it if the required CPU features are not available which
/// is why all trait methods are `unsafe`.
///
/// General usage:
Expand Down Expand Up @@ -123,13 +125,13 @@ pub mod imp {
/// Finishes the validation and returns `Ok(())` if the input was valid UTF-8.
///
/// # Errors
/// A [`crate::basic::Utf8Error`] is returned if the input was not valid UTF-8. No
/// A [`basic::Utf8Error`] is returned if the input was not valid UTF-8. No
/// further information about the location of the error is provided.
///
/// # Safety
/// This implementation requires CPU SIMD features specified by the module it resides in.
/// It is undefined behavior to call it if the required CPU features are not available.
unsafe fn finalize(self) -> core::result::Result<(), crate::basic::Utf8Error>;
unsafe fn finalize(self) -> core::result::Result<(), basic::Utf8Error>;
}

/// Like [`Utf8Validator`] this low-level API is for streaming validation of UTF-8 data.
Expand All @@ -146,7 +148,7 @@ pub mod imp {
/// data passed to it.
///
/// This implementation requires CPU SIMD features specified by the module it resides in.
/// It is undefined behavior to call it if the required CPU features are not available which
/// It is undefined behavior to use it if the required CPU features are not available which
/// is why all trait methods are `unsafe`.
pub trait ChunkedUtf8Validator {
/// Creates a new validator.
Expand Down Expand Up @@ -175,7 +177,7 @@ pub mod imp {
/// Finishes the validation and returns `Ok(())` if the input was valid UTF-8.
///
/// # Errors
/// A [`crate::basic::Utf8Error`] is returned if the input was not valid UTF-8. No
/// A [`basic::Utf8Error`] is returned if the input was not valid UTF-8. No
/// further information about the location of the error is provided.
///
/// # Safety
Expand All @@ -184,7 +186,7 @@ pub mod imp {
unsafe fn finalize(
self,
remaining_input: core::option::Option<&[u8]>,
) -> core::result::Result<(), crate::basic::Utf8Error>;
) -> core::result::Result<(), basic::Utf8Error>;
}

/// Includes the x86/x86-64 SIMD implementations.
Expand All @@ -201,7 +203,7 @@ pub mod imp {
}
/// Includes the validation implementation for SSE 4.2-compatible CPUs.
///
/// Using the provided functionality on CPUs which do not support AVX 2 is undefined
/// Using the provided functionality on CPUs which do not support SSE 4.2 is undefined
/// behavior and will very likely cause a crash.
pub mod sse42 {
pub use crate::implementation::x86::sse42::validate_utf8_basic as validate_utf8;
Expand Down
2 changes: 1 addition & 1 deletion src/compat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
//!
//! The functions in this module also fail early: errors are checked on-the-fly as the string is processed and once
//! an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
//! This comes at a performance penality compared to the [`crate::basic`] module even if the input is valid UTF-8.
//! This comes at a slight performance penality compared to the [`crate::basic`] module if the input is valid UTF-8.
use core::fmt::Display;
use core::fmt::Formatter;
Expand Down
28 changes: 15 additions & 13 deletions src/implementation/algorithm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
macro_rules! algorithm_simd {
($feat:expr) => {
use crate::{basic, compat};

impl Utf8CheckAlgorithm<SimdU8Value> {
#[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
#[inline]
Expand Down Expand Up @@ -203,7 +205,7 @@ macro_rules! algorithm_simd {
/// Validation implementation for CPUs supporting the SIMD extension (see module).
///
/// # Errors
/// Return the zero-sized [`crate::basic::Utf8Error`] on failure.
/// Returns the zero-sized [`basic::Utf8Error`] on failure.
///
/// # Safety
/// This function is inherently unsafe because it is compiled with SIMD extensions
Expand All @@ -213,7 +215,7 @@ macro_rules! algorithm_simd {
#[inline]
pub unsafe fn validate_utf8_basic(
input: &[u8],
) -> core::result::Result<(), crate::basic::Utf8Error> {
) -> core::result::Result<(), basic::Utf8Error> {
use crate::implementation::helpers::SIMD_CHUNK_SIZE;
let len = input.len();
let mut algorithm = Utf8CheckAlgorithm::<SimdU8Value>::default();
Expand Down Expand Up @@ -250,7 +252,7 @@ macro_rules! algorithm_simd {
}
algorithm.check_incomplete_pending();
if algorithm.has_error() {
Err(crate::basic::Utf8Error {})
Err(basic::Utf8Error {})
} else {
Ok(())
}
Expand All @@ -259,7 +261,7 @@ macro_rules! algorithm_simd {
/// Validation implementation for CPUs supporting the SIMD extension (see module).
///
/// # Errors
/// Return [`crate::compat::Utf8Error`] with detailed error information on failure.
/// Returns [`compat::Utf8Error`] with detailed error information on failure.
///
/// # Safety
/// This function is inherently unsafe because it is compiled with SIMD extensions
Expand All @@ -269,7 +271,7 @@ macro_rules! algorithm_simd {
#[inline]
pub unsafe fn validate_utf8_compat(
input: &[u8],
) -> core::result::Result<(), crate::compat::Utf8Error> {
) -> core::result::Result<(), compat::Utf8Error> {
validate_utf8_compat_simd0(input)
.map_err(|idx| crate::implementation::helpers::get_compat_error(input, idx))
}
Expand Down Expand Up @@ -347,7 +349,7 @@ macro_rules! algorithm_simd {
}
}

/// Low-level implementation of the [`crate::basic::imp::Utf8Validator]` trait.
/// Low-level implementation of the [`basic::imp::Utf8Validator`] trait.
///
/// This is implementation requires CPU SIMD features specified by the module it resides in.
/// It is undefined behavior to call it if the required CPU features are not
Expand All @@ -371,7 +373,7 @@ macro_rules! algorithm_simd {
}

#[cfg(feature = "public_imp")]
impl crate::basic::imp::Utf8Validator for Utf8ValidatorImp {
impl basic::imp::Utf8Validator for Utf8ValidatorImp {
#[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
#[inline]
#[must_use]
Expand Down Expand Up @@ -424,7 +426,7 @@ macro_rules! algorithm_simd {

#[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
#[inline]
unsafe fn finalize(mut self) -> core::result::Result<(), crate::basic::Utf8Error> {
unsafe fn finalize(mut self) -> core::result::Result<(), basic::Utf8Error> {
if self.incomplete_len != 0 {
for i in &mut self.incomplete_data[self.incomplete_len..] {
*i = 0
Expand All @@ -433,14 +435,14 @@ macro_rules! algorithm_simd {
}
self.algorithm.check_incomplete_pending();
if self.algorithm.has_error() {
Err(crate::basic::Utf8Error {})
Err(basic::Utf8Error {})
} else {
Ok(())
}
}
}

/// Low-level implementation of the [`crate::basic::imp::ChunkedUtf8Validator]` trait.
/// Low-level implementation of the [`basic::imp::ChunkedUtf8Validator`] trait.
///
/// This is implementation requires CPU SIMD features specified by the module it resides in.
/// It is undefined behavior to call it if the required CPU features are not
Expand All @@ -451,7 +453,7 @@ macro_rules! algorithm_simd {
}

#[cfg(feature = "public_imp")]
impl crate::basic::imp::ChunkedUtf8Validator for ChunkedUtf8ValidatorImp {
impl basic::imp::ChunkedUtf8Validator for ChunkedUtf8ValidatorImp {
#[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
#[inline]
#[must_use]
Expand Down Expand Up @@ -480,7 +482,7 @@ macro_rules! algorithm_simd {
unsafe fn finalize(
mut self,
remaining_input: core::option::Option<&[u8]>,
) -> core::result::Result<(), crate::basic::Utf8Error> {
) -> core::result::Result<(), basic::Utf8Error> {
use crate::implementation::helpers::SIMD_CHUNK_SIZE;

if let Some(mut remaining_input) = remaining_input {
Expand All @@ -505,7 +507,7 @@ macro_rules! algorithm_simd {
}
self.algorithm.check_incomplete_pending();
if self.algorithm.has_error() {
Err(crate::basic::Utf8Error {})
Err(basic::Utf8Error {})
} else {
Ok(())
}
Expand Down
6 changes: 3 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@
//! Add the dependency to your Cargo.toml file:
//! ```toml
//! [dependencies]
//! simdutf8 = { version = "0.1.2" }
//! simdutf8 = { version = "0.1.3" }
//! ```
//! or on ARM64 with Rust Nightly:
//! ```toml
//! [dependencies]
//! simdutf8 = { version = "0.1.2", features = ["aarch64_neon"] }
//! simdutf8 = { version = "0.1.3", features = ["aarch64_neon"] }
//! ```
//!
//! Use [`basic::from_utf8()`] as a drop-in replacement for `std::str::from_utf8()`.
Expand Down Expand Up @@ -87,7 +87,7 @@
//!
//! ### Access to low-level functionality
//! If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
//! implementations are then accessible via [`basic::imp`] and [`compat::imp`].Traits facilitating streaming validation are available
//! implementations are then accessible via [`basic::imp`] and [`compat::imp`]. Traits facilitating streaming validation are available
//! there as well.
//!
//! ## Optimisation flags
Expand Down

0 comments on commit 0798da0

Please sign in to comment.