Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it no-alloc compatible #33

Open
vmx opened this issue Nov 14, 2020 · 10 comments · May be fixed by #70
Open

Make it no-alloc compatible #33

vmx opened this issue Nov 14, 2020 · 10 comments · May be fixed by #70

Comments

@vmx
Copy link
Member

vmx commented Nov 14, 2020

The data-encoding library is no_alloc compatible, you can use e.g. encode_mut.

We could provide a similar trait, which writes into a mutable slice.

Things to keep in mind:

  • base-x currently needs alloc. It's used for bases which are not power of two.
  • It increases the API surface, currently this library is pretty small, should we change that?
  • Can we make it work without making this library hard to use when you don't need no_std and no-alloc compatibility?
  • How would it relate to rust-cid? Could we follow the idea of rust-multihash and use GenericArray when converting into a certain Multibase? Given that we know the maximum digest size of the CID, we also know the possible maximum base encoded size at compile time. Can this be wrapped in a nice to use API?
@mriise
Copy link
Contributor

mriise commented Nov 16, 2020

Feature flags can do a lot of work here to get no-alloc available. If for now we just want to have it as an option, we could throw in the necessary feature flags and be on our way. Default features would be std, with options to specify if they want no_std with alloc, or no features whatsoever (no base-x, no alloc, no std). It would just be a matter of making the effects of the features obvious, so probably a section in README.

For API, copying decode_len, decode_mut, etc. from data_encoding is only growing the amount of functions by 4 and it's only purpose would be for the specific use case of no alloc. Optionally it might be nice to add as/from_str

Generic array would really be a whole new section of the API (encode x as y with the size of z) as it could easily interfere with the core of the lib (encode x as y). If complexity of API is of concern it would probably be best for cid to do the extra logic at runtime for the time being. If it is really beneficial to cid then it might be worth the effort though currently I can't think of a clean way of doing it (at least without const generics and features from the RFC that don't even work currently).

@mriise
Copy link
Contributor

mriise commented Nov 16, 2020

It also looks like with some work base-x can be converted to no-alloc as well.

@vmx
Copy link
Member Author

vmx commented Nov 16, 2020

If complexity of API is of concern it would probably be best for cid to do the extra logic at runtime for the time being.

This aligns with my thoughts. I find it hard to come up with good/usable APIs without a proper consumer/use case. Hence I would lean towards making it possible to use rust-multibase with no-alloc somehow, but don't spend to much time on a super tight integration with rust-cid. Though I would use rust-cid use case as a consumer in the sense of "what would it look like if you would want to convert a CID into a base-encoded string in a no-alloc environment.

There wouldn't need to be a fleshed out API for rust-cid. I think it would be good enough to have it in the rust-cid documentation a section with an example on how to do it (or an example as code in the rust-cid repo). If it then really becomes used by someone we could think about a tighter integration.

Stebalien added a commit to multiformats/rust-cid that referenced this issue Oct 28, 2021
This:

1. Uses core2 when std isn't supported.
2. Adds an alloc feature for building with and without the alloc crate.

Uses multiformats/rust-multihash#146.

Unfortunately, we can't do multibase formatting yet due to
multiformats/rust-multibase#33. But we can fix
that later.
@mriise
Copy link
Contributor

mriise commented Feb 27, 2022

Decided to get back to work on the no-alloc base-x https://github.com/mriise/smol-base-x
Currently its rather rough, and im considering if supporting utf over 1 byte is worth it. It would simply things a fair amount, maybe just as a feature flag?

Also working on an API using const generics to generate the backing arrays.

@vmx
Copy link
Member Author

vmx commented Feb 28, 2022

im considering if supporting utf over 1 byte is worth it.

I don't think it's worth it. Base encodings are usually ASCII, which is also often the point of them. The only use case are those emoji encodings that I've so far only seen to be used for fun or experimenting with packing more data into less bytes (e.g. for Tweets). So from a practical level, I think having the limitation of supporting ASCII based alphabets only is totally fine.

@mriise
Copy link
Contributor

mriise commented Jul 4, 2022

@vmx i'd like to re-open this for some more discussion now, as i finally got around to releasing a version of https://docs.rs/smol-base-x/0.1.0/smol_base_x/

@mriise
Copy link
Contributor

mriise commented Jul 4, 2022

One option is modifying trait BaseCodec to

pub(crate) trait BaseCodec<En, De> {
    const CODE: char;
    
    /// Encode with the given byte slice.
    fn encode(input: impl AsRef<[u8]>) -> Result<En>;
    /// Decode with the given string (as slice).
    fn decode(input: impl AsRef<str>) -> Result<De>;
}

std impl will look like

impl BaseCodec<String, Vec<u8>> for $type

no-alloc could be (likely to change)

impl<const S: usize, const B: usize> BaseCodec<heapless::String<S>, heapless::Vec<B>> for $type

no_alloc public fn might look like

fn encode_arr<const S: usize>(
    base: BaseEnum,
    input: impl AsRef<[u8]>,
) -> Result<heapless::String<S>> {
    use impls::smol::Base10;
    use impls::BaseCodec;
    let input = input.as_ref();
    // needs to be a different error
    // String size <S> needs to be defined, though Decode <D> doesnt matter and isn't used so it can be whatever
    let mut out =
        <Base10 as BaseCodec<heapless::String<S>, heapless::Vec<u8, 0>>>::encode(input).unwrap();
    // SAFETY: this trusts that encode() leaves an open byte in the begining
    unsafe { out.as_mut_vec()[0] = base.code() as u8 };
    Ok(out)
}

This would allow for preservation of current API since BaseCodec is pub(crate). We will likely need a new Code enum for no_alloc encoding, but offering From (and probably TryInto for std -> no_std code) between the two should make it easy enough to work with.

@mriise
Copy link
Contributor

mriise commented Jul 4, 2022

I'm using heapless here but we can move to an alternative non-alloc output container.

@vmx
Copy link
Member Author

vmx commented Jul 5, 2022

Thanks, all this sounds good to me.

We will likely need a new Code enum for no_alloc encoding

Is this because you need different implementations of the same functions for alloc/no-alloc?

@mriise
Copy link
Contributor

mriise commented Jul 5, 2022

Is this because you need different implementations of the same functions for alloc/no-alloc?

yes, it can be done with feature flags per code [adding feature(..) and feature(not(..)) ] but then you are forced to use either alloc or no alloc, when it is simple enough to just have two enums where one is removed when alloc is disabled.

@mriise mriise linked a pull request Jul 12, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants