Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce r_array::TypedFrozenRArray and r_array::Iter #85

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

adampetro
Copy link
Contributor

I have found having a typed frozen wrapper around RArray to be quite useful in my own project, so thought I'd open a draft PR to see if there was interest in upstreaming it to Magnus.

Additionally, this includes a r_array::Iter which I've found to faster than the Enumerator returned by RArray::each, but safer than RArray::as_slice. While there is the potential for unexpected behaviour if the RArray is modified while being iterated, I think the worst-case scenario is skipping an item or iterating an item twice.

@matsadler
Copy link
Owner

So I have to admit that every now and again I check GitHub's dependents view and nose around other people's projects to see how they are using Magnus. I happened to be looking at bluejay-rb (very cool btw) just the other day and particularly at your TypedFrozenRArray, so I've already had a bit of a think about how something similar might work in Magnus.

One thing I'm not keen on is implementing TryConvert for TypedFrozenRArray. The type checking and modifying a method argument (by freezing it) doesn't seem like it's something that should happen implicitly.

I was also thinking it might be possible to have a typed array without freezing it, here's a rough design (I've not checked if it works/compiles):

impl RArray {
    /// Iterates though `self` and checks each element is a `T`.
    ///
    /// Returns a typed copy of `self`. Mutating the returned copy will not
    /// mutate `self`.
    pub fn typecheck<T>(self) -> Result<TypedArray<T>, Error>
    where
        T: TryConvert + ReprValue,
    {
        for r in self.each() {
            T::try_convert(r?)?;
        }
        unsafe {
            // new array without a class so it is invisible to ObjectSpace
            let ary = rb_ary_hidden_new(0);
            // turn ary into a copy-on-write clone of self
            rb_ary_replace(ary, self.as_rb_value());
            Ok(TypedArray(
                NonZeroValue::new_unchecked(Value::new(ary)),
                PhantomData,
            ))
        }
    }
}

// Very deliberately not Copy or Clone so that values of this type are consumed
// when TypedArray::to_array is called, so you can either have typed access
// from Rust *or* expose it to Ruby
//
// Maybe this could contain an RArray rather than untyped NonZeroValue. The
// inner value is a bit of a landmine as it doesn't have a class, NonZeroValue
// is an internal type that is hard to leak to Ruby so gives us a layer of
// defence. But maybe it's just annoying to work with?
#[repr(transparent)]
pub struct TypedArray<T>(NonZeroValue, PhantomData<T>);

impl<T> TypedArray<T> {
    // convert to a regular array
    pub fn to_array(self) -> RArray {
        let val = self.0.get();
        let ruby = Ruby::get_with(val);
        unsafe {
            rb_obj_reveal(val.as_rb_value(), ruby.class_array().as_rb_value());
            RArray::from_value_unchecked(val)
        }
    }

    // tediously proxy methods to RArray
    // unlike RArray has to take self by reference
    pub fn store(&self, offset: isize, val: T) -> Result<(), Error>
    where
        T: IntoValue,
    {
        unsafe { RArray::from_value_unchecked(self.0.get()) }.store(offset, val)
    }
}

impl Ruby {
    // the new function can counjure a Ruby object from thin air, so has to
    // take &Ruby to prove it's on a Ruby thread
    pub fn typed_ary_new<T>(&self) -> TypedArray<T> {
        unsafe {
            // new array without a class so it is invisible to ObjectSpace
            let ary = rb_ary_hidden_new(0);
            TypedArray(NonZeroValue::new_unchecked(Value::new(ary)), PhantomData)
        }
    }
}

impl<T> IntoValue for TypedArray<T>
where
    T: IntoValue,
{
    fn into_value_with(self, handle: &Ruby) -> Value {
        self.to_array().as_value()
    }
}

Possibly the two ideas aren't mutually exclusive? TypedFrozenRArray has the advantage it can be shared between Rust and Ruby. I didn't dig too deep, so I'm not sure your exact use case.

As for the iterator, yeah, I think the performance of each has been a pain point for a few people. I think maybe the copy + hide or copy + freeze trick would be the way to go to make sure the iterator doesn't misbehave if the array is modified during iteration.

It's a bit wonky with the way Magnus's types are all Copy, but think it might make sense to implement IntoIterator rather than follow the iter(&self) pattern, as iter() implies you're getting references out of the iterator.

@matsadler
Copy link
Owner

I've added the TypedArray<T> I described above.

I'm not sure if that covers your use case?

@adampetro
Copy link
Contributor Author

Sorry for the delayed reply!

Thanks for adding TypedArray<T>. It is a nice addition and will certainly come in handy. I do think TypedFrozenRArray still has some benefit, specifically the fact that it be shared between Ruby and Rust as you mentioned. That being said, if you don't feel it is a worthwhile addition to Magnus, I completely understand.

In the most recent commit, I tried to address the feedback, including:

  • Switch to using IntoIterator. I did keep RArray::iter for performance sensitive use-cases, but made it unsafe to indicate the risks (does it merit unsafe or is explaining the risks in the docs enough?)
  • Remove TryConvert implementation for TypedFrozenRArray, so there's no implicit freeze. The behavior is moved to TryFrom<RArray> (is the freeze acceptable in that context?).

Thanks!

Copy link
Owner

@matsadler matsadler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with your Iter<T>::next it should fetch the len on every iteration. I would hope that the compiler knows that the len is behind a pointer which could be mutated at any time, so won't optimise out the repeated calls.

So I don't think the possibility of modifying an Array while also using RArray::iter should cause it to be marked unsafe as I don't think it could lead to a memory safety violation.

But as that len check is chasing a pointer on every iteration I wonder if making len a field of Iter and passing it in when construing an Iter would be able to avoid that repeated pointer following and win some performance. If you were to make that change then RArray::iter would need to be unsafe.


I didn't think of using TryFrom for getting a TypedArray from an RArray. I probably skipped over that option because I've moved away from using From/Into for creating Ruby types as they could be used from non-Ruby threads. That's not a worry in this case as it's converting from one Ruby type to another, so if you have a Ruby type to start you know you're on a Ruby thread.

I think it'd be best if it was consistent on how you get ahold of TypedArray and TypedFrozenArray, but I'm not strongly wedded to one approach over the other. We can swap the typecheck method for an implementation of TryFrom.

I feel inherent methods are a little more discoverable than trait implementations, what are your thoughts?

/// ```
pub fn typed_frozen_ary_new<T>(&self) -> TypedFrozenRArray<T>
where
T: TryConvert + Copy,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the extra + Copy bound here (and in a couple of other spots) if, instead of deriving, you manually implement Copy/Clone like:

impl<T> Copy for TypedFrozenRArray<T> {}

impl<T> Clone for TypedFrozenRArray<T> {
    fn clone(&self) -> Self {
        *self
    }
}

/// # Ruby::init(example).unwrap()
/// ```
fn into_iter(self) -> Self::IntoIter {
Iter::new(self.to_r_array())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This either needs to be frozen, or cast to an RArray (like unsafe { RArray::from_value_unchecked(self.0.get()) } without calling to_r_array. As soon as to_r_array is called the array becomes visible to Ruby via ObjectSpace.each_object (even if it's never returned to Ruby), so could be mutated to contain a non-T value, but this iterator is promising a T.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants