Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function slice start end = take (end − start) . drop start? #576

Open
kindaro opened this issue Mar 17, 2024 · 6 comments
Open

Add a function slice start end = take (end − start) . drop start? #576

kindaro opened this issue Mar 17, 2024 · 6 comments

Comments

@kindaro
Copy link
Contributor

kindaro commented Mar 17, 2024

Many languages offer a feature that lets you take a substring by indices real quick. It is handy. Here is an example from JavaScript:

> "Hello World".slice (6, 11)
'World'

One frequent use case is when you have a parse tree annotated with locations and you want to recover the literal source of a given syntactic element. With slice, this is done at once.

Can we add such a function to text? Maybe it can have a more efficient implementation than take … . drop …?

@Bodigrim
Copy link
Contributor

When I encounter slice :: Int -> Int -> f a -> f a, I never remember whether the second Int is the number of elements to take or the index of the final element to take: is it slice start len or slice start end? take ... . drop ... does not pose an ambiguity and is only marginally longer.

However, if we extend slice to allow negative numbers mimicking Python, it would offer asymptotic improvements. Currently take (length xs - 2) . drop 1 is O(n) because length is O(n), but slice 1 (-1) could be implemented in O(1).

@Lysxia
Copy link
Contributor

Lysxia commented Mar 17, 2024

Currently take (length xs - 2) . drop 1 is O(n) because length is O(n)

We can use dropEnd for this.

@kindaro
Copy link
Contributor Author

kindaro commented Mar 17, 2024

When I encounter slice :: Int -> Int -> f a -> f a, I never remember whether the second Int is the number of elements to take or the index of the final element to take: is it slice start len or slice start end? take ... . drop ... does not pose an ambiguity and is only marginally longer.

I agree, this is terrible. But convenience also matters. slice is convenient.

  • We should do what Python and JavaScript do. 9 out of 10 programmers have experience with these. Python and JavaScript do absolute indexing, so we should do absolute indexing.
  • A Text is conceptually an array. The simplest thing slice could do is take two indices within the bounds of the array and return a new appropriate array. It would not be possible with C strings but it is possible with Text — we should celebrate this possibility.
  • drop … . take … is list thinking, not array thinking. Or wait, was it take … . drop …?

It hurts me that there is no function out of the box that does what text is good for — taking substrings in constant time space.

@Lysxia
Copy link
Contributor

Lysxia commented Mar 17, 2024

what text is good for — taking substrings in constant time.

You can't take a substring of a UTF-8 string in constant time, because characters have encodings of variable lengths. In Python, strings are byte-indexed, i.e., they're really ByteString.

Haskell's vector has a slice that takes start and length instead of start and end.

@kindaro
Copy link
Contributor Author

kindaro commented Mar 18, 2024

Yes, of course. I meant «constant space» but I guess I was sleepy.

  • For a Haskell String, if you want to get a tail, you can do it in constant space, but you cannot take a general sub-list in constant space.
  • For a null terminated C string, again you cannot get a sub-string without copying because it must be null terminated.

Text is different. It stores a pointer to an array and start and end indices. It is in a special position here that makes slicing space efficient — the space cost of a slice is a pointer and two numbers. It makes sense to expose this special feature.

@meooow25
Copy link
Contributor

meooow25 commented Mar 21, 2024

If we add slice, users of Python or Javascript will be surprised to find it to be O(n) instead of the O(1) they expect. index already poses this problem.
In my opinion, Python and Javascript provide a reason to not add this function, because it would be misleading.

Not saying that this is enough reason to reject slice outright, of course.


In Python, strings are byte-indexed, i.e., they're really ByteString.

It's a little more complicated. Python strings are arrays of whatever the the largest code point in the string is: PEP-393

In Javascript, Strings are sequences of UTF-16 code units, which means slice arguably does the wrong thing if you want to slice on Unicode code points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants