Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO/FS/read_line never ends when reading last line with bend run-c #638

Closed
ArthurMPassos opened this issue Jul 19, 2024 · 5 comments · Fixed by #700
Closed

IO/FS/read_line never ends when reading last line with bend run-c #638

ArthurMPassos opened this issue Jul 19, 2024 · 5 comments · Fixed by #700
Assignees
Labels
bug Something isn't working good first issue Good for newcomers prelude Bend's builtin types and functions

Comments

@ArthurMPassos
Copy link

ArthurMPassos commented Jul 19, 2024

Reproducing the behavior

It seems that the function have a problem reading the value from the last line, maybe because it cant find the "\n" it searches for.

Example of source file used, (there are 3 lines, the last line is not empty):

41
23
54

If I run this program using "bend runc-c":

def main:
  with IO:
    fd <- IO/FS/open("../datasets/sequence_value_issue.txt", "r")

    bytes <- IO/FS/read_line(fd)
    bytes <- IO/FS/read_line(fd)
    # bytes <- IO/FS/read_line(fd)

    txt = Bytes/decode_utf8(bytes)
    return txt

I get:

Result: "23"

If I uncomment the last read_line, calling read 3 times, the program seems to never end.

It seems it have other behaviors for "bend run" for example.

System Settings

  • HVM: 2.0.21
  • Bend: 0.2.36
  • OS: Windows 11 WSL 2 with Ubuntu 22.04.4 LTS
  • CPU: Intel i5-1235U
  • RAM: 32 GB -> 26GB to WSL

Additional context

I guess the change can be made in the builtin's. Could change the generic Bytes/split_once to a Bytes/split_next_line that uses a condition to check for '\n' and EOF, but I didn't quite catch how that would work here:

Bytes/split_once xs cond = (Bytes/split_once.go xs cond @x x)
  Bytes/split_once.go  List/Nil        cond acc = (Result/Err (acc List/Nil))
  Bytes/split_once.go (List/Cons x xs) cond acc =
    if  (== cond x) { # <====== Here could have something like: (+ ( == '\n' x) ( == EOF x))
      (Result/Ok ((acc List/Nil), xs))
    } else {
      (Bytes/split_once.go xs cond @y (acc (List/Cons x y)))
    }
def IO/FS/read_line(fd):
  return IO/FS/read_line.read_chunks(fd, [])

def IO/FS/read_line.read_chunks(fd, chunks):
  with IO:
    # Read line in 1kB chunks
    chunk <- IO/FS/read(fd, 1024)
    match res = Bytes/split_once(chunk, '\n'):
      case Result/Ok:
        (line, rest) = res.val
        (length, *) = List/length(rest)
        * <- IO/FS/seek(fd, to_i24(length) * -1, IO/FS/SEEK_CUR)
        chunks = List/Cons(line, chunks)
        bytes = List/flatten(chunks)
        return wrap(bytes)
      case Result/Err:
        line = res.val
        chunks = List/Cons(line, chunks)
        return IO/FS/read_line.read_chunks(fd, chunks)
@developedby developedby added bug Something isn't working prelude Bend's builtin types and functions labels Jul 20, 2024
@imaqtkatt
Copy link
Contributor

imaqtkatt commented Jul 23, 2024

I don't think this is a bug, in the unix standard every line should terminate with a newline character

@ArthurMPassos
Copy link
Author

Make sense, but in this case, using "bend run-c", it just runs forever consuming 100% of all CPU cores.

Maybe adding some checking and returning and error or some kind of ".../Nil" value could be more user friendly. But I can see how this checking for each line or character could significantly decrease performance for some use cases.

@developedby
Copy link
Member

I don't think this is a bug, in the unix standard every line should terminate with a newline character

But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?

@imaqtkatt
Copy link
Contributor

But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?

Yes, I think the problem here is that the Result/Err returned by split_once needs to be interpreted for both cases when we are reading a line with +1024 characters or when it reached EOF

@developedby developedby added the good first issue Good for newcomers label Aug 9, 2024
@coder3112
Copy link
Contributor

So, I think the simplest solution is just to just check if SEEK_CUR exceeds total file size(which can be a single call to the underlying File System.

@developedby developedby added this to the IO lib v0 milestone Aug 20, 2024
@kings177 kings177 assigned kings177 and LunaAmora and unassigned kings177 Aug 28, 2024
@LunaAmora LunaAmora linked a pull request Aug 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers prelude Bend's builtin types and functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants