Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writesas #38

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open

Writesas #38

wants to merge 61 commits into from

Conversation

JanMarvin
Copy link
Owner

@JanMarvin JanMarvin commented Jan 8, 2023

Rebased the initial work (last commit was in July 2021) on #14 onto current main.
I do not remember anything, therefore the entire approach is HIGHLY EXPERIMENTAL.

  • writing mtcars and iris as sas7bdat files works (checked this with SAS studio)
  • reading these in readsas works (but a lot of debug information is printed)
  • writing a subset from pivottabler::bhmtrains (first 4k rows with trimmed names) works

ToDo

The following is a simple checklist. Marked with * are tasks that are unlikely to be implemented.

  • check that writing various cell types works as expected
    • integer
    • numeric
    • date
    • posix
    • character
    • factor (as character)
    • missings
  • write multiple pages (iirc readsas currently only supports writing a single page)
  • write wide datasets. right now a row must fit on a page and page 1 must contain data. With wide datasets sas splits rows over pages and sas might create a page without data.* (it requires more research and I don't have a use case for this aka I don't need it)
  • compression*
  • writing 32/64 bit files* (initially I wanted to use 32 bit files because they are much smaller and therefore testing the file format would be easier. unfortunately they did not seem to work)
  • files work with current presumably Unicode aware sas. They don't work with latin1 windows and will throw a YZR code bug for unknown reasons. Maybe one of the very many unks defines the encoding.
    I assume that SASYZR is the default compression algorithm used by SAS. This probably is related to case 8, where c8vec contains some kind of column integer information mixed together with a few blanks. Maybe it is possible to disable compression entirely for a file.
  • test limits, number of columns/rows* (with a side matrix there's an overflow in some position seeking, which causes 4gb files)
  • labels (already working? they only blow up page 1 and I don't have a use case for them)
  • various file metadata
  • check against SAS
    ...
# in this data frame 485 is almost a full page writing more corrupts the output
dat <- pivottabler::bhmtrains
# dat <- dat[1:485,]

write.sas(dat, "/tmp/trains.sas7bdat")
read.sas("/tmp/trains.sas7bdat")

ETA

Whenever I find time to implement things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant