Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writesas #38

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
87bd90e
initial attemts at writing sas7bdat files
JanMarvin Jun 7, 2020
cc5e16b
write parts of the header in a way that I can read it afterwards
JanMarvin Jun 7, 2020
8c7b053
valid header, page 1 is broken
JanMarvin Jun 7, 2020
c339c71
more debug
JanMarvin Jun 7, 2020
ef7bb9c
write a file that kinda can be read
JanMarvin Jun 7, 2020
ac45d08
all cases & cleanups
JanMarvin Jun 7, 2020
2bc5530
debug information
JanMarvin Jun 7, 2020
34c33a2
re-order many things
JanMarvin Jun 8, 2020
9f6e10a
my first SAS file
JanMarvin Jun 8, 2020
7be5467
case 3 for every case
JanMarvin Jun 8, 2020
0370162
BLOCK_COUNT is relative to n and SUBHEADER_COUNT
JanMarvin Jun 8, 2020
76e1614
correct a few SH_LEN
JanMarvin Jun 8, 2020
b74432c
work on subheaders
JanMarvin Jun 8, 2020
0634fde
my first SAS readable sas file
JanMarvin Jun 15, 2020
3b0c902
write a longer file
JanMarvin Jun 15, 2020
ab565e1
sync reader & writer
JanMarvin Jun 20, 2020
4680c2a
write a first label
JanMarvin Jun 20, 2020
b02dff5
cleanups. writes sas7bdat identical to sas
JanMarvin Jun 20, 2020
340b9ea
fixes for k > 1
JanMarvin Jun 20, 2020
1d56be3
fix columname size for SAS
JanMarvin Jun 20, 2020
008e93c
fix writing the iris dataset
JanMarvin Jun 20, 2020
96318a7
initial attempt at write option bit32 for i686 files
JanMarvin Jun 21, 2020
c22fe55
playing around with 32bit
JanMarvin Jun 21, 2020
a771503
sync 64 and 32 bit read
JanMarvin Jun 21, 2020
e9c6099
add test: writing numeric and characters works
JanMarvin Jan 8, 2023
806423f
silence read.sas debug (not useful anymore?)
JanMarvin Jan 8, 2023
2933a47
new year
JanMarvin Jan 8, 2023
24c18f9
lintr
JanMarvin Jan 8, 2023
8129ab4
windows check
JanMarvin Jan 8, 2023
c234bca
write date and datetime
JanMarvin Jan 10, 2023
db4dd34
write date9 and datetime22
JanMarvin Jan 10, 2023
d18cc82
write NA
JanMarvin Feb 5, 2023
811c055
fix varname size
JanMarvin Feb 5, 2023
97b2d35
write the pivottabler::bhmtrains dataset
JanMarvin Feb 19, 2023
98b2a11
silence the entire output
JanMarvin Feb 19, 2023
5e9ac3f
a bit trial and error, but 32bit is still not working
JanMarvin Feb 20, 2023
1faaf96
attempt to fix writing character formats
JanMarvin Feb 21, 2023
51183cf
warn if position is wrong.
JanMarvin Feb 21, 2023
a6dadba
silence
JanMarvin Feb 21, 2023
8190ce6
fix format for characters
JanMarvin Feb 21, 2023
2053ebf
another attempt to fix the character format
JanMarvin Feb 21, 2023
ac81971
better positioning?
JanMarvin Feb 21, 2023
fe29598
add varlabels
JanMarvin Feb 21, 2023
bc78511
minor adjustments
JanMarvin Feb 22, 2023
2d0f0fe
toy around
JanMarvin Feb 22, 2023
ad0113c
make size optional
JanMarvin Dec 1, 2023
9641ae2
convert matrix to data frame
JanMarvin Dec 1, 2023
881ace2
fix typo
JanMarvin Dec 1, 2023
2ead104
cleanup and comment
JanMarvin Dec 1, 2023
e29d02a
lintr fixes
JanMarvin Dec 2, 2023
a11602f
roxygen update
JanMarvin Dec 2, 2023
b03532b
try to write a big endian file
JanMarvin Dec 2, 2023
41b771e
write timestamp and restore datetime test
JanMarvin Dec 2, 2023
5dbc8bc
update roxygen
JanMarvin Dec 2, 2023
32beda1
encoding should probably be the same as the SAS encoding
JanMarvin Dec 2, 2023
729ffb3
note on password
JanMarvin Dec 2, 2023
d451e58
pick a known pageseqnum32
JanMarvin Sep 17, 2024
fd0b1bc
fix lintr
JanMarvin Sep 17, 2024
28a0178
research case 8
JanMarvin Sep 17, 2024
a941e35
test with disabled compression
JanMarvin Sep 17, 2024
f692c29
Revert "research case 8"
JanMarvin Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export(convert_to_date)
export(convert_to_datetime)
export(convert_to_time)
export(read.sas)
export(write.sas)
import(Rcpp)
importFrom(stringi,stri_encode)
importFrom(utils,download.file)
Expand Down
15 changes: 15 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,18 @@ readsas <- function(filePath, debug, selectrows_, selectcols_, empty_to_na, temp
.Call(`_readsas_readsas`, filePath, debug, selectrows_, selectcols_, empty_to_na, tempstr)
}

#' Writes the binary SAS file
#'
#' @param filePath The full systempath to the sas7bdat file you want to export.
#' @param dat an R-Object of class data.frame.
#' @param compress the file
#' @param debug print debug information
#' @param bit32 write smaller 32 bit file
#' @param headersize,pagesize size parameters 512 / 1024 times x
#' @param dateval timestamp
#' @keywords internal
#' @noRd
writesas <- function(filePath, dat, compress, debug, bit32, headersize, pagesize, dateval, encoding32) {
invisible(.Call(`_readsas_writesas`, filePath, dat, compress, debug, bit32, headersize, pagesize, dateval, encoding32))
}

12 changes: 12 additions & 0 deletions R/readsas.R
Original file line number Diff line number Diff line change
Expand Up @@ -284,3 +284,15 @@ convert_to_datetime <- function(x) {
convert_to_time <- function(x) {
format(convert_to_datetime(x), format = "%H:%M:%S")
}


as_date <- function(x) {
as.vector(
julian(x, as.Date("1960-1-1", tz = "UTC"))
)
}

as_datetime <- function(x) {
# From SAS blog: Number of seconds between 01JAN1960 and 01JAN1970: 315619200
as.numeric(x) + 315619200
}
111 changes: 111 additions & 0 deletions R/writesas.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#' write.sas
#'
#'@author Jan Marvin Garbuszus \email{jan.garbuszus@@ruhr-uni-bochum.de}
#'
#'@param dat data frame to save
#'@param filepath path to save file to
#'@param compress option to compress file
#'@param debug print debug information
#'@param bit32 write 32bit file
#'@param varlabels optional variable labels
#'@param size optional header/pagesize
#'@param encoding encoding 62 = windows, 20 = utf
#'
#'@useDynLib readsas, .registration=TRUE
#'
#'@export
write.sas <- function(dat, filepath, compress = 0, debug = FALSE, bit32 = FALSE,
varlabels, size, encoding = 20) {

filepath <- path.expand(filepath)

if (missing(varlabels)) {
varlabels <- rep("", ncol(dat))
}

if (missing(size)) {

hsize <- 65536
psize <- 65536

if (bit32) {
hsize <- 1024
psize <- 8192
}

size <- c(hsize, psize)
} else if (length(size) != 2) {
size <- c(size[1], size[1])
}

if (!inherits(dat, "data.frame")) {
dat <- as.data.frame(dat)
}

# convert from factor
ff <- sapply(dat, is.factor)
dat[ff] <- lapply(dat[ff], as.character)

vartypes <- sapply(dat, is.character) + 1
colwidth <- sapply(dat, function(x) max(nchar(x)))
colwidth[vartypes == 1] <- 8

labels <- "testlab"

vartypen <- sapply(dat, is.numeric)

formats <- NA
formats[vartypen] <- "BEST"
formats[!vartypen] <- ""

width <- 0
width[vartypen] <- 32 # fix for now
width[!vartypen] <- 0 # colwidth[!vartypen]

decim <- sapply(dat, is.integer)
decim[!vartypen] <- TRUE

is.Date <- function(x) inherits(x, "Date")
is.POSIX <- function(x) inherits(x, "POSIXt")

vartypen <- sapply(dat, is.Date)
dat[vartypen] <- lapply(dat[vartypen], as_date)
formats[vartypen] <- "DATE"
width[vartypen] <- 9

vartypen <- sapply(dat, is.POSIX)
dat[vartypen] <- lapply(dat[vartypen], as_datetime)
formats[vartypen] <- "DATETIME"
width[vartypen] <- 22
decim[vartypen] <- 3

if (debug) {
message("vartypes")
print(vartypes)
message("colwidth")
print(colwidth)
message("formats")
print(formats)
message("width")
print(width)
message("decim")
print(decim)
message("labels")
print(labels)
}

# for numerics
# formats <- rep("BEST", ncol(dat))

attr(dat, "vartypes") <- as.integer(vartypes)
attr(dat, "colwidth") <- as.integer(colwidth)
attr(dat, "formats") <- formats
attr(dat, "width") <- width
attr(dat, "decim") <- decim
attr(dat, "labels") <- labels
attr(dat, "varlabels") <- varlabels

writesas(filepath, dat, compress = 0, debug = debug, bit32 = bit32,
headersize = size[1], pagesize = size[2],
dateval = as_datetime(Sys.time()), encoding32 = encoding)
}
40 changes: 40 additions & 0 deletions man/write.sas.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,28 @@ BEGIN_RCPP
return rcpp_result_gen;
END_RCPP
}
// writesas
void writesas(const char * filePath, Rcpp::DataFrame dat, uint8_t compress, bool debug, bool bit32, int32_t headersize, int32_t pagesize, double dateval, int32_t encoding32);
RcppExport SEXP _readsas_writesas(SEXP filePathSEXP, SEXP datSEXP, SEXP compressSEXP, SEXP debugSEXP, SEXP bit32SEXP, SEXP headersizeSEXP, SEXP pagesizeSEXP, SEXP datevalSEXP, SEXP encoding32SEXP) {
BEGIN_RCPP
Rcpp::RNGScope rcpp_rngScope_gen;
Rcpp::traits::input_parameter< const char * >::type filePath(filePathSEXP);
Rcpp::traits::input_parameter< Rcpp::DataFrame >::type dat(datSEXP);
Rcpp::traits::input_parameter< uint8_t >::type compress(compressSEXP);
Rcpp::traits::input_parameter< bool >::type debug(debugSEXP);
Rcpp::traits::input_parameter< bool >::type bit32(bit32SEXP);
Rcpp::traits::input_parameter< int32_t >::type headersize(headersizeSEXP);
Rcpp::traits::input_parameter< int32_t >::type pagesize(pagesizeSEXP);
Rcpp::traits::input_parameter< double >::type dateval(datevalSEXP);
Rcpp::traits::input_parameter< int32_t >::type encoding32(encoding32SEXP);
writesas(filePath, dat, compress, debug, bit32, headersize, pagesize, dateval, encoding32);
return R_NilValue;
END_RCPP
}

static const R_CallMethodDef CallEntries[] = {
{"_readsas_readsas", (DL_FUNC) &_readsas_readsas, 6},
{"_readsas_writesas", (DL_FUNC) &_readsas_writesas, 9},
{NULL, NULL, 0}
};

Expand Down
21 changes: 20 additions & 1 deletion src/readsas.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Rcpp::List readsas(const char * filePath,
const bool empty_to_na,
std::string tempstr)
{

std::ifstream sas(filePath, std::ios::in | std::ios::binary | std::ios::ate);
auto sas_size = sas.tellg();
if (sas) {
Expand Down Expand Up @@ -503,11 +504,24 @@ Rcpp::List readsas(const char * filePath,

pageseqnum[pg] = pageseqnum32;

if (debug) {
Rcout << "pageseqnum: " << pageseqnum32 << std::endl;
Rcout << unk1 << " " << unk2 << " " << PAGE_DELETED_POINTER_LENGTH << std::endl;
}

PAGE_TYPE = readbin(PAGE_TYPE, sas, swapit);
BLOCK_COUNT = readbin(BLOCK_COUNT, sas, swapit);
SUBHEADER_COUNT = readbin(SUBHEADER_COUNT, sas, swapit);
unk16 = readbin(unk16, sas, swapit);

if(debug) {
Rcout << "PAGE_TYPE: " << PAGE_TYPE <<
" - BLOCK_COUNT: " << BLOCK_COUNT <<
" - SUBHEADER_COUNT: " << SUBHEADER_COUNT << std::endl;
}

// Rcout << sas.tellg() << std::endl;

page_type.push_back(PAGE_TYPE);

rowsperpage[pg] = BLOCK_COUNT - SUBHEADER_COUNT;
Expand Down Expand Up @@ -709,8 +723,11 @@ Rcpp::List readsas(const char * filePath,
unk64 = readbin(unk64, sas, swapit);
if (debug) Rcout << unk64 << std::endl;
pgsize = readbin(pgsize, sas, swapit);
if (debug) Rcout << "pgsize " << pgsize << std::endl;
unk64 = readbin(unk64, sas, swapit);
rcmix = readbin(rcmix, sas, swapit);
if (debug)
Rcout << "rcmix " << rcmix << std::endl;

/* next two indicate the end of the initial header ? */
unk64 = readbin(unk64, sas, swapit);
Expand Down Expand Up @@ -756,6 +773,7 @@ Rcpp::List readsas(const char * filePath,
unk16 = readbin(unk16, sas, swapit); // padding

pgc = readbin(pgc, sas, swapit);
if (debug) Rcout << "pgc: " << pgc << std::endl;

unk16 = readbin(unk16, sas, swapit); // val ?
unk16 = readbin(unk16, sas, swapit); // padding
Expand Down Expand Up @@ -926,7 +944,8 @@ Rcpp::List readsas(const char * filePath,
unk32 = readbin(unk32, sas, swapit);
if (debug) Rcout << unk32 << std::endl;
rcmix = readbin((int32_t)rcmix, sas, swapit);
if (debug) Rcout << "rcmix " << rcmix << std::endl;
if (debug)
Rcout << "rcmix " << rcmix << std::endl;
uunk32 = readbin(uunk32, sas, swapit);
if (debug) Rcout << uunk32 << std::endl;
uunk32 = readbin(uunk32, sas, swapit);
Expand Down
37 changes: 34 additions & 3 deletions src/sas.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,36 @@

#include "swap_endian.h"

// #define NA_DOUBLE 00 00 00 00 00 FE FF FF

template <typename T>
inline void writebin(T t, std::fstream& sas, bool swapit)
{
if (swapit==1){
T t_s = swap_endian(t);
sas.write((char*)&t_s, sizeof(t_s));
} else {
sas.write((char*)&t, sizeof(t));
}
}


inline void write_case_i(uint32_t f1, uint32_t f2, bool ENDIANNES, bool bit32, bool swapit, std::fstream& sas) {

// with ENDIANNES == 0 aka big endian
// f2 is written first. So this requires an additional swap
if (ENDIANNES == 0) {
writebin(f2, sas, swapit);
if (bit32 == 0)
writebin(f1, sas, swapit);
} else if (ENDIANNES == 1) {
writebin(f1, sas, swapit);
if (bit32 == 0)
writebin(f2, sas, swapit);
}

}

inline void writestr(std::string val_s, int32_t len, std::fstream& sas)
{

Expand Down Expand Up @@ -62,6 +92,7 @@ T readbin( T t , std::istream& sas, bool swapit)
return(swap_endian(t));
}

inline
double readbinlen(double d, std::istream& sas, bool swapit, int len)
{

Expand Down Expand Up @@ -489,7 +520,7 @@ inline std::string SASEncoding(uint8_t encval) {
return enc;
}

std::vector<int64_t> vec_order(const std::vector<int64_t> &v) {
inline std::vector<int64_t> vec_order(const std::vector<int64_t> &v) {
std::vector<int64_t> idx(v.size());
iota(idx.begin(), idx.end(), 0);
stable_sort(idx.begin(), idx.end(),
Expand All @@ -499,7 +530,7 @@ std::vector<int64_t> vec_order(const std::vector<int64_t> &v) {
}

// order only the valid options
std::vector<int64_t> order_(std::vector<int64_t> v) {
inline std::vector<int64_t> order_(std::vector<int64_t> v) {
// if (std::count(v.begin(), v.end(), -1)) {
// std::vector<int64_t> idx(v.size());
// iota(idx.begin(), idx.end(), -1);
Expand All @@ -526,7 +557,7 @@ std::vector<int64_t> order_(std::vector<int64_t> v) {
// }
}

bool any_keepr(Rcpp::IntegerVector rvec, uint64_t idx) {
inline bool any_keepr(Rcpp::IntegerVector rvec, uint64_t idx) {
return std::find(rvec.begin(), rvec.end(), idx) != rvec.end();
}

Expand Down
Loading
Loading