- Fix typos and revise outdated paragraphs in vignettes.
The recoding and transformation functions get scoped variants, allowing to select variables based on logical conditions described in a function:
rec_if()
as scoped variant ofrec()
.dicho_if()
as scoped variant ofdicho()
.center_if()
as scoped variant ofcenter()
.std_if()
as scoped variant ofstd()
.split_var_if()
as scoped variant ofsplit_var()
.group_var_if()
andgroup_label_if()
as scoped variant ofgroup_var()
andgroup_label()
.recode_to_if()
as scoped variant ofrecode_to()
.set_na_if()
as scoped variant ofset_na()
.
- New function
remove_cols()
as alias forremove_var()
. std()
gets a new robust-option,robust = "2sd"
, which divides the centered variables by two standard deviations.- Slightly improved performance for
set_na()
.
frq()
now removes empty columns before computing frequencies, because applyingfrq()
on empty vectors caused an error.empty_cols()
andempty_rows()
(and hence,remove_empty_cols()
andremove_empty_rows()
) caused an error for data frames with only one column resp. row, or ifx
was a vector and no data frame.frq()
now removes missing values from input when weights are applied, to ensure that input and weights have same length.
- Breaking changes: The
append
-argument in recode and transformation functions likerec()
,dicho()
,split_var()
,group_var()
,center()
,std()
,recode_to()
,row_sums()
,row_count()
,col_count()
androw_means()
now defaults toTRUE
. - The
print()
-method fordescr()
now accepts adigits
-argument, to specify the rounding of the output. - Cross refences from
dplyr::select_helpers
were updated totidyselect::select_helpers
.
is_whole()
as counterpart tois_float()
.
frq()
now prints variable names for non-labelled data, adds variable names in braces for labelled data and omits the label column for non-labelled data.frq()
now prints mean and standard deviation in the header line of the output.frq()
now gets aauto.grp
-argument to automatically group variables with many unique values.frq()
now gets ashow.strings
-argument to omit string variables (character vectors) from being printed as frequency table.frq()
now gets agrp.strings
-argument to group similar string values in the frequency table.frq()
gets anout
-argument, to print output to console, or as HTML table in the viewer or web browser.descr()
gets anout
-argument, to print output to console, or as HTML table in the viewer or web browser.
is_empty()
returnedTRUE
for single vectors withNA
being the first element.- Fix issue where due to a bug during code cleanup,
remove_empty_rows()
did no longer remove empty rows, but columns.
- Revised examples that used removed methods from other packages.
- Use select-helpers from package tidyselect, instead of dplyr.
- Beautiful colored output for
frq()
,descr()
andflat_table()
.
rec()
now also recodes doubles with floating points, if a range of values is specified.std()
andcenter()
now useinclude.fac = FALSE
as default option.std()
gets arobust
-argument, to divide variables either by standard deviation, or - in case of asymmetrically distributed variables - median absolute deviation or Gini's mean difference.frq()
now shows total and valid N in output.
center()
,std()
,dicho()
,split_var()
andgroup_var()
did not work correctly for grouped data frames.frq()
did not print multiple variables when applied on grouped data frames.
- Arguments
as.df
andas.varlab
in functionfind_var()
are now deprecated. Please useout
instead. rotate_df()
preserves attributes.is_float()
is now exported as function.
- Fixed bug for
to_label()
, whenx
was a character vector and argumentdrop.levels
wasTRUE
.
- Fixed issue with latest tidyr-update on CRAN.
frq()
did not correctly calculate valid and cumulative percentages when using weights.
- All labelled-data functions were removed and are now in package sjlabelled.
remove_var()
as pipe-friendly function to remove variables from data frames.var_type()
as pipe-friendly function to determine the type of variables.all_na()
to check whether a vector only consists of NA values.rotate_df()
to rotate data frames (switch columns and rows).shorten_string()
, to shorten strings to a certain maxium number of chars.
- Following functions now also work on grouped data frames:
dicho()
,split_var()
,group_var()
,std()
andcenter()
. - Argument
groupcount
insplit_var()
,group_var()
andgroup_labels()
is now namedn
. - Argument
groupsize
ingroup_var()
andgroup_labels()
is now namedsize
. frq()
gets a revised print-method, which does not print the result to console when captured in an object (i.e.,x <- frq(x)
no longer prints the result).frq()
no longer prints (redundant) labels for factors w/o value label attributes.frq()
adds information about the variable type in the table caption (only for variables with variable labels).frq()
adds information about groups when printing grouped, non-labelled variables.descr()
now also prints information about the variable type.to_character()
now preserves variable labels.
- sjmisc now uses dplyr's tidyeval-approach to evaluate arguments. This means that the select-helper-functions (like
one_of()
orcontains()
) no longer need to be prefixed with a~
when used as argument within sjmisc-functions. - All labelled-data functions are now deprecated and will become defunct in future package versions. The labelled-data functions have been moved into a separate package, sjlabelled.
row_count()
to count specific values in a data frame per observation.col_count()
to count specific values in a data frame per variable.str_start()
andstr_end()
to find starting and end indices of patterns inside strings.
- The output for
frq()
now always includes aNA
-row, but no longer prints a value for theNA
-row. merge_imputations()
gets asummary
-argument to plot a graphical summary of the quality of the merging process.
add_columns()
andreplace_columns()
crashed R when no data frame was specified in...
-ellipses argument.descr()
andfrq()
used wrong variable labels when processing grouped data frames for specific situations, where the grouping variable had no sequences values.descr()
did not work for large data frames, because internally, becausepsych::describe()
switched to fast mode by default then (removing columns from the output).
- Argument
value
inset_na()
is deprecated. Please usena
instead. - Argument
recodes
inrec()
is deprecated. Please userec
instead. - Argument
lab
inset_label()
is deprecated. Please uselabel
instead. - Argument
value
inadd_labels()
andreplace_labels()
is deprecated. Please uselabels
instead. - Argument
value
inref_lvl()
is deprecated. Please uselvl
instead.
row_sums()
as wrapper ofrowSums()
to compute row sums, but works within pipe-workflow and with select-helpers for variables, and always returns a tibble..row_means()
as wrapper ofsjstats::mean_n()
to compute row means, but works within pipe-workflow and with select-helpers for variables, and always returns a tibble..%nin%
as complement to%in%
.
- Functions
rec()
,dicho()
,center()
,std()
,recode_to()
andgroup_var()
get anappend
-argument, to optionally return the original data including the transformed variables as new columns. var_labels()
andvar_rename()
now give a warning if specified variables to rename or relabel do not exist in the data frame. Non-matching variables are ignored.- If
model.term
does not exist in models,spread_coef()
now prints the name of non-existing coefficients. find_var()
gets afuzzy
-argument to enable fuzzy-matching for search pattern.
remove_empty_cols()
returned an empty data frame, when input data frame had no empty columns.remove_empty_rows()
returned an empty data frame, when input data frame had no empty rows.add_columns()
andreplace_columns()
in some cases coerced data frames of classdata.frame
with only one column into a vector, which gave an error when binding columns.- Argument
part.dist.match
instr_pos()
caused an error when being larger than 0.
- Re-exports
magrittr::%>%
(Bob Rudis said more packages should do so).
replace_columns()
to replace variables in one data frame with variables from other data frames.
descr()
gets amax.length
-argument to shorten variable labels in the output to a specific number of chars.descr()
now also reports the percentage of missing values.set_na()
no longer gives a warning when trying to replace values withNA
for vectors that completely containedNA
s.
descr()
now also works on single vectors as data argument.- Fixed bugs with
write_*()
-functions.
- Added package-vignettes.
- Functions were largely revised to work seamlessly within the tidyverse. This may break existing code, but in the long run, consistent api-design makes working with the package more intuitive. See
vignette("design_philosophy", "sjmisc")
for more details. as_labelled()
only converts vectors intolabelled
-class if vector has label attributes. This ensures that data can be properly saved into other formats, e.g. withwrite_spss()
.- The
write_*()
-functions have been revised and should now save data frame with any common classes of vectors (labelled, factor, numeric, atomic...).
center()
andstd()
are moving from packagesjstats
tosjmisc
.add_columns()
to bind columns of first data frame at the end of all data frames.
frq()
now has the same argument-structure asflat_table()
.- Following functions now follow a consistent tidyverse-approach, with the data being the first argument, followed by variable names:
add_labels()
,replace_labels()
,remove_labels()
,count_na()
,rec()
,dicho()
,split_var()
,drop_labels()
,fill_labels()
,group_var()
,group_labels()
,ref_lvl()
,recode_to()
,replace_na()
,set_na()
andset_labels()
. get_values()
now sorts returned values by default, to be consistent withget_labels()
.spread_coef()
gets argumentsse
andp.val
, to define whether standard errors and p-values should be included in the return value or not.
merge_df()
did not copy label attributes for data frame with identical variables (that were row-bound).to_value()
did not work for character vectors of class labelled, with empty values (which typically have no value label).
- The
sort.frq
did not workfrq()
.
zap_inf()
to "clean" vectors fromNaN
and infinite values.descr()
to provide basic descriptive statistics (similar todescribe()
in the psych-package), but including variable labels and usable in pipe-workflows. Also works with grouped data frames.
rec()
,split_var()
anddicho()
get an argumentsuffix
, to append a suffix to variable (column) names, if applied on a data frame.- Value labels in
rec()
can now directly be assigned inside therecodes
-syntax (see 'Details' in?rec
). find_var()
gets aas.df
-argument, to return a data frame with matching variables, instead of their column indices only.find_var()
gets aas.varlab
-argument, to return a "summary" data frame with column number, variable name and variable label.flat_table()
now also accepts grouped data frames.flat_table()
gets ashow.values
-argument, to add values to associated labels in output.frq()
now also accepts grouped data frames.frq()
gets aweight.by
-argument to weight frequencies.set_na()
can now also find values by their value labels and replace them with NA.set_na()
now removes unused value labels from values that have been replaced with NA.- The
as.tag
-argument inset_na()
now defaults toFALSE
. get_labels()
now always returns labels in sorted order of the associated values.get_labels()
gets adrop.unused
-argument, to automatically drop labels from values that don't occur in the vector.- For a named vector as
labels
-argument,set_labels()
now always sorts labels in sorted order of the associated values. is_empty()
gets afirst.only
-argument, to evaluate either first or all elements of a character vector.
set_na()
did not work on vectors of classDate
when argumentas.tag = TRUE
.flat_table()
did not show values that had no value labels. Now all categories are shown in the frequency table.rec()
did not properly copy labels of tagged NA values when not all recoded values appeared in the vector.frq()
did not show correct values, when value labels of a vector were not sorted according their values.set_labels()
did not set labels properly for ordered factors.remove_labels()
returned NA-values for value labels (instead of no value labels) when the last value label of a vector was removed.
find_var()
to find variables in data frames by name or label.var_labels()
as "tidyversed" alternative toset_label()
to set variable labels.var_rename()
to rename variables.
- Following functions now get an ellipses-argument
...
, to apply function only to selected variables, but return the complete data frame (thus, overwriting existing variables in a data frame, if requested):to_factor()
,to_value()
,to_label()
,to_character()
,to_dummy()
,zap_labels()
,zap_unlabelled()
,zap_na_tags()
.
- Fixed bug with copying attributes from tibbles for
merge_df()
. - Fixed wrong argument-description in docs of
frq()
.
- Removed package
coin
from Imports.
count_na()
to print a frequency table of tagged NA values.
set_na()
gets adrop.levels
argument to keep or drop factor levels of values that have been replaced with NA.set_na()
gets aas.tag
argument to set NA values as regular or tagged NA.
- sjmisc now supports tagged
NA
values, a new structure for labelled missing values introduced by the haven-package. This means that functions or arguments that are no longer useful, have been removed while other functions dealing with NA values have been largely revised. - All statistical functions have been removed and are now in a separate package, sjstats.
- Removed some S3-methods for
labelled
-class, as these are now provided by the haven-package. - Functions no longer check input for type
matrix
, to avoid conflicts with scaled vectors (that were recognized as matrix and hence treated as data frame). table(*, exclude = NULL)
was changed totable(*, useNA = "always")
, because of planned changes in upcoming R version 3.4.- More functions (like
trim()
orfrq()
) now also have data frame- or list-methods.
zap_na_tags()
to turn tagged NA values into regular NA values.spread_coef()
to spread coefficients of multiple fitted models in nested data frames into columns.merge_imputations()
to find the most likely imputed value for a missing value.flat_table()
to print flat (proportional) tables of labelled variables.- Added
to_character()
method. big_mark()
to format large numbers with big marks.empty_cols()
andempty_rows()
to find variables or observations with exclusively NA values in a data frame.remove_empty_cols()
andremove_empty_rows()
to remove variables or observations with exclusively NA values from a data frame.
str_contains()
gets aswitch
argument to switch the role ofx
andpattern
.word_wrap()
coerces vectors to character if necessary.to_label()
gets avar.label
anddrop.levels
argument, and now preserves variable labels by default.- Argument
def.value
inget_label()
now also applies to data frame arguments. - If factor levels are numeric and factor has value labels, these are used in
to_value()
by default. to_factor()
no longer generatesNA
orNaN
-levels when converting input into factors.
rec()
did not recode values, when these were the first element of a multi-line string of therecodes
argument.is_empty()
returnedNA
instead ofTRUE
for empty character vectors.- Fixed bug with erroneous assignment of value labels to subset data when using
copy_labels()
(#20)