-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert function #6
Comments
Hello Pyry, I get no error converting daF3840_fin.sav to DDI Codebook, the .xml file is produced just fine. To replicate that error, I would need the output from the following two commands: I also get no error converting the (Finnish version of the) metadata plus the .csv file, the result is a dataset containing 1132 rows and 78 variables: fsd <- convert("/Users/dusadrian/Downloads/FSD3840/Study/Data/daF3840_fin.xml")
head(fsd$q23)
<declared<numeric>[6]> [q23] Kuinka tyytyväinen olet demokratian toimivuuteen Suomessa?
[1] 3 3 3 3 3 4
Labels:
value label
0 En osaa sanoa
1 En lainkaan tyytyväinen
2 En kovinkaan tyytyväinen
3 Melko tyytyväinen
4 Erittäin tyytyväinen
9 En halua sanoa In your R command, specifying the Specific to the function Additionally, as the notes element is repeatable, there might even be multiple notes inside the codeBook/fileDescr element, but only that specific one is read and interpreted as a dataset. The other one(s) are read as text, and dedicated functions can be written to parse that text and get interpreted in various ways. There are examples and tests converting to DDI Codebook and back to R, more specifically lines 56-57. It is just difficult to insert such examples in the appropriate section of the documentation, making the CRAN checks very heavy. |
Thank you for your response and confirming that I was not completely on the wrong track when I attempted to use the function. However, I couldn't replicate your code, I continued to get the same error:
Here's my
(DDIwR_0.18.4 refers to the latest version available here on Github in master branch) |
That error is most likely triggered by package admisc, which is some minor versions newer than CRAN's 0.35. So just to make sure, could you please re-install all from GitHub? I use: remotes::install_github("dusadrian/admisc")
remotes::install_github("dusadrian/declared")
remotes::install_github("dusadrian/DDIwR") If the error still persists, perhaps a |
Thank you, that was it!
It is actually now that I got to play with the function hands-on that I realised that without However, while I could run the line fsd <- convert("/Users/<username>/Downloads/FSD3840/Study/Data/daF3840_fin.xml") with no problem when I used convert("/Users/<username>/Downloads/FSD3840/Study/Data/daF3840_fin.sav", to = "DDI", embed = TRUE) I got the same error message as before when I had set convert("/Users/<username>/Downloads/FSD3840/Study/Data/daF3840_fin.sav", to = "DDI", embed = FALSE)
> fsd <- convert("/Users/<username>/Downloads/FSD3840/Study/Data/daF3840_fin.xml")
Error in which(sapply(elements, function(x) { :
argument to 'which' is not logical
3.
which(sapply(elements, function(x) {
xml_attr(x, "ID") == "rawdata" && grepl("serialized", xml_attr(x,
"subject"))
})) at internals.R#570
2.
extractData(xml) at convert.R#286
1.
convert("/Users/<username>/Downloads/FSD3840/Study/Data/daF3840_fin.xml")
The .xml file and the .csv file were in the same folder and had the same name as the convert function had generated them both. |
Good catch, I think now it works alright. |
Thank you, the latest commits made everything work! This is great. Returning to the first message in this issue, in the end what I actually wanted to do was to use the I had two hurdles:
|
I see. I guess a more descriptive error message could be introduced (see the latest commit), where the user is now more informed as to why the .csv file does not match the DDI .xml codebook. Any suggestion is welcome, I agree it doesn't sound ideal but I don't know any other way to avoid comparing apples and oranges. |
I agree it's a good idea to keep the identical function as tight as possible. More informative error messages are always very welcome in my opinion.
Lines 352 to 358 in 6f2ee52
would be taken out of the if (ncol(data) == length(variables)) {
if (header) {
if (!identical(names(data), names(variables))) {
if (identical(tolower(names(data)), tolower(names(variables)))){
names(data) <- tolower(names(variables))
message("The variable names have have been translated to lower case to ensure equality between data and DDI codebook")
} else {
admisc::stopError("The .csv file does not match the DDI Codebook")
}
}
} else {
names(data) <- names(variables)
}
} The risk of this kind of comparison accidentally ruining the dataset labels is very slim, non-existent even. I guess line names(data) <- tolower(names(variables)) could just be names(data) <- names(variables) meaning that the labels in the DDI codebook take precedence over the csv names. |
The latest commit now accepts an R data.frame as input for the "csv" argument, I think this is a decent addition: csv <- read.csv("/Users/dusadrian/Downloads/FSD3840/Study/Data/daF3840_fin.csv")
# (use your own delimiters, etc.)
fsd <- convert("/Users/dusadrian/Downloads/FSD3840/Study/Data/daF3840_fin.xml", csv = csv) Your other suggestion to lower the case for all variables, I think it is not a good idea. Again, the point is less to compare SPSS upper case variables with SPSS lower case variables, but also to compare R (mixed) lower and upper case variable names. Something like: foo <- data.frame(aa = 1:2, aA = 1:2, Aa = 1:2)
# aa aA Aa
# 1 1 1 1
# 2 2 2 2 To R, these are three different variables, unlike SPSS and Stata. Lowering the case for all three variables would create a mess, and there is no guarantee such a situation is impossible in real life. DDI is not only about SPSS, and R is case sensitive... |
Yes, it's true that R makes the distinction between aa and aA and Aa. I think it's not a good convention to data.frames with such names (and actually, wouldn't a csv file / data.frame with such names be impossible to label with a DDI file?) but since it's possible in R your point is very much valid. I think reading the csv file separately and then making sure that the names match is the most elegant solution in this case. Again, thank you for your work and accommodating my requests. Actually I think the work done within the scope of this issue might help solve issues #3 and #4 in the future as well. I might have some feature requests related to combining csv and DDI files but I'll make a separate issue / PR if need arises. |
You're more than welcome, please keep the suggestions coming. |
Hi, in the internal extractData function
DDIwR/R/internals.R
Line 558 in 6a24c63
more specifically in the lines 561-577
DDIwR/R/internals.R
Lines 561 to 577 in 6a24c63
the function attempts to find data from xpath
"/d1:codeBook/d1:fileDscr/d1:notes"
. In my file variableelements
produces{xml_nodeset (0)}
(list of 0) and consequently variableattrs
is an empty listlist()
(list of 0). When the function proceeds to lines 570-573DDIwR/R/internals.R
Lines 570 to 573 in 6a24c63
It is a problem since my file only has
notes
elements underdocDscr
and notfileDscr
, containing data like this:so I'm wondering if my file in specific is not up to spec or if this function generally has hard-coding in it that prevents efficient handling of files. When debugging something like this it could be helpful if the package had some example files that could be used for testing and validation.
For context, what I'm attempting to do here is to see whether
DDIwR
could be used to provide labels for a dataset in.csv
format. I have an openly available dataset that has data both in.csv
and SPSS's.sav
format, alongside the DDI 2.5 XML-file. I would like to try to replicate the.sav
file usingDDIwR
package to see if DDI format would be a useful format for providing metadata with other types of datasets where such neat arrangement is not available. Mainly for datasets that have data only in.csv
format (or a.sav
file or similar with no metadata) and metadata resides in a separate data resource catalogue.My function code looked like the following:
It could be that since the function examples covered only cases where "SPSS file called test.sav" was converted to other formats my use of this function was not as intended? As stated earlier, since there is no test data provided with the package, I couldn't really run the functions. However, I couldn't run the case that was more in line with the examples either:
I got
Error: 'x' should contain at least one (possibly) numeric value.
as error message. The traceback looked like this:Am I using the function wrong? Or is there a problem in the code?
The dataset I used for testing is FSD3840 Citizen's Pulse 10/2023 and the data and metadata can be freely downloaded in Finnish Social Science Data Archive website: https://services.fsd.tuni.fi/catalogue/FSD3840?tab=download&study_language=en&lang=en
(you need to use the Finnish metadata file as the English metadata file only has general dataset descriptions and no
fileDscr
for example)The text was updated successfully, but these errors were encountered: