Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing the data types to something less internal #12

Open
nikonikoniko opened this issue Sep 26, 2023 · 4 comments
Open

Consider changing the data types to something less internal #12

nikonikoniko opened this issue Sep 26, 2023 · 4 comments

Comments

@nikonikoniko
Copy link
Member

nikonikoniko commented Sep 26, 2023

Screenshot 2023-09-26 at 15 23 23

In this column, the data types are named by their internal validation function for clojure.spec. As these are relatively arbitratily named, I would suggest changing them out for something 1) used more widely and 2) further abstracted 3) reveals less about inner implementation.

One such solution would be to use the Typescript standard, as that is becoming a very widely used type system.

the changes would look like this

inner implementation typed reference as multiple
uuid-string uuid uuid[]
strings<->uuids uuid[] uuid[]
single-string string string[]
cell-list string[] string[]
YN<->bool boolean boolean[]
string-date<->timestamp timestamp timestamp[]
status "3" \ "2" \ "1" Array<"3" \ "2" \ "1">

optional values get added question marks. string?, string[]?, etc

@nikonikoniko
Copy link
Member Author

Then, we can talk about how a boolean is represented as "Y" | "N" in the spreadsheet, or as a boolean in ingested data

@tlongers
Copy link
Member

Yeah, agree. Or even have a human readable version, like I report in the detailed section: uuid-strings -> String, formatted in UUID format.

I think there's a good case for explaining what different data types are (strings, integers, bools), the related validators and what we use them for. For some readers, this may be the first time anyone's done that for them!

@nikonikoniko
Copy link
Member Author

There's a greater issue there of data-types in the sheet vs data-types in the intermediate system vs data-types in json exports.

Clojure/Script has a uuid data type, for example, where it is a string in the sheet and a string in the export. Similarly, dates have multiple representations along the chain.

They do have predictable transformations, though.

@tlongers
Copy link
Member

I think there are three audiences, with various overlapping interests in the docs:
A) developers, who want to know how type handling should be managed;
B) researchers: likely to use the model in practice, and will have interest what they need to do for data validation/integrity/quality; and,
C) downstream users of product derived from our data: assessment of the meaning of a field and the rules used to create it, as an aid to understanding.

Stuff written for B and C is generally useful to A, but stuff for A less so for B and C!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants