JSON Schema Inference #216
-
I was trying to use recap today with some JSON data. I wanted to infer the schema from a JSON file and then use that as a way to navigate the recap types. See the JSON I used here:
after doing a
There are fields missing obviously but I don't understand why. Also, if the JSON was stored in the file with newlines, recap was failing to parse it with a colorful but not helpful error produced by frictionless. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I believe things are working now. I've made a bunch of changes since you ran your command (notably, implementing Recap's type spec in the Python library). Can you try re-running? One thing to clarify: Recap currently only handles newline delimited JSON (akak Anyway, I just ran: pdm run recap schema /tmp/recap-test/test-subdir/bar.json Against:
And got: {
"type": "struct",
"fields": [
{
"name": "anonymousId",
"type": "string32"
},
{
"name": "channel",
"type": "string32"
},
{
"name": "context",
"type": "struct",
"fields": [
{
"name": "campaign",
"type": "struct",
"fields": [
{
"name": "name",
"type": "string32"
},
{
"name": "source",
"type": "string32"
},
{
"name": "medium",
"type": "string32"
},
{
"name": "term",
"type": "string32"
},
{
"name": "content",
"type": "string32"
}
]
},
{
"name": "library",
"type": "struct",
"fields": [
{
"name": "name",
"type": "string32"
},
{
"name": "version",
"type": "string32"
}
]
},
{
"name": "locale",
"type": "string32"
},
{
"name": "page",
"type": "struct",
"fields": [
{
"name": "path",
"type": "string32"
},
{
"name": "initial_referrer",
"type": "string32"
},
{
"name": "initial_referring_domain",
"type": "string32"
},
{
"name": "referrer",
"type": "string32"
},
{
"name": "referring_domain",
"type": "string32"
},
{
"name": "search",
"type": "string32"
},
{
"name": "title",
"type": "string32"
},
{
"name": "url",
"type": "string32"
}
]
},
{
"name": "screen",
"type": "struct",
"fields": [
{
"name": "density",
"type": "int64"
},
{
"name": "height",
"type": "int64"
},
{
"name": "width",
"type": "int64"
},
{
"name": "innerHeight",
"type": "int64"
},
{
"name": "innerWidth",
"type": "int64"
}
]
},
{
"name": "userAgent",
"type": "string32"
}
]
},
{
"name": "event",
"type": "string32"
},
{
"name": "integrations",
"type": "struct",
"fields": [
{
"name": "All",
"type": "bool"
}
]
},
{
"name": "messageId",
"type": "string32"
},
{
"name": "properties",
"type": "struct",
"fields": [
{
"name": "review_id",
"type": "string32"
},
{
"name": "product_id",
"type": "string32"
},
{
"name": "rating",
"type": "float64"
},
{
"name": "review_body",
"type": "string32"
}
]
},
{
"name": "originalTimestamp",
"type": "string32"
},
{
"name": "type",
"type": "string32"
},
{
"name": "sentAt",
"type": "string32"
}
]
} |
Beta Was this translation helpful? Give feedback.
-
I've opened #226 |
Beta Was this translation helpful? Give feedback.
I believe things are working now. I've made a bunch of changes since you ran your command (notably, implementing Recap's type spec in the Python library). Can you try re-running?
One thing to clarify: Recap currently only handles newline delimited JSON (akak
ndjson
orjsonl
), which requires that each JSON object be on a single line. I'll open a GH issue to infer whether to treat JSON as ndjson or a single multi-line JSON object.Anyway, I just ran:
Against: