diff --git a/docs/StardustDocs/topics/gradleReference.md b/docs/StardustDocs/topics/gradleReference.md index 946457ea2..9602a090d 100644 --- a/docs/StardustDocs/topics/gradleReference.md +++ b/docs/StardustDocs/topics/gradleReference.md @@ -10,7 +10,7 @@ dataframes { } } ``` -Note than name of the file and the interface are normalized: split by '_' and ' ' and joined to camel case. +Note that the name of the file and the interface are normalized: split by '_' and ' ' and joined to CamelCase. You can set parsing options for CSV: ```kotlin dataframes { @@ -23,9 +23,17 @@ dataframes { } } ``` -In this case output path will depend on your directory structure. For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt -`. Note that name of the Kotlin file is derived from the name of the data file with the suffix `.Generated` and the package -is derived from the directory structure with child directory `dataframe`. The name of the **data schema** itself is `JetbrainsRepositories`. You could specify it explicitly: +In this case, the output path will depend on your directory structure. +For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt +`. + +Note that the name of the Kotlin file is derived from the name of the data file with the suffix +`.Generated` and the package +is derived from the directory structure with child directory `dataframe`. + +The name of the **data schema** itself is `JetbrainsRepositories`. +You could specify it explicitly: + ```kotlin schema { // output: build/generated/dataframe/main/kotlin/org/example/dataframe/MyName.Generated.kt @@ -33,14 +41,18 @@ schema { name = "MyName" } ``` -If you want to change default package for all schemas: + +If you want to change the default package for all schemas: + ```kotlin dataframes { packageName = "org.example" // Schemas... } ``` + Then you can set packageName for specific schema exclusively: + ```kotlin dataframes { // output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt @@ -50,7 +62,9 @@ dataframes { } } ``` -If you want non-default name and package, consider using fully-qualified name: + +If you want non-default name and package, consider using fully qualified name: + ```kotlin dataframes { // output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt @@ -60,7 +74,10 @@ dataframes { } } ``` -By default, plugin will generate output in specified source set. Source set could be specified for all schemas or for specific schema: + +By default, the plugin will generate output in a specified source set. +Source set could be specified for all schemas or for specific schema: + ```kotlin dataframes { packageName = "org.example" @@ -76,7 +93,9 @@ dataframes { } } ``` -But if you need generated files in other directory, set `src`: + +If you need the generated files to be put in another directory, set `src`: + ```kotlin dataframes { // output: schemas/org/example/test/OtherName.Generated.kt @@ -87,10 +106,63 @@ dataframes { } } ``` +## Schema Definitions from SQL Databases + +To generate a schema for an existing SQL table, +you need to define a few parameters to establish a JDBC connection: +URL (passing to `data` field), username, and password. + +Also, the `tableName` parameter should be specified to convert the data from the table with that name to the dataframe. + +```kotlin +dataframes { + schema { + data = "jdbc:mariadb://localhost:3306/imdb" + name = "org.example.imdb.Actors" + jdbcOptions { + user = "root" + password = "pass" + tableName = "actors" + } + } +} +``` + +To generate a schema for the result of an SQL query, +you need to define the same parameters as before together with the SQL query to establish connection. + +```kotlin +dataframes { + schema { + data = "jdbc:mariadb://localhost:3306/imdb" + name = "org.example.imdb.TarantinoFilms" + jdbcOptions { + user = "root" + password = "pass" + sqlQuery = """ + SELECT name, year, rank, + GROUP_CONCAT (genre) as "genres" + FROM movies JOIN movies_directors ON movie_id = movies.id + JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id + WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino" + GROUP BY name, year, rank + ORDER BY year + """ + } + } +} +``` + +**NOTE:** This is an experimental functionality and, for now, +we only support four databases: MariaDB, MySQL, PostgreSQL, and SQLite. + +Additionally, support for JSON and date-time types is limited. +Please take this into consideration when using these functions. ## DSL reference -Inside `dataframes` you can configure parameters that will apply to all schemas. Configuration inside `schema` will override these defaults for specific schema. -Here is full DSL for declaring data schemas: +Inside `dataframes` you can configure parameters that will apply to all schemas. +Configuration inside `schema` will override these defaults for a specific schema. +Here is the full DSL for declaring data schemas: ```kotlin dataframes { @@ -101,8 +173,8 @@ dataframes { // KOTLIN SCRIPT: DataSchemaVisibility.INTERNAL DataSchemaVisibility.IMPLICIT_PUBLIC, DataSchemaVisibility.EXPLICIT_PUBLIC // GROOVY SCRIPT: 'internal', 'implicit_public', 'explicit_public' - withoutDefaultPath() // disable default path for all schemas - // i.e. plugin won't copy "data" property of the schemas to generated companion objects + withoutDefaultPath() // disable a default path for all schemas + // i.e., plugin won't copy "data" property of the schemas to generated companion objects // split property names by delimiters (arguments of this method), lowercase parts and join to camel case // enabled by default @@ -125,8 +197,8 @@ dataframes { withNormalizationBy('_') // enable property names normalization for this schema and use these delimiters withoutNormalization() // disable property names normalization for this schema - withoutDefaultPath() // disable default path for this schema - withDefaultPath() // enable default path for this schema + withoutDefaultPath() // disable the default path for this schema + withDefaultPath() // enable the default path for this schema } } ``` diff --git a/docs/StardustDocs/topics/readSqlDatabases.md b/docs/StardustDocs/topics/readSqlDatabases.md index 80cb46a2e..c1f212212 100644 --- a/docs/StardustDocs/topics/readSqlDatabases.md +++ b/docs/StardustDocs/topics/readSqlDatabases.md @@ -80,8 +80,21 @@ val df = DataFrame.readSqlTable(dbConfig, tableName, 100) df.print() ``` +## Getting Started with Notebooks +To use the latest version of the Kotlin DataFrame library +and a specific version of the JDBC driver for your database (MariaDB is used as an example below) in your Notebook, run the following cell. + +```jupyter +%use dataframe + +USE { + dependencies("org.mariadb.jdbc:mariadb-java-client:$version") +} +``` +**NOTE:** The user should specify the version of the JDBC driver. + ## Reading Specific Tables These functions read all data from a specific table in the database. @@ -220,7 +233,7 @@ The versions with a limit parameter will only read up to the specified number of This function allows reading a ResultSet object from your SQL database and transforms it into an AnyFrame object. -The `dbType: DbType` parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc), +The `dbType: DbType` parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.), supported by a library. Currently, the following classes are available: `H2, MariaDb, MySql, PostgreSql, Sqlite`. diff --git a/docs/StardustDocs/topics/schemas.md b/docs/StardustDocs/topics/schemas.md index dc55132a5..a7e44a15f 100644 --- a/docs/StardustDocs/topics/schemas.md +++ b/docs/StardustDocs/topics/schemas.md @@ -35,7 +35,10 @@ Here's a list of the most popular use cases with Data Schemas. Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library. Schema interfaces should also be extracted if this code uses Custom Data Schemas. -* [**Import OpenAPI Schemas in Gradle project**](schemasImportOpenApiGradle.md)
+* [**Schema Definitions from SQL Databases in Gradle Project**](schemasImportSqlGradle.md)
+ When you need to take data from the SQL database. + +* [**Import OpenAPI Schemas in Gradle Project**](schemasImportOpenApiGradle.md)
When you need to take data from the endpoint with OpenAPI Schema. * [**Import Data Schemas, e.g. from OpenAPI, in Jupyter**](schemasImportOpenApiJupyter.md)
diff --git a/docs/StardustDocs/topics/schemasGradle.md b/docs/StardustDocs/topics/schemasGradle.md index 42c3145a6..7d61c409f 100644 --- a/docs/StardustDocs/topics/schemasGradle.md +++ b/docs/StardustDocs/topics/schemasGradle.md @@ -58,7 +58,7 @@ interface Person { } ``` -#### Execute assemble task to generate type-safe accessors for schemas: +#### Execute the `assemble` task to generate type-safe accessors for schemas: @@ -150,60 +150,3 @@ print(df.fullName.count { it.contains("kotlin") }) ``` - -### OpenAPI Schemas - -JSON schema inference is great, but it's not perfect. However, more and more APIs offer -[OpenAPI (Swagger)](https://swagger.io/) specifications. Aside from API endpoints, they also hold -[Data Models](https://swagger.io/docs/specification/data-models/) which include all the information about the types -that can be returned from or supplied to the API. Why should we reinvent the wheel and write our own schema inference -when we can use the one provided by the API? Not only will we now get the proper names of the types, but we will also -get enums, correct inheritance and overall better type safety. - -First of all, you will need the extra dependency: - -```kotlin -implementation("org.jetbrains.kotlinx:dataframe-openapi:$dataframe_version") -``` - -OpenAPI type schemas can be generated using both methods described above: - -```kotlin -@file:ImportDataSchema( - path = "https://petstore3.swagger.io/api/v3/openapi.json", - name = "PetStore", -) - -import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema -``` - -```kotlin -dataframes { - schema { - data = "https://petstore3.swagger.io/api/v3/openapi.json" - name = "PetStore" - } -} -``` - -The only difference is that the name provided is now irrelevant, since the type names are provided by the OpenAPI spec. -(If you were wondering, yes, the Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data) - -After importing the data schema, you can now start to import any JSON data you like using the generated schemas. -For instance, one of the types in the schema above is `PetStore.Pet` (which can also be -explored [here](https://petstore3.swagger.io/)), -so let's parse some Pets: - -```kotlin -val df: DataFrame = - PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available") -``` - -Now you will have a correctly typed [`DataFrame`](DataFrame.md)! - -You can also always ctrl+click on the `PetStore.Pet` type to see all the generated schemas. - -If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting -something as -type-fluid as JSON to a strongly typed language), please open an issue on -the [Github repo](https://github.com/Kotlin/dataframe/issues). diff --git a/docs/StardustDocs/topics/schemasImportOpenApiGradle.md b/docs/StardustDocs/topics/schemasImportOpenApiGradle.md index c410f8031..7a68c32cc 100644 --- a/docs/StardustDocs/topics/schemasImportOpenApiGradle.md +++ b/docs/StardustDocs/topics/schemasImportOpenApiGradle.md @@ -61,4 +61,4 @@ You can also always ctrl+click on the `PetStore.Pet` type to see all the generat If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting something as type-fluid as JSON to a strongly typed language), please open an issue on -the [Github repo](https://github.com/Kotlin/dataframe/issues). +the [GitHub repo](https://github.com/Kotlin/dataframe/issues). diff --git a/docs/StardustDocs/topics/schemasImportSqlGradle.md b/docs/StardustDocs/topics/schemasImportSqlGradle.md new file mode 100644 index 000000000..6fc48af7a --- /dev/null +++ b/docs/StardustDocs/topics/schemasImportSqlGradle.md @@ -0,0 +1,134 @@ +[//]: # (title: Import SQL Metadata as a Schema in Gradle Project) + + + +Each SQL database contains the metadata for all the tables. +This metadata could be used for the schema generation. + +**NOTE:** Visit this [page](readSqlDatabases.md) to see how to set up all Gradle dependencies for your project. + +### With `@file:ImportDataSchema` + +To generate schema for existing SQL table, +you need to define a few parameters to establish JDBC connection: +URL, username, and password. + +Also, the `tableName` parameter could be specified. + +You should also specify the name of the generated Kotlin class +as the first parameter of the annotation `@file:ImportDataSchema`. + +```kotlin +@file:ImportDataSchema( + "ActorSchema", + URL, + jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, tableName = TABLE_NAME) +) + +package databases + +import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema +``` + +```kotlin +const val URL = "jdbc:mariadb://localhost:3306/imdb" + +const val USER_NAME = "root" + +const val PASSWORD = "pass" + +const val TABLE_NAME = "actors" +``` +To generate schema for the result of an SQL query, +you need to define the SQL query itself +and the same parameters to establish connection with the database. + +You should also specify the name of the generated Kotlin class +as a first parameter of annotation `@file:ImportDataSchema`. + +```kotlin +@file:ImportDataSchema( + "TarantinoFilmSchema", + URL, + jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, sqlQuery = TARANTINO_FILMS_SQL_QUERY) +) + +package databases + +import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema +``` + +```kotlin +const val URL = "jdbc:mariadb://localhost:3306/imdb" + +const val USER_NAME = "root" + +const val PASSWORD = "pass" + +const val TARANTINO_FILMS_SQL_QUERY = """ + SELECT name, year, rank, + GROUP_CONCAT (genre) as "genres" + FROM movies JOIN movies_directors ON movie_id = movies.id + JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id + WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino" + GROUP BY name, year, rank + ORDER BY year + """ +``` + +### With Gradle Task + +To generate a schema for an existing SQL table, +you need to define a few parameters to establish a JDBC connection: +URL (passing to `data` field), username, and password. + +Also, the `tableName` parameter should be specified to convert the data from the table with that name to the dataframe. + +```kotlin +dataframes { + schema { + data = "jdbc:mariadb://localhost:3306/imdb" + name = "org.example.imdb.Actors" + jdbcOptions { + user = "root" + password = "pass" + tableName = "actors" + } + } +} +``` + +To generate a schema for the result of an SQL query, +you need to define the same parameters as before together with the SQL query to establish connection. + + +```kotlin +dataframes { + schema { + data = "jdbc:mariadb://localhost:3306/imdb" + name = "org.example.imdb.TarantinoFilms" + jdbcOptions { + user = "root" + password = "pass" + sqlQuery = """ + SELECT name, year, rank, + GROUP_CONCAT (genre) as "genres" + FROM movies JOIN movies_directors ON movie_id = movies.id + JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id + WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino" + GROUP BY name, year, rank + ORDER BY year + """ + } + } +} +``` + +After importing the data schema, you can start to import any data from SQL table or as a result of an SQL query +you like using the generated schemas. + +Now you will have a correctly typed [`DataFrame`](DataFrame.md)! + +If you experience any issues with the SQL databases support (since there are many edge-cases when converting +SQL types from different databases to Kotlin types), please open an issue on +the [GitHub repo](https://github.com/Kotlin/dataframe/issues), specifying the database and the problem.