[DOP-16959] - update documentation (#7)

* [DOP-16959] - update documentation * [DOP-16959] - spread documentation out
MobileTeleSystems · Jul 16, 2024 · 2320a27 · 2320a27
1 parent d4904f6
commit 2320a27
Show file tree

Hide file tree

Showing 4 changed files with 135 additions and 66 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,73 @@
+# Contributing to Spark Dialect Extension
+
+This document provides detailed steps to build the Spark Dialect Extension from the source code.
+
+### Prerequisites
+
+Before you start, ensure you have the following installed:
+- **Java**: Java 8 or higher. [Java Installation Guide](https://adoptopenjdk.net/)
+- **Scala**: [Scala Installation Guide](https://scala-lang.org/download/)
+- **SBT**: [SBT Installation Guide](https://www.scala-sbt.org/download.html)
+
+### Compile the Project
+
+To compile the project and generate a JAR file, run the following command in the project's root directory:
+
+```bash
+sbt package
+```
+
+This command compiles the source code and packages it into a .jar file located in the ``target/scala-2.12`` directory.
+
+
+## Running Scala Tests
+
+This section describes how to run Scala tests for the Spark Dialect Extension.
+
+### Start Required Services
+
+Before running the tests, you need to start the necessary database services using Docker Compose:
+
+```bash
+docker-compose -f docker-compose.test.yml up -d
+```
+
+### Execute Tests
+To run the Scala tests, execute:
+
+```bash
+sbt test
+```
+
+### With Coverage Report
+To run the tests with coverage and generate a report, use:
+
+```bash
+sbt clean coverage test coverageReport
+```
+
+After the tests, you can view the coverage report by opening the ``target/scala-2.12/scoverage-report/index.html`` file in your web browser.
+
+### Stopping Docker Containers
+After completing the tests, you can stop the Docker containers with:
+
+```bash
+docker-compose -f docker-compose.test.yml down
+```
+
+# Code Formatting and Linting
+
+## Using Scalafmt to Format Code
+
+To format all Scala source files in the project, execute the following command from the project's root directory:
+```bash
+sbt scalafmtAll
+```
+
+## Using Scalafix for Linting and Refactoring
+
+To lint and refactor the code, run Scalafix using the following command:
+```bash
+sbt scalafixAll
+```
+This command checks the code against various rules specified in the ```.scalafix.conf``` file and applies fixes where possible.
diff --git a/README.md b/README.md
@@ -1,68 +1,11 @@
-# Spark Dialect Extension
+# Spark Dialect Extension Project Documentation
 
-## Overview
-This repository hosts the Spark Dialect Extension, which provides custom handling for specific JDBC data types within Apache Spark. 
-
-## Prerequisites
-Before you begin, ensure you have the following prerequisites installed:
-- **Java**: Java 8 or higher required. [Java Installation Guide](https://adoptopenjdk.net/)
-- **Scala**: [Scala Installation Guide](https://scala-lang.org/download/)
-- **SBT**: [SBT Installation Guide](https://www.scala-sbt.org/download.html)
-
-## Getting Started
-### Clone the Repository:
-```bash
-git clone https://github.com/MobileTeleSystems/spark-dialect-extension.git
-cd spark-dialect-extension
-```
-
-### Format Source Code:
-Use Scalafmt to format your code by running:
-```bash
-sbt scalafmtAll
-```
-
-Use Scalafix to lint and refactor your code by running:
-```bash
-sbt scalafixAll
-```
-
-### Build the Project:
-Compile the project and generate a JAR file:
-```bash
-sbt package
-```
-This will place the generated `.jar` file in the `target/scala-2.12` directory.
-
-
-### Testing Setup
-Before running the tests, start the necessary database services using Docker Compose:
-
-``` bash
-docker-compose -f docker-compose.test.yml up -d
-```
-
-### Running Scala Tests:
-To execute the Scala tests, use the following:
-```bash
-sbt test
-```
-
-##### With coverage report:
-To run the tests with coverage and generate a report, use the following:
-```bash
-sbt clean coverage test coverageReport
-```
-After running the tests with coverage, you can view the coverage report by opening the following file in your web browser:
-``spark-dialect-extension/target/scala-2.12/scoverage-report/index.html``
-
-### Stopping Docker Containers:
-After the tests, you can stop the Docker containers with:
-
-``` bash
-docker-compose -f docker-compose.test.yml down
-```
-
-
-add Scalafix with Scalafmt in the continuous (CI) workflow to enhance code quality automatically, add an auto-commit changes to emulate the behavior of pre-commit hooks used in our python repositories. 
+This repository contains the Spark Dialect Extension, which provides custom handling for specific JDBC data types within Apache Spark.
+## Documentation Index
 
+- [**Using the Dialect**](docs/using_the_dialect.md)
+  - How to configure and use the dialect in Spark applications.
+- [**Data Type Mappings**](docs/data_type_mappings.md)
+  - Detailed mappings between ClickHouse data types and Spark data types.
+- [**Contributing to the project**](CONTRIBUTING.md)
+  - Detailed instructions on how to build the project.
diff --git a/docs/data_type_mappings.md b/docs/data_type_mappings.md
@@ -0,0 +1,22 @@
+## Data Type Mappings for Spark Dialect Extension
+
+This documentation outlines the customized mappings that the Spark Dialect Extension implements that optimize interactions between Spark and ClickHouse.
+
+#### Customized Type Mappings with Spark Dialect Extension
+
+| ClickHouse Type (Read)     | Spark Type                     | ClickHouse Type (Write)       | ClickHouse Type (Create)    |
+|----------------------------|--------------------------------|-------------------------------|-----------------------------|
+| `Int8`                     | `ByteType`                     | `Int8`                        | `Int8`                      |
+| `Int16`                    | `ShortType`                    | `Int16`                       | `Int16`                     |
+| `Datetime64(6)`            | `TimestampType`                | `Datetime64(6)`               | `Datetime64(6)`             |
+| `Bool`                     | `BooleanType`                  | `Bool`                        | `Bool`                      |
+
+
+#### Default Type Mappings without Spark Dialect Extension
+
+| ClickHouse Type (Read)     | Spark Type                     | ClickHouse Type (Write)       | ClickHouse Type (Create)    |
+|----------------------------|--------------------------------|-------------------------------|-----------------------------|
+| `Int8`                     | `IntegerType`                  | `Int32`                       | `Int32`                     |
+| `Int16`                    | `IntegerType`                  | `Int32`                       | `Int32`                     |
+| `Datetime64(6)`            | `TimestampType`                | `Datetime64(6)`               | `DateTime32`                |
+| `Bool`                     | `BooleanType`                  | `Bool`                        | `UInt64`                    |
diff --git a/docs/using_the_dialect.md b/docs/using_the_dialect.md
@@ -0,0 +1,31 @@
+## Using the Spark Dialect Extension
+
+This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types.
+
+### Configuration Steps
+
+To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. This enables Spark to utilize the custom JDBC dialect for enhanced data type handling.
+
+#### Add the JAR to Spark
+
+1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory.
+
+2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application:
+
+- **Spark Submit Command**:
+  ```bash
+  spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar
+  ```
+
+- **Programmatically** (within your Spark application):
+  ```scala
+  import org.apache.spark.sql.jdbc.JdbcDialects
+
+  val spark = SparkSession.builder()
+    .appName("My Spark App")
+    .config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar")
+    .getOrCreate()
+
+  // register custom Clickhouse dialect
+  JdbcDialects.registerDialect(ClickhouseDialectExtension)
+  ```