Skip to content

Commit

Permalink
[DOP-16959] - update documentation (#7)
Browse files Browse the repository at this point in the history
* [DOP-16959] - update documentation

* [DOP-16959] - spread documentation out
  • Loading branch information
maxim-lixakov authored Jul 16, 2024
1 parent d4904f6 commit 2320a27
Show file tree
Hide file tree
Showing 4 changed files with 135 additions and 66 deletions.
73 changes: 73 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Contributing to Spark Dialect Extension

This document provides detailed steps to build the Spark Dialect Extension from the source code.

### Prerequisites

Before you start, ensure you have the following installed:
- **Java**: Java 8 or higher. [Java Installation Guide](https://adoptopenjdk.net/)
- **Scala**: [Scala Installation Guide](https://scala-lang.org/download/)
- **SBT**: [SBT Installation Guide](https://www.scala-sbt.org/download.html)

### Compile the Project

To compile the project and generate a JAR file, run the following command in the project's root directory:

```bash
sbt package
```

This command compiles the source code and packages it into a .jar file located in the ``target/scala-2.12`` directory.


## Running Scala Tests

This section describes how to run Scala tests for the Spark Dialect Extension.

### Start Required Services

Before running the tests, you need to start the necessary database services using Docker Compose:

```bash
docker-compose -f docker-compose.test.yml up -d
```

### Execute Tests
To run the Scala tests, execute:

```bash
sbt test
```

### With Coverage Report
To run the tests with coverage and generate a report, use:

```bash
sbt clean coverage test coverageReport
```

After the tests, you can view the coverage report by opening the ``target/scala-2.12/scoverage-report/index.html`` file in your web browser.

### Stopping Docker Containers
After completing the tests, you can stop the Docker containers with:

```bash
docker-compose -f docker-compose.test.yml down
```

# Code Formatting and Linting

## Using Scalafmt to Format Code

To format all Scala source files in the project, execute the following command from the project's root directory:
```bash
sbt scalafmtAll
```

## Using Scalafix for Linting and Refactoring

To lint and refactor the code, run Scalafix using the following command:
```bash
sbt scalafixAll
```
This command checks the code against various rules specified in the ```.scalafix.conf``` file and applies fixes where possible.
75 changes: 9 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,11 @@
# Spark Dialect Extension
# Spark Dialect Extension Project Documentation

## Overview
This repository hosts the Spark Dialect Extension, which provides custom handling for specific JDBC data types within Apache Spark.

## Prerequisites
Before you begin, ensure you have the following prerequisites installed:
- **Java**: Java 8 or higher required. [Java Installation Guide](https://adoptopenjdk.net/)
- **Scala**: [Scala Installation Guide](https://scala-lang.org/download/)
- **SBT**: [SBT Installation Guide](https://www.scala-sbt.org/download.html)

## Getting Started
### Clone the Repository:
```bash
git clone https://github.com/MobileTeleSystems/spark-dialect-extension.git
cd spark-dialect-extension
```

### Format Source Code:
Use Scalafmt to format your code by running:
```bash
sbt scalafmtAll
```

Use Scalafix to lint and refactor your code by running:
```bash
sbt scalafixAll
```

### Build the Project:
Compile the project and generate a JAR file:
```bash
sbt package
```
This will place the generated `.jar` file in the `target/scala-2.12` directory.


### Testing Setup
Before running the tests, start the necessary database services using Docker Compose:

``` bash
docker-compose -f docker-compose.test.yml up -d
```

### Running Scala Tests:
To execute the Scala tests, use the following:
```bash
sbt test
```

##### With coverage report:
To run the tests with coverage and generate a report, use the following:
```bash
sbt clean coverage test coverageReport
```
After running the tests with coverage, you can view the coverage report by opening the following file in your web browser:
``spark-dialect-extension/target/scala-2.12/scoverage-report/index.html``

### Stopping Docker Containers:
After the tests, you can stop the Docker containers with:

``` bash
docker-compose -f docker-compose.test.yml down
```


add Scalafix with Scalafmt in the continuous (CI) workflow to enhance code quality automatically, add an auto-commit changes to emulate the behavior of pre-commit hooks used in our python repositories.
This repository contains the Spark Dialect Extension, which provides custom handling for specific JDBC data types within Apache Spark.
## Documentation Index

- [**Using the Dialect**](docs/using_the_dialect.md)
- How to configure and use the dialect in Spark applications.
- [**Data Type Mappings**](docs/data_type_mappings.md)
- Detailed mappings between ClickHouse data types and Spark data types.
- [**Contributing to the project**](CONTRIBUTING.md)
- Detailed instructions on how to build the project.
22 changes: 22 additions & 0 deletions docs/data_type_mappings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Data Type Mappings for Spark Dialect Extension

This documentation outlines the customized mappings that the Spark Dialect Extension implements that optimize interactions between Spark and ClickHouse.

#### Customized Type Mappings with Spark Dialect Extension

| ClickHouse Type (Read) | Spark Type | ClickHouse Type (Write) | ClickHouse Type (Create) |
|----------------------------|--------------------------------|-------------------------------|-----------------------------|
| `Int8` | `ByteType` | `Int8` | `Int8` |
| `Int16` | `ShortType` | `Int16` | `Int16` |
| `Datetime64(6)` | `TimestampType` | `Datetime64(6)` | `Datetime64(6)` |
| `Bool` | `BooleanType` | `Bool` | `Bool` |


#### Default Type Mappings without Spark Dialect Extension

| ClickHouse Type (Read) | Spark Type | ClickHouse Type (Write) | ClickHouse Type (Create) |
|----------------------------|--------------------------------|-------------------------------|-----------------------------|
| `Int8` | `IntegerType` | `Int32` | `Int32` |
| `Int16` | `IntegerType` | `Int32` | `Int32` |
| `Datetime64(6)` | `TimestampType` | `Datetime64(6)` | `DateTime32` |
| `Bool` | `BooleanType` | `Bool` | `UInt64` |
31 changes: 31 additions & 0 deletions docs/using_the_dialect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Using the Spark Dialect Extension

This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types.

### Configuration Steps

To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. This enables Spark to utilize the custom JDBC dialect for enhanced data type handling.

#### Add the JAR to Spark

1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory.

2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application:

- **Spark Submit Command**:
```bash
spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar
```

- **Programmatically** (within your Spark application):
```scala
import org.apache.spark.sql.jdbc.JdbcDialects

val spark = SparkSession.builder()
.appName("My Spark App")
.config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar")
.getOrCreate()

// register custom Clickhouse dialect
JdbcDialects.registerDialect(ClickhouseDialectExtension)
```

0 comments on commit 2320a27

Please sign in to comment.