Skip to content
This repository has been archived by the owner on Jul 29, 2024. It is now read-only.

Typos fixes #61

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/components/PageLayout/PageFooter.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ const PageFooter = () => (
<Typography variant="p2">
Copyright © {new Date().getFullYear()} Delta Lake, a series of LF
Projects, LLC. For web site terms of use, trademark policy and other
project polcies please see{" "}
project policies please see{" "}
<Link href="https://lfprojects.org" newTab>
https://lfprojects.org
</Link>
Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/concurrency-control.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ operate in three stages:

The following table describes which pairs of write operations can conflict. Compaction refers to [file compaction operation](/latest/best-practices#compact-files) written with the option dataChange set to false.

| | **INSERT** | **UPDATE, DELTE, MERGE INTO** | **OPTIMIZE** |
| | **INSERT** | **UPDATE, DELETE, MERGE INTO** | **OPTIMIZE** |
| ------------------------------ | --------------- | ----------------------------- | ------------ |
| **INSERT** | Cannot conflict | | |
| **UPDATE, DELETE, MERGE INTO** | Can conflict | Can conflict | |
Expand Down
6 changes: 3 additions & 3 deletions src/pages/latest/delta-batch.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ For many Delta Lake operations on tables, you enable integration with Apache Spa

Delta Lake supports creating two types of tables --- tables defined in the metastore and tables defined by path.

To work with metastore-defined tables, you must enable integration with Apache Spark DataSourceV2 and Catalog APIs by setting configurations when you create a new `SparkSession`. See [Configure SparkSesion](#configure-sparksession).
To work with metastore-defined tables, you must enable integration with Apache Spark DataSourceV2 and Catalog APIs by setting configurations when you create a new `SparkSession`. See [Configure SparkSession](#configure-sparksession).

You can create tables in the following ways.

Expand Down Expand Up @@ -1347,7 +1347,7 @@ pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --

## Configure storage credentials

Delta Lake uses Hadoop FileSystem APIs to access the storage systems. The credentails for storage systems usually can be set through Hadoop configurations. Delta Lake provides multiple ways to set Hadoop configurations similar to Apache Spark.
Delta Lake uses Hadoop FileSystem APIs to access the storage systems. The credentials for storage systems usually can be set through Hadoop configurations. Delta Lake provides multiple ways to set Hadoop configurations similar to Apache Spark.

### Spark configurations

Expand All @@ -1365,7 +1365,7 @@ Spark SQL will pass all of the current [SQL session configurations](http://spark

Besides setting Hadoop file system configurations through the Spark (cluster) configurations or SQL session configurations, Delta supports reading Hadoop file system configurations from `DataFrameReader` and `DataFrameWriter` options (that is, option keys that start with the `fs.` prefix) when the table is read or written, by using `DataFrameReader.load(path)` or `DataFrameWriter.save(path)`.

For example, you can pass your storage credentails through DataFrame options:
For example, you can pass your storage credentials through DataFrame options:

<CodeTabs>

Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/delta-storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ that S3 is lacking.

- All of the requirements listed in [\_](#requirements-s3-single-cluster)
section
- In additon to S3 credentials, you also need DynamoDB operating permissions
- In addition to S3 credentials, you also need DynamoDB operating permissions

#### Quickstart (S3 multi-cluster)

Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/delta-streaming.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ For applications with more lenient latency requirements, you can save computing
Available in Delta Lake 2.0.0 and above.
</Info>

The command `foreachBatch` allows you to specify a function that is executed on the output of every micro-batch after arbitrary transformations in the streaming query. This allows implementating a `foreachBatch` function that can write the micro-batch output to one or more target Delta table destinations. However, `foreachBatch` does not make those writes idempotent as those write attempts lack the information of whether the batch is being re-executed or not. For example, rerunning a failed batch could result in duplicate data writes.
The command `foreachBatch` allows you to specify a function that is executed on the output of every micro-batch after arbitrary transformations in the streaming query. This allows implementing a `foreachBatch` function that can write the micro-batch output to one or more target Delta table destinations. However, `foreachBatch` does not make those writes idempotent as those write attempts lack the information of whether the batch is being re-executed or not. For example, rerunning a failed batch could result in duplicate data writes.

To address this, Delta tables support the following `DataFrameWriter` options to make the writes idempotent:

Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/delta-update.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,7 @@ You can reduce the time taken by merge using the following approaches:

</CodeTabs>

will make the query faster as it looks for matches only in the relevant partitions. Furthermore, it will also reduce the chances of conflicts with other concurrent operations. See [concurency control](/latest/concurrency-control) for more details.
will make the query faster as it looks for matches only in the relevant partitions. Furthermore, it will also reduce the chances of conflicts with other concurrent operations. See [concurrency control](/latest/concurrency-control) for more details.

- **Compact files**: If the data is stored in many small files, reading the data to search for matches can become slow. You can compact small files into larger files to improve read throughput. See [best practices for compaction](/latest/best-practices/#compact-files) for details.

Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/integrations.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Access Delta tables from external data processing engines
description: Docs for accessesing Delta tables from external data processing engines
description: Docs for accessing Delta tables from external data processing engines
---

You can access Delta tables from Apache Spark and [other data processing systems](https://delta.io/integrations/). Here is the list of integrations that enable you to access Delta tables from external data processing engines.
Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/porting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ migrating from older to newer versions of Delta Lake.

Delta Lake 1.2.1, 2.0.0 and 2.1.0 have a bug in their DynamoDB-based S3 multi-cluster configuration implementations where an incorrect timestamp value was written to DynamoDB. This caused [DynamoDB’s TTL](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) feature to cleanup completed items before it was safe to do so. This has been fixed in Delta Lake versions 2.0.1 and 2.1.1, and the TTL attribute has been renamed from `commitTime` to `expireTime`.

If you already have TTL enabled on your DynamoDB table using the old attribute, you need to disable TTL for that attribute and then enable it for the new one. You may need to wait an hour between these two operations, as TTL settings changes may take some time to propagate. See the DynamoDB docs [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html). If you don’t do this, DyanmoDB’s TTL feature will not remove any new and expired entries. There is no risk of data loss.
If you already have TTL enabled on your DynamoDB table using the old attribute, you need to disable TTL for that attribute and then enable it for the new one. You may need to wait an hour between these two operations, as TTL settings changes may take some time to propagate. See the DynamoDB docs [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html). If you don’t do this, DynamoDB’s TTL feature will not remove any new and expired entries. There is no risk of data loss.

```bash
# Disable TTL on old attribute
Expand Down
2 changes: 1 addition & 1 deletion src/pages/latest/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ deltaTable.toDF().show();
You should see that some of the existing rows have been updated and new rows
have been inserted.

For more information on these operations, see [Table delets, updates, and merges](/latestl/delta-update).
For more information on these operations, see [Table deletes, updates, and merges](/latestl/delta-update).

## Read older versions of data using time travel

Expand Down
2 changes: 1 addition & 1 deletion static/quickstart_docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ The current version is `delta-spark_2.12:3.0.0` which corresponds to Apache Spar

1. Open a bash shell (if on windows use git bash, WSL, or any shell configured for bash commands)

2. Run a container from the image with a JuypterLab entrypoint
2. Run a container from the image with a JupyterLab entrypoint

```bash
# Build entry point
Expand Down