Skip to content

Commit

Permalink
feat(ingest): elasticsearch - add Elasticsearch Source (#3893)
Browse files Browse the repository at this point in the history
Co-authored-by: Shirshanka Das <[email protected]>
  • Loading branch information
rslanka and shirshanka authored Jan 14, 2022
1 parent 2daa06a commit a44b48a
Show file tree
Hide file tree
Showing 6 changed files with 2,845 additions and 0 deletions.
11 changes: 11 additions & 0 deletions metadata-ingestion/examples/recipes/elasticsearch_to_datahub.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
source:
type: "elasticsearch"
config:
host: 'localhost:9200'
username: ""
password: ""

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ def get_long_description():
"datahub-business-glossary": set(),
"dbt": {"requests"},
"druid": sql_common | {"pydruid>=0.6.2"},
"elasticsearch": {"elasticsearch"},
"feast": {"docker"},
"glue": aws_common,
"hive": sql_common
Expand Down Expand Up @@ -204,6 +205,7 @@ def get_long_description():
for plugin in [
"bigquery",
"bigquery-usage",
"elasticsearch",
"looker",
"glue",
"mariadb",
Expand Down Expand Up @@ -278,6 +280,7 @@ def get_long_description():
"bigquery-usage = datahub.ingestion.source.usage.bigquery_usage:BigQueryUsageSource",
"dbt = datahub.ingestion.source.dbt:DBTSource",
"druid = datahub.ingestion.source.sql.druid:DruidSource",
"elasticsearch = datahub.ingestion.source.elastic_search:ElasticsearchSource",
"feast = datahub.ingestion.source.feast:FeastSource",
"glue = datahub.ingestion.source.aws.glue:GlueSource",
"sagemaker = datahub.ingestion.source.aws.sagemaker:SagemakerSource",
Expand Down
62 changes: 62 additions & 0 deletions metadata-ingestion/source_docs/elastic_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Elastic Search

For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

## Setup

To install this plugin, run `pip install 'acryl-datahub[elasticsearch]'`.

## Capabilities

This plugin extracts the following:

- Metadata for indexes
- Column types associated with each index field

## Quickstart recipe

Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).

```yml
source:
type: "elasticsearch"
config:
# Coordinates
host: 'localhost:9200'
# Credentials
username: ""
password: ""
# Options
env = "prod"
index_pattern:
allow: [".*some_index_name_pattern*"]
deny: [".*skip_index_name_pattern*"]

sink:
# sink configs
```

## Config details

Note that a `.` is used to denote nested fields in the YAML recipe.


| Field | Required | Default | Description |
| --------------------------- | -------- | ---------------- |---------------------------------------------------------------|
| `host` | | "localhost:9092" | The elastic search host URI. |
| `username` | | "" | The username credential. |
| `password` | | "" | The password credential. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `index_pattern.allow` | | | List of regex patterns for indexes to include in ingestion. |
| `index_pattern.deny` | | | List of regex patterns for indexes to exclude from ingestion. |
| `index_pattern.ignoreCase` | | `True` | Whether regex matching should ignore case or not |

## Compatibility

Coming soon!

## Questions

If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
Loading

0 comments on commit a44b48a

Please sign in to comment.