Skip to content

Commit

Permalink
automatically infer the separator in ElkDataset; bump to 0.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
diegobit committed Sep 18, 2024
1 parent 4ccdbdd commit a07701b
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ You can see examples for every operation in the [dedicated notebook](./examples/
| **start_isodate** | str (ISO datetime) | Yes | | Elastic start date range with format: "2021-09-15T10:00:00.000Z" |
| **end_isodate** | str (ISO datetime) | Yes | | Elastic end date range with format: "2021-09-15T10:00:00.000Z" |
| **date_field** | str | No | @timestamp | Elastic date field. Can be nested into list, eg. "messages.date" |
| **date_field_separator** | str | No | . | Separator for date_field used to split the path. Use different ones to NOT split and consider date_field as single field |
| **date_field_separator** | str | No | . | [DEPRECATED] (separator automatically inferred) Separator for date_field used to split the path. Use different ones to NOT split and consider date_field as single field |

**Returned element type**: ```dict```. Each element is a document matching the given query.

Expand Down
14 changes: 11 additions & 3 deletions datafun/sources/elk.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ class ELKDatasetConfig:

class ELKDataset(DatasetSource):
def __init__(self, config: ELKDatasetConfig, **kwargs):
if 'date_field_separator' in kwargs:
print("WARN: date_field_separator is deprecated, the separator is now automatically inferred")
super().__init__(config=config, **kwargs)

self.es = Elasticsearch(
Expand Down Expand Up @@ -94,11 +96,17 @@ def set_time_interval(self, query: dict) -> dict:
if not isinstance(xs, List):
raise TypeError(f'Field query.bool.filter must be of type List, but found of type {type(xs)}')

sep = self.config.date_field_separator
path_sep_alts = ['/', '//--@@--//']
path_sep = '.'
while path_sep in self.config.date_field:
try:
path_sep = path_sep_alts.pop(0)
except Exception as e:
raise ValueError(f'Field {self.config.date_field} contains invalid characters. Exception: {e}')
for idx, obj in enumerate(xs):
if dl.has(obj, "range"):
obj = dl.update(obj, f"range{sep}{self.config.date_field}{sep}gte", value=self.config.start_isodate, sep=sep)
obj = dl.update(obj, f"range{sep}{self.config.date_field}{sep}lte", value=self.config.end_isodate, sep=sep)
obj = dl.update(obj, f"range{path_sep}{self.config.date_field}{path_sep}gte", value=self.config.start_isodate, sep=path_sep)
obj = dl.update(obj, f"range{path_sep}{self.config.date_field}{path_sep}lte", value=self.config.end_isodate, sep=path_sep)
if not obj:
raise ValueError(f'{self.config.date_field}.lte or {self.config.date_field}.lte fields can\'t be updated, e.g. check '
'if they exist in the query.')
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ authors = [
{ name = "Diego Giorgini, Luigi Di Sotto, Saeed Choobani", email = "[email protected]" }
]
description = "datafun brings the fun back to data pipelines"
version = "0.5.2"
version = "0.6.0"
requires-python = ">=3.8"
dependencies = [
"backoff",
Expand Down

0 comments on commit a07701b

Please sign in to comment.