Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ts-interpolation.md #112

Merged
merged 1 commit into from
Sep 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 50 additions & 10 deletions content/en/altinity-kb-queries-and-syntax/ts-interpolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,48 +5,88 @@ description: >
Time-series alignment with interpolation
---

This article demonstrates how to perform time-series data alignment with interpolation using window functions in ClickHouse. The goal is to align two different time-series (A and B) on the same timestamp axis and fill the missing values using linear interpolation.

Step-by-Step Implementation
We begin by creating a table with test data that simulates two time-series (A and B) with randomly distributed timestamps and values. Then, we apply interpolation to fill missing values for each time-series based on the surrounding data points.

#### 1. Drop Existing Table (if it exists)

```sql
DROP TABLE test_ts_interpolation;
```
This ensures that any previous versions of the table are removed.

--- generate test data
#### 2. Generate Test Data
In this step, we generate random time-series data with timestamps and values for series A and B. The values are calculated differently for each series:

```sql
CREATE TABLE test_ts_interpolation
ENGINE = Log AS
SELECT
((number * 100) + 50) - (rand() % 100) AS timestamp,
transform(rand() % 2, [0, 1], ['A', 'B'], '') AS ts,
if(ts = 'A', timestamp * 10, timestamp * 100) AS value
((number * 100) + 50) - (rand() % 100) AS timestamp, -- random timestamp generation
transform(rand() % 2, [0, 1], ['A', 'B'], '') AS ts, -- randomly assign series 'A' or 'B'
if(ts = 'A', timestamp * 10, timestamp * 100) AS value -- different value generation for each series
FROM numbers(1000000);
```
Here, the timestamp is generated randomly and assigned to either series A or B using the transform() function. The value is calculated based on the series type (A or B), with different multipliers for each.


#### 3. Preview the Generated Data
After generating the data, you can inspect it by running a simple SELECT query:
```sql
SELECT * FROM test_ts_interpolation;
```
This will show the randomly generated timestamps, series (A or B), and their corresponding values.

-- interpolation select with window functions
#### 4. Perform Interpolation with Window Functions
To align the time-series and interpolate missing values, we use window functions in the following query:

```sql
SELECT
timestamp,
if(
ts = 'A',
toFloat64(value),
prev_a.2 + (timestamp - prev_a.1 ) * (next_a.2 - prev_a.2) / ( next_a.1 - prev_a.1)
toFloat64(value), -- If the current series is 'A', keep the original value
prev_a.2 + (timestamp - prev_a.1 ) * (next_a.2 - prev_a.2) / ( next_a.1 - prev_a.1) -- Interpolate for 'A'
) as a_value,
if(
ts = 'B',
toFloat64(value),
prev_b.2 + (timestamp - prev_b.1 ) * (next_b.2 - prev_b.2) / ( next_b.1 - prev_b.1)
toFloat64(value), -- If the current series is 'B', keep the original value
prev_b.2 + (timestamp - prev_b.1 ) * (next_b.2 - prev_b.2) / ( next_b.1 - prev_b.1) -- Interpolate for 'B'
) as b_value
FROM
(
SELECT
timestamp,
ts,
value,
-- Find the previous and next values for series 'A'
anyLastIf((timestamp,value), ts='A') OVER (ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_a,
anyLastIf((timestamp,value), ts='A') OVER (ORDER BY timestamp DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS next_a,
-- Find the previous and next values for series 'B'
anyLastIf((timestamp,value), ts='B') OVER (ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_b,
anyLastIf((timestamp,value), ts='B') OVER (ORDER BY timestamp DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS next_b
FROM
test_ts_interpolation
)

```
#### Explanation:
**Timestamp Alignment:**
We align the timestamps of both series (A and B) and handle missing data points.

**Interpolation Logic:**
For each A-series timestamp, if the current series is not A, we calculate the interpolated value using the linear interpolation formula:

```bash
interpolated_value = prev_a.2 + ((timestamp - prev_a.1) / (next_a.1 - prev_a.1)) * (next_a.2 - prev_a.2)
```
Similarly, for the B series, interpolation is calculated between the previous (prev_b) and next (next_b) known values.

**Window Functions:**
anyLastIf() is used to fetch the previous or next values for series A and B based on the timestamps.
We use window functions to efficiently calculate these values over the ordered sequence of timestamps.


By using window functions and interpolation, we can align time-series data with irregular timestamps and fill in missing values based on nearby data points. This technique is useful in scenarios where data is recorded at different times or irregular intervals across multiple series.

Loading