Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjacency Search #1

Open
jcuenod opened this issue Jun 8, 2024 · 0 comments
Open

Adjacency Search #1

jcuenod opened this issue Jun 8, 2024 · 0 comments

Comments

@jcuenod
Copy link
Member

jcuenod commented Jun 8, 2024

With a sliding window function, we should be able to do "adjacency" searches in parabible. This will not force an order of wids (not a sequence search), but it is a useful first step. The relevant "WINDOW" function docs for clickhouse are pasted below. First, though, here's a mocked up query that does something useful:

select * from (
SELECT
    groupArray(rid) OVER w1 AS rids,
    groupArray(wid) OVER w1 AS wids,
    groupArray(text) OVER w1 AS words,
    groupArray(lexeme) OVER w1 AS lexemes,
    groupArrayIf(lexeme, lexeme='ἐκκλησία') OVER w1 AS w1,
    groupArrayIf(lexeme, lexeme='παροικέω') OVER w1 AS w2
FROM word_features
WINDOW
    w1 AS (PARTITION BY module_id ORDER BY wid ASC Rows BETWEEN 2 PRECEDING AND 2 FOLLOWING)
)
where length(w1) > 0 and length(w2) > 0
LIMIT 10
-- first_value and last_value respect the frame
SELECT
    groupArray(value) OVER w1 AS frame_values_1,
    first_value(value) OVER w1 AS first_value_1,
    last_value(value) OVER w1 AS last_value_1,
    groupArray(value) OVER w2 AS frame_values_2,
    first_value(value) OVER w2 AS first_value_2,
    last_value(value) OVER w2 AS last_value_2
FROM wf_frame
WINDOW
    w1 AS (PARTITION BY part_key ORDER BY order ASC),
    w2 AS (PARTITION BY part_key ORDER BY order ASC Rows BETWEEN 1 PRECEDING AND CURRENT ROW)
ORDER BY
    part_key ASC,
    value ASC;
┌─frame_values_1─┬─first_value_1─┬─last_value_1─┬─frame_values_2─┬─first_value_2─┬─last_value_2─┐
│ [1]            │             11 │ [1]            │             11 │
│ [1,2]          │             12 │ [1,2]          │             12 │
│ [1,2,3]        │             13 │ [2,3]          │             23 │
│ [1,2,3,4]      │             14 │ [3,4]          │             34 │
│ [1,2,3,4,5]    │             15 │ [4,5]          │             45 │
└────────────────┴───────────────┴──────────────┴────────────────┴───────────────┴──────────────┘

-- second value within the frame
SELECT
    groupArray(value) OVER w1 AS frame_values_1,
    nth_value(value, 2) OVER w1 AS second_value
FROM wf_frame
WINDOW w1 AS (PARTITION BY part_key ORDER BY order ASC Rows BETWEEN 3 PRECEDING AND CURRENT ROW)
ORDER BY
    part_key ASC,
    value ASC
┌─frame_values_1─┬─second_value─┐
│ [1]            │            0 │
│ [1,2]          │            2 │
│ [1,2,3]        │            2 │
│ [1,2,3,4]      │            2 │
│ [2,3,4,5]      │            3 │
└────────────────┴──────────────┘

-- second value within the frame + Null for missing values
SELECT
    groupArray(value) OVER w1 AS frame_values_1,
    nth_value(toNullable(value), 2) OVER w1 AS second_value
FROM wf_frame
WINDOW w1 AS (PARTITION BY part_key ORDER BY order ASC Rows BETWEEN 3 PRECEDING AND CURRENT ROW)
ORDER BY
    part_key ASC,
    value ASC
┌─frame_values_1─┬─second_value─┐
│ [1]            │         ᴺᵁᴸᴸ │
│ [1,2]          │            2 │
│ [1,2,3]        │            2 │
│ [1,2,3,4]      │            2 │
│ [2,3,4,5]      │            3 │
└────────────────┴──────────────┘
```sql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant