Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMMV with duplicate data in the pkey can't be created due to auto index creation #99

Open
aeg opened this issue Sep 7, 2024 · 5 comments · Fixed by #106
Open

IMMV with duplicate data in the pkey can't be created due to auto index creation #99

aeg opened this issue Sep 7, 2024 · 5 comments · Fixed by #106
Assignees
Labels
enhancement New feature or request

Comments

@aeg
Copy link

aeg commented Sep 7, 2024

In pg_ivm 1.9, IMMV creation fails due to issues with index creation, which worked in 1.5.

expect

I want the ability to suppress auto index creation when needed to create IMMV.

how to reproduce

CREATE TABLE table_name ( id SERIAL PRIMARY KEY, json jsonb );

INSERT INTO table_name (json) VALUES ('{"a": [{"name":"b"},{"name": "c"},{"name": "d"}]}'::jsonb);
INSERT INTO table_name (json) VALUES ('{"a": [{"name":"e"},{"name": "f"},{"name": "g"}]}'::jsonb);
postgres=#  SELECT
    id,
    (element->>'name') AS name
FROM
    table_name,
    jsonb_array_elements(json->'a') AS element;
 id | name
----+------
  2 | b
  2 | c
  2 | d
  3 | e
  3 | f
  3 | g
(6 rows)

then create immv(failed)

postgres=# select create_immv('json_iv', $$SELECT
   id,
    (element->>'name') AS name
FROM
    table_name,
    jsonb_array_elements(json->'a') AS element$$);
ERROR:  could not create unique index "json_iv_index"
DETAIL:  Key (id)=(2) is duplicated.

Added DISTINCT in the SELECT then I could create the IMMV, but the execution plan is different, so it's inefficient and not what I need.

postgres=# select create_immv('json_iv', $$SELECT
   distinct id,
    (element->>'name') AS name
FROM
    table_name,
    jsonb_array_elements(json->'a') AS element$$);
NOTICE:  created index "json_iv_index" on immv "json_iv"
 create_immv
-------------
           6
(1 row)

postgres=# \d table_name
                            Table "public.table_name"
 Column |  Type   | Collation | Nullable |                Default
--------+---------+-----------+----------+----------------------------------------
 id     | integer |           | not null | nextval('table_name_id_seq'::regclass)
 json   | jsonb   |           |          |
Indexes:
    "table_name_pkey" PRIMARY KEY, btree (id)
@yugo-n yugo-n self-assigned this Oct 14, 2024
@yugo-n yugo-n added the bug Something isn't working label Oct 14, 2024
@yugo-n
Copy link
Collaborator

yugo-n commented Oct 14, 2024

Thank you for your reporting

Yes, this would be a bug of automatic index creation when the view has set-returning function in FROM clause. I'll fix this.

I want the ability to suppress auto index creation when needed to create IMMV.

Even though putting the bug-fix aside, your proposal also may make sense. This can be workaround for issues like this case and also it might be possible that users can find another efficient index.

@yugo-n yugo-n added the enhancement New feature or request label Oct 21, 2024
@yugo-n
Copy link
Collaborator

yugo-n commented Oct 21, 2024

I want the ability to suppress auto index creation when needed to create IMMV.

Or adding GUC parameter for this is better, instead of adding function arguments?

@aeg
Copy link
Author

aeg commented Oct 26, 2024

Yes, this would be a bug of automatic index creation when the view has set-returning function in FROM clause. I'll fix this.

I want the ability to suppress auto index creation when needed to create IMMV.

Even though putting the bug-fix aside, your proposal also may make sense. This can be workaround for issues like this case and also it might be possible that users can find another efficient index.

一方で、自分が提示したようなユニークキーが重複しており、複数のレコードに展開されるようなimmvを設定してしまうと、元レコードに更新があった場合に、複数レコードに対して更新がされるためにあまり効率的ではないという事象も見えています。しかも、オリジナルレコードのどのレコードが変化したら、反映すべきかという判断までおそらく現状していないと思うのですが、immvで参照されていないレコードの更新でもトリガーにより処理がされるのか、オリジナルレコードの更新に時間がかかるという事象が見えています。
これはこれで別の話だとは思いますが、immv 的にはそのような使い方はあまり推奨していないというところなのでしょうか。

@yugo-n
Copy link
Collaborator

yugo-n commented Oct 29, 2024

一方で、自分が提示したようなユニークキーが重複しており、複数のレコードに展開されるようなimmvを設定してしまうと、元レコードに更新があった場合に、複数レコードに対して更新がされるためにあまり効率的ではないという事象も見えています。
With an IMMV that a record in a table is expanded to multiple records in the view, when a record is updated in the original record, it is not very efficient because multiple records are updated.)

ビューの定義上、テーブルの1つ行更新がビューの複数行の更新を要する場合には、複数行更新が発生するのは必然の結果です。

If updating one row in the table requires updating multiple rows in the view, it is inevitable that multiple rows will be updated.

しかも、オリジナルレコードのどのレコードが変化したら、反映すべきかという判断までおそらく現状していないと思うのですが、immvで参照されていないレコードの更新でもトリガーにより処理がされるのか、オリジナルレコードの更新に時間がかかるという事象が見えています。
I don't think that it is currently possible to determine which record in the original record is should be reflected to the view contents when it is changed, so I am seeing that it takes time to update the original record, as the trigger processes updates to records not referenced in IMMV.

WHERE句で除外されているはずのテーブル行が更新された場合にもトリガが起動してしまうことについて、でしょうか。これに対しては、不要な場合にトリガが関数が起動しないよう WHEN condition を付与することが思いつきましたが、文レベルトリガではこれは有用ではない[1]ためうまく行かなそうです。

As to it, I thought of adding a WHEN condition so that the trigger does not fire the function when it is not needed, but this is not useful for statement-level triggers [1] , so it doesn't seem to work.

[1] https://www.postgresql.org/docs/current/sql-createtrigger.html

これはこれで別の話だとは思いますが、immv 的にはそのような使い方はあまり推奨していないというところなのでしょうか
Does such usages of IMMV are not recommended?

非推奨ということではないですが、上記による更新性能への影響が要件を満たさないほどに大きい場合には使用に適さないでしょう。

It's not exactly unrecommended, but if the impact of the above on update performance is so great that it doesn't meet your requirements, it's probably not suitable for use.

yugo-n added a commit that referenced this issue Dec 9, 2024
Previously, a unique index is automatically created even when
a set-returning function is contained in FROM clause, but this
results in an error due to key duplication. Now, an index is
not created automatically when a relation other than table,
sub-query, or join is contained in FROM clause.

Issue (#99)
@yugo-n yugo-n linked a pull request Dec 9, 2024 that will close this issue
yugo-n added a commit that referenced this issue Dec 9, 2024
Previously, a unique index is automatically created even when
a set-returning function is contained in FROM clause, but this
results in an error due to key duplication. Now, an index is
not created automatically when a relation other than table,
sub-query, or join is contained in FROM clause.

Issue (#99)
@yugo-n
Copy link
Collaborator

yugo-n commented Dec 9, 2024

The bug is fixed. but there may be enhancement to suppress automatic index creation.

@yugo-n yugo-n reopened this Dec 9, 2024
@yugo-n yugo-n removed the bug Something isn't working label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants