Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/search work by tag #99

Merged
merged 11 commits into from
Aug 3, 2022
Merged

Feature/search work by tag #99

merged 11 commits into from
Aug 3, 2022

Conversation

PigeonsHouse
Copy link
Collaborator

No description provided.

@PigeonsHouse PigeonsHouse requested a review from rkun123 July 5, 2022 00:43
@PigeonsHouse PigeonsHouse self-assigned this Jul 5, 2022
@PigeonsHouse PigeonsHouse linked an issue Jul 5, 2022 that may be closed by this pull request
@PigeonsHouse PigeonsHouse added this to the α milestone Jul 5, 2022
@PigeonsHouse PigeonsHouse requested a review from Simo-C3 July 6, 2022 00:04
@rkun123
Copy link
Contributor

rkun123 commented Jul 9, 2022

フォーマッタが既存コードを片っ端からフォーマッティングしてる?
みんなにフォーマッタをちゃんと設定してもらったほうが良いかもね〜

@PigeonsHouse
Copy link
Collaborator Author

PigeonsHouse commented Jul 10, 2022

flake8のlinterが警告出して来るのが目に入るたびに修正しちゃってブランチに直接関係ないコミットも乗っけちゃってますね
フォーマッタの導入は周知したいですけど既存のコードの修正はどのタイミングでやりましょうか?リファクタブランチとか作ります?
あとflake8のチェック通すActionsも作りたいなと考えてましたのでissue切っときます.
#89 #72
↑似たようなissueはすでにあったので着手するときはこのissueで対応することにします.

@PigeonsHouse
Copy link
Collaborator Author

一旦別ブランチでフォーマットは行うことを想定してフォーマットのみの変更点を切り戻すコミットを作ろうと思ってます.

@PigeonsHouse PigeonsHouse force-pushed the feature/search_work_by_tag branch from 1cc762f to fc3f7fa Compare July 11, 2022 10:20
@PigeonsHouse PigeonsHouse force-pushed the feature/search_work_by_tag branch from d583aef to 7dd2398 Compare July 12, 2022 02:02
Comment on lines 244 to 245
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tags))
Copy link
Contributor

@rkun123 rkun123 Jul 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[COMMENT]
とりあえず良さそう
taggingsテーブルのインデックスが上手に使えてない気がしているので、WorkとTagが増えたときにめっちゃ遅くなりそうな気持ちはある。

image

toybox=# EXPLAIN ANALYSE SELECT works.* FROM works, taggings, tags
WHERE taggings.work_id = works.id AND taggings.tag_id = tags.id AND tags.id IN ('0d435e98-1b63-4a8f-9433-1cfa7840e024', '51216642-ae4e-48d7-80ee-741ec76b0fd5', 'af040833-7ab9-4fff-844a-c234041f3f71') GROUP BY works.id
HAVING count(works.id) = 3;
                                                                                       QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=24.97..25.05 rows=1 width=1850) (actual time=0.017..0.018 rows=0 loops=1)
   Group Key: works.id
   Filter: (count(works.id) = 3)
   ->  Sort  (cost=24.97..24.98 rows=4 width=1850) (actual time=0.016..0.017 rows=0 loops=1)
         Sort Key: works.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=11.00..24.93 rows=4 width=1850) (actual time=0.005..0.006 rows=0 loops=1)
               ->  Hash Join  (cost=10.86..21.76 rows=4 width=516) (actual time=0.004..0.005 rows=0 loops=1)
                     Hash Cond: ((taggings.tag_id)::text = (tags.id)::text)
                     ->  Seq Scan on taggings  (cost=0.00..10.70 rows=70 width=1032) (actual time=0.003..0.004 rows=0 loops=1)
                     ->  Hash  (cost=10.82..10.82 rows=3 width=516) (never executed)
                           ->  Seq Scan on tags  (cost=0.00..10.82 rows=3 width=516) (never executed)
                                 Filter: ((id)::text = ANY ('{0d435e98-1b63-4a8f-9433-1cfa7840e024,51216642-ae4e-48d7-80ee-741ec76b0fd5,af040833-7ab9-4fff-844a-c234041f3f71}'::text[]))
               ->  Index Scan using works_pkey on works  (cost=0.14..0.79 rows=1 width=1850) (never executed)
                     Index Cond: ((id)::text = (taggings.work_id)::text)
 Planning Time: 0.296 ms
 Execution Time: 0.112 ms
(17 rows)

routers/works/__init__.py Outdated Show resolved Hide resolved
@Simo-C3
Copy link
Member

Simo-C3 commented Jul 19, 2022

作品一覧取得のエンドポイントと分ける必要がないと思う。既存の作品一覧取得のエンドポイントのクエリパラメータとしてタグを渡して絞り込むような実装でいいと思いました。フロントの実装的にもエンドポイントを分けない方がやりやすいと思います。

@PigeonsHouse
Copy link
Collaborator Author

作品一覧取得のエンドポイントにtagsというクエリパラメータを受け取り,tagのidをカンマ区切りで受け取り,選んだタグが全て入っている作品のみ(AND検索)を表示するように修正いたしました.

cruds/works/__init__.py Outdated Show resolved Hide resolved
models.Work.visibility != models.Visibility.draft)
def get_works_by_limit(db: Session, limit: int, visibility: models.Visibility, oldest_id: str, tags: str, auth: bool = False) -> List[Work]:
tag_list = tags.split(',')
works_orm = db.query(models.Work).filter(models.Tagging.work_id == models.Work.id).filter(models.Tagging.tag_id == models.Tag.id).filter(models.Tag.id.in_(tag_list))
Copy link
Member

@Simo-C3 Simo-C3 Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

今のコード
sql

SELECT works.id AS works_id, works.title AS works_title, works.description AS works_description, works.description_html AS works_description_html, works.user_id AS works_user_id, works.visibility AS works_visibility, works.created_at AS works_created_at, works.updated_at AS works_updated_at 
FROM works, taggings, tags 
WHERE taggings.work_id = works.id AND taggings.tag_id = tags.id AND tags.id IN (__[POSTCOMPILE_id_1]) GROUP BY works.id 
HAVING count(works.id) = %(count_1)s

↓に書き換え
python

works_orm = db.query(models.Work).filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id)

sql

SELECT works.id AS works_id, works.title AS works_title, works.description AS works_description, works.description_html AS works_description_html, works.user_id AS works_user_id, works.visibility AS works_visibility, works.created_at AS works_created_at, works.updated_at AS works_updated_at 
FROM works, taggings 
WHERE taggings.tag_id IN (__[POSTCOMPILE_tag_id_1]) AND taggings.work_id = works.id GROUP BY works.id 
HAVING count(works.id) = %(count_1)s

効率的にtaggingのtag_idで先に絞ってから、work_idで絞る方がいいと思う。上に書いたのは一例だからもっといい他の方法があるかも

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ここ複雑だし、生SQLで書いたほうが可読性上がったりしないかと思った
JOINでやるのが良いのかな〜

SELECT ...
FROM works
INNER JOIN taggings ON taggings.work_id = works.id
WHERE taggings.tag_id IN ?
GROUP BY work.id
HAVING count(works.id) = ? -- ここはtag_idsの数

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

↑動くかは知らん

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orm系はsqlで先に考えて書き換えた方がいい気がします。バイトの時もjoinとかネストとか複雑なsqlが必要になった時にsql文作ってからormの記法に書き換えてました。

生のsqlを書くのが可読性以外(セキュリティー、安定性)でマイナスにならなければ生sqlでもいいと思います

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一応この部分のORMはSQLを書いた後でORMに書き換える手法で記述しました.
シモと同意で,セキュリティ的に問題が無いのかという不安があるのと,また,他の記述で生のSQLを採用する基準がはっきりしないので(ここ以外で生SQL使わない方針なら心配ないですけど),一括で統一したほうが楽かなとも思いました.
その点問題が解消するなら可読性の観点からも生SQLもありかと思います.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

作品の絞り込み方の修正は対応しました.

Copy link
Contributor

@rkun123 rkun123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[COMMENT] シモくんコメントのとこだけ直してもらえるとよさそう

This was referenced Jul 31, 2022
if tags:
tag_list = tags.split(',')
works_orm = works_orm.filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id)
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

これって何してるん?
いまいち分かってなくて、、、

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag_list = tags.split(',')

','で区切ったtag_idを配列にして,

works_orm = works_orm.filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id)

Taggingテーブルからそのタグが付いたデータを抽出して,そのデータのwork_idと一致する作品情報をWorkテーブルから持ってきて,

works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list))

重複してる作品をGROUPで一纏めにして,重複したデータ数がタグの数と一致したもので更に絞り込んでます.
1つの作品と検索してるタグそれぞれのTaggingデータがあればちょうどタグの数の分だけ重複するから最後のhavingがついてます.

SQLっぽく書くと,

SELECT works.* FROM works, tagging
WHERE tagging.tag_id in ("tagid1", ..., "tagid2") AND tagging.work_id = works.id
GROUP BY works.id HAVING count(works.id) = 4; -- 4はタグの数

って感じです.

Copy link
Member

@Simo-C3 Simo-C3 Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list))
重複してる作品をGROUPで一纏めにして,重複したデータ数がタグの数と一致したもので更に絞り込んでます.
1つの作品と検索してるタグそれぞれのTaggingデータがあればちょうどタグの数の分だけ重複するから最後のhavingがついてます.

なるほど、理解

works_orm = db.query(models.Work).order_by(models.Work.created_at).filter(
models.Work.visibility != models.Visibility.draft)
def get_works_by_limit(db: Session, limit: int, visibility: models.Visibility, oldest_id: str, tags: str, auth: bool = False) -> List[Work]:
works_orm = db.query(models.Work).order_by(desc(models.Work.created_at)).filter(models.Work.visibility != models.Visibility.draft)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

タグで絞り込んで作品数が減った後にソートした方が処理が早い気がする

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlalchemyはall()やfirst()などが付くまではSQLを構成するだけで実行はされないため全て記述した順番通りに実行されるわけではないと記憶しております.
また,そもそもPostgreSQLではソートを行ったあとで絞り込みを行うという記述は実行することが出来ないため,正常に動作していることからも問題ないと考えています.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

そっか
ほんまやわw

@PigeonsHouse PigeonsHouse requested review from rkun123 and Simo-C3 August 3, 2022 02:17
@Simo-C3 Simo-C3 merged commit 478cf4b into main Aug 3, 2022
@Simo-C3 Simo-C3 deleted the feature/search_work_by_tag branch October 13, 2022 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workのタグ検索機能
3 participants