-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/search work by tag #99
Conversation
フォーマッタが既存コードを片っ端からフォーマッティングしてる? |
一旦別ブランチでフォーマットは行うことを想定してフォーマットのみの変更点を切り戻すコミットを作ろうと思ってます. |
1cc762f
to
fc3f7fa
Compare
d583aef
to
7dd2398
Compare
cruds/works/__init__.py
Outdated
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tags)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[COMMENT]
とりあえず良さそう
taggings
テーブルのインデックスが上手に使えてない気がしているので、WorkとTagが増えたときにめっちゃ遅くなりそうな気持ちはある。
toybox=# EXPLAIN ANALYSE SELECT works.* FROM works, taggings, tags
WHERE taggings.work_id = works.id AND taggings.tag_id = tags.id AND tags.id IN ('0d435e98-1b63-4a8f-9433-1cfa7840e024', '51216642-ae4e-48d7-80ee-741ec76b0fd5', 'af040833-7ab9-4fff-844a-c234041f3f71') GROUP BY works.id
HAVING count(works.id) = 3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=24.97..25.05 rows=1 width=1850) (actual time=0.017..0.018 rows=0 loops=1)
Group Key: works.id
Filter: (count(works.id) = 3)
-> Sort (cost=24.97..24.98 rows=4 width=1850) (actual time=0.016..0.017 rows=0 loops=1)
Sort Key: works.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=11.00..24.93 rows=4 width=1850) (actual time=0.005..0.006 rows=0 loops=1)
-> Hash Join (cost=10.86..21.76 rows=4 width=516) (actual time=0.004..0.005 rows=0 loops=1)
Hash Cond: ((taggings.tag_id)::text = (tags.id)::text)
-> Seq Scan on taggings (cost=0.00..10.70 rows=70 width=1032) (actual time=0.003..0.004 rows=0 loops=1)
-> Hash (cost=10.82..10.82 rows=3 width=516) (never executed)
-> Seq Scan on tags (cost=0.00..10.82 rows=3 width=516) (never executed)
Filter: ((id)::text = ANY ('{0d435e98-1b63-4a8f-9433-1cfa7840e024,51216642-ae4e-48d7-80ee-741ec76b0fd5,af040833-7ab9-4fff-844a-c234041f3f71}'::text[]))
-> Index Scan using works_pkey on works (cost=0.14..0.79 rows=1 width=1850) (never executed)
Index Cond: ((id)::text = (taggings.work_id)::text)
Planning Time: 0.296 ms
Execution Time: 0.112 ms
(17 rows)
作品一覧取得のエンドポイントと分ける必要がないと思う。既存の作品一覧取得のエンドポイントのクエリパラメータとしてタグを渡して絞り込むような実装でいいと思いました。フロントの実装的にもエンドポイントを分けない方がやりやすいと思います。 |
作品一覧取得のエンドポイントにtagsというクエリパラメータを受け取り,tagのidをカンマ区切りで受け取り,選んだタグが全て入っている作品のみ(AND検索)を表示するように修正いたしました. |
cruds/works/__init__.py
Outdated
models.Work.visibility != models.Visibility.draft) | ||
def get_works_by_limit(db: Session, limit: int, visibility: models.Visibility, oldest_id: str, tags: str, auth: bool = False) -> List[Work]: | ||
tag_list = tags.split(',') | ||
works_orm = db.query(models.Work).filter(models.Tagging.work_id == models.Work.id).filter(models.Tagging.tag_id == models.Tag.id).filter(models.Tag.id.in_(tag_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
今のコード
sql
SELECT works.id AS works_id, works.title AS works_title, works.description AS works_description, works.description_html AS works_description_html, works.user_id AS works_user_id, works.visibility AS works_visibility, works.created_at AS works_created_at, works.updated_at AS works_updated_at
FROM works, taggings, tags
WHERE taggings.work_id = works.id AND taggings.tag_id = tags.id AND tags.id IN (__[POSTCOMPILE_id_1]) GROUP BY works.id
HAVING count(works.id) = %(count_1)s
↓に書き換え
python
works_orm = db.query(models.Work).filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id)
sql
SELECT works.id AS works_id, works.title AS works_title, works.description AS works_description, works.description_html AS works_description_html, works.user_id AS works_user_id, works.visibility AS works_visibility, works.created_at AS works_created_at, works.updated_at AS works_updated_at
FROM works, taggings
WHERE taggings.tag_id IN (__[POSTCOMPILE_tag_id_1]) AND taggings.work_id = works.id GROUP BY works.id
HAVING count(works.id) = %(count_1)s
効率的にtaggingのtag_idで先に絞ってから、work_idで絞る方がいいと思う。上に書いたのは一例だからもっといい他の方法があるかも
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ここ複雑だし、生SQLで書いたほうが可読性上がったりしないかと思った
JOINでやるのが良いのかな〜
SELECT ...
FROM works
INNER JOIN taggings ON taggings.work_id = works.id
WHERE taggings.tag_id IN ?
GROUP BY work.id
HAVING count(works.id) = ? -- ここはtag_idsの数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
↑動くかは知らん
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orm系はsqlで先に考えて書き換えた方がいい気がします。バイトの時もjoinとかネストとか複雑なsqlが必要になった時にsql文作ってからormの記法に書き換えてました。
生のsqlを書くのが可読性以外(セキュリティー、安定性)でマイナスにならなければ生sqlでもいいと思います
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一応この部分のORMはSQLを書いた後でORMに書き換える手法で記述しました.
シモと同意で,セキュリティ的に問題が無いのかという不安があるのと,また,他の記述で生のSQLを採用する基準がはっきりしないので(ここ以外で生SQL使わない方針なら心配ないですけど),一括で統一したほうが楽かなとも思いました.
その点問題が解消するなら可読性の観点からも生SQLもありかと思います.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
作品の絞り込み方の修正は対応しました.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[COMMENT] シモくんコメントのとこだけ直してもらえるとよさそう
if tags: | ||
tag_list = tags.split(',') | ||
works_orm = works_orm.filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id) | ||
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
これって何してるん?
いまいち分かってなくて、、、
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag_list = tags.split(',')
','で区切ったtag_idを配列にして,
works_orm = works_orm.filter(models.Tagging.tag_id.in_(tag_list)).filter(models.Tagging.work_id == models.Work.id)
Taggingテーブルからそのタグが付いたデータを抽出して,そのデータのwork_idと一致する作品情報をWorkテーブルから持ってきて,
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list))
重複してる作品をGROUPで一纏めにして,重複したデータ数がタグの数と一致したもので更に絞り込んでます.
1つの作品と検索してるタグそれぞれのTaggingデータがあればちょうどタグの数の分だけ重複するから最後のhavingがついてます.
SQLっぽく書くと,
SELECT works.* FROM works, tagging
WHERE tagging.tag_id in ("tagid1", ..., "tagid2") AND tagging.work_id = works.id
GROUP BY works.id HAVING count(works.id) = 4; -- 4はタグの数
って感じです.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works_orm = works_orm.group_by(models.Work.id).having(func.count(models.Work.id) == len(tag_list))
重複してる作品をGROUPで一纏めにして,重複したデータ数がタグの数と一致したもので更に絞り込んでます.
1つの作品と検索してるタグそれぞれのTaggingデータがあればちょうどタグの数の分だけ重複するから最後のhavingがついてます.
なるほど、理解
works_orm = db.query(models.Work).order_by(models.Work.created_at).filter( | ||
models.Work.visibility != models.Visibility.draft) | ||
def get_works_by_limit(db: Session, limit: int, visibility: models.Visibility, oldest_id: str, tags: str, auth: bool = False) -> List[Work]: | ||
works_orm = db.query(models.Work).order_by(desc(models.Work.created_at)).filter(models.Work.visibility != models.Visibility.draft) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
タグで絞り込んで作品数が減った後にソートした方が処理が早い気がする
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
そっか
ほんまやわw
No description provided.