-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackfillTransactionSyncBatchJob sql queries are inefficient #247
Comments
Thanks for the detailed report and your proposal. 👍 I agree that there are a bunch of straightforward performance improvements we should make to this. |
We've confirmed this by testing if
The assumption we made is that if the start date is not set when are backfilling transactions, the user wants to backfill all existing orders. For anyone who wants to narrow the set of orders, the expectation is that they will set the start and the end dates for backfilling. |
You're right, if someone gives a start date. However, Taxjar only allows you to backfill up to the current calendar year or a few months for free. After that, you have to pay per year. It could be dangerous to offer an empty date as the default option. I would personally require to put a backfill date and set it as that so the user knows they have to make an explicit choice on the date. |
I have been testing the backfills a bit over the last couple months on production and I have some thoughts on efficiency. Lets take a look at the code for the backfill batch job:
The first line:
complete_orders = ::Spree::Order.complete.where(shipment_state: 'shipped')
By itself, that's an extremely expensive sql query. If a store has hundreds of thousands, or millions of orders, this could take a VERY long time, or freeze the thread. This has happened to me. There needs to be some lower bound to the date. Either require the
start_date
or have a date in the admin for when Taxjar started. For instance, we've been using spree / solidus since Dec 2010, but started using Taxjar April 2021. We backfilled to Feb 2021 when we started. A good default date would be Feb 2021 for thetaxjar_start_date
. That would prevent 10+ years of orders from getting queried.next if order.taxjar_order_transactions.any?
this is an N+1 query since it fetches
taxjar_order_transactions
for each order in the loop. This can be resolved with anincludes
andwhere
what I propose:
thoughts?
The text was updated successfully, but these errors were encountered: