-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New spec: Request body canonicalization #141
Comments
I had an attempt at documenting it while writing a Java implementation. http://iipc.github.io/warc-specifications/guidelines/cdx-non-get-requests/ There's some interesting quirks to it. JSON null is encoded as the string "None". It can also produce slightly different output when run on old versions of Python (< 3.7). |
Thank you @ato , this is very helpful! Documenting the quirks especially - I'll have to look into the Python < 3.7 issues! |
We will be modifying pywb and warcio.js to be consistent - necessary issues have been opened. Notably, we'll be making pywb use native JSON values rather than Pythonic values (e.g. We'll need to make sure that this isn't a breaking change and that already-canonicalized URLs created by pywb will either continue to work with pywb and wabac.js's fuzzy matching (preferred) or that we have a conversion process available. |
@tw4l I have the modified outbackcdx from @ato using index version 5 running and indexed a warc using However, I can't get the replay of POST requests working. It seems the __wb_method and value parameters are not getting through to outbackcdx (latest pywb 2.7.4) in the request from pywb. Before looking too far - did I miss something in the config? (I'm using |
Pywb needs this patch to pass them through: |
It works very nicely now, thanks @ato! Do you see any problems for using index version 5 in production? (besides having to recreate the index if the spec changes at a later time) |
We run index version 5 it in production and haven't had any issues so far. The only reason it's not the default yet is because the upgrade process needs a bit more polishing. |
Sounds good! Mentioning upgrade... so there is a way to upgrade the index? (so far I only used a newly created index) It would of course simplify things a lot. |
I've written up some notes about upgrading here: nla/outbackcdx#117 |
POST canonicalization (or POST append) is implemented in pywb, warcio.js, and cdxj-indexer, but doesn't have a written specification.
This was first implemented in pywb in webrecorder/pywb#636. Various implementations are described here: webrecorder/replayweb.page#69 (comment)
A proper specification would likely help in resolving issues such as webrecorder/pywb#768 as well as generally making it clear to users and developers the rationale and expected behavior for POST canonicalization.
The text was updated successfully, but these errors were encountered: