-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend and explain in more details how to upload large files. #315
Extend and explain in more details how to upload large files. #315
Conversation
ae297f4
to
f082527
Compare
@tsodring and @ivaylomitrev, please have a look at the rewritten example and description to let me know if it make sense and seem correct to you. |
It seems correct to me on a general level, but I have several (not-so-small) concerns on the topic:
A note on the backwards-incompatible mentions in my comment: The specification only (indirectly) requires OData version to be transferred between the client and server. It does not, in any way, allow a client and server to negotiate on the version to be used other than the purely informational admin/system. Even if admin/system was not intended as informational, a server implementation upgrading its version automatically and inadvertently breaks all clients which only expect a lower version to have been supported. As a result, performing a 1.y change that breaks previous 1.x requirements cannot be communicated effectively between the client and server. As a result, a 1.2 vendor implementation (assuming such backwards-incompatible changes will go there) cannot respond to a client with the 1.1 format as the client has not indicated, in any way, that it only supports 1.1, but not any versions above that. As a result, a vendor implementation cannot actually upgrade its implementation to 1.2 as it cannot communicate with clients supporting different versions simultaneously. This means that vendor implementations are locked at the version at which the first client integration was performed with no way to support newer features. Please correct me if you find flaws in my logic regarding backwards compatibility. |
[ivaylomitrev]
It seems correct to me on a general level, but I have several
(not-so-small) concerns on the topic:
Thank you for taking the time to review the proposal.
1. I do not see a reason to limit "small" file uploads. If a client
decides to upload a 5GB file as a "small" file that is up to them. I
would assume that in such a case, they simply do not want a resumable
session and are OK with repeating the upload if necessary (i.e. if it
fails). In other words, to me the distinction is between resumable and
non-resumable upload, not between small and large file upload and I do
not see the need for limitations. Also, 1.0 and 1.1 _imply_ that
resumable file upload _can be_ used for files >150MB, but does not
_prevent_ the upload of >150 MBs using the "small file upload
implementation". To the contrary, the distinction between the two
contracts is based on headers, not based on size. This is a
_backwards-incompatible_ change that would break any client
implementations that have used the "small file upload" contract for
files > 150MB as vendors will be required to start rejecting them.
I believe this is a misunderstanding on several levels. First of all,
there are scenarious where large HTTP sessions are blocked/cut, which
can be handled by doing chunked uploads across several HTTP sessions. I
have personally experienced it with a transparent proxy administrated by
a third party causing problems. This is independent of any wish for
resumable upload. Second, a server implementation specifying
bulkgrense=0 do not need to reject any sizes using the small file upload
method, so those servers can keep handling clients ignoring the
recommendation to upload files larger than 150 MiB in chunks. Clients
following the recommendation will keep working with any server
implementing the specification too. As far as I know the only clients
that might break when using servers following this new specification are
those trying to ignore the recommended 150 MiB cutoff point and
uploading large files using the small file upload method without
verifying the servers limit by looking at bulkgrense. Rejecting such
uploads is both expected and intended.
2. The query parameter `filsesjon` change to `ref` is also
backwards-incompatible. Any integration developed against a 1.0/1.1
vendor implementation needs to be _modified_ to support 1.2 meaning
that vendors cannot upgrade to 1.2 seamlessly without client
changes. In the real world, it is nigh impossible to request client
changes for such scenarios. My advice would be to still allow
`filsesjon` as fallback if `ref` is missing in 1.2. If not, I believe
the version should be bumped to 2.0 as a result of that. Even if it is
bumped to 2.0, there are still other problems with
backwards-incompatibility in the specification, however. Please check
my comment at the end.
The values of href fields are not standardized in this specification,
and it is entirely up to the server to decide what they should contain.
The same is the case with the content of the Location header. I will
add some text to make this more clear. I changed the examples to avoid
bad language caused by questionable translation (english session = norsk
økt, not 'sesjon'), but place no new constrain on what servers need to
put in their
3. I see a sequence number appended to the "session" in the examples -
`ref=abc1234567-1`, `ref=abc1234567-2`, etc., and I am not sure I
understand where the requirement originates from. I do not see this as
described and I also do not see the need of it considering that an
implementation can reject an out-of-place "chunk" based on the Range
and Content-Range headers alongside the state of the chunks maintained
on the server-side (since this is a stateful implementation after
all). I believe this _sequence number_ is derived information that is
not needed and simply introduces noise and complexity. It is also
_backwards-incompatible_ which implies it would require a version
change to 2.0 and would prevent vendors from supporting both 1.x and
the new format.
I added these sequence numbers to make it more clear in the example that
the Location header returned is the one to use in the next PUT. It is
already expressed in the protocol description: "Responsen du mottar vil
inneholde en Location-Header som inneholder en økt-URI som skal benyttes
i neste PUT-forespørsel for å overføre neste bolk av filen.", but was
not made obvious in the example. I hope it is more obvious now. A
server implementation is free to reuse the same URL for every response,
and allow multiple PUT to the same URL, but it is not required by the
specification.
A note on the _backwards-incompatible_ mentions in my comment:
The specification only (indirectly) requires OData version to be
transferred between the client and server. It does not, in any way,
allow a client and server to negotiate on the version to be used other
than the purely informational admin/system. Even if admin/system was
not intended as informational, a server implementation upgrading its
version automatically and inadvertently breaks all clients which only
expect a lower version to have been supported. As a result, performing
a 1.y change that _breaks_ previous 1.x requirements cannot be
communicated effectively between the client and server. As a result, a
1.2 vendor implementation (assuming such backwards-incompatible
changes will go there) cannot respond to a client with the 1.1 format
as the client has not indicated, in any way, that it only supports
1.1, but _not_ any versions above that. As a result, a vendor
implementation cannot actually upgrade its implementation to 1.2 as it
cannot communicate with clients supporting different versions
simultaneously. This means that vendor implementations are _locked_ at
the version at which the first client integration was performed with
no way to support newer features.
I do not know OData enough to quite understand what you mean here. I
have assumed that the OData reference pointed to a specification that is
backwards compatible with earlier versions of OData. If this is not the
case, I guess we might need to handle OData differently. In any case,
so far N5TG have only referred to OData 4.01, so no back- and forward
compatilbity issue should be present at the moment. Did I
misunderstand?
If we need to handle multiple versions of Odata, do you have any ideas
how to do that?
…--
Happy hacking
Petter Reinholdtsen
|
I only gave OData as an example, but I meant to the N5WS specification. Since different 1.x versions will now require different handling on the client-side, this means that there must be a way for the server and client to negotiate the "version"(format) they will be communicating in. Otherwise a 1.1 client cannot talk to a 1.2 server since certain fields will change as part of this pull request. |
[ivaylomitrev]
I only gave OData as an example, but I meant to the N5WS
specification. Since different 1.x versions will now require different
handling on the client-side, this means that there must be a way for
the server and client to negotiate the "version"(format) they will be
communicating in. Otherwise a 1.1 client cannot talk to a 1.2 server
since certain fields will change as part of this pull request.
As far as I can tell, the 1.0 and 1.1 versions are backwards
compatible. Once we come up with backwards incompatible changes, we
should also come up with a version negotiation method. Personally I
prefer to keep it backwards compatible.
…--
Happy hacking
Petter Reinholdtsen
|
I think this can go in, but likely needs to see an implementation from both server and client perspectives to identify potential weakness |
Introduce new bulkgrense system value to allow clients to learn exactly how small small files are. Specify how to reject large files using the small method. Explain the format of Range and Content-Range headers. Made it clear that the upload need to be done in sequence (and not out of order) by specifying that the Location header returned after each bulk upload is to be used in the next bulk upload. Drop the Google Drive reference. It do not really explain much, the URL is broken and this specification should describe its own protocol as it is not identical to the Google Drive protocol. Fixes arkivverket#313.
30ac9f7
to
963487e
Compare
Introduce new bulkgrense system value to allow clients to learn exactly how small
small files are. Specify how to reject large files using the small method. Explain
the format of Range and Content-Range headers. Made it clear that the upload need
to be done in sequence (and not out of order) by specifying that the Location header
returned after each bulk upload is to be used in the next bulk upload.
Drop the Google Drive reference. It do not really explain much, the URL is broken
and this specification should describe its own protocol as it is not identical to
the Google Drive protocol.
Fixes #313.