Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed url upload #1099

Open
wants to merge 3 commits into
base: generic-storage
Choose a base branch
from
Open

Signed url upload #1099

wants to merge 3 commits into from

Conversation

davidfarkas
Copy link
Contributor

With this PR the clients will be able to upload files using signed URLs. The only limitation is that only one file can be uploaded per request.
The packfile upload is not supported because it is not possible to create the zip file without the API downloads the files again.
It is not a braking change, the upload endpoints are backward compatible, so the clients can use the old way.

Review Checklist

  • Tests were added to cover all code changes
  • Documentation was added / updated
  • Code and tests follow standards in CONTRIBUTING.md

@codecov-io
Copy link

Codecov Report

Merging #1099 into generic-storage will increase coverage by 0.02%.
The diff coverage is 87.73%.

@@                 Coverage Diff                 @@
##           generic-storage    #1099      +/-   ##
===================================================
+ Coverage            91.05%   91.07%   +0.02%     
===================================================
  Files                   49       49              
  Lines                 7067     7139      +72     
===================================================
+ Hits                  6435     6502      +67     
- Misses                 632      637       +5

Copy link
Contributor

@ambrussimon ambrussimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
Reusing placers as-is is a fat 👍 for both the placer abstraction's creator and for you for embracing it.

I left a couple of comments and would also like to see additional integration tests (marked with xfail if using osfs, eg. in travis).

@@ -248,7 +248,7 @@ def initialize_db():
log_db.access_log.create_index([('timestamp', pymongo.DESCENDING)])

create_or_recreate_ttl_index('authtokens', 'timestamp', 2592000)
create_or_recreate_ttl_index('uploads', 'timestamp', 60)
#create_or_recreate_ttl_index('uploads', 'timestamp', 60)
Copy link
Contributor

@ambrussimon ambrussimon Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disabling, I think this needs an appropriate timeout instead. It feels like it should be roughly in sync with the job_tickets ttl, as the most important user of this feature would be the engine.

@@ -348,6 +350,25 @@ class FileListHandler(ListHandler):
def __init__(self, request=None, response=None):
super(FileListHandler, self).__init__(request, response)

def _create_ticket(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to _create_upload_ticket instead to avoid confusion with download-related methods. At the same time, rename download methods, too. (eg. _check_ticket -> _check_download_ticket)

@@ -547,7 +547,7 @@ def finalize(self):
})

# Similarly, create the attributes map that is consumed by helper funcs. Clear duplication :(
# This could be coalesced into a single map thrown on file fields, for example.
# This could be coalesced into a co single map thrown on file fields, for example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix comment.

return self._create_ticket()

# Check ticket id and skip permissions check if it clears
ticket_id = self.get_param('ticket')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having URL params 'ticket' and 'job_ticket' on the same endpoint seems very confusing. I would suggest renaming this one to 'upload_ticket' unless there's a problem with that I don't see yet.

})
]

if level is not 'analysis':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can analysis job results be uploaded directly to storage (using a ticket)?

if ticket_id:
ticket = self._check_ticket(ticket_id)
if not self.origin.get('id'):
# If we don't have an origin with this request, use the ticket's origin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would you expect us to not have an origin? Above we check to make sure the request has admin rights, so we only allow authenticated requests for this endpoint.

@nagem
Copy link
Contributor

nagem commented Mar 23, 2018

I'm not sure what kind of Swagger coverage we have for the engine and reaper upload endpoints, but could you amend and add information about how to use the signed URL ticket method?

if not ticket:
self.abort(404, 'no such ticket')
if ticket['ip'] != self.request.client_addr:
self.abort(400, 'ticket not for this resource or source IP')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could argue this is a 403 because ip is our rough permission check (other than ownership of the ticket).

strategy = strategy.replace('-', '')
strategy = getattr(Strategy, strategy)
else:
self.abort(500, 'strategy {} not implemented'.format(strategy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the 500 here because something not accounted for won't make it past the routing logic in api.py.

if ticket_id:
ticket = self._check_ticket(ticket_id)
if not self.origin.get('id'):
# If we don't have an origin with this request, use the ticket's origin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as before - I don't think it's possible to have an empty origin with an authenticated request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants