-
Notifications
You must be signed in to change notification settings - Fork 26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this progresses, I hope the comments below are useful.
Also, we should consider using psl or another maintained and tested module. I don't think it's worth our time to write our own public suffix implementation; the (meta) rules have subtle gotchas.
return filter.split(/\s+/); | ||
} | ||
|
||
function tryMakeURL(string, defaultProtocol = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what the user types, or the extension grabs from the activeTab (let's call this target
), using exactly what is provided is appropriate. For what we get from a saved entry in Lockbox (even if the entry had "http:") (let's call this allowed
), I am far less sure.
Matching on "http:" carries a lot of risk. It's ok to for "http://localhost' (or "http://127.0.0.1", or "http://[::1]"); this is considered a secure origin (at least for now). For other hosts/ports, I worry we are potentially exposing users to exfiltration. Also, matching on other origins carries risk, too (e.g., "javascript:").
This is probably a long way of saying the data massaging here needs to be more robust; my personal preference is sooner than later, but it'll need to happen.
[I do see the valid protocol constraints, but they are far from the rest of this constraints logic]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would we want this validation to happen? If we want to be strict about what's allowed in the datastore, should we just throw an exception when trying to save an entry if the user specifies a "bad" origin? At the moment, this code is assuming that the origin could be literally anything, since we have no constraints on what a user enters into the datastore. However, we could mandate that origins take a particular form, and possibly even normalize them when saving them (e.g. www.google.com
is normalized to https://www.google.com
). I'd actually prefer this to what I have now, since it's one less place where I have to make guesses about what the user meant.
For HTTP, how do we balance the fact that some sites just don't use HTTPS (and users would like to log into them) with the potential for some kind of MITM attack? Note that the code here only applies to the doorhanger, which is in chrome, not content. When we have a matcher used by (semi-) autofill, we may well choose to be even more restrictive. For example, I'd be ok with a plan where we say, "Autofill only works on HTTPS, but you can get pre-filtered results in the doorhanger even on HTTP".
I also think it would make sense to provide more restrictions here (e.g. constraining protocols), but we'd probably also have the constraints where they are now (or in a Redux action creator) since I don't think we should even fill in the search box on URLs with "bad" protocols. That said, a lot of this stuff will probably move to a common file somewhere once it's fleshed out a bit better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would we want this validation to happen? If we want to be strict about what's allowed in the datastore, should we just throw an exception when trying to save an entry if the user specifies a "bad" origin? At the moment, this code is assuming that the origin could be literally anything, since we have no constraints on what a user enters into the datastore. However, we could mandate that origins take a particular form, and possibly even normalize them when saving them (e.g.
www.google.com
is normalized tohttps://www.google.com
). I'd actually prefer this to what I have now, since it's one less place where I have to make guesses about what the user meant.
The datastore should have been more restrictive, and we can (should) make it more restrictive now (ish). We still have to deal with existing data, though.
For HTTP, how do we balance the fact that some sites just don't use HTTPS (and users would like to log into them) with the potential for some kind of MITM attack? Note that the code here only applies to the doorhanger, which is in chrome, not content. When we have a matcher used by (semi-) autofill, we may well choose to be even more restrictive. For example, I'd be ok with a plan where we say, "Autofill only works on HTTPS, but you can get pre-filtered results in the doorhanger even on HTTP".
Since this code only applies to doorhanger's filter, I suppose for now it's ok to match on http:
also. Howver, it is MUST NOT to extend that to our upcoming fill work, unless security-review finds it acceptable.
for (let i of values) { | ||
if (match(filterElement, i)) { | ||
if (matcher(filterElement, i)) { | ||
return true; | ||
} | ||
} | ||
return false; | ||
} | ||
|
||
export function filterItem(filter, item) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at these changes, my sense is filtering needs to get more differentiating -- instead of relying on a string that is (potentially mis)interpreted in a few ways, it's an object that better outlines what the filter intent is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means. What part of filtering needs to be converted from a string to an object? The visible representation of a filter should be a string, since it's what goes into the text field, and one of the main reasons for having filters in the first place is as something a user could type.
If you're suggesting that the automatically-generated filter should only find results in the origin
field of the item, I agree and that's #627. I'd like it to be something like Google's mini-format for search queries, e.g. url:https://www.example.com
(there'd be other prefixes too, like title:
, username:
, and maybe one day, tag:
).
@@ -26,7 +26,7 @@ browser.tabs.query({ active: true, currentWindow: true }).then((result) => { | |||
const validProtocols = ["http:", "https:"]; | |||
const url = new URL(result[0].url); | |||
if (validProtocols.includes(url.protocol)) { | |||
store.dispatch(filterItems(url.hostname)); | |||
store.dispatch(filterItems(url.protocol + "//" + url.host)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected url
to be passed in directly here, rather than breaking it up. This feels like the constraints are spread around the codebase; that will make it harder for us to understand the implications and limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't (well, we shouldn't) pass the URL object, since that goes right into the Redux state, and Redux is supposed to be just JSON (as I understand it, this is very helpful for Redux devtools). We could make a new Redux action creator, but it'd do pretty much exactly what this does before getting stored in Redux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a reasonable argument against passing the URL directly.
However, this still needs to be hostname
and not host
. It needs to keep the optional port, or it risks false positives. For the vast majority of people for the vast majority of their entries, hostname
and host
are identical, so I think false negatives by real users will be rare.
I can see an argument for treating the lack of an explicit port as matching any port, but I'd like to get sign off from security-review.
@linuxwolf Good call on using
Next up, I'd like to break some of these bits out into a generic "URL Utilities" file, since much of this can be used by autofill as well. Both |
return match(filterElement.string, value.string); | ||
} | ||
return (filterElement.url.protocol === value.url.protocol && | ||
filterElement.url.host === value.url.host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this more, I'm less and less comfortable with the missing port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URL.host
includes the port if it's not the default port for that protocol. So:
- For
https://www.google.com
,URL.host
iswww.google.com
- For
https://www.google.com:8000
,URL.host
iswww.google.com:8000
I plan to make this a lot more explicit when I build some common functions to use here (i.e. I'll write documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for that; I keep having them swapped in my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries; this is why we need documentation (and instinctively, hostname
is longer than host
, so the value seems like it should be longer too!).
@@ -26,7 +26,7 @@ browser.tabs.query({ active: true, currentWindow: true }).then((result) => { | |||
const validProtocols = ["http:", "https:"]; | |||
const url = new URL(result[0].url); | |||
if (validProtocols.includes(url.protocol)) { | |||
store.dispatch(filterItems(url.hostname)); | |||
store.dispatch(filterItems(url.protocol + "//" + url.host)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a reasonable argument against passing the URL directly.
However, this still needs to be hostname
and not host
. It needs to keep the optional port, or it risks false positives. For the vast majority of people for the vast majority of their entries, hostname
and host
are identical, so I think false negatives by real users will be rare.
I can see an argument for treating the lack of an explicit port as matching any port, but I'd like to get sign off from security-review.
This is still very WIP, but it should give an idea of how I plan to pass the data around. The general idea is that when the popup opens, we fill in the protocol, host, and port (if the port is default, it's implicitly defined by the protocol), which then gets passed to the filter code.
The filter code rebuilds a URL object and tries to match it to URL objects for the origins of each item. Origins are allowed to omit their protocol, in which case we treat them as HTTPS. If the search query omits its protocol (this will only happen when the user is typing in the filter themselves), we just do a string comparison. This way, you only have to type part of a URL (e.g. "goo" for "https://www.google.com") to filter the results down when you're searching manually.
Up next, I plan to make the matching on valid URLs work closer to our plan.