Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percent encode the legacy_url_path when requesting Whitehall assets #794

Merged

Conversation

chrisroos
Copy link
Contributor

@chrisroos chrisroos commented Jan 11, 2018

We've found a number of Whitehall assets whose filenames (and therefore
their legacy_url_path) contain non-ascii characters.

We need to URI encode the legacy_url_path to avoid RestClient, and
then the gds-api-adapters, from raising an InvalidUrl exception (as per
this example in Sentry1). Addresses alphagov/asset-manager#384.

I've used the Addressable Gem to avoid the Rubocop violation warning
that I saw when making a similar change to Asset Manager in
alphagov/asset-manager#396.

We've found a number of Whitehall assets whose filenames (and therefore
their legacy_url_path) contain non-ascii characters.

We need to URI encode the legacy_url_path to avoid `RestClient`, and
then the gds-api-adapters, from raising an InvalidUrl exception (as per
this example in Sentry[1]). Addresses issue #384.

I've used the Addressable Gem to avoid the Rubocop violation warning
that I saw when making a similar change to Asset Manager in
alphagov/asset-manager#396.

[1]: https://sentry.io/govuk/app-whitehall/issues/429050852/
@chrisroos chrisroos force-pushed the handle-non-ascii-legacy-url-paths-for-whitehall-assets branch from c11544b to 998c702 Compare January 11, 2018 16:16
@floehopper floehopper self-assigned this Jan 11, 2018
Copy link
Contributor

@floehopper floehopper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@chrisroos
Copy link
Contributor Author

Thanks @floehopper. I'll get this merged now.

@chrisroos chrisroos merged commit e3fff86 into master Jan 11, 2018
@chrisroos chrisroos deleted the handle-non-ascii-legacy-url-paths-for-whitehall-assets branch January 11, 2018 16:48
chrisroos added a commit to alphagov/whitehall that referenced this pull request Jan 11, 2018
So that I can get the change in
alphagov/gds-api-adapters#794 which should allow
me to fix alphagov/asset-manager#384.
chrislo added a commit to alphagov/whitehall that referenced this pull request Jan 29, 2018
#3728

When the `legacy_url_path` contains non-ascii characters, generating a
URI with `URI.join` raises an `URI::InvalidURIError`. We fix this by
percent-encoding `legacy_url_path` before joining.

We fixed a similar issue[1] in `gds_api_adapters` in this way.

[1] alphagov/gds-api-adapters#794
ollietreend added a commit to alphagov/whitehall that referenced this pull request Feb 8, 2022
This fixes a bug where CSVs couldn't be previewed if their filenames included non-ASCII/special characters. This bug was observed in production as Sentry issue [APP-WHITEHALL-8SQ][sentry].

Special characters in file names were causing the following error:
```
URI::InvalidURIError: URI must be ascii only "/government/uploads/system/uploads/attachment_data/file/729119/page_60_details_of_closed_cases_over_u\u0301300_000.csv"
```

This was because the app was attempting to download the raw CSV file over HTTP, but without escaping the CSV filename first to make it URL safe.

`Addressable::URI.escape` is used rather than `CGI.escape` to avoid escaping "/" slash characters, which are expected and valid in this context.

There is some prior art for this fix, so it's in good company:
- ac3c346 protects against the same error in `AssetManagerStorage`
- alphagov/gds-api-adapters#794
- alphagov/asset-manager#396

[sentry]: https://sentry.io/organizations/govuk/issues/2986708524/events/a5b647ed215e44ae9d4807f6debe47bb/?project=202259
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants