FileNameNormalizer/URLNormalizer does not support files with long file extensions #65

pgrunewald · 2024-06-24T12:19:58Z

The current implementation does not properly normalize filenames with long file extensions (i.e. extensions with more than 4 characters). Example for this: base.geojson

When I upload this file, the generated ID for the filename will be base-geosjon instead of base.geojson. When downloading, the filename of the download however is the correct base.geojson. Shouldn't The FileNameNormalizer also cause the dash here?

The proposed solution would be to update the FILENAME_REGEX from:

FILENAME_REGEX = re.compile(r"^(.+)\.(\w{,4})$")

to:

FILENAME_REGEX = re.compile(r"^(.+)\.(\w+)$")

Additionally we should check if FileNameNormalizer is actually used, since it should've be applied in the filename for downloads.

The text was updated successfully, but these errors were encountered:

davisagli · 2024-06-26T01:32:10Z

Makes sense to me.

pgrunewald added 01 type: bug 14 prio: low 41 lvl: easy labels Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNameNormalizer/URLNormalizer does not support files with long file extensions #65

FileNameNormalizer/URLNormalizer does not support files with long file extensions #65

pgrunewald commented Jun 24, 2024

davisagli commented Jun 26, 2024

FileNameNormalizer/URLNormalizer does not support files with long file extensions #65

FileNameNormalizer/URLNormalizer does not support files with long file extensions #65

Comments

pgrunewald commented Jun 24, 2024

davisagli commented Jun 26, 2024