.pdf files can be detected as .ai based on content #582

eric-yuan-vanta · 2023-02-15T21:07:29Z

When pdf files have images created from photoshop or adobe ai in them, file-type detects them as .ai based on the byte checking heuristic we have in place.

I'm proposing that even if the magic string is found, if the original file's extension is .pdf, file-type should consider it a pdf and not change it's type based on some content inside of it.

An even more strict approach that I would also support is only returning ai file type if the file extension is already .ai. It seems more natural/compatible to default to .pdf if .ai isn't explicitly specified, since the ai detection is just a loose heuristic anyway.

The text was updated successfully, but these errors were encountered:

eric-yuan-vanta · 2023-02-15T21:08:11Z

@sindresorhus Curious if you have thoughts on this.

I plan to put up a fix with the second approach, but I would like to get https://github.com/sindresorhus/file-type/pulls in first

eric-yuan-vanta · 2023-02-15T21:21:15Z

But it seems like we don't have access to the original file extension, since we only use the stream which makes sense, so maybe this approach is no good.

In my own usage, I'll work around it by managing this case in the caller.

Still, I wonder if there's a better way to do this than what we have today.

Borewit · 2023-02-17T07:55:29Z

None of the file implemented recognition is perfect (guaranteed to be correct). By writing 4 characters at the beginning of a text file you can probably mimic half of of the file recognition heuristics. This reliability of the heuristics vary strongly.

If the recognition is likely to introduce false positives (for which there is no clear definition), it may indeed be better to, preferably improve the algorithm, or, like you suggest, fall back on it's parent file type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.pdf files can be detected as .ai based on content #582

.pdf files can be detected as .ai based on content #582

eric-yuan-vanta commented Feb 15, 2023

eric-yuan-vanta commented Feb 15, 2023 •

edited

Loading

eric-yuan-vanta commented Feb 15, 2023 •

edited

Loading

Borewit commented Feb 17, 2023 •

edited

Loading

.pdf files can be detected as .ai based on content #582

.pdf files can be detected as .ai based on content #582

Comments

eric-yuan-vanta commented Feb 15, 2023

eric-yuan-vanta commented Feb 15, 2023 • edited Loading

eric-yuan-vanta commented Feb 15, 2023 • edited Loading

Borewit commented Feb 17, 2023 • edited Loading

eric-yuan-vanta commented Feb 15, 2023 •

edited

Loading

eric-yuan-vanta commented Feb 15, 2023 •

edited

Loading

Borewit commented Feb 17, 2023 •

edited

Loading