Skip to content

Public domain MIME-type detector using file extensions and file signatures

License

Notifications You must be signed in to change notification settings

technosaurus/MIMEtype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIMEtype

Public domain MIME-type detector using file extensions and file signatures

How it Works

Detection by extension will take a file name and first check if the file name is a registered MIME-type (LICENSE -> text/plain for example) Then it will recursively check the largest extension (.tar.bz2 vs. just .bz2) The extension list is in all lower case and sorted in alphabetical order so a binary search can be done (n searches for 2^n entries)

Note: MIMEtype includes its own version of strcasecmp for speed, (for inlining and removes superfluous tolower() on internal strings)

The file signatures are sorted by magic in ascii order @ offset X for memcmp. This allows a binary search on file signatures starting at common offsets, (the majority of file types use 0, 4 or 8 byte offsets). All other offsets are handled last using a linear search.

TODO

  • add more types (ongoing - submit an issue here if one is broken/missing)
  • add MIMEverify() to ensure extension matches magic
  • win32 version of the magic detection (FILE* instead of fd)
  • config menu to enable/disable types (for servers that only want some)
    • just delete lines from header files for now
  • basic charset detection @ offset 0
    • utf-8 "xEF\xBB\xBF"
    • utf-16be "\xFE\xFF"
    • utf-16le "\xFF\xFE"
    • utf-32be "\x00\x00\xFE\xFF"
    • utf-32le "\xFF\xFE\x00\x00"

About

Public domain MIME-type detector using file extensions and file signatures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages