Syntax in the context of the lang-list syntax_uid
means an identified computer language (or
data or information) representation format which is viewable by humans.
syntax_uid
is:
- a globally unique syntax identifier
- protocol versioned
- release versioned
- inclusive of every identified syntax
- inclusive of every syntax category
- internationalized
- includes (optional)
deprecated
,obsolete
,unknown
andalias_of
fields - composed of only the following characters:
[a-z0-9_]
that is0
to9
, lowercasea
toz
, and_
(underscore) - trivially convertible into a variable name in most programming languages - by simply add an alphabetic prefix
Syntax categories include:
- programming languages, including low level (e.g. asm) languages
- document markup (/markdown) styles
- tagged markup languages
- common combination formats (e.g.
html_cheetah
) - log file formats
- program output formats
- configuration syntaxes
- data representation formats
- protocol formats
- Background
- Data structure
- Lookup protocol functions
- Example with EditorConfig
- Jobs
- Indentation
- Localisation
- Feature requests and support
- License
- See also
EditorConfig could use a syntax_uid
for mapping that which is
inter alia referred to as file type, [programming] language [name], syntax, [file]
format or sometimes even [file] extension. The concept handled here is "display syntax",
i.e. a syntax as viewed by humans, and is hereinfter referred to as "syntax".
Other programs may also find syntax_uid
useful.
Commonly used syntax names often clash e.g. APM.
File extensions are far from unique - there are 1000s, possibly 10s of 1000s.
The concept "file type" or "file format" includes concepts such as file extension, encoding, and textual vs binary representation; e.g. an "XML file" may be stored as a "plain text" file which may be encoded say in UTF-8 or UTF-16LE, and further it may be stored in a compressed format, either textual or binary (".zip" for one binary compression format example).
Given an arbitrary file, there are many ways to identify file type, including any preferred combination of the following or other file type identification systems:
- filename extension
- internal metadata
- internal "magic numbers"
- external metadata
- MIME types and other 'standard' mappings
- operating system specific type-codes
- operating system- and filesystem-specific extended attributes
- declaration
The EditorConfig project, in particular
editorconfig/editorconfig#190
and editorconfig/editorconfig#404,
inspired the lang-list
.
The initial creation has been some rather tedious effort - from here the list should be relatively stable, broadly applicable, and easy to maintain and translate.
Syntax uids primary store is in the file syntax_uids.yaml
, a YAML file.
Per-application syntax_uid
maps are in the subdirectory app_maps/
with one YAML file per
application, named ${APPNAME}.yaml
.
Boolean values may be true
or false
only.
The lang-list
data structures/ schema, are/ is as follows:
A. syntax_uids
(syntax_uids.yaml
): map, keyed by field name:
protocol
: integer, currently0
, monotonically increasing if necessaryversion
: integer, initially0
, monotonically increasing with each public releasereleased
: integer, public release date of this version, in "ISO format, no punctuation" i.e.YYYYMMDD
uids
: map, keyed by$syntax_uid
B. syntax_uid
(syntax_uids . uids . $syntax_uid
): map, keyed by field name:
unknown
: boolean, optional,true
if this uid is not (yet) properly identified,false
otherwise or if not presentalias_of
: string, optional, MUST be non-empty if present; if present, this field's value is thesyntax_uid
of the "primary" or "parent" syntax, of which the present uid is an aliasdeprecated
: boolean, optional,true
if thissyntax_uid
is deprecated,false
otherwise or if not present;deprecated
MUST only be present ifalias_of
is also presentfamily
: list of string (each asyntax_uid
), optional, if present the value is the list ofsyntax_uid
s of which this syntax is associatedname
: map keyed by IETF BCP 47 language tag (a string identified herein as$LANG
), with a minimum inclusion of "{name: {en: "${English_syntax_name}"}}
", include only the LANGen
in the rootsyntax_uids.yaml
file,name
is optional in the case of an alias or unknownsyntax_uid
, mandatory otherwisesupercedes
: list of string (each asyntax_uid
), optional, syntaxes superceded by this syntaxinfluenced_by
: list of string (each asyntax_uid
), optional, languages which influenced this languageobsolete
: boolean, optional,true
if this syntax is "obsolete",false
otherwise or if not present
C. app_map
(app_maps/${APPNAME}.yaml
): map, keyed by field name:
protocol
: integer, currently0
, only increase if necessaryversion
: integer, initially0
, identifies this specific application "syntax uid map", monotonically increasing with each public releasereleased
: integer, public release date of this version, in "ISO format, no punctuation" i.e.YYYYMMDD
syntax_uid_to
: map, keyed by$syntax_uid
D. syntax_uid
(app_map . syntax_uid_to . $syntax_uid
): map, keyed by field name:
map
: string, mandatory, the application's syntax name corresponding to thissyntax_uid
run
: list of string, optional, a list of commands and/ or settings to apply, in combination with themap
value, in order to causesyntax_uid
to be applied
Protocols for using syntax_uid
are necessary to provide for:
- backwards compatibility
- forwards compatibility
- deprecatability
- changing, adding, and removing of aliases
As seen here, an EDITOR plugin will need to do something like the following:
1. see that a file has been opened
2. check the .editorconfig for settings for this file
3. check if syntax
(filetype) is declared for this file,
3.a. if so, use the .editorconfig declared $syntax
as the
syntax_uid
(nee "filetype") for this file
3.b.i. if not, use the $EDITOR's filetype detection to determine
the $EDITOR's filetype/syntax SID
for this file
3.b.ii. given $EDITOR's syntax $SID
, use the lang-list
app_map/$EDITOR.yaml
map to reverse-lookup the corresponding syntax_uid
4. Finally, given the syntax_uid
for this file, lookup the
.editorconfig settings for this $syntax_uid
In the protocol pseudo code functions below:
//
begins a comment.
(period) means lookup or access a map key (field):=
means assign RHS to LHS==
and!=
are equality comparisons""
means the empty string$
means dereference or "get the value stored in this variable"NULL
means a non-existent map, field or value- The error() function MUST show an error where errors are normally shown.
- The warn() function MUST show a warning where warnings are normally shown.
define function str_to_syntax_uid (STR :: <string>)
// Convert STRing to syntax_uid
syntax_uid := syntax_uids . uids . $STR
if syntax_uid == NULL
then
error("syntax_uid '$STR' does not exist")
return NULL
end if
suid_primary := syntax_uid . alias_of
if suid_primary == NULL
then
return $STR
else
if syntax_uid . deprecated == "true"
then
warn("syntax_uid '$STR' is deprecated and will be removed, use '$suid_primary' instead")
end if
return $suid_primary
end if
end str_to_syntax_uid
define function str_to_app_sid (STR :: <string>)
// Convert STRing to app Syntax ID
syntax_uid := str_to_syntax_uid($STR)
if syntax_uid == NULL
then
return NULL
end if
app_sid := app_map . syntax_uid_to . $syntax_uid
if app_sid == NULL
then
error("$EDITOR does not support syntax_uid '$STR'")
return NULL
end if
return app_sid . map
end str_to_app_sid
define function app_sid_to_syntax_uid (SID :: <string>)
// Convert app Syntax ID to syntax_uid
app_suids = app_map . syntax_uid_to
foreach syntax_uid in app_suids do
if syntax_uid . map == $SID
then
return $syntax_uid
end if
end foreach
error("$EDITOR syntax id '$SID' not found in $EDITOR's syntax uid map")
return NULL
end app_sid_to_syntax_uid
THIS IS A PROPOSAL/proof-of-concept only, it is NOT yet implemented.
For a new EditorConfig $EDITOR syntax uid map, copy app_maps/vim.yaml
to
app_maps/${EDITOR}.yaml
, and update each syntax_uid_to
entry as follows:
-
comment out each line not supported by your EDITOR, using "
#
" (sharp/hash sign) -
edit the string after "
map:
" to match your EDITOR's name for thatsyntax_uid
(the name near the start of the line); be sure to keep a space character after the ":
" (colon) as well as keep the trailing "}
" (closing brace).
Next implement the above lookup protocol functions, str_to_syntax_uid(STR)
,
str_to_app_sid(STR)
and app_sid_to_syntax_uid(SID)
, in your plugin
(if they are not already available in your editor).
Voilà, your EDITOR's Syntax ID ("file format" or "filetype"), and lang-list's syntax_uid
,
can now each be used to lookup editorconfig language-specific settings, e.g. in an editorconfig
group named say "[: syntax_uid=[java,json]]
";
And the corollary, an editorconfig file-group setting for syntax
can ensure the correct syntax
highlighting for those files in your editor.
NOTE: THIS IS A PROPOSAL/proof-of-concept only, it is NOT yet implemented.
[src/sh/*]
syntax = sh
indent_size = 8
[src/bash/*]
syntax = bash
indent_size = 4
# use this if you are happy with your EDITOR's auto detection for Java-syntax files:
[: syntax_uid=[java,json]] # exact .editorconfig "group syntax" TBD
indent_size = 3
# and if your editor does not properly auto detect Java-syntax files, add this:
[src/java/*]
syntax_uid = java
Note: The syntax_uid.yaml
file is not normally loaded by editorconfig plugins, only the
app_maps/$EDITOR.yaml
file should be needed.
NOTE: THIS IS A PROPOSAL/proof-of-concept only, it is NOT yet implemented.
-
app_maps/${MY_EDITOR}.yaml
-
l10n/${LANG}/syntax_uids.yaml
To apply the lang-list to another editor, copy app_maps/vim.yaml
to
app_maps/${MY_EDITOR}.yaml
and update each map:
value to the corresponding syntax ID for
your editor. Implement the lookup protocol functions.
Find happiness in gratitude.
To translate syntax_uid
names, see section "Localisation..." below.
Find happiness in gratitude.
YAML files per specification, are unfortunately space indented (control freaks will control in
freaky ways). Although some (many?) YAML parsers may treat tabs as spaces, syntax_uids.yaml
shall be a YAML specification compliant file. See here:
- https://stackoverflow.com/questions/19975954/a-yaml-file-cannot-contain-tabs-as-indentation
- http://yaml.org/faq.html
NOTE: THIS IS A PROPOSAL/proof-of-concept only, it is NOT yet implemented.
l10n/$LANG/synax_uids.yaml
Localization translation files translating syntax_uid
names, README.md and other material,
are located in files with the same name and relative location, as the corresponding file in the
root directory, but instead located under the l10n/${LANG}/
directory.
To be clear, syntax_uid
name translations are not included in the root syntax_uids.yaml
file, but in the appropriate l10n/${LANG}/syntax_uids.yaml
file, and similarly
l10n/${LANG}/README.md
for a translation of the README.md
file.
Such localisation files must be:
UTF-8
encoded- "three space chars" indented
- For
.yaml
files translations, YAML files with the same structure as the original file, but only the necessary content included - do NOT include data that is not being translated, just maintain the same structure and keep the same field names (the part to the left of the colon:
).
$LANG
is the IETF BCP 47 language tag identifying the corresponding localized human language.
For information about the IETF BCP 47
"human language" tag, see:
- https://en.wikipedia.org/wiki/IETF_language_tag
- http://cldr.unicode.org/index/cldr-spec/picking-the-right-language-code
- https://en.wikipedia.org/wiki/Lists_of_ISO_639_codes
- https://salsa.debian.org/iso-codes-team/iso-codes
- https://stackoverflow.com/questions/2511076/which-iso-format-should-i-use-to-store-a-users-language-code
Each localization file must have the same structure as syntax_uids.yaml
but less content. In
particular your translated file contains the following YAML mapping keys:
protocol: 0
- integer, mandatory.version: 0
- integer, mandatory, starts at0
, is the version for this localisation file.name: {$LANG: "..."}
- string, mandatory, translation of the phrase "computer language syntax UID".uids:
- map, mandatory, keyed by$syntax_uid
.- Do NOT include the alias and unknown groups (the first two syntax uid groups).
- You should not include languages for which the translation is the same as the English name.
- Each
$syntax_uid
in theuids
map contains ONLY the{name: {$LANG: "..."}}
map, where...
is replaced with your translation of the name of this syntax/language. - That is, remove all fields other than the
name
map, such asfamily
,successor
,influenced_by
andsupercedes
,run
, and remove comments (but keep the section dividers and the copyright header line in English, and add a second, translated version of the copyright header line). - See
templates/syntax_uids.yaml
for an example to start from, but do note that entries in the rootsyntax_uids.yaml
file will be added and changed over time;git
commands can be used to identify such differences.
For feature requests and support, please search the lang-list issues list and if your feature or issue is not present, add a new issue.
If you wish to register your interest in an issue, click the "subscribe" button - please do not add "+1"s or "me too" comments.
You may set your GitHub settings to receive emails for updates on issues you are subscribed to.
lang-list
is licensed by the GNU Lesser General Public License version 3
aka LGPL3
.
The LGPL reflects the need for lang-list to be usable as a library in relation to any other software licenses.
- https://github.com/editorconfig/editorconfig/wiki/%5BDevelopment%5D-Discussion-of-language-filetype-support
- https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
- https://github.com/github/linguist.git
- https://en.wikipedia.org/wiki/List_of_programming_languages
- https://en.wikipedia.org/wiki/Markup_language
- https://en.wikipedia.org/wiki/List_of_markup_languages
- https://en.wikipedia.org/wiki/Common_Log_Format
- https://en.wikipedia.org/wiki/Configuration_file
- https://medium.com/web-development-zone/a-complete-list-of-computer-programming-languages-1d8bc5a891f
- https://github.com/garabik/grc
- https://userstyles.org/?utm_campaign=stylish_homepage
- https://userstyles.org/styles/70979/github-better-sized-tabs-in-code, or better yet, the similar style but configured to apply to all websites: http://userstyles.org/styles/89425/all-code-has-custom-tab-size (I suggest configure, add style block, duplicate the "code" block to be a "pre" block, now we're cooking) ; experiencing gratitude in browser TAB indent_size :)