-
Notifications
You must be signed in to change notification settings - Fork 11
/
ucum-source.xml
5405 lines (5188 loc) · 280 KB
/
ucum-source.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<!-- COPYRIGHT 1998-2024, REGENSTRIEF INSTITUTE, INC. AND THE UCUM ORGANIZATION ALL RIGHTS RESERVED.
Use and redistribution of this data is permitted without charge
as long as you make no changes to it. DO NOT CHANGE THIS FILE if
you want to redistribute it. For any issues that you think you
need to change this file, please create a new issue on the
GitHub repository at <https://github.com/ucum-org/ucum>, or
contact Regenstrief Institute using the [email protected] address.
-->
<!DOCTYPE spec [
<!ENTITY alpha "α">
<!ENTITY beta "β">
<!ENTITY mu "μ">
<!ENTITY pi "π">
<!ENTITY Pi "Π">
<!ENTITY rho "ρ">
<!ENTITY epsilon "ε">
<!ENTITY kappa "κ">
<!ENTITY rho "ρ">
<!ENTITY tau "τ">
<!ENTITY iotatonos "ί">
<!ENTITY omikron "ο">
<!ENTITY nu "ν">
<!ENTITY omega "ω">
<!ENTITY Omega "Ω">
<!ENTITY egrave "è">
<!ENTITY eacute "é">
<!ENTITY ouml "ö">
<!ENTITY Aring "Å">
<!ENTITY aring "å">
<!ENTITY ccedilla "ç">
<!ENTITY deg "°">
<!ENTITY box "□">
<!ENTITY middot "·">
<!ENTITY mult "×">
<!ENTITY nbsp " ">
<!ENTITY reg "®">
<!ENTITY ndash "–">
<!ENTITY mdash "—">
<!ENTITY ldquo "“">
<!ENTITY rdquo "”">
<!ENTITY lquo "‘">
<!ENTITY rquo "’">
<!ENTITY langle "‹">
<!ENTITY rangle "›">
<!ENTITY sharps "ß">
<!ENTITY sim "~">
<!ENTITY ssim "≈">
<!ENTITY bullet "•">
<!ENTITY element "∈">
<!ENTITY cap "∩">
<!ENTITY cup "∪">
<!ENTITY emptyset "{}"> <!-- ∅ -->
<!ENTITY UCUM "<i>Unified Code for Units of Measure</i>">
<!ENTITY TUCUM "<i>The Unified Code for Units of Measure</i>">
]>
<spec xmlns:u="http://aurora.regenstrief.org/UCUM">
<header>
<title>The Unified Code for Units of Measure</title>
<version>2.2</version>
<revision>N/A</revision>
<date>2024-06-17</date>
<authlist>
<author>
<name>Gunther Schadow</name>
<affiliation>Pragmatic Data LLC</affiliation>
<email href="[email protected]"/>
</author>
<author>
<name>Clement J. McDonald</name>
<affiliation>National Library of Medicine, Lister Hill</affiliation>
<email href="[email protected]"/>
</author>
</authlist>
<copyright>1998-2024, Regenstrief Institute, Inc. and the UCUM Organization. All rights reserved.</copyright>
</header>
<body>
<div1 id="section-introduction">
<head>Introduction</head>
<p>
&TUCUM; is a code system intended to include <emph>all</emph> units of
measures being contemporarily used in international science,
engineering, and business. The purpose is to facilitate unambiguous
electronic communication of quantities together with their units. The
focus is on electronic communication, as opposed to communication
between humans. A typical application of &TUCUM; are electronic data
interchange (EDI) protocols, but there is nothing that prevents it
from being used in other types of machine communication.
</p>
<p>
&TUCUM; is inspired by and heavily based on ISO 2955-1983, ANSI
X3.50-1986, and HL7's extensions called “ISO+”. The
respective ISO and ANSI standards are both entitled
“Representation of [...] units in systems with limited
character sets” where ISO 2955 refers to SI and other units
provided by ISO 1000-1981, while ANSI X3.50 extends ISO 2955 to
include U.S. customary units. Because these standards carry the
restriction of “limited character sets” in their names
they seem to be of less value today, when graphical user interfaces and
laser printers are in wide-spread use. For this reason, the European
standard ENV 12435 in its clause 7.3 declares ISO 2955 obsolete.
</p>
<p>
ENV 12435 is dedicated exclusively to the communication of
measurements between humans in display and print, and does not provide
codes that can be used in communication between systems. It does not
even provide a specification that would allow communication of units
from one system to the screen or printer of another system. The issue
about displaying units in the common style defined by the 9th
<emph>Conférence Générale des Poids et Mesures</emph>
(CGPM) in 1947 is not just the character set. Although the
<emph>Unicode</emph> standard and its predecessor ISO/IEC 10646 is
the richest character set ever, it is still not enough to specify the
presentation of units, because there are important typographical
details such as superscripts, subscripts, roman and
italics.<footnote>
<p>
Interestingly the authors of ENV 12435 forgot to include
superscripts in the minimum requirements as given by subclause 7.1.4
for which they do not specify an alternative.
</p>
</footnote>
</p>
<p>
The real value of the restriction on the character set and
typographical details, however, is not to cope with legacy systems and
less powerful technology, but to facilitate unambiguous communication
and interpretation of the meaning of units from one computer system to
another. In this respect, ISO 2955 and ANSI X3.50 are not
obsolete because there is no other standard that would fill in for
inter-systems communication of units. However, ISO 2599 and ANSI
X3.50 currently have severe defects:
</p>
<list role="ordered">
<item>
<p>
ISO 2955 and ANSI X3.50 contain numerous name conflicts,
both direct conflicts (e.g., “<code>a</code>” being used
for both “year” and “are”) and conflicts
that are generated through combination of unit symbols with prefixes
(e.g., “<code>cd</code>” means candela and centi-day and
“<code>PEV</code>” means peta-volt and pico-electronvolt.)
</p>
</item>
<item>
<p>
Neither ISO 2955 nor ANSI X3.50 cover all units that are
currently used in practice. There are many more units in use than what
is allowed by the <emph>Système International
d'Unités</emph> (SI) and accompanying standards. For example,
the older CGM-units dyne and erg are still used in the science of
physiology. Although ANSI X3.50 extends ISO 2955 with some
U.S. customary units, it is still not complete in this respect. For
example it does not define the degree Fahrenheit.
</p>
</item>
<item>
<p>
ANSI X3.50 is semantically ambiguous with respect to customary
units, even if we do not consider the history and international
aspects of customary units. Three systems of mass units are used in
the U.S., avoirdupois used generally, apothecaries' used by
pharmacists, and troy used in trade with Gold and other precious
metals. ANSI X3.50 has no way to select any one of those
specifically, which is bad in medicine, where both apothecaries' and
avoirdupois weights are being used frequently.
</p>
</item>
</list>
<p>
ISO 2955 and all standards that do only look for the resolutions
and recommendations of the CGPM and the <emph>Comité
International des Poids et Mesures</emph> (CIPM) as published by the
<emph>Bureau International des Poids et Mesures</emph> (BIPM) and various
ISO standards (ISO 1000 and ISO 31) fail to recognize that
the needs in practice are often different from the ideal propositions
of the CGPM. Although not allowed by the CGPM and related ISO
standards, many other units are used in international sciences,
healthcare, engineering, and business, both meaningfully and some
units of questionable meaning. A coding system that is to be useful in
practice must cover the requirements and habits of the
practice—even some of the bad habits.
</p>
<p>
None of the current standards attempt to specify a semantics of units
that can be deployed in information systems with moderate
requirements. Metrological standards such as those published by the
BIPM are dedicated to maximal scientific correctness of reproducible
definitions of units. These definitions make sense only to human
specialists and can hardly be deployed to their full extent by any
information system that is not dedicated to metrology. On the other
hand, ISO 2955 and ANSI X3.50 provide no semantics at all for the
codes they define.
</p>
<p>
&TUCUM; provides a single coding system for units that is complete,
free of all ambiguities, and that assigns to each defined unit a
concise semantics. In communication it is not only important that all
communicating parties have the same repertoire of symbols, but also that
all attach the same meaning to the symbols they exchange. The common
meaning must be computationally verifiable. &TUCUM; assumes a
semantics for units based on dimensional analysis.<footnote>
<p>
A more extensive introduction into this semantics of units can be
found in: Schadow G, McDonald CJ et al: Units of Measure in Clinical
Information Systems; <it>JAMIA</it> 6(2); Mar/Apr 1999;
p. 151–162.
</p>
</footnote>
</p>
<p>
<!-- FIXME this paragraph is a lump: we need Roadmap, scope, etc. -->
In short, each unit is defined relative to a system of base units by a
numeric factor and a vector of exponents by which the base units
contribute to the unit to be defined. Although we can reflect all the
meaning of units covered by dimensional analysis with this vector
notation, the following tables do not show these vectors. One reason
is that the vectors depend on the base system chosen and even on the
ordering of the base units. The other reason is that these vectors are
hard to understand to human readers while they can be easily derived
computationally. Therefore we define new unit symbols using algebraic
terms of other units. Those algebraic terms are also valid codes of
&TUCUM;.
</p>
</div1>
<div1>
<head>Grammar of Units and Unit Terms</head>
<p name="preliminaries">
<verse>
&TUCUM; consists of a basic set of terminal symbols for units, called
<emph>atomic unit symbols</emph> or <emph>unit atoms</emph>, and multiplier
prefixes. It also consists of an expression syntax by which these
symbols can be combined to yield valid units.
</verse>
<verse>
The tables of terminal symbols are fixed as of every revision of
&TUCUM;, additions, deletions or changes are <emph>not</emph> allowed.
</verse>
<verse>
All expression that can be derived from these terminal symbols and the
expression syntax are valid codes. Any expression of &TUCUM; has a
precisely defined semantics.
</verse>
</p>
<comment>
<p>
The expression syntax of &TUCUM; generates an infinite number of codes
with the consequence that it is impossible to compile a table of all
valid units.
</p>
<p>
That the tables of terminal symbols may not be extended does not mean
that missing symbols will never be available in &TUCUM;. Suggestions
for additions of new symbols are welcome and revisions of
&TUCUM; will be released as soon as a change request has been approved.
</p>
</comment>
<p name="full and limited conformance">
<verse>
The semantics of &TUCUM; implies equivalence classes such that
different expressions may have the same meaning.
</verse>
<verse>
Programs that declare <emph>full conformance</emph> with &TUCUM; must
compare unit expressions by their semantics, i.e. they must detect
equivalence for different expressions with the same meaning.
</verse>
<verse>
Programs with <emph>limited conformance</emph> may compare unit expressions
literally and thus may not detect equivalence of unit expressions.
</verse>
</p>
<comment>
<p>
The option for “limited conformance” allows &TUCUM; to be adopted
even by less powerful systems that can not or do not want to deal with
the full semantics of units. Those systems typically have a table of
fixed unit expression literals that may be related to other literals
with fixed conversion factors. Although these systems will have
difficulties to receive unit expressions from various sources, they
will at least send out valid expressions of &TUCUM;, which is an
important step towards a commonly used coding scheme for units.
</p>
</comment>
<div2>
<head>Character Set and Lexical Rules</head>
<p name="character set">
<verse> All expressions of &TUCUM; shall be built from characters of
the 7-bit US-ASCII character set exclusively.
</verse>
<verse> Terminal unit symbols can consist of all ASCII characters in
the range of 33–126 (0x21–0x7E) excluding
double quotes (&lquo;<code>"</code>&rquo;),
parentheses (&lquo;<code>(</code>&rquo; and &lquo;<code>)</code>&rquo;),
plus sign (&lquo;<code>+</code>&rquo;'),
minus sign (&lquo;<code>-</code>&rquo;'),
period (&lquo;<code>.</code>&rquo;'),
solidus (&lquo;<code>/</code>&rquo;'),
equal sign (&lquo;<code>=</code>&rquo;'),
square brackets (&lquo;<code>[</code>&rquo;
and &lquo;<code>]</code>&rquo;), and
curly braces (&lquo;<code>{</code>&rquo; and &lquo;<code>}</code>&rquo;),
which have special meaning.
</verse>
<verse> A terminal unit symbol can not consist of only digits
(&lquo;<code>0</code>&rquo;–&lquo;<code>9</code>&rquo;) because
those digit strings are interpreted as positive integer
numbers. However, a symbol “<code>10*</code>” is allowed
because it ends with a non-digit allowed to be part of a symbol.
</verse>
<verse> For every terminal symbol there is a case insensitive variant
defined, to be used when there is a risk of upper and lower case to be
confused. Although upper and lower case can be mixed in case
insensitive symbols there is no meaning to the case. Case insensitive
symbols are incompatible to the case sensitive symbols.
</verse>
</p>
<comment>
<p>
The 7-bit US-ASCII character code is the greatest common denominator
that can be expected to be available in any communication environment.
Only very few units normally require symbols from the Greek alphabet
and thus the cost of requiring Unicode does not outweigh the benefit.
As explained above, the real issue about writing unit terms naturally
is not the character set but the ability to write subscripts and
superscripts and distinguish roman letters from italics.
</p>
<p>
Some computer systems or programming languages still have the
requirement of case insensitivity and some humans who are not familiar
with SI units tend to confuse upper and lower case or can not
interpret the difference in upper and lower case correctly. For this
reason the case insensitive symbols are defined. Although &TUCUM;
does not encourage use of case insensitive symbols where not
absolutely necessary, in some circumstances the case insensitive
representation may be the greatest common denominator. Thus some
systems that can handle case sensitivity may end up using case
insensitive symbols in order to communicate with less powerful
systems.
</p>
<p>
ISO 2955 and ANSI X3.50 call case sensitive symbols “mixed
case” and case insensitive symbols “single case” and
list two columns for “single case” symbols, one for upper
case and one for lower case. In &TUCUM; all units can be written in
mixed upper and lower case, but in the case insensitive variant the
mixing of case does not matter.
</p>
<p>
White space is not recognized in a a unit term and should generally
not occur. UCUM implementations may flag whitespace as an error
rather than ignore it. Whitespace is not used as a separator of
otherwise ambiguous parts of a unit term.
</p>
</comment>
<p name="prefixes">
<verse>Metric units (cf. <pref ref="para-metric"/>) may be
combinations of a unit symbol with a prefix symbol.
</verse>
<verse>The unit symbol to be combined with the prefix must not itself
contain a prefix. Such a prefix-less unit symbol is called <emph>unit
atom</emph>.
</verse>
<verse>Prefix and atom are connected immediately without any
delimiter. Separation of an optional prefix from the atom occurs on
the lexical level by finding a matching combination of an optional
prefix and a unit atom.
</verse>
<verse> The prefix is the longest leading substring that matches a
valid prefix where the remainder is a valid metric unit atom. If no
such prefix can be matched, the unit atom is without prefix and may be
both metric or non-metric.</verse>
<smallref>[1–3: ISO 1000, 3; ISO 2955-1983, 3.7;
ANSI X3.50-1986, 3.7 (Rule No. 6).]</smallref>
</p>
<p name="square brackets">
<verse> Square brackets (&lquo;<code>[</code>&rquo; and
&lquo;<code>]</code>&rquo;) may be part of a
unit atom at any place but only as matched pairs. Square brackets are
lexical elements and not separate syntactical tokens. </verse>
<verse> Within a matching pair of square brackets the full range of
characters 33–126 can be used.<footnote>
<p>
see the section about style in <pref from="para-curly"/>, to find out
how square brackets are actually used. Note, however, that the user
has no choice about square bracket symbols, as these are fixed in the
list of atomic unit symbols.
</p>
</footnote>
</verse>
<verse> Square brackets do <emph>not</emph> determine the boundary between
prefix and unit atom, but they never span the boundary of unit atoms.
</verse>
<verse>
Square brackets must not be nested.
</verse>
</p>
<comment>
<p>
For example
%
“<code>[abc+ef]</code>”,
“<code>ab[c+ef]</code>”,
“<code>[abc+]ef</code>”, and
“<code>ab[c+ef]</code>”
%
could all be valid symbols if defined in the tables.
In “<code>ab[c+ef]</code>” either
“<code>a</code>” or “<code>ab</code>”
could be defined as a prefix, but not “<code>ab[c</code>”.
</p>
<p>
Square brackets take on one task of round parentheses in HL7's
“ISO+” code, where one use of parentheses is to augment
unit symbols with suffixes, as in “<code>mm(Hg)</code>”.
Another use is to enclose one full unit symbol into parentheses, as
“<code>(ka_u)</code>” (for the King-Armstrong unit of
catalytic amount of phosphatase). Apparently, in a unit symbol such
enclosed one is supposed not to expect a prefix. Thus, even if
“<code>a_u</code>” would have been defined,
“<code>(ka_u)</code>” should not be matched against
kilo-<code>a_u</code>.
</p>
<p>
Parentheses, however, were also used for the nesting of terms since
HL7 version 2.3. At this point it became ambiguous whether parentheses
are part of the unit symbol or whether they are syntactic tokens. For
instance, “<code>(ka_u)</code>” could mean a nested
“<code>ka_u</code>” (where “<code>k</code>”
could possibly be a prefix), but also the proper symbol
“<code>(ka_u)</code>” that happens to have parentheses as
part of the symbol. &TUCUM; uses parentheses for the usual meaning of
term nesting and uses square brackets where HL7's “ISO+”
assumes parentheses to be part of the unit symbol.
</p>
</comment>
<p name="curly braces">
<verse> The full range of characters 33–126 can be used within a
pair of curly braces (&lquo;<code>{</code>&rquo; and
&lquo;<code>}</code>&rquo;). The material enclosed in curly braces is
called <emph>annotation</emph>.
</verse>
<verse>
Annotations do not contribute to the semantics of the unit but are
meaningless by definition. Therefore, any fully conformant parser must
discard all annotations. Parsers of limited conformance <emph>should</emph>
not value annotations in comparison of units.
</verse>
<verse>
Annotations do, however, signify the end of a unit symbol.
</verse>
<verse> An annotation without a leading symbol implies the default
unit <unit>1</unit> (the unity).
</verse>
<verse>
Curly braces must not be nested.
</verse>
</p>
<comment>
<p>
Curly braces are here because people want annotations and deeply
believe that they need annotations. Especially in chemistry and
biomedical sciences, there are traditional habits to write annotations
at units or instead of units, such as “%vol.”,
“RBC”, “CFU”, “kg(wet tis.)”, or
“mL(total)”. These habits are hard to overcome. Any
attempt of a coding scheme to restrict this perceived expressiveness
will ultimately result in the coding scheme not being adopted, or just
“half-way” adopted (which is as bad as not adopted).
</p>
<p>
Two alternative responses to this reality exist: either give in to the
bad habits and blow up of the code with dimension- and meaningless
unit atoms, or canalize this habit so that it does no harm. &TUCUM;
canalizes this habit using curly braces. Nevertheless we do continuing
efforts to upgrade doubtful units to genuine units of &TUCUM; by
defining and linking them to the other units as good as
possible. Thus, “<code>g%</code>” is a valid metric unit
atom (so that “<code>mg%</code>” is a valid unit too.)
A <emph>drops</emph>, although quite imprecise, is a valid unit of volume
“<code>[drp]</code>”. Even HPF and LPF (the so called
“high-” and “low power field” in the
microscope) have been defined so that at least they relate to each
other.
</p>
</comment>
</div2>
<div2>
<head>Syntax Rules</head>
<p name="algebraic unit terms">
<verse> All units can be combined in an algebraic term using the
operators for multiplication (period &lquo;<code>.</code>&lquo;) and
division (solidus &lquo;<code>/</code>&rquo;). </verse>
<verse> The multiplication operator is mandatory it must not be
omitted or replaced by a space. The multiplication operator is a
strict binary operator that may occur only <emph>between two</emph> unit
terms. </verse>
<verse> The division operator can be used as a binary and unary
operator, i.e. a leading solidus will invert the unit that directly
follows it. </verse>
<verse> Terms are evaluated from left to right with the period and the
solidus having the same operator precedence. Multiple division
operators are allowed within one term. </verse>
<smallref>[ISO 1000, 4.5.2; ISO 2955-1983, 3.3f; ANSI X3.50-1986, 3.3f
(Rule No. 2f).]</smallref>
</p>
<comment>
<p>
The use of the period instead of the asterisk
(&lquo;<code>*</code>&rquo;) as a multiplication operator continues a
tradition codified in ISO 1000 and maintained in ISO 2955. Because
floating point numbers may not occur in unit terms the period is not
ambiguous. A period in a unit term has no other meaning than to be the
multiplication operator.
</p>
<p>
Since Resolution 7 of the 9th CGPM in 1948 the myth of ambiguity being
introduced by more than one solidus lives on and is quoted in all
standards concerning the writing of SI units. However, when the strict
left to right rule is followed there is no ambiguity, neither with one
solidus nor with more than one solidus. However, in human practice we
find the tendency to assign a lower precedence to the solidus which
misleads people to write <v>a</v>/<v>b</v>·<v>c</v> when they
really mean <v>a</v>/(<v>b</v>·<v>c</v>). When this is
rewritten as <v>a</v>/<v>b</v>/<v>c</v> there is actually less
ambiguity that in <v>a</v>/<v>b</v>·<v>c</v>. So the real
source of ambiguity is when a multiplication operator follows a
solidus, not when there is more than one solidus in a term. Hence, we
remove the restriction for only one solidus and introduce parentheses
which may be used to remove any perceived ambiguity.
</p>
</comment>
<p name="integer numbers">
<verse> A positive integer number may appear in place of a simple unit
symbol. </verse>
<verse> Only a pure string of decimal digits
(&lquo;<code>0</code>&rquo;–&lquo;<code>9</code>&rquo;)
is interpreted as a number. If after one or more digits there is any
non-digit character found that is valid for unit atoms, all the
characters (including the digits) will be interpreted as a simple unit
symbol.
</verse>
</p>
<comment>
<p>
For example, the string “<code>123</code>” is a positive
integer number while “<code>12a</code>” is a symbol.
</p>
<p>
Note that the period is only used as a multiplication operator, thus
“<code>2.5</code>” means 2 &mult; 5 and is not equal to 5/2.
</p>
</comment>
<p name="exponents">
<verse> Simple units may be raised to a power. The exponent is an
integer number and is written immediately behind the unit
term. Negative exponents must be preceded by a minus sign
(&lquo;<code>-</code>&rquo; positive exponents may be preceded by an
optional plus sign (&lquo;<code>+</code>&rquo;). </verse>
<verse> If the simple unit raised to a power is a combination of a
prefix and a unit atom, both are raised to the power, e.g. “1
<code>cm3</code>” equals “10<sup>-6</sup>
<code>m3</code>” not “10<sup>-2</sup>
<code>m3</code>”.
</verse>
<smallref>[ISO 2955-1983, 3.5f; ANSI X3.50-1986, 3.5f (Rule
No. 4f).]</smallref>
</p>
<comment>
<p>
ISO 2955 and ANSI X3.50 actually do not allow a plus sign leading a
positive exponent. However, if there can be any perceived ambiguities,
an explicit leading plus sign may be of help sometimes. <emph>The
Unified Code for Units of Measures</emph> therefore allows such plus signs
at exponents. The plus sign on positive exponents can be used to
delimit exponents from integer numbers used as simple units. Thus,
<code>2+10</code> means 2<sup>10</sup> = 1024.
</p>
</comment>
<p name="nested terms">
<verse> Unit terms with operators may be enclosed in parentheses
(&lquo;<code>(</code>&rquo; and &lquo;<code>)</code>&rquo;) and used
in place of simple units. Normal left-to-right evaluation can be
overridden with parentheses. </verse>
<verse> Parenthesized terms are <emph>not</emph> considered unit atoms
and hence must not be preceded by a prefix. </verse>
</p>
<comment>
<p>
Up until revision 1.9 there was a third clause
“Since a unit term in parenthesis can be used in place of
a simple unit, an exponent may follow on a closing parenthesis which
raises the whole term within the parentheses to the power.”
However this feature was inconsistent with any BNF or other syntax
description ever provided, was never used and seems to have no
relevant use case. For this reason this clause has been stricken.
This is a <emph>tentative</emph> change. Users who have used this
feature in the past, should please comment on this deprecation.
If we receive indication that this feature was used by anyone, we
would undo the deprecation. If no comments are received, the
deprecation continues to take effect.
</p>
</comment>
<exhibit>
<table class="plain" border="0">
<tr valign="top"><td><sym name="sign"/></td><td>::=</td>
<td><lit value="+"/> | <lit value="-"/></td></tr>
<tr valign="top"><td><sym name="digit"/></td><td>::=</td>
<td><lit value="0"/> | <lit value="1"/> | <lit value="2"/> |
<lit value="3"/> | <lit value="4"/> | <lit value="5"/> |
<lit value="6"/> | <lit value="7"/> | <lit value="8"/> |
<lit value="9"/></td></tr>
<tr valign="top"><td><sym name="digits"/></td><td>::=</td>
<td> <sym name="digit"/> <sym name="digits"/>
| <sym name="digit"/></td></tr>
<tr valign="top"><td><sym name="factor"/></td><td>::=</td>
<td><sym name="digits"/></td></tr>
<tr valign="top"><td><sym name="exponent"/></td><td>::=</td>
<td> <sym name="sign"/> <sym name="digits"/>
| <sym name="digits"/></td></tr>
<tr valign="top"><td><sym name="simple-unit"/></td><td>::=</td>
<td><sym name="ATOM-SYMBOL"/><br/>
| <sym name="PREFIX-SYMBOL"/><sym name="ATOM-SYMBOL[metric]"/></td></tr>
<tr valign="top"><td><sym name="annotatable"/></td><td>::=</td>
<td><sym name="simple-unit"/><sym name="exponent"/><br/>
| <sym name="simple-unit"/></td></tr>
<tr valign="top"><td><sym name="component"/></td><td>::=</td>
<td><sym name="annotatable"/><sym name="annotation"/><br/>
| <sym name="annotatable"/><br/>
| <sym name="annotation"/><br/>
| <sym name="factor"/><br/>
| <lit value="("/> <sym name="term"/> <lit value=")"/></td></tr>
<tr valign="top"><td><sym name="term"/></td><td>::=</td>
<td> <sym name="term"/> <lit value="."/> <sym name="component"/><br/>
| <sym name="term"/> <lit value="/"/> <sym name="component"/><br/>
| <sym name="component"/></td></tr>
<tr valign="top"><td><sym name="main-term"/></td><td>::=</td>
<td> <lit value="/"/> <sym name="term"/><br/>
| <sym name="term"/></td></tr>
<tr valign="top"><td><sym name="annotation"/></td><td>::=</td>
<td><lit value="{"/> <sym name="ANNOTATION-STRING"/>
<lit value="}"/></td></tr>
</table>
<caption>
The complete syntax in the Backus-Naur Form.
</caption>
</exhibit>
<figure id="ucum-state-automaton">
<pixmap source="https://raw.githubusercontent.com/ucum-org/ucum/main/assets/images/ucum-state-automaton.gif"/>
<caption>Pushdown-state automaton describing the syntax.</caption>
</figure>
</div2>
<div2>
<head>The Predicate “Metric”</head>
<p id="para-metric" name="metric and non-metric unit atoms">
<verse> Only metric unit atoms may be combined with a prefix.
</verse>
<verse> To be metric or not to be metric is a predicate assigned to
each unit atom where that unit atom is defined.
</verse>
<verse> All base units are metric. No non-metric unit can be part of
the basis.
</verse>
<verse> A unit must be a quantity on a ratio scale in order to be
metric.
</verse>
</p>
<comment>
<p>
The metric predicate accounts for the fact that there are units that
are prefixed and others that are not. This helps to disambiguate the
parsing of simple units into prefix and atom.
</p>
<p>
To determine whether a given unit atom is metric or not is not
trivial. It is a cultural phenomenon, subject to change, just like
language, the meaning of words and how words can be used. At one time
we can clearly tell right or wrong usage of words, but these
decisions may need to be revised with the passage of time.
</p>
<p>
Generally, metric units are those defined “in the spirit”
of the metric system, that emerged in France of the 18th century and
was rapidly adopted by scientists. Metric units are usually based on
reproducible natural phenomena and are usually not part of a system of
comparable units with different magintudes, especially not if the
ratios of these units are not powers of 10. Instead, metric units use
multiplier prefixes that magnify or diminish the value of the unit
by powers of ten.
</p>
<p>
Conversely, customary units are in the spirit of the middle age as
most of them can be traced back into a time around the 10th century,
some are even older from the Roman and Babylonian empires. Most
customary units are based on the average size of human anatomical or
botanic structures (e.g., foot, ell, fathom, grain, rod) and come in
series of comparable units with ratios 1/2, 1/4, 1/12, 1/16, and
others. Thus all customary units are non-metric
</p>
<p>
Not all units from ISO 1000 are metric as degree, minute and second of
plane angle are non-metric as well as minute, hour, day, month, and
year. The second is a metric unit because it is a part of the SI
basis, although it used to be part of a series of customary units
(originating in the Babylonian era).
</p>
<p>
Furthermore, for a unit to be metric it must be a quantity on a ratio
scale where multiplication and division with scalars are defined. The
<emph>Comité Consultatif d'Unités</emph> (CCU) decided
in February 1995 that SI prefixes may be used with the degree
Celsius. This statement has not been made explicitly before. This is
an unfortunate decision because difference-scale units like the degree
Celsius have no multiplication operation, so that the prefix value
could be multiplied with the unit. Instead the prefix at non-ratio
units scales the measurement value. One dekameter is 10 times of a
meter, but there is no meaning to 10 times of 1 °C in the
same way as 30 °C are not 3 times as much as
10 °C. See <pref from="para-special"/> on how &TUCUM; finds a
way to accommodate this different use of prefixes at units such as the
degree Celsius, bel or neper.
</p>
</comment>
</div2>
<div2>
<head>Style</head>
<comment>
<p>
Except for the rule on curly braces (<pref ref="para-curly"/>), the
rules on style govern the creation of the tables of unit atoms not
their individual use. Users of &TUCUM; need not care about style rules
(<pref from="para-underscore" to="para-apostrophe"/>) because users
just use the symbols defined in the tables. Hence, style rules do not
affect conformance to &TUCUM;. New submissions of unit atoms, however,
must conform to the style rules.
</p>
</comment>
<p id="para-curly" name="curly braces">
<verse>
Curly braces may be used to enclose annotations that are often written
in place of units or behind units but that do not have a proper
meaning of a unit and do not change the meaning of a unit.
</verse>
<verse>
Annotations have no semantic value.
</verse>
</p>
<comment>
<p>
For example one can write “<code>%{vol}</code>”,
“<code>kg{total}</code>”, or “<code>{RBC}</code>”
(for “red blood cells”) as pseudo-units. However, these
annotations do not have any effect on the semantics, which is why
these example expressions are equivalent to
“<code>%</code>”, “<code>kg</code>”, and
“<code>1</code>” respectively.
</p>
</comment>
<p id="para-underscore" name="underscore">
<verse> When in print a unit would have a subscript, an underscore
(&lquo;<code>_</code>&rquo;) is used to separate the subscript from
the stem of the unit symbol. </verse>
<verse>
The subscript is part of the unit atom.
</verse>
<verse> subscripts are used to disambiguate the two units with the
same name but different meanings.</verse>
</p>
<comment>
<p>
For example when distinguishing the International Table calorie from
the thermochemical calorie, we would use 1 cal<sub>IT</sub> or
1 cal<sub>th</sub> in print. &TUCUM; defines the symbols
“<code>cal_IT</code>” and
“<code>cal_th</code>” with the underscore signifying that
“IT” and “th” are subscripts. Other examples
are the distinctions between the Julian and Gregorian calendar year
from the tropical year or the British imperial gallon from the U.S.
gallon (see <pref ref="para-other"/> and <pref from="para-us-volumes"/>).
</p>
</comment>
<p name="square brackets">
<verse> Square brackets enclose suffixes of unit symbols that change
the meaning of a unit stem.
</verse>
<verse> All customary units shall be enclosed completely by square
brackets.
</verse>
<verse>
Other unit atoms shall be enclosed in square brackets if they are very
rare, if they will conflict with other units, or if they are normally
not used as a unit symbol but do have a proper meaning as a unit in
&TUCUM;.
</verse>
<verse>
Square brackets are part of the unit atom.
</verse>
</p>
<comment>
<p>
For example 1 m H<sub>2</sub>O is written as
“<code>m[H2O]</code>” in &TUCUM; because the suffix
H<sub>2</sub>O changes the meaning of the unit atom for meter (length)
to a unit of pressure.
</p>
<p>
Customary units are defined in &TUCUM; in order to accommodate
practical needs. However metric units are still preferred and the
customary symbols should not interfere with metric symbols in any
way. Thus, customary units are “stigmatized” by enclosing
them into square brackets.
</p>
<p>
If unit symbols for the purpose of display and print are derived from
&TUCUM; units, the square brackets can be removed. However, display
units are out of scope of &TUCUM;.
</p>
</comment>
<p id="para-apostrophe" name="apostrophe">
<verse>
The apostrophe (&lquo;<code>'</code>&rquo;) is used to separate words
or abbreviated words in a multi-word unit symbol.
</verse>
<verse>
Since units are mathematically defined symbols and not abbreviations
of words, multi-word unit symbols should be defined only to reflect
existing habits, not in order to create new ones.
</verse>
<verse>
Multi-word units should always be enclosed in square brackets.
</verse>
</p>
<comment>
<p>
For example, such legacy units called “Bodansky unit” or
“Todd unit” have the unit symbols
“<code>[bdsk'U]</code>”, and
“<code>[todd'U]</code>” respectively.
</p>
</comment>
</div2>
</div1>
<div1>
<head>Semantics</head>
<p name="preliminaries">
<verse> The semantics of &TUCUM; is defined by the algebraic
operations of multiplication, division and exponentiation between
units, by the equivalence relations of equality and commensurability
of units, and by the multiplication of a unit with a scalar.
</verse>
<verse> Every expression in &TUCUM; is mapped to one and only one
semantic element. But every semantic element may have more than one
valid representant in &TUCUM;.
</verse>
<verse>
The set of expressions in &TUCUM; is infinite.
</verse>
</p>
<p name="equality and commensurability">
<verse> The set of expressions in &TUCUM; has two binary, symmetric,
reflexive, and transitive relations (equivalence relations)
“equals” = and “is commensurable with”
∼. All expressions that are equal are also commensurable but not
all commensurable expressions are equal.</verse>
</p>
<p name="algebra of units">
<verse> The equivalence classes generated by the equality relation =
are called <emph>units</emph>.
</verse>
<verse> The set of units <v>U</v> has a binary multiplication operator
· that is associative and commutative and has the neutral
element <un>1</un> (so called <emph>the unity</emph>). For each unit
<un>u</un> &element; <v>U</v> there is an inverse unit
<un>u</un><sup>-1</sup> such that <un>u</un> ·
<un>u</un><sup>-1</sup> = <un>1</un>. Thus, (<v>U</v>, ·) is
an Abelian group.
</verse>
<verse> The division operation <un>u</un> / <un>v</un> is defined as
<un>u</un> · <un>v</un><sup>-1</sup>. </verse>
<verse> The exponentiation operation with integer exponents <v>n</v>
is defined as <un>u</un><sup><v>n</v></sup> = <prod from="1"
to="n"><un>u</un></prod>.
</verse>
<verse>
The product <un>u</un>' = <v>r</v> <un>u</un> of a real number scalar
<!-- <v>r</v> &element; \Re --> with the unit <un>u</un> is also a
unit, where <un>u</un>' ∼ <un>u</un>.
</verse>
</p>
<p id="para-dimension" name="dimension and magnitude">
<verse> The equivalence classes generated by the commensurability
relation ∼ are called <emph>dimensions</emph>. The set <v>D</v>
of dimensions is infinite in principle, but only a finite subset of
dimensions are used in practice. Thus, implementations of &TUCUM; need
not be able to represent the infinite set of dimensions.
</verse>
<verse>
Two commensurable units that are not equal differ only by their
magnitude.
</verse>
<verse> The quotient <un>u</un> / <un>v</un> of any two commensurable
units <un>u</un> ∼ <un>v</un> is of the same dimension as the
unity (<un>u</un> / <un>v</un> ∼ <un>1</un>). This quotient is
also equal to the unity multiplied with a scalar <v>r</v> <!--
&element; \Re -->: <un>u</un> / <un>v</un> = <v>r</v> <un>1</un>,
where <v>r</v> is called the <emph>relative magnitude</emph> of
<un>u</un> regarding <un>v</un>.</verse>
</p>
<p name="base units">
<verse>
Any system of units is constructed from a finite set <sys>B</sys> of
mutually independent base units <sys>B</sys> = {
<un>b</un><sub>1</sub>, <un>b</un><sub>2</sub>, ...,
<un>b</un><sub><v>n</v></sub> }, on which any other unit <un>u</un>
&element; <v>U</v> is defined as <un>u</un> = <v>r</v><sub>1</sub>
<un>b</un><sub>1</sub><sup><v>u</v><sub>1</sub></sup> ·
<v>r</v><sub>2</sub>
<un>b</un><sub>2</sub><sup><v>u</v><sub>2</sub></sup> ·
... · <v>r</v><sub><v>n</v></sub>
<un>b</un><sub><v>n</v></sub><sup><v>u</v><sub><v>n</v></sub></sup>,
where <v>r</v> = <v>r</v><sub>1</sub> · <v>r</v><sub>2</sub>
·· · <v>r</v><sub><v>n</v></sub> is called the
<emph>magnitude</emph> of the unit <un>u</un> regarding <sys>B</sys>.
</verse>
<verse>
With respect to a basis <sys>B</sys> every unit can thus be
represented as a pair (<v>r</v>, <vec>u</vec>) of magnitude <v>r</v>
<!-- &element; \Re --> and dimension <vec>u</vec> =
(<v>u</v><sub>1</sub>, <v>u</v><sub>2</sub>, ...,
<v>u</v><sub><v>n</v></sub>).
</verse>