UMath2LaTeX: Parsing UnicodeMath and converting to LaTeX
All UnicodeMath expressions in the box below are converted when you click the "Parse" button.
Feel free to edit the expressions.
Typing Unicode in a browser is a bit of an issue, but you can try LaTeX commands followed by [space].
Typeset output:
Raw input / LaTeX / Prefix form (to check term-tree)
Some remarks about the parser:
The top-down parsing algorithm is due to Vaughan Pratt (1973).
The implementation is based on Douglas Crockford's
article Top
Down Operator Precedence where he uses the algorithm to parse
Simplified JavaScript.
The current input language is UnicodeMath,
based on standard conventions for mathematical symbols and
operations. The types for symbols and expressions are
variable symbols single latin letters ($u,v,w,x,y,z$)
with possible subscripts ($u_0,u_1,\dots,z_0,z_1,\dots$)
constant symbols, the default type for all symbols not assigned to other types
($a,b,\dots,\alpha,\beta,\dots,\emptyset,\infty,0,1,\dots$)
prefix/infix/postfix/aroundfix function symbols with
standard precedence ($+,-,\cdot,/,\cup,\cap,\sqrt{\phantom{x}},\ln,\sin,| |, \dots$)
terms built from function symbols applied to constants and terms
prefix/infix relation symbols with lower
precedence than function symbols ($∈,=,≤,<,≥,>,…$)
atomic formulas built from relation symbols applied
to terms
prefix/infix logical symbols with lower precedence
than relation symbols
($\neg$, or, and, $\implies,\iff,\exists,\forall,\dots$)
formulas built from logical symbols applied to
atomic formulas and formulas
metalogical symbols and large operator
symbols (osym) that combine mathematical expressions from several
of the above types ($\vdash,\models,\bigvee,\bigwedge,\bigcup,\bigcap,\sum,\prod,\lim,\int,\dots$)
Standard mathematical notation makes liberal use of invisible
times and function application by juxtaposition (i.e., when two
symbols are adjacent with the symbol on the left neither prefix nor infix and
the symbols on the right neither infix nor postfix). The parser
treats this situation as a binary operation called \, (LaTeX thin space).
Infix symbols that are usually considered associative
($+,\cdot,=$) have variable arity and chain over a list of arguments
(rather than having left or right associated parse trees).
Many symbols are overloaded, but context is used to disambigute
these situations. Each symbol has a default type, but its type can be
changed dynamically.
The abstract syntax tree contains the symbol (sym:string), and
arguments (arg, arg2, arg3), where the latter three are either a tree or a list of trees.
Peter Jipsen --- January 2019 --- Chapman University