Last update: July 30, 2008
******
--> Click Here for Distribution DownLoads <--
******
Overview of an Open Source/GPL'ed Mumps Interpreter / Compiler
and
MultiDimensional and Hierarchical Toolkit (MDH)
for Linux, Cygwin and Windows
Mumps (also referred to as M) is a general purpose programming language that supports a
unique, hierarchical (or multidimensional) database facility. It was originally developed in the late 1960s and the acronym
stands for the Massachusetts General Hospital Utility Multi-programming System. It was (and is) widely
used in clinical computing. Its original purpose was to store tree structured medical records.
Over the years a number of commercial versions were developed. Most of these, however,
are now extinct, merged or evolved into forms considerably different from the original.
The version described here is an open source, GPL licensed implementation begun in the
early 1980's. It also has undergone considerable evolution and Mumps/II is the current result.
The main motivation for its development was to implement tools for information storage
and retrieval, text processing and bioinformatics.
(for example, see Online I&SR Notes)
This implementation is written entirely in C/C++ and compiles under Linux, Cygwin and Windows (some
features are omitted from the Windows version due to differences in the MS VC++ compiler
versus gcc/g++).
The implementation consists of two parts: a compiler which translates Mumps to C++ and then to binaries and
an interpreter scripting shell which executes source code directly.
The hierarchical or multidimensional data base is perhaps the most interesting feature of Mumps.
It permits the construction of arbitrary trees by means of string indexed
array references. Data may be stored at any node and there are functions to
sequentially access siblings and children.
For example, the NLM MeSH codes,
a hierarchy of terminology used in the health sciences,
consists of text such as the following excerpt:
Cardiovascular System;A07
Blood Vessels;A07.231
Arteries;A07.231.114
Aorta;A07.231.114.056
Aorta, Abdominal;A07.231.114.056.205
Aorta, Thoracic;A07.231.114.056.372
Sinus of Valsalva;A07.231.114.056.847
Arterioles;A07.231.114.060
Axillary Artery;A07.231.114.085
Basilar Artery;A07.231.114.106
Brachial Artery;A07.231.114.139
Brachiocephalic Trunk;A07.231.114.145
Bronchial Arteries;A07.231.114.158
Carotid Arteries;A07.231.114.186
Carotid Artery, Common;A07.231.114.186.200
Carotid Artery, External;A07.231.114.186.200.210
Carotid Artery, Internal;A07.231.114.186.200.230
Carotid Sinus;A07.231.114.186.456
Celiac Artery;A07.231.114.207
Cerebral Arteries;A07.231.114.228
Anterior Cerebral Artery;A07.231.114.228.100
Circle of Willis;A07.231.114.228.351
Middle Cerebral Artery;A07.231.114.228.550
Posterior Cerebral Artery;A07.231.114.228.700
Temporal Arteries;A07.231.114.228.868
where the text on each line prior to the semi-colon
is the name of the item being described and the codes following
the semi-colon are the hierarchical codes assigned to the
item described.
In Mumps, this can be represented as (this image can be viewed larger):
In the above, a sparse disk resident global array named ^mesh()
is created to store the items being described. Each array reference
consists of a sequence of codes from the MeSH hierarchy and the text of the
item described is stored at the final leaf node.
For further examples using the MeSH codes,
click here.
The upper limit on global arrays in this version is 246TB.
To legacy Mumps, Mumps/II adds the following features:
- Relational database access. Mumps/II interoperates with PostgreSQL, a widely used,
free, (Berkeley license) open source RDBMS system. Mumps/II can access PostgreSQL databases
as well as store the Mumps/II hierarchical and multidimensional file system in PostgreSQL tables.
- Advanced text processing functional support. Mumps/II adds many functions to the legacy
Mumps base including functions to compute Smith-Waterman sequence alignments, the Perl
Compatible Regular Expression Library, the Cosine, Jaccard, and Dice similarity coefficients,
and a number of matrix manipulation routines.
- Shell scripting. Mumps/II has facilities to interact fully with the underlying operating
system through shell scripts. These permit a full range of system functions to be directly
executed from the Mumps/II environment.
- Translation to and compatibility with C++. The Mumps/II compiler translates Mumps/II programs
to standard C++. Thus, Mumps/II programs can call upon the complete resources of the C++ runtime
environment. Mumps/II programs may contain embedded C++ statements and there is a C++ class hierarchy to give user written C++ programs access to all Mumps/II facilities.
Mumps is essentially an interpreted language as Mumps commands can write and
execute code.
The MDH (Multi-Dimensional and Hierarchical Data Base Toolkit) is a Linux/Cygwin based, open
sourced, toolkit of portable software that makes many features of Mumps
available to C++ programs. It supports very fast, flexible, multi-dimensional
and hierarchical storage, retrieval and manipulation of data bases ranging in size up
to 256 terabytes. The package is written in C and C++ and is available under the GNU
GPL/LGPL licenses in source code form.
You must install the Mumps Compiler in order to use the MDH.
Documentation and Installation
See
MDH manual
for Multi-Dimensional and Hierarchical Toolkit C++ Library details.
See
Mumps/II Interpreter/Compiler manual
for Interpreter/Compiler details. A file in the distribution named mumpsc/doc/compiler.html
has additional details.
(--> author makes shameless profit from sales of book <--).
You un-tar/gzip the distribution with a command such as:
tar xvzf mumpscompiler-11.0.src.tar.gz
which will build a sub-directory named mumpsc.
Then, in mumpsc, as root:
configure prefix=/usr
make
make install
Use:
configure prefix=/usr --with-cpu64
if you have a 64 bit CPU and are using Linux.
See the PDF for advice if your systems hides its libraries somewhere odd.
You may need to install additional
software on your Linux or Cygwin system. Please see the documentation for details.
Mumps needs:
- The Perl Compatible Regular Expression Developement Library (required)
- The PostgreSQL RDBMS (optional)
If you use this work, please cite:
O'Kane, Kevin C. (1999), "An M Compiler for Internet server applications", M Computing, 7(1):11-17.
and/or:
http://www.cs.uni.edu/~okane
License
The Mumps Compiler is distributed under the GNU GPL and GNU LGPL licenses.
Please see each source module to determine which license applies.
Generally speaking, the compiler itself is distributed under the
GNU GPL license and the runtime libraries under the GNU LGPL.
Copies of the licenses are included in the distributions along
with copyright information.
The PCRE code is dirstributed under its own license.
Kevin C. O'Kane
http://www.cs.uni.edu/~okane
|