A Mumps Compiler for Internet Server Applications
Department of Computer Science
University of Northern Iowa
Cedar Falls, IA 50614-0507
okane@cs.uni.edu
Abstract
This paper presents the development of a compiler that translates Mumps to C code that is subsequently compiled to binary executables on a host system (e.g., Windows 98/NT, Linux, Solaris, etc). During compilation, the Mumps routines are linked to library routines that provide b-tree based global array access and the usual Mumps functions. ODBC/SQL support is also included. This permits Mumps programs to load/store the global arrays from/to any ODBC/SQL compliant server. Also, the compiler supports an inline HTML scripting facility for use with web servers where HTML statements may be interspersed with ordinary Mumps code and contain Mumps expressions that are evaluated and replaced in the HTML and written back to the invoking web server. Built-in routines automatically extract from the server environment parameters passed from browsers in HTML FORMS and instantiate these values in the Mumps environment. Inline C/C++ code fragments may also be used to enhance functionality, including objects.
Introduction
Mumps (also referred to as "M") is a general purpose programming language that supports a native hierarchical data base facility. It is supported by a large user community (mainly biomedical), and a diversified installed application software base. The language originated in the mid-60's at the Massachusetts General Hospital [1, 2] and it became widely used in both clinical and commercial settings. Many implementations exist for the language [3] and it is available on many computer platforms. There are both ANSI, ISO (ISO/IEC 11756:1992) and DOD approved standards [4, 5, 6] for Mumps.
As originally conceived, Mumps differed from other mini-computer based languages of the late 1960=s by providing: 1) an easily manipulated hierarchical data base that was well suited to representing medical records; 2) flexible string handling support; and (3) multiple concurrent tasks in limited memory on very small machines. Syntactically, Mumps is based on an earlier language named JOSS and has an appearance that is similar to early versions of Basic that were also based on JOSS.
Initially, the structure of the language was limited by the constraints imposed by the primitive minicomputers and operating systems on which it was originally implemented. An early design goal for Mumps was to provide multi-user, time shared access despite the very limited memory and mainly single user operating systems available at the time. Consequently, early implementations of Mumps were mainly standalone, interpreter based, dedicated operating systems. In these implementations, each user was assigned a very small region of memory for both code and local data. Source code was loaded from external storage and stored in memory in source form. Since Mumps programs were mainly interactive and dominantly data base access dependent, direct interpretation of source code did not introduce serious performance penalties. Typically, program partitions were less than 4,000 bytes, including all data, stacks, code and buffers thus allowing multiple time-shared partitions on even the smallest machines. Because of memory limitations, applications usually consisted of many small, task-specific programs that were modular, compact, highly abbreviated, concise and focused on a limited objective. Program modules then, as now, were loaded frequently from a library. Applications typically consisted of a tree-like hierarchy of program modules that often corresponded to the structure of the underlying database in an early form of object-oriented programming. An excellent example of an early system structured this way can be found in the structure of the COSTAR System [7] that employed well over a thousand tightly coupled and encapsulated separate code modules to service an ambulatory patient record data base. A more recent example is the widely used Veterans Administration Distribute Hospital Computer Program (DHCP) system which consists of several thousand Mumps routines.
Mumps Implementations
Initially, all Mumps implementations were pure interpreters. As Mumps evolved, various methods were developed to partly compile Mumps code to intermediate representations, similar to Java byte codes or UCSD Pascal P code. However, due to indirection and the Xecute command, the interpretive nature of the language was always present. Indirection permits a Mumps program to dynamically construct and execute Mumps expressions and commands.
The compiler presented here initially began as an interpreter implementation [8,9,10,11,12,13]. The interpreter was used to develop several web based medical record applications in Mumps[14,15,16,17] and also for document indexing projects [18,19]. In the networking applications, efficiency, speed of execution and size of the program module are each critical to the number of server transactions that can be supported per second. Similarly, document indexing applications, based on the work of G. Salton [20] and others, are very CPU intensive and speed of execution is also critical. In some cases, document indexing programs can execute for one or more days even on a high speed workstation. Both types of applications are very dependent upon data base access which led to another extension, the ability to load and store the globals arrays from SQL servers [21,22] via ODBC calls.
Both the document indexing and internet applications led to the recent rewrite of a Mumps interpreter developed in the 1980's by this author into a compiler. The compiler generates C code as output.. The C code is then compiled and linked with library routines to create binary executables. The binaries execute as much as five time faster than the same code running on the interpreter and the overall size of each module is generally much smaller that the interpreter as a whole. The resulting C code has about ten times the number of lines as the original Mumps code although many lines involve very primitive operations.
Figure 1 gives a brief example of some C code output. The original Mumps program was considerably longer and only a few statements are shown here for illustration. A complete example is available at the web site listed below. The comment lines give the line number and original Mumps commands. Following the comments are the resulting lines of C code. Variables used exclusively by the compiler begin with underscore characters to distinguish them from Mumps local variable names
The compiler includes most of the traditional Mumps globals array and string processing features but it does not support any form of indirection. To support indirection would mean retaining the interpreter as part of each compiled module and this would introduce considerable overhead and size. While many Mumps programs use indirection, it is one of the features of the language that is hard to justify in terms of modern programming practices.
In this author's opinion, Mumps development diverged from the larger programming language community in the mid 80's and attempted to construct a language that was at odds with commonly accepted programming standards and practices. Mumps should have converged with other developing standards rather than separating itself from them. Indirection, including Xecute, although originating in the earliest dialects of Mumps, should have been among the first to go. It is inherently at odds with good programming practices and it means that any implementation of the language must always include an interpreter with every implementation. Without the ability to have freestanding compilations of Mumps programs which imples standardized subroutine linkage, predictable memory allocation of data structures and compatible data typing, many opportunities are lost because Mumps programs cannot properly link with other environments. Successive layers of increasingly less standard features have been added to the point that the language has nearly vanished.
Use with Web Servers and Browsers
One of the main purposes for this development effort was to build a web server-side Mumps host that exploits the facilities of the web server to provide networking and the end user's browser to provide a graphical user interface. This version of Mumps supports two ways in which to accomplish these goals:
First, the interface with the Web server environment is automatic. When a user at a browser enters data into an HTML FORM, the data is sent to the web server that, in turn, invokes an application program through the so-called, common graphics interface (CGI). The data entered by the browser user is made available to the invoked application through environment variables. The application program extracts the values from the environment variables, processes the data and sends a web page in HTML to the web server that, in turn, transmits it to the originating browser.
Web based applications using this Mumps compiler have immediate access in the compiled Mumps program to the data values in the web server environment variables. Initialization routines in the compiled module prologue extract the data from the web server environment, translate it to ASCII, and make it available in the Mumps program via function calls where the programmer gives the name of the data field and the function replies with the value received from the server.
Secondly, this version supports inline HTML. When a Mumps application is invoked by a web server, the web server captures all the standard output of the Mumps application and passes it to a browser. The Mumps application, therefore, must write to the standard output text which browsers can interpret, that is, HTML. The compiler when parsing a Mumps source program permits lines of source code to be written in either Mumps or HTML. The Mumps code lines are compiled while the HTML code lines are translated into standard output print statements. If during compilation the compiler detects a Mumps expression embedded in an HTML code line, the compiler generates the C code to evaluate the expression at runtime and substitute the result into the HTML. This permits the Mumps programmer to easily construct complex browser displays with data generated by the Mumps program.
For example, Figure 2 contains a section of a larger program that displays a patient record data to a browser. In this fragment, a list of test names (test) performed for some patient ptid on or after some date Date are displayed in a table with a button for each test name. An additional button for all tests is also presented. If the browser user clicks on one of the buttons, the appropriate mumps routine is invoked and passed as parameters the name of the test and the date. The end user browser buttons are created by JavaScript routines, two of which are shown. When the button is clicked, it is the JavaScript routine that actually sends the message to the server with the name of the compiled Mumps program to execute and the parameters.
The Mumps part of the program is very simple. It cycles through (using $Next) each of the patients test name values. Upon encountering the inline HTML code, the Mumps compiler generates print statements containing the HTML code. Remember, when this program runs, it will do so as a result of an invocation by the web server. All output generated by this program will be captured by the web server and sent to the browser that originated the request. If the compiler detects in the inline HTML statement a Mumps expression, it generates the code necessary to evaluate the expression and generates print statements to print the result in place of the expression. Mumps expressions are detected by the leading &~ and the trailing ~ characters. The function $zh encodes the value of the string argument into the mixed ASCII character and hexadecimal format required by HTML (that is, non-alphanumeric characters in parameter lists must be encoded in hexadecimal; the blank is encoded as a plus sign).
Data Base Access
The compiler allows three options for global array access. The first of these is no database. Programs compiled with this option have no access to global variables. Consequently, the support routines for global array processing are not linked into the final executable, thereby making the executable much smaller and faster loading. Also, as there is no opening or closing of the globals and no initialization of the buffers and other environmental code, overall execution time of the compiled module is reduced.
The second option includes the standalone b-tree global array processing routines in the compiled executable. This includes full global array processing, storage and retrieval. The user may select the directory in which the globals are placed. If other compiled modules access the same global array files, the global array routines synchronize file access. All globals are stored in b-trees with stored data residing in one file and the keys in another. The key and data files may be placed on separate disk drives to improve performance.
The third option allows temporary global arrays that are created upon initiation of the mumps program and deleted upon termination. These may be either disk or virtual memory resident. This would mainly be used in conjunction with loading and storing the globals from another data base system as discussed next.
With the second two options, global arrays may be loaded from and/or stored into a relational data base management system server. Upon initiation, the Mumps program can load global arrays from the server, manipulate the globals using the traditional Mumps data base view, and, finally, store all or part of the globals arrays on the server before termination. The compiler supports z function extensions to load, store, and search an ODBC compliant SQL server.
For applications where the primary storage of data is on a SQL server, the temporary globals array option has the least overhead. For some applications, such as a practitioner with a laptop making rounds to remote locations without online access, preloading the laptop global arrays and then updating the server at the end of the day would be a viable option.
For specialized applications, the global array processing library provided with the compiler could be substituted with a locally written package or interfaced to a larger system. The communication between the compiled Mumps program and the global array routine is simple and straightforward.
Inline C Code and System Facilities
The Mumps compiler translates the Mumps program to C source code that is subsequently translated to binary executables on the host system. The Mumps program variables are C character string variables and must be declared with a special Z command. Consequently, Mumps variables may also be accessed by, passed to, and loaded from other C environment functions. The Mumps compiler permits inline C code to be interspersed with the Mumps and HTML code (C code lines have a special character at their beginning) and Mumps expressions will be evaluated and their values substituted into the C code as with HTML code lines. C control and block structures may also be used.
Mumps programs can be compiled to functions, callable from other system environments. Thus, for example, a Windows based application written in C or C++ can call upon Mumps functions and pass and receive data from the Mumps functions in a normal manner. This would allow any application to take advantage of global arrays.
Summary
Copies of the compiler for several host operating systems and associated library routines are available without charge from: http://www.cs.uni.edu/~okane. Updates and improvements will be posted to this address as they are developed. Future work will involve optimizing the C output of the compiler and extending its features. A complete Internet based clinical information system, originally implemented for the web server side Mumps interpreter, is being refitted for use with the compiler and it will be the subject of a further report.
References
1. J. Bowie, and G. O. Barnett, MUMPS - an economical and efficient time-sharing language for information management, Computer Programs in Biomedicine, Vol 6, pp 11-21 (1976).
2. Barnett, G. Octo; and Greenes, R. A. (1970). High level programming languages, Computers and Biomedical Research, Vol 3, pp 488-497.
3. M Technology Association, M Sources ' 94, M Technology Association, 1738 Elton Road, Suite 205, Silver Spring, Maryland 20903, Tel: (301) 431-4070., Fax: (301) 431-0017 (1994)
4. American National Standards Institute, Inc. ANSI/MDC X11.4-1995 Information Systems - Programming Languages ! M, American National Standards Institute, 11 West 42nd Street, New York, New York 10036 (http://web.ansi.org/default_js.htm)
5. American National Standards Institute, Inc., ANSI/MDC X11.4-1995 : MUMPS - X Window System Binding, American National Standards Institute, 11 West 42nd Street, New York, New York 10036 (http://web.ansi.org/default_js.htm)
6. American National Standards Institute, Inc., ANSI/MDC X11.3-1994 : Graphical Kernel Systems (GKS) - MUMPS Language Binding, American National Standards Institute, 11 West 42nd Street, New York, New York 10036 (http://web.ansi.org/default_js.htm)
7. Barnett, G.O.; et al., COSTAR - a computer-based medical information system for ambulatory care, Proc of the IEEE, Vol 67, No 9 (1979).
8. O'Kane, K.C.; An RT-11 single user standard MUMPS interpreter, MUMPS Users' Group Quarterly, Vol 10, p 5-6 (1980).
9. O'Kane, K.C.; A portable FORTRAN based implementation of MUMPS, MUMPS Users' Group Quarterly, Vol 12, p 19-21 (1982).
10. O'Kane, K.C.; A MUMPS language development and experimentation laboratory, MUMPS Users' Group Quarterly. Vol 12, No 4., p 47-51 (1983).
11. O'Kane, K.C.; MUMPS under IBM VM/CMS, MUMPS Users' Group Quarterly, Vol 13, No 1, p 55-57 (1983).
12. O'Kane, K.C.; A portable hybrid MUMPS development system host, Proc IEEE Computer Society 7th International Computer Software Applications Conference, p 60-65 (1983).
13. O'Kane, K.C.; A C Based MUMPS Interpreter, MUMPS Users' Group Quarterly, Vol 14, No 2, p 23-24 (1984).
14. O'Kane, K.C.; McColligan, E.E.; Davis, G. A., Implementing a Distributed Intranet Based Information System, Topics in Health Information Management, Vol 17, No 2 pp 54-62 (1996).
15. O'Kane, K.C.; and McColligan, E. E., A case study of a Mumps intranet patient record, Journal of the Healthcare Information and Management Systems Society, Vol 11, No 3, pp 81-95 (1997).
16. O'Kane, K.C.; and McColligan, E.E., A Web Based Mumps Virtual Machine, Proceedings of the American Medical Informatics Association 1997 Fall Symposium, D. R. Masys, ed., Abstract: p 881, Text: CDROM Document D004079.pdf (Hanley & Belfus, Philadelphia 1997)
17. O'Kane, K.C.; and McColligan, E.E., A Web access script language to support clinical application development, Computer Methods and Programs in Biomedicine, Vol 55, pp 85-97 (1998).
18. O'Kane, K.C.; "The design of a text-based information storage and retrieval system in MUMPS," MUMPS Users' Group Quarterly, Vol XXI, No 4, pp 21-26 (1991).
19. O'Kane, K.C.; A language for implementing information retrieval software, Online Review, Vol 16, No 3, pp 127-137 (1992).
18. ODBC 3.5 Developer's Guide (Warehousing/Data Management) by Roger Sanders Paperback - 480 pages (July 1998) Computing McGraw-Hill; ISBN: 0070580871
19. O'Kane, K. C.; Design for a relational data base system in MUMPS, MUMPS Users' Group Quarterly, Vol 15, No 2 p 33 (1985).
20. Salton, Gerard, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, ISBN 0-201-12227-8 (Reading, MA, 1988).
21. O'Kane, K.C.; An expert systems and relational data base management facility for Mumps, Computers in Biology and Medicine, Vol 16, No. 3, pp 205-213 (1986).
22. O'Kane, K.C., Mumps Using SQL RDBMS Data Warehousing, in preparation.
/*** 8 open 1:"tmp.tmp,old" for i=1:1 use 1 read a quit:'$t use 5 write "***",a,! */
_i=atoi("1");
strcpy(_tmp0,"tmp.tmp,old");
for(_j=0; _tmp0[_j]!=0 && _tmp0[_j]!=',';_j++);
if (_tmp0[_j]!=',') {printf("*** File Error\n"); abort();}
_tmp0[_j++]=0;
strcpy(_gtmp,_tmp0);
if (strcmp(&_tmp0[_j],"old")==0||strcmp(&_tmp0[_j],"OLD")==0) _new=0;
else if (strcmp(&_tmp0[_j],"new")==0||strcmp(&_tmp0[_j],"NEW")==0) _new=1;
else if (strcmp(&_tmp0[_j],"append")==0||strcmp(&_tmp0[_j],"APPEND")==2) _new=1;
else {printf("*** File error\n"); abort();}
if (_new==1) { /* output */
_file[_i]=fopen(_gtmp,"w");
if(_file[_i]==NULL) _tpx=0;
else _tpx=1;
}
else if (_new==2) { /* append */
_file[_i]=fopen(_gtmp,"a");
if(_file[_i]==NULL) _tpx=0;
else _tpx=1;
}
else { /* input */
_file[_i]=fopen(_gtmp,"r");
if(_file[_i]==NULL) _tpx=0;
else _tpx=1;
}
strcpy(_for1_init,"1");
strcpy(_for1_incr,"1");
for(strcpy(i,_for1_init);
1; /* limit expression - no limit */
add(i,_for1_incr,i)) {
_io=atoi("1");
_file[5]=stdin;
_tpx=getstr1(_file[_io],_gtmp);
if (_tpx>=0) _tpx=1; else _tpx=0;
strcpy(a,_gtmp);
if (_tpx) _tmp0[0]='1'; /* $test */
else _tmp0[0]='0';
_tmp0[1]='\0';
if (numcomp(_tmp0,"0")==0) strcpy(_tmp1,"1");
else strcpy(_tmp1,"0");
if (atoi(_tmp1)) { /* postconditional */
break;
} /* post conditional */
_io=atoi("5");
_file[5]=stdout;
_hor[_io]+=fprintf(_file[_io],"%s","***");
strcpy(_tmp0,a);
_file[5]=stdout;
_hor[_io]+=fprintf(_file[_io],"%s",_tmp0);
_file[5]=stdout;
fprintf(_file[_io],"\n");
_hor[_io]=1;
_ver[_io]++;
}
/*** 10 set:1=1 a=1+(2+(3+(4+5))) set a=0 write "0 ",a,! */
if (strcmp("1","1")==0) strcpy(_tmp0,"1");
else strcpy(_tmp0,"0");
if (atoi(_tmp0)) { /* postconditional */
add("4","5",_tmp0);
add("3",_tmp0,_tmp1);
add("2",_tmp1,_tmp2);
add("1",_tmp2,_tmp3);
strcpy(a,_tmp3);
} /* post conditional */
strcpy(a,"0");
_file[5]=stdout;
_hor[_io]+=fprintf(_file[_io],"%s","0 ");
strcpy(_tmp0,a);
_file[5]=stdout;
_hor[_io]+=fprintf(_file[_io],"%s",_tmp0);
_file[5]=stdout;
fprintf(_file[_io],"\n");
_hor[_io]=1;
_ver[_io]++;
/*** 11 for i=1:1:10 for j=1:1:10 set ^a(i,j)=i_","_j */
strcpy(_for1_init,"1");
strcpy(_for1_incr,"1");
strcpy(_for1_lim,"10");
for(strcpy(i,_for1_init);
numcomp(i,_for1_lim)<=0;
add(i,_for1_incr,i)) {
strcpy(_for2_init,"1");
strcpy(_for2_incr,"1");
strcpy(_for2_lim,"10");
for(strcpy(j,_for2_init);
numcomp(j,_for2_lim)<=0;
add(j,_for2_incr,j)) {
strcpy(_tmp0,i);
strcpy(_tmp1,j);
strcpy(_gtmp,"^a\xce");
strcat(_gtmp,_tmp0);
strcat(_gtmp,"\xd0");
strcat(_gtmp,_tmp1);
strcat(_gtmp,"\xcf");
strcpy(_tmp0,i);
strcpy(_tmp1,_tmp0);
strcat(_tmp1,",");
strcpy(_tmp2,j);
strcpy(_tmp3,_tmp1);
strcat(_tmp3,_tmp2);
_f=global(STORE,_gtmp,_tmp3);
}
}
Example of Generated C CodeFigure 1
Content-type: text/html &!&!
<html>
<script language='JavaScript'>
<!-- &!
function labmore(lab,date) {
str="/cgi-bin/labmore1.exe?ptid=&~ptid~&lab="+lab+"&D="+date;
labmwin=open(str,'labm','screenX=0,screenY=200,height=120,width=630,resizable=yes,status=no,location=no,toolbar=no,scrollbars=yes');
labmwin.focus();
}&!
function labcht() {
str="/cgi-bin/flow2l.exe&ptid=&~ptid~";
labchtwin=open(str,'labc','screenX=0,screenY=200,height=254,width=628,resizable=yes,status=no,location=no,toolbar=no,scrollbars=yes');
labchtwin.focus();
...
//-->
</script>
...
Set test=-1, i=1
<center><form><table border bgcolor=silver><font size=1><tr>
<td align=center><font size=2>
<input type=Button onClick="labcht()" value=" ALL Labs "></td>
L107 Set test=$Next(^patient(ptid,"lab",test))
If test<0 Write "</form></tr></table></center>" Goto labend
<td align=center><font size=1>
<font size=1> &~test~<br>
<font size=2>
<input type=Button onClick="labmore('&~$zh(test)~','&~$zh(Date)~')" value=" * ">
</td>
Set i=i+1
If i>12 Write "</tr><tr><td></td>" Set i=0
Goto L107
labend ...
Figure 2Example of inline HTML code