081:115G Spring 2009
810:115g Information Storage and Retrieval (3 hours)

Last updated: October 28, 2008
Objective To understand computer based automatic indexing and retrieval of text/web based information.
Books: Experiments in Information Storage and Retrieval Using Mumps
Course Materials: Programming examples will mainly use the Mumps language although other languages including C++, Perl, PHP, and Java may be used as desired.

Data bases, examples and text
Mumps Pocket Guide JPG images

Requirements: The requiements will consist of a set of assignments to be performed individually and a project which may be done either individually or in small groups (approx. 3 persons max). As this couse depends heavily on lecture content, attendance is required. Excessive absence will result in a reduced grade for the project. The term "excessive absence" means: don't push your luck.

Term Project (30%)
Assignments (40%) Late assignments charged 5% per class day to a max of 25%. Chronically late assignments will be charged at a higher rate.
Tests (2 at 10% each)
Final (10%).

Classes: Classes are lecture format. Cell phones, pagers, laptops and PDA's may not be used.

Test 1
Test 2
Final: Click Here
Makeup Tests Makeup tests will be given only in cases of demonstrated need for causes such as serious illness, family emergency or University sanctioned schedule conflict. In all cases, written documentation will be required.
Penalty for
Hacking & Cheating
A grade of "F" for the course; termination of computer access. If your work duplicates in whole or part the work of another, both works will receive a grade of F.
Project

Projects will be presented in class the last week of classes.

All projects will be presented in class with a brief online demonstration. A vote will be taken for the best project and the winner will be exempt from the final (grade of A will be entered for the final exam grade). You may work in teams of from 1 to 3 (hint: have one person do the indexing, one the retrieval code and the other do the web interface). All group members will receive the same grade.

Project: using the OSU Medline file, design a web based system to access the data. Your project will include code to index and access the information. Use the first 20,000 abstracts.

You may design your own access interface method: keywords, hierarchies, etc. Input can be either by typed queries or point-and-click. Be original!

Final Grades Final grades will not be available via email. If you want your grade mailed to you, bring a stamped, self-addressed envelope to the final.
Contact Click Here
Computer I am assuming you will want to use your own computer to do the assignments and project on. If that is not the case, you may have an account on one of my Linux servers.
Getting Started
  1. Get and install Ubuntu either native, under WUBI or Sun VirtualBox. or Cygwin for Windows
  2. Get and install Mumps under Cygwin: In Cygwin, type:
    1. wget http://cns2.uni.edu/~okane/source/MUMPS-MDH/mumpscompiler-10.0.src.tar.gz
      (Check http://cns2.uni.edu/~okane/source/MUMPS-MDH/ for the latest version number).
    2. tar xvzf mumpscompiler-10.0.src.tar.gz
    3. cd mumpsc
    4. ./configure prefix=/usr
    5. make
    6. make install
  3. Learn an adult editor (nano/pico are for children) : vi Tutorial
  4. Learn Mumps.
HTML BareBones HTML Guide
Assignments Do not push assignments under the office door - leave them in the CS Office. All pages must be stapled.

Other Resources Mesh

Documentation and Availability
Documentation Descriptor Data Elements
Descriptor Records
Documentation Qualifier Data Elements
Qualifier Records
Supplementary Records

Resources:

Salton, G., Automatic Informatiuon Organization and Retrieval, McGraw Hill (1968)
Salton, G., Automatic Text Processing, Addison-Wesley (1989)
Salton, G., ed., The SMART Retrieval System, Prentice-Hall (1971).
Salton, G., and McGill, M., Introduction to Modern Information Retrieval, McGraw-Hill, (1983)
Borko, H., Automated Language Processing, (1968)
The Smart System from Cornell: ftp://cs.cornell.edu/pub/smart


Data Sets for Machine Learning
WordNet
Apache
Web Archive (you think you have disk capacity problems!)
PostgreSQL Tutorial
Rod Library Electronic Resources
ACM Digital Library
Lemur Toolkit for Language Modeling
Cornell Smart System
SIGIR List and Archives
UVA Electronic Text Center
NLM Gateway
Digital Library Research Laboratory
Automatic Text Browsing Using the Vector Space Model
Lawrence Berkeley Lab Science Articles Archive
The Internet Archive
Search Engine Features
Anatomy of a Search Engine (Google)
Medline (National Library of Medicine)
Information Retrieval by C. J. van Rijsbergen
Modern Information Retrieval Chapter 10
Cystic Fibrosis Reference Collection
Marti Hearst Site
Online Papers
Ed Fox Links
IIT IR Publications
Web IR
WWW 10
WWW 9
WWW 8
WWW 7
WWW 5
IRIS Project
Lots of Links
Top Ten Issues
Searching Genomic Databases
Amazon.com's recommender algorithm
Huffman Trees
Knuth Optimal Binary Trees
Hu-Tucker Trees
AVL (Balanced) Trees
B trees and AVL Trees
B Trees
IBM Clever Project
flex documentation
Homology Searching
Project Gutenberg

The following notice is required by the University:

"The Americans with Disabilities Act of 1990 (ADA) provides protection from illegal discrimination for qualified individuals with disabilities. Students requesting instructional accommodations due to disabilities must arrange for such accommodation through the Office of Disability Services. The ODS is located at: 213 Student Services Center, and the phone number is: 273-2676."

Because the Office of Disability Services has procedures in place to determine the validity of disability claims as well as the need for instructional accommodations, faculty are reminded that they are to direct all students with accommodation requests to the above listed office.

UNDER NO CIRCUMSTANCE SHOULD A FACULTY MEMBER MAKE AN ACCOMMODATION INDEPENDENT OF THE OFFICE OF DISABILITY SERVICES.

Questions may be directed to: Jane Slykhuis, Disability Services Coordinator, at 273-2676 or to this office at 273-2846.