|Objective||To understand computer based automatic indexing and retrieval of text/web based information.|
Experiments in Information Storage and Retrieval Using Mumps (to be distributed)
The Mumps Programming Language (Note: copies of the PDF for this will be sent to members of the class once the email class list has been established. If you want a printed copy, I have a limited number available for $2.50 each.)
|Notifications|| Assignments and other notifications will be sent by email. If you have blocked your email address or
you register late, you will not be on the class list.
If you are not on the initial registration list provided by the Registrar, it is your responsibility to add yourself to the class list. See the instructions on my homepage. You may add additional email address(s) to this list if the default is not your primary account.
|Course Materials:||Slides in PDF Format|
The requirements will consist of a set of assignments to be
performed individually and a project which may be done
either individually or in small groups (up to 3).
As this couse depends heavily on lecture content, attendance is
Excessive absence will result in a reduced grade
for the project.
Extra credit may be awarded for attendance at selected presentations and
seminars both on campus and off. These opportunities will be announced in class.
Classes are lecture format. Cell phones, pagers, and PDA's may not be used.
Laptop use is permitted if class related. YouTube, Facebook, email, Chat (any flavor), AngryBirds (or anyother game) are not class related.
Assignments may be submitted by email as plain text, PDFs or image files (jpg or png). The subject line must contain the course (CS 3150), your name and the assignment number. If these are not present, the email will not be accepted.
If an assignment involves programming, you must turn in your code and an example of its execution. If you elect to submit code without output, your program will be assumed to be non-functional and there will be an automatic 40% deduction.
|Makeup Tests||Makeup tests will be given only in cases of demonstrated need for causes such as serious illness, family emergency or University sanctioned schedule conflict. In all cases, written documentation will be required. In those cases where a makeup test is granted, it must be taken within 1 week of the originally scheduled exam or, in the case of illness, return to classes..|
Penalty for |
Hacking & Cheating
A grade of "F" for the course and possible University disciplinary action.
If your work duplicates in whole or part the work of another,
both works will receive a grade of F.
You may use material from the Internet if you document the source (URL). Any undocumented use of the Internet in an assignment will be considered a serious case of cheating and will result in a grade of F for the course.
|Project||Projects will be presented in class with a brief online demonstration. A vote will be taken for the best project and the winner will be exempt from the final (grade of A will be entered for the final exam grade). You may work in teams of from 1 to 3 (hint: have one person do the indexing, one the retrieval code and the other do the web interface). All group members will receive the same grade.|
I will send an anonimized spreadsheet by email each time there is a significant grading event. The spreadsheet will list
the grades I have recorded for you and the final score, assuming all remaining work is perfect.
If you see an error, please contact me immediately to have it corrected. I will assume that if I do not hear within a reasonable period of time, that all grades are correct and that no typos have happened.
The spreadsheet is sent by email to the class list thus you must register for the class email list. You must also give me a code word by which your row in the spreadsheet will be known. Spreadsheet rows will be randomized and not in alphabetic order. If you do not give me a secret codeword, your row will not appear in the spreadsheet.
You may want to use your own computer to do the assignments and project.
If that is not the case, you may have an account on one of my
The assignments and project are extreme I/O intensive. If you use your own machine, a desktop with lots of memeory is best.
Assignments will be done in Linux. I will be distribute an Oracle Virtual Box virtual disk with a preconfigured Debian system that contains the code and databases. You will need to install the (free) Oracle Virtual Box and install the distributed virdual disk in same.
|Data Base||To be distributed.|
|| HTML ||
BareBones HTML Guide ||
Salton, G., Automatic Informatiuon Organization and Retrieval,
McGraw Hill (1968)
Data Sets for Machine Learning
Web Archive (you think you have disk capacity problems!)
Rod Library Electronic Resources
ACM Digital Library
Lemur Toolkit for Language Modeling
Cornell Smart System
SIGIR List and Archives
UVA Electronic Text Center
Digital Library Research Laboratory
Automatic Text Browsing Using the Vector Space Model
Lawrence Berkeley Lab Science Articles Archive
The Internet Archive
Search Engine Features
Anatomy of a Search Engine (Google)
Medline (National Library of Medicine)
Information Retrieval by C. J. van Rijsbergen
Modern Information Retrieval Chapter 10
Cystic Fibrosis Reference Collection
Marti Hearst Site
Ed Fox Links
IIT IR Publications
Lots of Links
Top Ten Issues
Searching Genomic Databases
Amazon.com's recommender algorithm
Knuth Optimal Binary Trees
AVL (Balanced) Trees
B trees and AVL Trees
IBM Clever Project
The following notice is required by the University:
Students seeking disability accomodations are directed to see: UNI Policy 13.15 Accommodations of Disabilities