WAIS Client for OS/2 Documentation

  An Online version of this help file is available from the help menu within
WAIS or by entering the 'view os2wais.inf' command from a command window.


Introduction

  The WAIS OS/2 Client is a Public Domain software product developed at the
Libarary of Congress.  The Client allows OS/2 users to connect to WAIS
Servers on the Internet and to search for and retrieve documents from those
Servers.  Documents returned can be text, pictures, or other types of data,
depending on the type of server being accessed.  The Client and WAIS
Servers communicate using the WAIS Protocol.  This allows a single user to
query many different data servers without having to learn a new query
language or interface.  The Client can also be used to access local WAIS
Servers across a local area network (LAN).

  This manual describes how to contact a WAIS Server, search its database,
and retrieve documents.  The Client user first selects a SOURCE (a pointer
to a WAIS Server) to query.  Then the user forms a QUERY (a string of
words) to search the database and receives document HEADLINES (document
titles or descriptions).  Using RELEVANCE FEEDBACK (defined below), the
user can refine her search to find a desired document.  Finally, the user
can retrieve the document, and display it or store it on her machine for
later reference.

Installation

  After unzipping the os2wais.zip file using unzip.exe (available from
ftp-os2.cdrom.com), start the os2wais program by typing 'os2wais'.  Some
sample source files are included, and some default viewers are selected.
To add an icon to a folder or to the desktop, right-click on another
application and choose 'create another' -> 'program'.  In the path name
field of the setup for the new icon, enter the path of the os2wais.exe
program and the program name.

Network Requirements

  The OS/2 Client runs on top of IBM's TCP/IP for OS/2 network software.
The user must be able to open a socket connection to remote WAIS Server
machine on the network.  The WAIS Client will work with either the 16-bit
or 32-bit flavor of IBM's TCP/IP for OS/2 product.  The Client will not
work with non-IBM TCP/IP products, but conversion should not be difficult.
Since the Client is Public Domain software, source code available for
porting, modification, or improvement.


Sources

  The first step in beginning a serach is to select a source to contact.
The user lists the currently known sources by clicking on the "Sources"
button.  A window listing current known sources will appear.  You can
select one or more of these sources with the mouse and then hit the "Use
Selected Sources" button or double click on a source.  The selected sources
will then appear in the "Look in these Sources:" window, ready to be
searched.  Most searches are a single-source, but there are times when it
is desirable to search mutiple sources simultaneously.

  If you want to stop searching a source, select the source in the "Look in
these Source:" window and execute the "Stop Using Source" command in the
"Sources" pull-down menu.  You can also double click on the source to make
it go away.

  Known sources are described in files with ".src" extensions.  The first
time the user lists sources, the Client loads in all the .src files in the
local directory.  To see what these files contain, select a source in the
"Known Sources" window (just one) and then click on the "Edit Source"
button.  A window will appear, showing all the information associated with
that source.  Typically, the source description provides information on how
to search that source, how to obtain more information on that source,
whether or not the server service costs money, and the email address of the
source administrator.  Be careful, you can click in any of these windows
and edit the contents, if you change the network information, you may not
be able to contact that source in the future.

  You can also select sources and hit the "Delete Selected Sources" button.
This erases all the information related to that source and erases the .src
file in the local directory.


Queries

  Once you have selected a source to use, the Client should put you back
into the Query window.  This is the window which is labeled, "Tell me
about:".  You can now enter a natural language question in this window, or
just type a set of words and phases t hat are relevant to the type of
information you are seeking from the selected source.

  The general algorithm for weighting words and phrases is as follows: if a
word is rarely used in the database, it get more weight; if a phase matches
exactly, it gets more weight; and if a word appears in the document title,
it get more weight.  Once you have entered your query, hit return or click
on the "Search" button to begin a search.

  You can also enter more complex queries, depending on the type of server
you are contacting.  For example, WAIS Inc. commercial servers allow you to
enter boolean queries by using logical words in capital letters, like AND,
OR, and NOT.  The source description should tell you what kind of server
it is and what kinds of queries it supports.  Also, the server description
often contains a method for getting a help document about that server.


Results

  Search results are displayed in the results window, the largest window in
the display with the column headings "Score Size HEADLINES".  The server
should return a number of document titles or headlines, along with their
score and size.  The score runs from 0 to 1000.  The highest scoring
documents are listed first at the top of the display.  The default file
size indicates the number of bytes or characters it contains.  If the file
is large, the size will be expressed in mutiples of 1024.  If the size is
followd by a "k", these are units of 1024.  "M" stands for megabytes, or
units of 1024 squared (slightly more than a million).  "G" stands for
gigabytes, or units of 1024 cubed (slightly more than a billion).


Retrieving Documents

  You can double click on any displayed headline in the results window to
retreive and display the document.  Before retrieving a document, it's a
good idea to look at how large it is to get an idea of how long it will
take to retrieve the document.  A 150k file will take anywhere from 10
seconds to a minute to download, depending on network traffic, network
bandwidth, and server workload.

  The Client retrieves the document and puts it into a file called
"new_doc.tmp" and launches a viewer to display the document.  The type of
viewer depends on the type of document retreived.

  The user can select which type of viewer to launch with each type of
document by selecting the "Document Viewers" menu item from the "Options"
menu list.  Typically, editors are used for text documents, while an image
viewer is used to display GIF, JPEG , or TIFF documents.
  
  The Client comes with default viewer settings.  The OS/2 epm editor is
called on text documents.  Also included on the Client distribution disk is
a Public Domain image viewer which is called by the Client for GIF and JPEG
images.  The user can substitute his preferred editors and viewers for
these default values.

  The document viewer runs as a separate program.  When you are done
viewing a document, simply quit or close out the editor or viewer.  The
WAIS Client will still be running.

Saving Documents

  Each document retrieval erases the previous contents of "new_doc.tmp".
If the user wishes to permanently store a document, she should copy the
file "new_doc.tmp" to another file before retrieving another document.  In
the case of text documents, simply use the "Save As" command in the editor
to save the file under another name.  With imgages, the user may have to go
to another OS/2 command window to copy the file, unless the viewer has a
"Save As" command.


Finding New Sources

  The Client disk comes with a few of .src files, but these are only for
demonstration purposes.  The one source which is essential to have is the
Directory of Servers.  This is a WAIS Server which is a database of
databases.  Begin your search with this source in order to locate sources
which are relevant to your query.

  The Directory of Servers functions like a normal WAIS Server, except that
the documents it returns are source descriptions, not documents.  To
examine a source description, simply double click on the headline in the
Results window.  The "document" will be retrieved and displayed.  At this
point you have the option to discard the source description "Cancel", or to
save it out for future use "Save".

  If you wish to save the source, be sure to edit the "Filename" field to
indicate the filename to use.  The default name is "new-src" which will be
overwitten the next time you save a source description wihtout changing the
file name.  The Client will append a ".src" extension to the source
filename.  The new source should now appear in the known sources window,
listed under the filename you chose, ready to be used.

  If you are running WAIS on a FAT formatted disk, you will get an error if
you specify a filename greater than eight characters.


Creating Source Pointers

  You can also create source descriptions if you know the database name,
the internet address, and the port number of the Server you are trying to
contact.  Call the "Create a New Source" command under the "Sources"
pull-down menu.  Then fill out the necessary information by clicking in
each field.  The IP Number is not required, but if you know it, put it in
as it will save lookup time.  The rest of the information is optional.

  You must enter the exact Database Name; the machine name and port number
are not sufficient.  Servers run under the UNIX Operating System. The
Database Name is actually a UNIX path name which the Server uses to access
the database.  UNIX is case sensitive.  This means that the database name
must have the correct capitalization.  Also note that UNIX pathnames use
"/" not "\" as in DOS, or OS/2.


Relevance Feedback

  One of the most powerful aspects of WAIS is the ability to say to a
server, find me more documents like this one.  This is called relevance
feedback.  This is a quick, intuitive way of searching large databases to
obtain the documents you are looking for.  If you find a document that you
want to use for relevance feedback, select the document headline and
execute the "Use Document for Relevance Feedback" command under the
"Documents" menu list.  The document headline, along with the source it
comes from , will appear in the relevance feedback window which is titled
"Similar to:".

  You can now run the search again (by hitting the "Search" button), but
this time, in addition to your query, the document pointers in the
relevance feedback window will be passed to the server to refine your
search.  Relevance feedback can be used iteratively, adding and deleting
documents until you find the what you are looking for.


Relevance Feedback and Multiple Source Searches

  Relevance feedback works best with single-source searches with documents
which come from that source.  If you are doing a multiple-source query,
relevance feedback becomes more complicated.  For those of you who want to
know how it really works, read on.

  Although all relevance feedback document ID's are send to all the servers
being searched, only those servers that can access relevance feedback
documents on their own file systems will use them, otherwise they will
ignore them.  That is, relevance feedback documents from Server X cannot be
used by Server Y, unless Server X and Y are on the same file system.

  Thus, if you are simultaneously searching on two servers (X and Y) with
relevance feedback documents from both servers, and if they are not on the
same file system, then each server will perfom its search only with the
relevance feedback documents from their respective databases.

  Also, when a user removes a source from the "Look in these Sources:"
window (via the "Stop Using Source" command), all the relevance feedback
documents from that source are placed at the bottom of the list, with the
label "These documents may be ignored:" to indicate that their source is no
longer being used.  If they exist on a file system that is still in use,
they may still be used, but otherwise they will be ignored.

