Workshops on Neural Networks and Information Retrieval in Amsterdam

Sun Apr 3 15:21:09 EDT 1994

Preliminary Program

Neural Networks and Information Retrieval in a Libraries Context

Amsterdam , The Netherlands
Friday June 24, 1994 and Friday September 16, 1994

M.S.C. Information Retrieval Technologies BV, based in Amsterdam, the 
Netherlands, is currently undertaking a study on Neural Networks and 
Information Retrieval in a Libraries Context, in collaboration with the 
Department of Computational Linguistics of the University of Amsterdam and 
the Department of Information Technology and Information Science at 
Amsterdam Polytechnic. This study is funded by the European Commission as a 
complementary measure under the Libraries Programme

In this study the general application of artificial neural net (ANN) 
technology to information retrieval (IR) problems is investigated in a 
libraries context. Typical applications of this technology are advanced 
interface design, current awareness, SDI, fuzzy search and concept 
formation.

In order to discuss and disseminate the results obtained through this 
study, two one-day workshops will be organized by M.S.C. Information 
Retrieval Technologies BV, the first one after compilation of the State of 
the Art Report and the second one after completion of the prototyping and 
experimentation phase. 

During both workshops, there will be much room for discussions on how to 
commercialise such applications of ANN in a libraries context.

Both workshops are open to participants from other organizations, 
commercial and academic, that are interested in various applications of 
ANNs in existing libraries systems.

For who:

Interesting for all:

	- Computer Companies
	- Information Management and Supply Companies
	- Government Agencies
	- Libraries
	- Universities and Polytechniques

That are Interested in:

	- Neural Networks
	- Information Retrieval
	- Libraries Sciences
	- Natural Language Processing
	- Advanced Computer Science
	- Data compression

For applications such as:

	- Current Awareness
	- Selective Dissemination of Information (SDI)
	- Information Filtering
	- Automatic Contents Based Information Distribution
	- Categorization
	- Advanced Interface Design
	- Fuzzy Retrieval (Information recognized by Optical Character 
Recognition 
		and Speech Recognition).
	- Retrieval Generalization
	- Thesaurus Generation
	- Information Compression
	- Juke box staging

General Information

Costs per participant for both days:

Commercial companies					Dfl. 950,- 
Universities and non-profit institutions (*)			Dfl. 
500,- 
Students (*)						Dfl. 150,-

(*) Letter of university or non-profit institution must be shown at 
registration

These costs include:	
			Workshop Proceedings
			State of the Art report on Neural Networks 
			in Information Retrieval as composed by 
			MSC
			Achievements report on Neural Networks in 
			Information Retrieval as composed by MSC
			Ongoing coffee & tea
			Lunch
			Diner
			Future mailings on progress
			Limited availability of travel grants for 	

			students (please apply)
			All other expenses such as traveling, 	
						hotels, short stays, etc. 
are 
			not included in the fee.

Payment

The following payment methods are accepted:

1.	Credit Cards
2.	Prepayment by bank
3.	Personal cheques

More information:

	M.S.C. Information Retrieval Technologies BV
	Dr Johannes C. Scholtes
	Dufaystraat 1
	1075 GR AMSTERDAM
	the Netherlands

	Telephone:	+31 20 679 4273
	Fax:			+31 20 6710 793
	Internet:		100322.250 at compuserve.com or
				scholtes at msc.mhs.compuserve.com
	Compuserve:	MHS: SCHOLTES at MSC or
				100322,250

Background & Introduction

Recent research of artificial neural networks  (ANN) in the field of 
pattern recognition and pattern classification applications has provided 
successful alternatives of traditional techniques. Products applied for  
optical character recognition (OCR), speech recognition, hand-written 
character recognition and prediction of non-linear time series are good 
examples of commercialization of these ANN techniques. 

So far, the European Commission has funded more than 40 projects of 
different sizes under the ESPRIT and other programmes which involve 
research on or the application of ANN technology.

The task of Information Retrieval (IR), that is the matching of a large 
number of documents against a query, can also been seen as a pattern 
recognition or pattern classification task. Therefore, there have been 
several approaches to the application of ANN in IR in order to increase the 
quality of the retrieval process.

Despite the theoretical and practical evidence that ANN are good tools for 
pattern recognition tasks, it is still an open question whether  they are 
appropriate tools within the specific domain of Bibliographic Information 
Retrieval. Apart from some minor studies it seems no real attempt has been 
made up until now to integrate an ANN as a main component of a 
bibliographical information retrieval system or an on-line library 
catalogue (OPAC). It is therefore not clear whether and how ANN techniques 
can be combined with more "classical" methods, for instance rule-based or 
statistical approaches. By the same token it is not clear either to what 
extent existing OPACs could benefit from ANN technology.

Objectives

The objectives of this study are:
to ascertain the State-of-the-Art of the application of Artificial Neural 
Net (ANN) technology to Information Retrieval (IR), with particular 
emphasis on bibliographic information in a libraries context;
to assess the (potential) quality of ANN-based approaches to IR in this 
particular domain of interest, in comparison with traditional practices. 
Here "quality must be understood in terms of both (measurable) efficiency 
and practical benefits;
to stimulate interest in the practical application of ANN technology to 
bibliographic information retrieval in a libraries context.

Information Retrieval

It can be stated that Information Retrieval (IR) is the ultimate 
combination between Natural Language Processing (NLP) and Artificial 
Intelligence (AI). On the one hand there is an enormous amount of NLP data 
that needs to be processed and understood to return the proper information 
to the user. On the other hand, one needs to understand what the user 
intends with his or her query given the context of the other queries and 
some kind of user model. 

Most of these systems still use techniques that were developed over thirty 
years ago and that implement nothing more than a global surface analysis of 
the textual (layout) properties. No deep structure whatsoever is 
incorporated in the decision to whether or not retrieve a text. 

There is one large dilemma in IR research. The data collections are so 
incredibly large, that any method other than a global surface analysis 
would fail. However, such a global analysis could never implement a 
contextually sensitive method to restrict the number of possible candidates 
returned by the retrieval system. 

Information retrieval can also be a very frustrating area of research. 
Whenever one invents a new model, it is difficult to show that it works 
better (qualitatively and quantitatively) than any previous model. The 
addition of new dependencies often results in much too slow a system. 
Systems such as Salton's SMART exist for over 30 years without having any 
serious competition.

The field of information retrieval would be greatly indebted to a method 
that could incorporate more context without slowing down. Since computers 
are only capable of processing numbers within reasonable time limits, such 
a method should be based on vectors of numbers rather than on symbol 
manipulations. This is exactly where the challenge lies: on the one hand 
keep up the speed, and on the other incorporate more context. 

Artificial Neural Networks

The connectionist approach offers a massively parallel, highly distributed 
and highly interconnected solution for the integration of various kinds of 
knowledge, with preservation of generality. It might be that connectionism 
or neural networks (despite all currently unsolved questions concerning 
learning, stability, recursion, firing rules, network architecture, etc.), 
will contribute to the research in natural-language processing and 
information retrieval.

Distributed data representation may solve many of the unsolved problems in 
IR by introducing a powerful and efficient knowledge integration and 
generalization tool. However, distributed data representation and 
self-organization trigger new problems that should be solved in an elegant 
manner.

Current Problems in Information Retrieval

The main objectives of current IR research can be characterised as the 
search for systems that exhibit adaptive behaviour, interactive behaviour 
and transparency. More specifically, these models should implement 
properties for:
Understanding incomplete queries or making incomplete matches, 
Understanding vague user intentions,
Ability to generalise over queries as well as over query results, 
Adapting to the needs of an evolving user (model), 
Allowing dynamic relevance feed-back, 
Aid for the user to browse intelligently through the data, and	 
Addition of (language) context sensitivity. 

Different Approaches in Information Retrieval and Neural NetworksTwo main 
directions of neural network related research information retrieval can be 
observed. 

First, there are relatively static databases that are investigated with a 
dynamic query (free text search, also known as document retrieval systems). 

Next, there are the more dynamic databases that need to be filtered with 
respect to a relatively static query (the filtering problem also known as 
current awareness systems and Selective Dissemination of Information, SDI). 
In the first case the data can be preprocessed due to their static 
character. In the second case, the amounts of data are so large that there 
is no time whatsoever for a preprocessing phase. A direct context-sensitive 
hit-and-go must be made.

Early neural models adapt well to the paradigms currently used in 
information retrieval. Index terms can be replaced by processing units, 
hyperlinks by connections between units, and network training resembles the 
index normalisation process. However, these models do not adapt well to the 
general notion of neural networks.

In addition, it is difficult to imagine what to teach a neural information 
retrieval system if it is used as a supervised training algorithm. The 
address space will almost always be too limited due to the large amounts of 
data to be processed. A combination of structured (query, retrieved 
document numbers) pairs does not seem plausible either, considering the 
restricted amount of memory of (current) neural network technology. 
Nevertheless, most of the neural IR models found in literature are based on 
these principles.

Also problematic are the so-called clustering networks. Due to the large 
amounts of data in free text databases, clustering is very expensive and is 
therefore considered irrelevant in changing information retrieval 
environments.

More interesting are the unsupervised, associative memory type of models, 
that can be used to implement a specific pattern matching task. This type 
of neural networks can be particularly useful in a filtering application. 
Here, the memory demands of the neural network only need to fulfil the 
query (or interest) size, and not the size of the entire data base. It is 
in this area where neural networks are expected to be most useful and 
relevant for information retrieval.

Especially topics such as fuzzy retrieval, current awareness, SDI, concept 
formation and advanced interface design are in the scope of the project. 
However, input from the workshops is very important for the final 
determination of the direction of the research.

Program

Day 1:	June 24, 1994

	9.15-9.30		Welcome and Introduction
			Dr Ir Johannes C. Scholtes, President of 
			MSC Information Retrieval Technologies 	
					B.V.

	9.30-11.00 	Tutorial Neural Networks (Back Propagation 
			 			Kohonen Feature Maps) Dr 
Ir Johan 							Henseler, 
Forensic Laboratories, Head of 					
	Section Computer Criminality

11.00-11.15		Break

11.15-12.30		Information Retrieval Application in Libraries 
Dr E. Sieverts, Professor at 			Amsterdam 
Polytechnique. Library Program

12.30-13.30	Lunch

13.30-15.00	Presentation Findings & State of the Art 	
		Report

15.00-15.15	Break

15.15-16.00	Directions for (Commercial) Applications
			Dr ir Johannes C. Scholtes

16.00-17.00	Panel Discussion

17.00-18.00	Reception

19.00-...		Diner and evening program

Day 2: September 16, 1994

9.15-9.30		Welcome and Introduction
		Dr Ir Johannes C. Scholtes. President of 		

		MSC Information Retrieval Technologies B.V.

9.30-11.00 	Achievements
		Dr Ir Johannes C. Scholtes. President of 		

		MSC Information Retrieval Technologies 		

		B.V. & Dr E. Sieverts. Professor at Amsterdam 	

		Polytechnique Library Program

11.00 - 12.30 	Hands on demonstrations

12.30-13.30 	Lunch

13.30-15.00 	Problem Issues by Dr E. Sieverts. 		
					Professor at Amsterdam 
Polytechnique. 						Library 
Program

15.00-15.15	Break

15.15-16.00	Commercial Implications by Dr Ir Johannes C. Scholtes. 

		President of MSC Information 			
		Retrieval Technologies B.V.

16.00-17.00	Panel Discussion

17.00-18.00	Reception

19.00-...		Diner and evening program

During the day, demo's of the prototypes will be available to the 
participants of the workshop. Each demo will be guided by a specialist who 
demonstrates the software