Internet Draft Wang Liang,et al
Document: draft-liang-irpdl-01.txt hust
Expires: April 2004 October 2003
Information Retrieval Protocol for Digital Library
draft-liang-irpdl-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of [RFC2026].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 10, 2004.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
This document specifies an information retrieval protocol for digital
library. This protocol has two parts: standard search Webservice,
which defines the format of query words and the search results; a
method to find and select such search Webservice. By using this
protocol, all the databases including web page database, digital
issue database, and video database, can release the uniform search
Webservice, though these databases may have different metadata
standards and architectures. And these Webservice can be easily found
and visited by search systems. This very protocol makes it possible
that users can obtain all kinds of information on the Internet in
WANG,et al Expires - April 2004 [Page 1]
Internet Draft Information RPDL October 2003
single search engine, but not visit lots of different search engines
one by one. Using this protocol is not limited in library.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC-2119].
Table of Contents
1.Introduction .................................................2
2.Standard Search Webservice....................................3
2.1 Data encoding.............................................3
2.2 The format of query words.................................3
2.3 class and function........................................5
2.3.1 Class Search........................................6
2.3.2 Class Search Response...............................7
2.3.3 Class Result Element................................7
2.4 WSDL of standard search Webservice........................7
3.The description of the search Webservice......................10
3.1 The XML schema of the SDDI...............................10
3.2 The API of SDDI..........................................12
3.2.1 Publish............................................12
3.2.2 Inquiry API........................................13
4.Security consideration........................................13
5.Reference.....................................................13
5.1 Normative references.....................................14
5.2 Informative references...................................14
6.Author's Address..............................................14
7. Copyright Statement..........................................15
1. Introduction
Hundreds of databases have been introduced in many libraries, and
there are many more free information resources on the Internet. It
has become a kind of acrobatics for us to find the complete and
precise results about our query in so many databases. Everyone hopes
to obtain all kinds of information in one search engine, such as web
pages, Videos, but does not care where the information lies in.
Webservice [4] give us a good method to realize this desire. As long
as these databases can provide Webservice, it will be an easy mission
to integrate all kinds of information resources in one search engine.
Now Google [5] and some other databases have provided search
Webservice. But standard protocol for these searches Webservice does
not exist. Even different web search engines?Webservice have
distinct formats of queries and search results, needless to mention
WANG,et al Expires - April 2004 [Page 2]
Internet Draft Information RPDL October 2003
the Webservice of many other kinds of databases. Thus, a uniform
Webservice applicable for all the information resources and an
efficient method to find such Webservice should be established. This
memo just achieves these two goals. The protocol comprises of two
interacting parts, Standard Search Webservice (SSW) which can be
applied to all databases and Search Webservice Description, Discovery
and Integration (SDDI) which provides an efficient way to find the
appropriate search Webservice.
2 Standard Searches Webservice
Standard Searches Webservice defines a standard search Webservice
with its classes and functions. Most of databases can distribute the
uniform search Webservice by using this definition.
2.1 data encoding
In order to support searching documents in multiple languages, all
requests and responses should be in accordance with the UTF-8
encoding.
2.2. The format of query words
The query words specify attribute based boolean queries. Different
communities will require their own sets of attributes, so these query
words is flexible enough to allow attributes from different
communities.
There are three kinds of query words : basicQuery, advancedQuery,
fullQuery. The XML Schema of query word is as follow.
The three logical operators are defined as follow.
The other operator for string operation is defined as follow.
Operator "equal" provides simple equality matching on property values.
"contain" means searching the documents that contain the special
words.
WANG,et al Expires - April 2004 [Page 3]
Internet Draft Information RPDL October 2003
All the query words will be defined as a series of combination of
logical operator, operator, and field. For example:
ANDcontainjava
It represents that the title of result must contain the "java".
The basicQuery is defined as follow. "title" represents the basic
description of a recorder. Field "title" is available in all the
metadata and databases. Book, video, webpage database all can provide
the search in "title".
The advancedQuery is defined as follow. Selecting the "keywords",
"author", "keywords" as the search fields is because these fields are
normally available in most databases and metadata.
WANG,et al Expires - April 2004 [Page 4]
Internet Draft Information RPDL October 2003
The fullQuery is defined as follow. FullQuery will provide all the
search field of one database. They will be decided by the database
owners.
2.3 class and function
The standard Webservice with its class structures and functions are
detailed here and also presented in the form of WSDL [1].
There are three function components in a search Webservice.1 receive
the query words and return results.2 analyze and explain the results.
3 depict every recorder of the results. All these functions are
implemented with three classes of Webservice.
2.3.1 Class Search
Main function of this class is to submit a query string and a set of
parameters to the search service and receive in return a set of
search results.
There are three levels of search function according to the three
kinds of query words in this class: basic search, advanced search,
full search.
a. Basic search
basicSearch(
WANG,et al Expires - April 2004 [Page 5]
Internet Draft Information RPDL October 2003
query as basicQuery,
start as integer,
maxResults as Integer)
query: XML format parameter, it accords with the different
definitions of query words.
start: Zero-based index of the first desired result.
maxResults: Number of results desired per query. The maximum value
per query set to 100, and the minimum is defined as 1. If you make a
query that doesn't have many matching items, the actual number of
results you get may be smaller than that of you request.
b. Advanced search
advancedSearch(
query as advancedQuery,
dateStart as date,
dateEnd as date,
start as Integer,
maxResults as Integer,
orderby as String,
order as string)
dateStart,dateEnd: present date range. If you want to limit your
results to document that are published within a specific date range,
you can use this query term to accomplish this.
orderby: the sort order of the results. It can be "date" which means
sorting by date or "Relevance" which means sorting by the relation
between results recorders and query, or "title", sorting by field
"title".
Order: can be "descending" or "ascending", recorders is sorted in
descending or ascending.
c. Full search
fullSearch(
query as fullQuery,
Start as Integer,
maxResults as Integer,
other parameters)
Full search will provides all the query formats of one database. They
will be guaranteed by the database owners.
WANG,et al Expires - April 2004 [Page 6]
Internet Draft Information RPDL October 2003
2.3.2 Class Search Response
Each time you issue a search request to the search service, a
response is returned back to you. This class describes the meanings
of the values returned to you. The characters of this class are
described as follows.
TotalResultsCount: The estimated total number of results that exist
for the query.
resultElements: An array of "resultElement" items. This corresponds
to the actual list of search results.
startIndex:Indicates the index (1-based) of the first search result
in "resultElements".
endIndex: Indicates the index (1-based) of the last search result in
"resultElements".
searchTime :Text, floating-point number indicating the total server
time to return the search results, which measured in seconds.
2.3.3 Class Result Element
This class describes every record in return results. This Class has
three characters as follows.
Sourcename: name of the information source.
Title: title of the search recorder.
URL: The URL of the recorder, returned as text, with an absolute URL
path.
Otherinformation: some information such as a snippet of a webpage,
author of the recorder. This character will be defined according to
different search Webservice.
2.4 WSDL of standard search Webservice
WANG,et al Expires - April 2004 [Page 7]
Internet Draft Information RPDL October 2003
WANG,et al Expires - April 2004 [Page 8]
Internet Draft Information RPDL October 2003
WANG,et al Expires - April 2004 [Page 9]
Internet Draft Information RPDL October 2003
3 The description of the search Webservice
To describe the search Webservice, we refer the UDDI[2]. Search
Webservice Description, Discovery and Integration (SDDI) is proposed
in this part. SDDI will help the search system find and select the
appropriate data sources.
3.1 The XML schema of the SDDI
We use the DC[3] standard to descript the character of the search
Webservice. 9 of 15 sub elements of DC are selected and divided into
three groups. The other basic information for a web service is also
added in the SDDI. Because all the search services use the uniform
standard Webservice, the business service, binding template and Model
in UDDI will be useless in SDDI. The information like BusinessEntity
in UDDI is enough to identify a search Webservice.
The elements and attributes to describe a search Webservice are
represented as follows.
1 content
WANG,et al Expires - April 2004 [Page 10]
Internet Draft Information RPDL October 2003
Title A name given to the resource.
Description An account of the content of the resource.
Language A language of the intellectual content of the resource.
2 copyright
Creator An entity primarily responsible for making the content of
the resource.
Publisher An entity responsible for making the resource available
Rights Information about rights held in and over the resource.
3 characters
Date A date of an event in the lifecycle of the resource
Format The physical or digital manifestation of the resource.
Identifier An unambiguous reference to the resource within a given
context.
4 UDDI Key the UDDI content of this Webservice if available.
5 categorybag: This is an optional list of name-value pairs that are
used to tag a search Webservice with specific taxonomy information.
Some classification methods according to subjects can be adopted,
such as (CLC) Chinese Library Classification (LCC) Library of
Congress Classification.
6 accesspoint: URL of this Webservice
7 queryschema: all the supporting search parameters.
The XML schema of SDDI is as follow.
WANG,et al Expires - April 2004 [Page 11]
Internet Draft Information RPDL October 2003
WANG,et al Expires - April 2004 [Page 12]
Internet Draft Information RPDL October 2003
3.2 The API of SDDI
3.2.1 Publish
When a library purchase a database, a SDDI of this database will be
authorized at the same time and saved at the local servers. The
library can revise the SDDI itself according to its own needs.
3.2.2 Inquiry API
Inquiry API will provide two simple functions that help the search
engine find the appreciate Webservice that matches the requirements
of users. The definition of element should refer the SDDI. Meanwhile,
the element can have complex structure.
1 find(element,value)
Element:the name of element.
Value: the value of element,which is xml format according to the
different element of SDDI.
This function returns the accesspoint according to the element and
its value. For example:
find(title, "ACM")
This function will return the "acesspoint" of SDDI whose element
"title" is "ACM".
2 get(element1,value,element2)
Return the value of element2 according to the value of element1.the
format of results is XML. For example:
Get((title, "ACM", character)
This function will return the element "character" of SDDI whose
element "title" is "ACM".
4. Security Considerations
Since the databases are always purchased by some organization, the IP
access control will be used to protect the copyright.
The organization will build their own search engine based on this
protocol, so the Webservice and SDDI will not open to end user. The
more additional security concerns should be decided by the
corresponding organization.
WANG,et al Expires - April 2004 [Page 13]
Internet Draft Information RPDL October 2003
5. Reference
5.1 Normative References
[1] WSDL, http://www.w3.org/TR/wsdl12
[2] UDDI, http://uddi.org/pubs/uddi_v3.htm
[3] S. Weibel, J. Kunze, "Dublin Core Metadata for Resource
Discovery", rfc2413, September 1998.
5.2 Informative References
[4] Webservice , http://www.w3.org/2002/ws
[2] The web service of Google, http://www.google.com/apis/
6. Author's Addresses
Wang liang
HUST
WUHAN 430074
China
Phone: 86-27-87553494
Email:wangliang_f@163.com
Guo YiPing
HUST
WUHAN 430074
China
Email:gyp@hust.edu.cn
Fang Ming
HUST
WUHAN 430074
China
Email:fangming_w@263.net
Xu Yuedong
HUST
WUHAN 430074
China
Email: xuyaodong2000@yahoo.com.cn
WANG,et al Expires - April 2004 [Page 14]
Internet Draft Information RPDL October 2003
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be followed,
or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY, THE INTERNET ENGINEERING
TASK FORCE, THE AUTHOR AND THE AUTHOR'S EMPLOYER DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
WANG,et al Expires - April 2004 [Page 15]