Search services: Alta Vista
Header
Service name: Alta Vista by Digital
Last update of this description: 4.4.1997
Description written by: Kai Halttunen
General information
-
Type of service (according to TK's typology): Robot based index
-
Access:(free, commercial) free
-
Volume: 31 000 000 www pages from 476 000 servers and 4 000 000 Usenet
articles from 14 000 news groups
-
URLs known: 31 000 000
-
Number of documents indexed: 31 000 000
-
Publisher: Digital Equipment Corporation
-
URL for Top-level Page: http://altavista.digital.com/
-
Mirror sites: Yes (Northern Europe) http://altavista.telia.com/ and
(Australia) http://www.altavista.yellowpages.com.au/
-
URL for the organization: http://www.digital.com/
-
History: "ALTA VISTA is the result of a research project started in
the summer of 1995 at Digital's Research Laboratories in Palo Alto, California.
By combining a fast Web crawler with scalable indexing software, the team
was able to build a large index of the Web in the Fall of 1995. After two
months of internal testing, we produced an even larger index consisting
of the full text of over 16,000,000 pages. We made the site public on the
15th of December 1995. Within three weeks of launch, we were handling over
two million HTTP requests per day. Nowadays Alta Vista is accessed 29 millon
times per weekday."
-
Update frequency of the whole database:continuous 4 000 000 pages/day
-
Document rating, reviews, "added value" included: No
-
Registration needed: No
-
Costs: No
-
Performance: Fast, good
-
Response time: Fast
-
Time outs: No
-
Image download time: Fast
Harvesting
-
Harvesting software: Scooter by Digital
-
Robot (type; follows robot exclusion standard?): Yes
-
Method:
-
Human: No
-
Automatic: Yes
-
User registration: Yes
-
User deletable: No
-
Depth first:
-
Breadth first: Yes
-
Type coverage:
-
WWW: Yes
-
gopher:
-
WAIS:
-
ftp:
-
telnet (OPACs):
-
UseNet News: Yes
-
Listserv:
-
IRC:
-
Other databases (numeric, commercial):
-
Multimedia products (images, movie, sounds):
-
Other types:
-
Geographic coverage: World wide
-
Subject coverage (General or specialized content): General
-
Update frequency for visiting the same sites/documents again: 4-6 weeks
-
Number of dead links:
Indexing
-
Indexing software: Indexer by Digital
-
What is indexed:
-
Extracted information, fields indexed:
-
Titles: Yes
-
Headings: Yes
-
Header information (included metainformation): META-tag
-
File information (size, date): Yes, both
-
Links (URLs): Yes
-
The anchor text of links: Yes
-
Other HTML tags: Applet, Host, Image, (Possible to constrain searches
to the text, links and images excluded)
-
Summary/excerpts (how generated):
-
Full text: Yes
-
What is not indexed:
-
Separate metainformation provided by the search service: Possible to
use META-tag to control what Alta Vista indexes from page. AltaVista will
index the description and keywords up to a limit of 1,024 characters.
-
Human cataloguing and indexing: No
-
Human summary/abstract, excerpt, review: No
Retrieval system:
Search software: Indexer by Digital
Type of retrieval system:
-
Boolean (exact match): Yes (advanced query)
-
Best match: Yes (simple query)
-
Combination: Yes
-
Vector retrieval:
-
nonverbal (citation indexing):
-
Other:
Query structures and operations supported
-
Natural language:
-
Word list (no Boolean operators associated): Yes (simple query)
-
Boolean query: Yes (advanced query)
-
Boolean operators:
-
AND: Yes
-
OR: Yes
-
NOT: Yes
-
Nesting (parentheses supported): Yes (advanced query)
-
Restrictions:
-
mixing of operators: Possible
-
number of search keys: Not limited
-
distance in number of words: NEAR (within 10 words)
-
distance in text structure: possible to limit the search to certain
elements in the document (title, link, anchor, text, url, image, java applet)
-
bound phrases: Yes (by "word word")
-
Other:
-
Ranking algorithm: "A document has a higher score if the following hold:
The query words or phrases are found in the first few words of the document
(especially in the title of a Web page or in the headers of Usenet news
articles). The query words or phrases are found close to one another in
the document. The document contains more of the query words than some other
document."
-
ranking factors: User must specify search keys to be used when the relevance
is counted (advanced query). In simple query relevance ranking is default.
-
calculation of scores:
-
User weighted words: User must specify relavance ranking criteria. Search
keys and other keys can and should be used.(Advanced query)
Search terms:
-
Truncation:
-
Not supported:
-
Automatic: No
-
stemming algorithm (morfological):
-
add wildcard (mechanical):
-
left (mechanical):
-
right (mechanical):
-
Manual:
-
left:
-
right: Yes with asterisk (*) The *-notation will match from zero up
to five additional letters in lower-case only. Capital letters and digits
will not therefore be matched.
-
internal: Yes (see above)
-
What is the default and is it user changeable?: Truncation is not default,
it is user changeable.
-
String match features:
-
regular expressions:
-
internal masking: Yes (see above)
-
case sensitive specify: capital letters force to exact match
-
others:
-
Any limits for a search term (character sets supported): Searches and
displays Latin-1.
-
Any limits for the size of a result set: No
WHAT IS SEARCHABLE:
-
Possibility to specify source types: WWW or Usenet-news
-
System searches as default:
-
URL:
-
Title, headings:
-
Keywords:
-
Summary:
-
Fulltext: Yes
-
cited URL, anchor text:
-
others:
-
User selectable search fields:
-
URL: Yes
-
Title, headings: Yes, title
-
keywords:
-
Summary:
-
Fulltext: Default
-
cited URL, anchor text: Yes both
-
others: possible to search text-only (link-text excluded), image, host,
java applet. In Usenet news articles it is possible to search from following
fields: From, Subject, Newsgroups, Summary, Keywords
-
Other search options:
-
Stopword list:
-
Uses the system a stopword list?: No
-
How is the stopword list constructed?: (e.g. words exceeding a given
absolut frequency are automatically put into the stopword list)
-
Can the stopword list be sidestepped in a search?: (e.g. in a phrase
search)
SEARCH IMPROVEMENT:
-
Concept search: No
-
Query expansion: Yes (Live Topics)
-
Controlled Vocabulary, thesauri: No
-
Relevance feedback, find similar: No
-
Improve your search support or form: Previous search statemant is dispalayded
in the result page. Possibly to modify search.
-
Navigation and graphical features:
-
Other features: Query presented in result page. Possible\ to enter new
query or modify query in result page
RESULT DISPLAY:
-
Result set information:
-
total: Yes (about)
-
subsets: Yes (only when search keys are enterd in result ranking criteria)
-
Possible to choose number of displayed hits?: No
-
Is the number of hits displayed limited by the service?: Yes (If result
ranking is used AltaVista displays only first 200 hits.)
-
What can be displayed:
-
URL: Yes
-
Hotlink to original document: Yes
-
Title, headings: Title
-
Keywords:
-
Summary: Excerpt
-
Fulltext:
-
cited URL, anchor text:
-
Show hits in context:
-
Highlight hits:
-
document size: Yes
-
document last updated:
-
document last visited: Yes
-
Pre-defined display formats: standard, compact, detailed form or count
only. Standard and detailed forms are the same.
-
Other display options: No
-
Information about relevance scores: "Documents with a high score will
appear at the head of the list. High scores are assigned if the selected
ranking word appears in the first few words of the document (say, in the
title of a Web page or in a header), or if the document contains more than
one instance of the ranking word."
-
Score displayed?: No
-
Matching terms: No
-
Sorting:
-
URL-based:
-
others (size, number of links): Relevance based
-
Afterprocessing of the result by the service:
-
duplicate check: Yes
-
link check: No
-
Browsing structure (Subject catalogue), Organization of the result:
No
-
Browsing structure integrated with index?: No
User interface
-
General description of interface:
-
Clarity of interface:
-
Clarity of search page or index:
-
Text-Only support: Yes
-
HTML Forms support: Yes
-
URL for Forms Search Page: http://www.altavista.digital.com/cgi-bin/query?pg=q
(simple query) ; http://www.altavista.digital.com/cgi-bin/query?pg=aq (advanced
query)
-
Query input form: The homepage of AltaVista is the same page as simple
search interface.
-
Optional forms for input:
-
simple but limited: Yes (simple query)
-
structured:
-
free not limited: Yes (advanced query)
-
other supported:
-
Non-Forms support: No
-
URL for Non-Forms Search Page:
-
Adaptations to special browsers (Netscape, lynx):
-
Online Help?:
-
URL for FAQ Page: http://www.altavista.digital.com/cgi-bin/query?pg=h&what=web#FAQ
-
URL for Help Page: http://www.altavista.digital.com/cgi-bin/query?pg=h&what=web
-
Navigation Aids: Good
-
Search Tutorials: Good
-
Sample Searches: Yes
-
Server Load Indicators: No
-
What's New page: No
-
What's Popular page: No
Documentation
-
Manual:
-
Literature: The AltaVista search revolution / Richard Seitzer, Deborah
S. Ray, Eric J. Ray. - Berkeley : McGraw-Hill, cop. 1997. - ISBN: 0-07-882235-1
-
Reviews:
URL for Copyright/Legal Page: http://www.altavista.digital.com/cgi-bin/query?pg=legal
URL for Subscription Page:
URL for Creator's Page: http://www.digital.com/
Our evaluation of the service
(Summary. strong points, weaknesses, criticism, recommendations to users
etc.)
Alta Vista seems to be powerfull search engine. Possibility to limit
searches to certain fields is an advance same as boolean and proximity
operators. Online manual is extensive. Query structures are not limited,
but in certain situations I have doubted performance in complex search
statements. Harvesting program Scooter doesn't seem to index sites very
deeply. In sometimes AltaVista drops pages from it's index and after a
while they reappear. You can get different results from simultanious, exactly
same searches. Mirror sites databases are different from the main site.
Traugott Koch (Traugott.Koch@ub2.lu.se)
Anna Brümmer, anna@munin.ub2.lu.se
Lotta Åstrand, lotta@munin.ub2.lu.se
Kai Halttunen, likaha@uta.fi
Eero Sormunen, lieeso@uta.fi
Anne Suoniemi, tmansu@uta.fi
Last update: 97-04-04