bitsifter
friday, april 11
[sift this] Most of the Bitsifter columns begin with a friend at the Company walking in the office saying, "Hey, have you seen INSERT TOPIC HERE?" Word of mouth, it seems, often travels faster than bits. This idea would make Einstein a liar. Groovy.
When it comes to Internet Search Engines, the spoken rule is: "Alta Vista is the fastest and largest index." Geeks think this probably because Digital is known for their monstrous hardware and when you throw that technical excellence at the problem of indexing the Internet, you must have the best solution.
Hence, the experiment. The following table represents the number of pages found by the respective search engines. When possible, we searched by phrase which means an exact match must be found on the page in order to qualify as a valid response.
|
AltaVista |
Exite |
HotBot |
Infoseek |
Lycos |
OpenText |
WebCrawler |
DVD Player |
600 |
970 |
954 |
975 |
24 |
272 |
50 |
Challenger Disaster |
600 |
735 |
1102 |
683 |
148 |
122 |
65 |
Ebola Virus |
2000 |
2850 |
3907 |
2247 |
310 |
323 |
167 |
Ronald Reagan |
10000 |
8050 |
24699 |
13305 |
994 |
1457 |
982 |
Cold Fusion |
6000 |
11450 |
19549 |
3888 |
607 |
1255 |
489 |
Andy Warhol |
10000 |
3960 |
17977 |
6657 |
432 |
1426 |
731 |
Hubble Telescope |
3000 |
2861 |
5123 |
2697 |
5179 |
2914 |
356 |
U.S.S. Enterprise |
1000 |
8460 |
506777 |
3336 |
0 |
9592 |
147 |
World Trade Center Bombing |
700 |
940 |
1312 |
835 |
3 |
187 |
45 |
My Fat Elephant |
0 |
3210 |
16 |
0 |
0 |
60 |
1 |
|
33900 |
43486 |
581416 |
34623 |
7697 |
17608 |
3033 |
With zero background in statistical analysis, the Bitsifter staff derived the following:
- In terms of this comparison, Alta Vista shoots themselves in the foot by rounding to the nearest thousand. A significantly geeky reason exists for this, but we've no idea what it is.
- Both Excite's and HotBot's results are padded. This is probably due to search algorithms that are doing their best to return the most results. Our placebo "My Fat Elephant" returned 3210 results even though we could not find a single page that contained the phrase. Our guess: Excite is looking for "elephant" and for "U.S.S. Enterprise", HotBot is simply looking for "Enterprise".
- Bringing HotBot's 506,777 result down to normalized 5,000 and Excite's 3,210 down to 50, HotBot still has a significant over all the other engines' by almost 40,000. Whether this is due to generous interpretations of our search phrases is unknown
If we're to take this table at face value, it seems that HotBot's claim that it has the biggest index of the web is true. HotBot consistently out performed the other indexes although a gray area exists in the fact whether the pages returned always match the exact phrase being searched for. In terms of speed, only Excite was consistently slow on queries of their index. All others returned results almost instantaneously.
The understatement of the 1990s will be "Gosh, the Web is big". Any of the indexes listed above are likely to direct the casual surfer to a page relevant to the topic because the Web is THAT big.