| Thousands of servers ...billions of web | | | | subject is unfamiliar. Similarly, the concept |
| pages.... the possibility of individually | | | | based search of Excite (instead of individual |
| sifting through the WWW is null. The search | | | | words, the words that you enter into a search |
| engine gods cull the information you need | | | | are grouped and attempted to determine the |
| from the Internet...from tracking down an | | | | meaning) is a difficult task and yields |
| elusive expert for communication to | | | | inconsistent results. |
| presenting the most unconventional views on | | | | |
| the planet. Name it and click it. Beyond all | | | | |
| the hype created about the web heavens they | | | | |
| rule, let's attempt to keep the argument | | | | |
| balanced. From Google to Voice of the Shuttle | | | | |
| (for humanities research) these ubiquitous | | | | |
| gods that enrich the net, can be unfair | | | | |
| ...and do wear pitfalls. And considering the | | | | Besides who reviews or evaluates these sites |
| rate at which the Internet continues to grow, | | | | for quality or authority? They are simply |
| the problems of these gods are only | | | | compiled by a computer program. These active |
| exacerbated further. | | | | search engines rely on computerized retrieval |
| | | | mechanisms called "spiders", "crawlers", or |
| | | | "robots", to visit Web sites, on a regular |
| | | | basis and retrieve relevant keywords to index |
| | | | and store in a searchable database. And from |
| | | | this huge database yields often unmanageable |
| | | | and comprehensive results....results whose |
| | | | relevance is determined by their computers. |
| Primarily, what you need to digest is the | | | | The irrelevant sites (high percentage of |
| fact that search engines fall short of | | | | noise, as it's called), questionable ranking |
| Mandrake's magic mechanism! They simply don't | | | | mechanisms and poor quality control may be |
| create URLs out of thin air but instead send | | | | the result of less human involvement to weed |
| their spiders crawling across those sites | | | | out junk. Thought human intervention would |
| that have rendered prayers (and expensive | | | | solve all probes....read on. |
| offerings!) to them for consideration. Even | | | | |
| when sites like Google claim to have a | | | | |
| massive 3 billion web pages in its database, | | | | |
| a large portion of the web nation is | | | | |
| invisible to these spiders. To think they are | | | | |
| simply ignorant of the Invisible Web. This | | | | |
| invisible web holds that content, normal | | | | |
| search engines can't index because the | | | | From the very first search engine - Yahoo to |
| information on many web sites is in databases | | | | about.com, Snap.com, Magellan, NetGuide, Go |
| that are only searchable within that site. | | | | Network, LookSmart, NBCi and Starting Point, |
| Sites like - The Internet Movie Database , | | | | all subject directories index and review |
| - IncyWincy, the invisible web search engine | | | | documents under categories - making them more |
| and - The Complete Planet that cover this | | | | manageable. Unlike active search engines, |
| area are perhaps the only way you can access | | | | these passive or human-selected search |
| content from that portion of the Internet, | | | | engines like don't roam the web directly and |
| invisible to the search gods. Here, you don't | | | | are human controlled, relying on individual |
| perform a direct content search but search | | | | submissions. Perhaps the easiest to use in |
| for the resources that may access the | | | | town, but the indexing structure these search |
| content. (Meaning - be sure to set aside | | | | engines cover only a small portion of the |
| considerable time for digging.) | | | | actual number of WWW sites and thus is |
| | | | certainly not your bet if you intend |
| | | | specific, narrow or complex topics. Subject |
| | | | designations may be arbitrary, confusing or |
| | | | wrong. A search looks for matches only in |
| | | | the descriptions submitted. Never contains |
| | | | full text of the web they link to - you can |
| | | | only search what you see titles, |
| None of the search engines indexes everything | | | | descriptions, subject categories, etc. |
| on the Web (I mean none). Tried research | | | | Human-labor intensive process limits database |
| literature on popular search engines? | | | | currency, size, rate of growth and |
| AltaVista to Yahoo, will list thousands of | | | | timeliness. You may have to branch through |
| sources on education, human resource | | | | the categories repeatedly before arriving at |
| development, etc. etc. but mostly from | | | | the right page. They may be several months |
| magazines, newspapers, and various | | | | behind the times because of the need for |
| organizations' own Web pages, rather than | | | | human organization. Try looking for some |
| from research journals and dissertations- the | | | | obscure topic....chances for the people that |
| main sources of research literature. That's | | | | maintain the directory to have excluded those |
| because most of the journals and | | | | pages. Obviously, machines can blindly count |
| dissertations are not yet available publicly | | | | keywords but they can't make common-sense |
| on the Web. Thought they'll get you all | | | | judgement as humans can. But then why does |
| that's hosted on the web? Think again. | | | | human-edited directories respond with all |
| | | | this junk?! |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| The Web is huge and growing exponentially. | | | | |
| Simple searches, using a single word or | | | | And here's about those meta search engines. A |
| phrase, will often yield thousands of "hits", | | | | comprehensive search on the entire WWW using |
| most of which will be irrelevant. A layman | | | | The Big Hub, Dogpile, Highway61, Internet |
| going in for a piece of info to the internet | | | | Sleuth or Savvysearch , covering as many |
| has to deal with a more severe issue - too | | | | documents as possible may sound as good an |
| much information! And if you don't learn how | | | | idea as a one stop shopping.Meta search |
| to control the information overload from | | | | engines do not create their own databases. |
| these websites, returned by a search result, | | | | They rely on existing active and passive |
| roll out the red carpet for some frustration. | | | | search engine indexes to retrieve search |
| A very common problem results from sites that | | | | results. And the very fact that they access |
| have a lot of pages with similar content. For | | | | multiple keyword indexes reduces their |
| e.g., if a discussion thread (in a forum) | | | | response time. It sure does save your time by |
| goes on for a hundred posts there will be a | | | | searching several search engines at once but |
| hundred pages all with similar titles, each | | | | at the expense of redundant, unwanted and |
| containing a wee bit of information. Now | | | | overwhelming results....much more - important |
| instead of just one link, all hundred of | | | | misses. The default search mode differs from |
| those darn pages will crop up your search | | | | search site to search site, so the same |
| result, crowding out other relevant site. | | | | search is not always appropriate in different |
| Regardless of all the sophistication | | | | search engine software. The quality and size |
| technology has brought in, many well | | | | of the databases vary widely. |
| thought-out search phrases produce list after | | | | |
| list of irrelevant web pages. The typical | | | | |
| search still requires sifting through dirt to | | | | |
| find the gold. If you are not specific | | | | |
| enough, you may get too many irrelevant hits. | | | | |
| | | | |
| | | | |
| | | | Weighted Search Engines like Ask Jeeves and |
| | | | RagingSearch allows the user to type queries |
| | | | in plain English without advanced searching |
| | | | knowledge, again at the expense of inaccurate |
| | | | and undetailed searching. Review or Ranking |
| | | | Sources like Argus Clearinghouse ( |
| As said, these search engines do not actually | | | | (eblast.com) and Librarian's Index to the |
| search the web directly but their centralized | | | | Internet (lii.org). They evaluate website |
| server instead. And unless this database is | | | | quality from sources they find or accept |
| updated continually to index modified, moved, | | | | submissions from but cover a minimal number |
| deleted or renamed documents, you will land | | | | of sites. |
| yourself amidst broken links and stale copies | | | | |
| of web pages. So if they inadequately handle | | | | |
| dynamic web pages whose content changes | | | | |
| frequently, chances are for the information | | | | |
| they reference to quickly go out-of-date. | | | | |
| After they wage their never ending war with | | | | |
| over-zealous promoters (spamdexers rather), | | | | |
| where do they have time to keep their | | | | As a webmaster, your site registration with |
| databases current and their search algorithms | | | | the biggest billboards in Times Square can |
| tuned? No surprise if a perfectly worthwhile | | | | get you closer to bingo! for the searcher. |
| site may go unlisted! | | | | Those who didn't even know you existed before |
| | | | are in your living room in New York time! |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| Similarly, many of the Web search engines are | | | | |
| undergoing rapid development and are not well | | | | Your URL registration is a no-brainer, |
| documented. You will have only an | | | | considering the generation of flocking |
| approximate idea of how they are working, and | | | | traffic to your site. Certainly a quick and |
| unknown shortcomings may cause them to miss | | | | inexpensive method, yet is only a component |
| desired information. Not to mention, amongst | | | | of the overall marketing strategy that in |
| the first class information, the web also | | | | itself offers no guarantees, no instant |
| houses false, misleading, deceptive and | | | | results and demands continued effort for the |
| dressed up information actually produced by | | | | webmaster. Commerce rules the web. Like how a |
| charlatans. The Web itself is unstable and | | | | notable Internet caveman put it, "Web |
| tomorrow they may not find you the site they | | | | publishers also find dealing with search |
| found you today. Well if you could predict | | | | engines to be a frustrating pursuit. |
| them, they would not be god!...would they?! | | | | Everybody wants their pages to be easy for |
| The syntax (word order and punctuation) for | | | | the world to find, but getting your site |
| various types of complex searches varies some | | | | listed can be tough. Search sites may take a |
| from search engine to search engine, and | | | | long time to list your site, may never list |
| small errors in the syntax can seriously | | | | it at all, and may drop it after a few months |
| compromise the search. For instance, try the | | | | for no reason. If you resubmit often, as it |
| same phrase search on different search | | | | is very tempting to do, you may even be |
| engines and you'll know what I mean. | | | | branded a spamdexer and barred from a search |
| Novices... read this line - using search | | | | site. And as for trying to get a good |
| engines does involve a learning curve. Many | | | | ranking, forget it! You have to keep up with |
| beginning Internet users, because of these | | | | all the arcane and ever-changing rules of a |
| disadvantages, become discouraged and | | | | dozen different search engines, and adjust |
| frustrated. Like a journalist put it, "Not | | | | the keywords on your pages just so...all the |
| showing favoritism to its business clients is | | | | while fighting against the very plausible |
| certainly a rare virtue in these times." | | | | theory that in fact none of this stuff |
| Search engines have increasingly turned to | | | | matters, and the search sites assign rankings |
| two significant revenue streams. Paid | | | | at random or by whim. |
| placement: In addition to the main | | | | |
| editorial-driven search results, the search | | | | |
| engines display a second - and sometimes | | | | |
| third - listing that's usually commercial in | | | | |
| nature. The more you pay, the higher you'll | | | | |
| appear in the search results. Paid inclusion: | | | | |
| An advertiser or content partner pays the | | | | |
| search engine to crawl its site and include | | | | "To make the best use of Web search |
| the results in the main editorial listing. | | | | engines--to find what you need and avoid an |
| So?...more likely to be in the hit list but | | | | avalanche of irrelevant hits-- pick search |
| then again - no guarantees. Of course those | | | | engines that are well suited to your needs. |
| refusing to favor certain devotees are | | | | And lest you'd want to cry "Ye immortal gods! |
| industry leaders like Google that publishes | | | | where in the world are we?", spend a few |
| paid listings, but clearly marks them as | | | | hours becoming moderately proficient with |
| 'Sponsored Links.' | | | | each. Each works somewhat differently, most |
| | | | importantly in respect to how you broaden or |
| | | | narrow a search. |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| The possibility of these 'for-profit' search | | | | |
| gods (which haven't yet made much profit) for | | | | |
| taking fees to skew their searches, can't be | | | | Finding the appropriate search engine for |
| ruled out. But as a searcher, the hit list | | | | your particular information need, can be |
| you are provided with by the engine should | | | | frustrating. To effectively use these search |
| obviously rank in the order of relevancy and | | | | engines, it is important to understand what |
| interest. Search command languages can often | | | | they are, how they work, and how they differ. |
| be complex and confusing and the ranking | | | | For e.g. while using a meta search engine, |
| algorithm is unique to each god based on the | | | | remember that each engine has its own methods |
| number of occurrences of the search phrase in | | | | of displaying and ranking results. Remember, |
| a page, if it appears in the page title, or | | | | search strategies affect the results. If the |
| in a heading, or the URL itself, or the meta | | | | user is unaware of basic search strategies, |
| tag etc. or on a weighted average of a number | | | | results may be spotty. |
| of these relevance scores. E.g. Google ( uses | | | | |
| its patented PageRank TM and ranks the | | | | |
| importance of search results by examining the | | | | |
| links that lead to a specific site. The more | | | | |
| links that lead to a site, the higher the | | | | |
| site is ranked. Pop on popularity! | | | | |
| | | | |
| | | | Quoting Charlie Morris (the former editor of |
| | | | The Web developer's journal) - "Search |
| | | | engines and directories survive, and indeed |
| | | | flourish, because they're all we've got. If |
| | | | you want to use the wealth of information |
| | | | that is the Web, you've got to be able to |
| Alta Vista, HotBot, Lycos, Infoseek and MSN | | | | find what you want, and search engines and |
| Search use keyword indexes - fast access to | | | | directories are the only way to do that. |
| millions of documents. The lack of an index | | | | Getting good search results is a matter of |
| structure and poor accuracy of the size of | | | | chance. Depending on what you're searching |
| the WWW, will not make searching any easier. | | | | for, you may get a meaty list of good |
| Large number of sites indexed. Keyword | | | | resources, or you may get page after page of |
| searching can be difficult to get right.In | | | | irrelevant drivel. By laboriously refining |
| reality, however, the prevalence of a certain | | | | your search, and using several different |
| keyword is not always in proportion to the | | | | search engines and directories (and |
| relevance of a page. Take this example. A | | | | especially by using appropriate specialty |
| search on sari - the national costume of | | | | directories), you can usually find what you |
| India -in a popular search engine, returned | | | | need in the end." |
| among it's top sites, the following links: | | | | |
| | | | |
| | | | |
| | | | |
| ? of the Scottish Crop research Institute | | | | |
| | | | |
| | | | |
| | | | Search engines are very useful, no doubt. |
| ? -a health resort in Indonesia | | | | Right from getting a quick view of a topic to |
| | | | finding expert contact info...verily certain |
| | | | issues lie in their lap. Now the very reason |
| | | | we bother about these search engines so much |
| ? - The South Asia Regional Initiative for | | | | is because they're all we've got! Though |
| Energy Cooperation and Development | | | | there sure is a lot of room for improvement, |
| | | | the hour's need is to not get caught in the |
| | | | middle of the road. By simply understanding |
| | | | what, how and where to seek, you'd spare |
| | | | yourself the fate of chanting that old Jewish |
| | | | proverb "If God lived on earth, people would |
| | | | break his windows." |
| | | | |
| Pretty useful sites for someone very much | | | | |
| interested in knowing how to drape or the | | | | |
| tradition of the sari?! (Well, no prayer goes | | | | |
| unanswered...whether you like the answer or | | | | |
| not!) By using keywords to determine how each | | | | |
| page will be ranked in search results and not | | | | |
| simply counting the number of instances of a | | | | Happy searching!Liji is a PostGraduate in |
| word on a page, search engines are attempting | | | | Software Science, with a flair for writing on |
| to make the rankings better by assigning more | | | | anything under the sun. She puts her |
| weight to things like titles, subheadings, | | | | dexterity to work, writing technical articles |
| and so on.Now, unless you have a clear idea | | | | in her areas of interest which include |
| of what you're looking for, it may be | | | | Internet programming, web design and |
| difficult or impossible to use a keyword | | | | development, ecommerce and other related |
| search, especially if the vocabulary of the | | | | issues. |