Search Engines Concepts:

It’s the only place, everyone tells you when you hunt for information on any common or intangible issue that might come up in your daily doings, or even to settle an squabble.

The trouble is just where is that information in this blundering massive database of www. It is supposed to have anywhere in a billion pages of data available to the general public, and more is being added to it each day, what with all universities and dot .com's (even if most of them have gone bust) working overtime. If we did not have the right tool to dig through this pile of information, it would be of no use. Fortunately, there are tools for that any many of us use them. They are called search engines and web directories. They essentially help you find the web site on which the information you want may most likely be.

To make the best of the money you have invested on the PC and that you spend on the Internet connection, make an effort to learn using search engines efficiently and save time. As search engines are nothing but some software working on machines placed “out there,” you have to realize that they do not have brains to understand what you are looking for. You have to understand how search engines “think” and then present your requirement according to there limited, through magnificent, capabilities. You have to explore smartly.

HOW DO Search Engines WORK?

A search engine in real meaning is a database of all the information that is there on the web and kept it in a certain order so that it can readily be located and accessed when demanded by a common user through a simple interface available on web sites.

DATABASE

All the data of the Internet in one place. Yes! That’s almost what it is, or at least that’s what every search engine attempts to have. Google, one of the most popular search engines, is said to have 8,058,044,651 web pages in its database. Software robots called spiders that are sent out to crawl the web perform this Herculean task.

These spiders start visiting sites by getting links from server lists (DNS entries or websites submitted to be indexed), and lists of the most popular or best sites. They then follow the links on these pages to find more links to add the database. While some databases want the spider to send back only the title and URL (address) of each pages visits, or just some HTML tags, nowadays most want them to send back the entries text of each page along with the information on where it was found one more source of getting data is the submission to the search engines by authors of new websites.

The information once acquired has to be them stored on the server of the company providing search engines in such ordered way that it is useful. This data is indexed in such a way that the user may be able to know what bit of data was found where. All words found on various pages are given a weight according to where on the page was the word found, i.e, page title, heading, sub-head, etc, and also how many times. Using this statistics and other algorithms, search engines try and establish the context and relevance of each word in the database.

Meta-tags: These are unseen words put in the code of a web page by it author to specify to the search engines which concepts page should be indexed under. This can be helpful particularly when certain words are likely to have more than one meaning. However, certain web authors, in order to get more hits, put too many irrelevant words also in the meta-tags. This undermines the system. As a result, the latest trend is for the web-crawling spiders to ignore most meta-tags, except description and title, especially if there are too many of them.

To search the database, engines have basically two ways, keyword search and concept search

Keyword search is the most commonly employed method. Search engines spot and index words that they consider significant. Words mentioned towards the top of a document and those that are repeated in particular document taken is more important. While most search engines index each word, others index only part of the document, such as the title, heading, sub-heading, hyperlinks to other sites, or the first few lines of text. Some of the search engines may discriminate between upper and lower case while other don’t. One problem with keyword searching is that distinguishing between two different meaning words spelled the same way may not be possible. A search on “tree” may also lead to ‘family trees’ instead of just the one horticultural variety.

Concept-based search systems try to understand what you mean. A concept-based looks for the subject you are exploring, even if the keyword you give does not match the word on the document precisely. This system usually examines words in relation on the other words found nearby. It calculates the frequency with which certain words appear. When several words or phrases that are tagged to signal a particular concept appears close to each other, the search engine concludes that the document is “about” a certain subject.

For example, the word joint, when used in medical context, would more likely be accompanied by such words as bone, fracture, or arthritis. If it appears in a document with other words like pipes, or houses, the search engines gives the results on the subject of plumbing. This system is, however, far from perfect. These results are good only when you enter a lot of relevant words. Thus, most search engines use the keywords method.

Ranking: Notwithstanding the narrowing down of relevant results by the logic search engines use, the results that they deliver for a particular query may still be so large in number that you would not be much better off than before search. To fix this problem, search engines use some more logic to present the results in a sequence such that, according to them, the most relevant result is at the top followed by other results with diminishing relevance.

Note the expression “according to them.” The results are what the engine believes you wanted. It may not be correct. While you may be talking of heart as in romance, the results may be on heart as in heart attacks! This is where you smartness come in. you have to understand the way search engines’ logic work. Various parameters are used to guess the relevance. Most search engines believe that if the term you searched for appears more frequently on a particular document, then that document is more relevant and will show it near the top of the search results list.

The position of the keyword in a particular document is also used to determine relevance. If the keywords appear early in the document, or in the headers, the relevance is considered more. So hits may be ranked according to how many times the keyword appears in the indices of the document and in which fields they appears (i.e., in headers titles or text). Search engines may use some or all of these way to access the relevance of a particular page of your query. There are certain search engines that allow you to even assign relevance weight of your query terms before conducting a search. Although this requires practice, it allows you to have a say in what results you get.