ABOUT SEARCH ENGINES
Moderators:Labib, bristy1588
-
- Posts:190
- Joined:Sat Apr 23, 2011 8:55 am
- Location:Khulna
HOW SEARCH ENGINES WORK IN INTERNET?
Re: ABOUT SEARCH ENGINES
Please read Forum Guide and Rules before you post.
Use $L^AT_EX$, It makes our work a lot easier!
Nur Muhammad Shafiullah | Mahi
Use $L^AT_EX$, It makes our work a lot easier!
Nur Muhammad Shafiullah | Mahi
Re: ABOUT SEARCH ENGINES
It has been estimated that the amount of textual information accessible via search engines is at least 40 times larger than the digitized content of all the books in the Library of Congress, the world's largest library. It is a challenge to provide access to such a large volume of information, yet current search engines do remarkably well in sifting through the content and identifying related links to queries.
There is a multitude of information providers on the web. These include the commonly known and publicly available sources such as Google, InfoSeek, NorthernLight and AltaVista, to name a few. A second group of sources--sometimes referred to as the "hidden web"--is much larger than the public web in terms of the amount of information they provide. This latter group includes sources such as Lexis-Nexis, Dialog, Ingenta and LoC. They remain hidden for various reasons: they may not allow other information providers access to their content; they may require subscription; or they may demand payment for access. This article is concerned with the former group, the publicly available web search services, collectively referred to here as search engines.
Search engines employ various techniques to speed up searches. Some of the common techniques are briefly described below.
Preprocessed Data
One way search engines save time is by preprocessing the content of the web. That is, when a user issues a query, it is not sent to millions of web sites. Instead, the matching takes place against preprocessed data stored in one site. The preprocessing is carried out with the aid of a software program called a crawler. The crawler is sent out periodically by the database maintainers to collect web pages. A specialized computer program parses the retrieved pages to extract words. These words are then stored along with the links to the corresponding pages in an index file. Users' queries are matched against this index file, not against other web sites.
Smart Representation
In this technique, the representation for the index is carefully selected with an eye toward minimizing search time. Information scientists have produced an efficient data structure called a tree that can guarantee significantly shorter overall search time compared with searches conducted against a sequential list (see sidebar). To accommodate searches conducted by many users simultaneously and eliminate "wait queues," the index is usually duplicated on multiple computers in the search site.
Prioritizing Results
The URLs or links produced as a result of searches are usually numerous. But due to ambiguities of language (for instance, "window blind" versus "blind ambition"), the resulting links would generally not be equally relevant to a user�s query. To provide quicker access to the most relevant records (and to place them at or near the top), the search algorithm applies various ranking strategies. A common ranking method known as term-frequency-inverse document-frequency (TFIDF) considers the distribution of words and their frequencies and generates numerical weights for words signifying their importance in individual documents. It produces word weights whereby words that are highly frequent (such as 'or,' 'to' or 'with') and that appear in many documents have substantially less weight than words that are semantically more relevant and appear in relatively few documents.
In addition to term weighting, web pages can be weighted using other strategies. For example, link analysis considers the nature of each page in terms of its association with other pages�namely if it is an authority (number of other pages that point to it) or a hub (number of pages it points to). The highly successful Google search engine uses link-analysis to improve the ranking of its search results.
There is a multitude of information providers on the web. These include the commonly known and publicly available sources such as Google, InfoSeek, NorthernLight and AltaVista, to name a few. A second group of sources--sometimes referred to as the "hidden web"--is much larger than the public web in terms of the amount of information they provide. This latter group includes sources such as Lexis-Nexis, Dialog, Ingenta and LoC. They remain hidden for various reasons: they may not allow other information providers access to their content; they may require subscription; or they may demand payment for access. This article is concerned with the former group, the publicly available web search services, collectively referred to here as search engines.
Search engines employ various techniques to speed up searches. Some of the common techniques are briefly described below.
Preprocessed Data
One way search engines save time is by preprocessing the content of the web. That is, when a user issues a query, it is not sent to millions of web sites. Instead, the matching takes place against preprocessed data stored in one site. The preprocessing is carried out with the aid of a software program called a crawler. The crawler is sent out periodically by the database maintainers to collect web pages. A specialized computer program parses the retrieved pages to extract words. These words are then stored along with the links to the corresponding pages in an index file. Users' queries are matched against this index file, not against other web sites.
Smart Representation
In this technique, the representation for the index is carefully selected with an eye toward minimizing search time. Information scientists have produced an efficient data structure called a tree that can guarantee significantly shorter overall search time compared with searches conducted against a sequential list (see sidebar). To accommodate searches conducted by many users simultaneously and eliminate "wait queues," the index is usually duplicated on multiple computers in the search site.
Prioritizing Results
The URLs or links produced as a result of searches are usually numerous. But due to ambiguities of language (for instance, "window blind" versus "blind ambition"), the resulting links would generally not be equally relevant to a user�s query. To provide quicker access to the most relevant records (and to place them at or near the top), the search algorithm applies various ranking strategies. A common ranking method known as term-frequency-inverse document-frequency (TFIDF) considers the distribution of words and their frequencies and generates numerical weights for words signifying their importance in individual documents. It produces word weights whereby words that are highly frequent (such as 'or,' 'to' or 'with') and that appear in many documents have substantially less weight than words that are semantically more relevant and appear in relatively few documents.
In addition to term weighting, web pages can be weighted using other strategies. For example, link analysis considers the nature of each page in terms of its association with other pages�namely if it is an authority (number of other pages that point to it) or a hub (number of pages it points to). The highly successful Google search engine uses link-analysis to improve the ranking of its search results.
Re: About Search Engines
For learning this, you have to join GOOGLE! lol
A man is not finished when he's defeated, he's finished when he quits.
-
- Posts:10
- Joined:Mon Nov 04, 2013 6:17 pm
Re: ABOUT SEARCH ENGINES
The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics.
- emeryhen121
- Posts:21
- Joined:Fri Jul 16, 2021 6:04 pm
Re: ABOUT SEARCH ENGINES
Search engines are programs that search for the identified items from the database. The information is selected in corresponds to the keywords or the values given by the user. It is used for finding information from the database of the sites uploaded on the world wide web.
Search engines work by crawling hundreds of billions of pages using their web crawlers. These web crawlers are automated programs. The web stores the page e URLs in an extensive list which the crawlers look from. A search engine explores the web by downloading web pages and following on these pages to discover new pages made available.
Search engines work by crawling hundreds of billions of pages using their web crawlers. These web crawlers are automated programs. The web stores the page e URLs in an extensive list which the crawlers look from. A search engine explores the web by downloading web pages and following on these pages to discover new pages made available.