!DOCTYPE html> insert_pixel_code_here

Search Engine

Search engines are information retrieval systems that are designed to help locate information stored in computer systems. Search results are usually displayed in list form, commonly called hits. Search engines help you minimize the amount of information you need to consult with the time required to search for information, as well as other techniques for managing information overload. The most common and visible form of the search engine is a web search engine that searches information on the World Wide Web.

How search engines work

The search engine provides an interface to a group of items, users can specify criteria for items of interest and find items matching the engines. Criteria are called search queries. In the case of a text search engine, search queries are typically represented as a set of words that identify the desired concepts that one or more documents can contain. There are several styles of strictly changing search query syntax. You can also change the name in the search engine from the previous site. Some text search engines need to enter two or three words separated by whitespace to the user, but in other search engines, documents, images, sounds, and various forms of natural language can be sent to users can be specified. Several search engines increase the likelihood of applying search query improvements and providing quality item sets through a process called query extension. How to understand queries can be used to standardize the query language.

The list of items that meet the criteria specified in the query is usually sorted or ranked. The ranking of items from relevant items to the lowest items shortens the time required to find the information you need. In order to provide a set of matching items to be quickly sorted according to certain criteria, search engines pre-collect metadata about groups of items under consideration, usually by a process called indexing. Since indexes usually require less computer storage, only some indexed information rather than the complete content of each item is stored in the search engine and navigate to items in the search engine results page There is a reason to provide a method. Alternatively, the search engine can store copies of each item in the cache to see the state of the item at the time the user is indexed, for archival purposes, or for more efficient and quicker iterative processes You can make it run.

Other types of search engines do not save indexes. Crawlers or spider-type search engines (also known as real-time search engines) can collect and evaluate items at the time of a search query and dynamically review additional items based on the contents of the start item (seeds or Seed URL Internet crawler case). The meta search engine does not store indexes or caches, but simply reuses the index or result of one or more other search engines to provide an aggregated final set of results.

Types of search engines

By source

  • Desktop search
  • Federated search
  • Human search engine
  • Metasearch engine
  • Multisearch
  • Search aggregator
  • Web search engine

By content type

  • Full-text search
  • Image search
  • Video search engine

By interface

  • Incremental search
  • Instant answer
  • Semantic search
  • Selection-based search

By topic

  • Bibliographic database
  • Enterprise search
  • Medical literature retrieval
  • Vertical search

Approach

A search engine maintains the following processes in near real-time:

  1. Web crawling
  2. Indexing
  3. Searching

The Web search engine acquires information by crawling the Web between sites. In accordance with many factors such as title, page content, JavaScript, cascading style sheet (CSS), heading, etc., “spider” must have corresponding standard file name robots.txt before returning specific information to the index I’ll check it. It is indicated by metadata of standard HTML markup or HTML meta tag of information content. “Because of the actual Web endless website, spider traps, spam, and other emergencies, crawlers apply crawl policies to determine when site crawls should be viewed Some sites It is fully crawled and some sites are partially crawled.

Indexing involves associating words and other definable tokens found on web pages with their domain names and HTML fields. Associations are made in a public database, made available for web search queries. A request from a user can be a single word. The index helps find the query information as quickly as possible. Some of the indexing and caching techniques are trade secrets, while Web crawling is a simple process of the systematic visit of all sites.

Between visits by the spider, the cached version of the page (all or part of the content needed to render it) stored in the search engine’s working memory is quickly sent to a requester. If a visit is late, the search engine can simply act as a web proxy instead. In this case, the page may differ from indexed search terms. The cached page contains the appearance of the version whose words have been indexed, so a cached version of a page may be useful to the website when the actual page has been lost, but this problem is also considered a slight form of link rot.

Normally, when a user enters a query into a search engine, it is several keywords. The index already has the name of the site that contains the keyword and is retrieved immediately from the index. The actual processing load is to generate a web page which is a search result list. All pages of the entire list must be weighted according to the information in the index. Next, in the top item of the search results, it is necessary to lookup, reconstruct, and mark up the snippet showing the context of the matched keyword. These are part of the processing required for each search result web page, and at the top of the page, much of this post-processing is necessary.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Translate »