{"id":563,"date":"2019-10-14T08:54:01","date_gmt":"2019-10-14T08:54:01","guid":{"rendered":"http:\/\/nitk.acm.org\/blog\/?p=563"},"modified":"2019-10-14T08:54:01","modified_gmt":"2019-10-14T08:54:01","slug":"everything-behind-a-google-search","status":"publish","type":"post","link":"https:\/\/nitk.acm.org\/blog\/2019\/10\/14\/everything-behind-a-google-search\/","title":{"rendered":"Everything behind a GOOGLE Search"},"content":{"rendered":"<p>Have you ever wondered what happens when you search for something on google? How does google manage to find the most useful data from such a huge collection of webpages<\/p>\n<p><strong><b>Organizing Data<\/b><\/strong><\/p>\n<p>This is basically done in two steps : Crawling and Indexing<\/p>\n<ol>\n<li><b><\/b><strong><b>Crawling<\/b><\/strong><\/li>\n<\/ol>\n<p>This happens before the search. The web crawlers gather information from billions of web pages. The way they work is interesting. They visit webpages obtained from previous crawls and other data provided by website owners. Then they find links on those pages and travel through to discover new pages. They provide data from these pages to the Google servers.<\/p>\n<ol start=\"2\">\n<li><b><\/b><strong><b>Indexing<\/b><\/strong><\/li>\n<\/ol>\n<p>As crawling happens, each webpage is rendered and the systems collect all the key data from the page. All this data is kept track of in the search index. The Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size. It\u2019s like the index in the back of a book \u2014 with an entry for every word seen on every webpage we index.<\/p>\n<p>&nbsp;<\/p>\n<p><strong><b>Searching<\/b><\/strong><\/p>\n<p>With so much data on the web, finding necessary information would be impossible without filtering and sorting it. So, google uses a set of algorithms.<\/p>\n<ul>\n<li><b><\/b><strong><b>Understanding the query<\/b><\/strong><\/li>\n<\/ul>\n<p>The intent behind the query has to be first understood. It basically involves understanding the language, correct errors, etc. Google has an algorithm to interpret spelling mistakes, match keywords to their respective meanings in the query (if it has multiple meanings). It is done mainly using natural language processing.<\/p>\n<ul>\n<li><b><\/b><strong><b>Quality of content<\/b><\/strong><\/li>\n<\/ul>\n<p>The Page Rank Algorithm which was developed by Larry Page and Sergey Brin, is a unique algorithm that ranks the webpages based on the links that are coming into and going out of a page. It basically determines the page\u2019s reliability and importance.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-565\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/Google-Search.png\" alt=\"\" width=\"850\" height=\"685\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/Google-Search.png 850w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/Google-Search-300x242.png 300w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/Google-Search-768x619.png 768w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><\/p>\n<p>Page C has higher rank than page E even though it has one link. It is because the link that comes to C is from a more important page and hence has higher value. If web surfers who start on a random page have an 85% likelihood of choosing a random link from the page they are currently visiting, and a 15% likelihood of jumping to a page chosen at random from the entire web, they will reach Page E 8.1% of the time.<\/p>\n<ul>\n<li><b><\/b><strong><b>Relevance of webpages<\/b><\/strong><\/li>\n<\/ul>\n<p>The content of a webpage is assessed by the use of an algorithm.\u00a0Beyond simple keyword matching, an aggregated and anonymized interaction data is used to check whether search results are relevant to queries. It means that the algorithm looks for more content on the page (like pictures and videos) which are relevant.<\/p>\n<ul>\n<li><b><\/b><strong><b>Usability of webpages<\/b><\/strong><\/li>\n<\/ul>\n<p>The algorithm checks for the usability of the webpage. Pages which are easier to use are given more priority. It checks whether the page opens properly in different browsers, its compatibility in different devices, etc.<\/p>\n<p>Google informs the website owners about the changes it will bring in the algorithm so that they make necessary changes to make it more easily usable.<\/p>\n<ul>\n<li><b><\/b><strong><b>Context and settings<\/b><\/strong><\/li>\n<\/ul>\n<p>This is a very important part which makes the google search more user friendly and relevant. This algorithm uses the users search history, location and search settings to provide the most useful results. Search also personalizes based on the activity in the google account. Google also allows the user to control the search settings.<\/p>\n<p>&nbsp;<\/p>\n<p>Next time you search something on google you actually know what is going on behind.<\/p>\n<p><em>\u00a0 \u00a0 &#8211; Hardik Harti<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Have you ever wondered what happens when you search for something on google? How does google manage to find the most useful data from such a huge collection of webpages Organizing Data This is basically done in two steps : Crawling and Indexing Crawling This happens before the search. The web crawlers gather information from&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[25],"tags":[52,138],"class_list":["post-563","post","type-post","status-publish","format-standard","hentry","category-sanganitra","tag-google","tag-search_engine"],"_links":{"self":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/comments?post=563"}],"version-history":[{"count":3,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/563\/revisions"}],"predecessor-version":[{"id":568,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/563\/revisions\/568"}],"wp:attachment":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/media?parent=563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/categories?post=563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/tags?post=563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}