{"id":564,"date":"2019-10-14T09:02:57","date_gmt":"2019-10-14T09:02:57","guid":{"rendered":"http:\/\/nitk.acm.org\/blog\/?p=564"},"modified":"2019-10-23T11:32:33","modified_gmt":"2019-10-23T11:32:33","slug":"trie-ing-new-things","status":"publish","type":"post","link":"https:\/\/nitk.acm.org\/blog\/2019\/10\/14\/trie-ing-new-things\/","title":{"rendered":"Trie-ing new things"},"content":{"rendered":"<p>If I were to give you a bunch of strings, how would you store them? Perhaps as an array or a list? But what if you\u2019re required to search for a string? That\u2019s a search complexity of O(n) (Where n is the number of strings stored). Can this be done more efficiently?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-569\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-1.png\" alt=\"\" width=\"331\" height=\"268\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-1.png 331w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-1-300x243.png 300w\" sizes=\"auto, (max-width: 331px) 100vw, 331px\" \/><\/p>\n<p>Instead of storing words individually, perhaps the overlapping parts can be stored just once, like a tree. This type of data structure is called a Trie (Nope not a typo).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-570\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-2.png\" alt=\"\" width=\"279\" height=\"538\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-2.png 279w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-2-156x300.png 156w\" sizes=\"auto, (max-width: 279px) 100vw, 279px\" \/><\/p>\n<p>This makes searching for a string \u2018 STR\u2019 O(Length of STR)\u2019 . This gives rise to a big problem though. What about the string \u2018HEL\u2019? This certainly wasn\u2019t a part of the initial list but it seems to exist in our data structure. How do we differentiate between strings like \u2018HEL\u2019 and \u2018HELL\u2019? We can use 0s and 1s to indicate the existence of a string.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-571\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-3.png\" alt=\"\" width=\"286\" height=\"527\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-3.png 286w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/10\/trie-3-163x300.png 163w\" sizes=\"auto, (max-width: 286px) 100vw, 286px\" \/><\/p>\n<p>This was a very simplistic explanation of what a trie is. In reality each node holds an array of pointers (Nonexistent children are NULL values) along with a key value and the root node holds no key. Tries can be used to store a multitude of different types of data, not just strings.<\/p>\n<p>If a string \u2018HELE\u2019 needs to be added, then the key value of the last \u2018E\u2019 is set to 1. If the string \u2018HELLO\u2019 needs to be deleted, the key value of the last \u2018O\u2019 is set to 0. This makes searching for a string quite trivial. For example, searching for the string \u2018HE\u2019, we start off at the root. From the root we try to move to the first letter \u2018H\u2019. Since the child \u2018H\u2019 exists, we move to it. Now we look for \u2018E\u2019 from \u2018H\u2019. Since the child \u2018E\u2019 exists, we move to it. As we have reached the end of the string, we look at the key value of \u2018E\u2019. \u2018E\u2019 has a key value of 0 which means the string \u2018HE\u2019 doesn\u2019t exist in our data structure.<\/p>\n<p>On the surface a trie might seem to be more space efficient than a simple list but tries grow very quickly and each node holds a lot of pointers. In the case of the English alphabet, each node holds 26 pointers, so that\u2019s a lot of useless pointers holding only NULL values. In the above example the root node would have 25 NULL children since only child exists (H). Similarly with the \u2018H\u2019 and \u2018E\u2019 nodes. So we\u2019re sacrificing memory to improve our running time.<\/p>\n<p>Tries are an extremely useful data structure. They are used to implement (Albeit in a much more complex manner) search and auto-complete suggestions, look-ups for blacklisted\/whitelisted IP addresses, spell-checkers and many more interesting things.<\/p>\n<p>&nbsp;<\/p>\n<p>If you\u2019d like to learn more, Radix Trees are a step up, they occupy less space compared to tries while maintaining the running time.<\/p>\n<p><em>\u00a0 \u00a0&#8211; Ajay Bharadwaj<\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If I were to give you a bunch of strings, how would you store them? Perhaps as an array or a list? But what if you\u2019re required to search for a string? That\u2019s a search complexity of O(n) (Where n is the number of strings stored). Can this be done more efficiently? Instead of storing&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[25],"tags":[139,140],"class_list":["post-564","post","type-post","status-publish","format-standard","hentry","category-sanganitra","tag-data-structures","tag-time-complexity"],"_links":{"self":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/564","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/comments?post=564"}],"version-history":[{"count":3,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/564\/revisions"}],"predecessor-version":[{"id":585,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/564\/revisions\/585"}],"wp:attachment":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/media?parent=564"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/categories?post=564"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/tags?post=564"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}