Applications Of Trie Data Structure

Applications of Trie data structure

Ever wondered how swipe feature works in mobile keypads or how auto correct works while writing a document, which data structure actually holds the character values ??

The answer is Trie Data Structure also known as Digital search algorithm.

What is a Trie ?

The word trie is an infix of the word “retrieval” because the trie can find a single word in a dictionary with only a prefix of the word. So, a Trie is nothing but a tree and each node in it contains the number of pointers equal to the number of characters of the alphabet. For example, if we assume that all the strings are formed with english alphabet characters 'a' to 'z' then each node of the trie can contain maximum 26 pointers.

The standard trie for a set of string S is an ordered tree such that

  • the root represents an empty string("")
  • each node but the root is labeled with a character
  • the children of a node is alphabetically ordered
  • the paths from the external nodes to the root yields the strings of S

The below diagram shows a trie representation for words bell, bear, fare, fair, sell, stop.

                root(empty)
                /   \    \
                b   f     s
                |   |     | \
                e   a     e  t
              / |   | \   |  |
             l  a   r  i  l  o
             |  |   |  |  |  |
             l  r   e  r  l  p

Why Tries ?

The tries can insert and find strings in o(L) time(where L represent the length of a single word) which is much faster than hash tables and binary search trees.

A binary search tree which stores each word as a node requires O(log(n)) time to search for a string where n represents the number of nodes in BST.
Contrary to Binary trees, Trie holds a single character and has a maximum fan-out equal to the length of the alphabet which says that the operations (insert, search, remove) would take time O(dn) where
d = alphabet size (26 in case of english alphabet)
n = size of the string involved in the operation

and the space complexity can be stated as O(W) where
W = total size of the string S

Implementation of a standard Trie

Here we can see how a trie data structure can be implemented in Java with the implementation of common insert and search methods.

A trie node can be constructed as a smallest unit to store a character and the possible pointers to other nodes

public class TrieNode {

    char character;
    HashMap<Character, TrieNode> child = new HashMap<Character, TrieNode>();
    boolean isLeaf;

    public TrieNode() {}
    public TrieNode(char character){
        this.character = character;
    }
}

Inserting a key into trie is simple approach. Every character of input key is inserted as an individual trie node. Note that the children is a hash map storing pointers to next level trie nodes. If the input key is new or an extension of existing key, we need to construct non-existing nodes of the key, and mark leaf node. If the input key is prefix of existing key in trie, we simply mark the last node of key as leaf. The key length determines trie depth.

public void insert(String word) {
        HashMap<Character, TrieNode> children = root.child;

        for(int i=0; i<word.length(); i++){
            char c = word.charAt(i);

            TrieNode t;
            if(children.containsKey(c)){
                    t = children.get(c);
            }else{
                t = new TrieNode(c);
                children.put(c, t);
            }

            children = t.child;

            //set leaf node
            if(i==word.length()-1)
                t.isLeaf = true;
        }
    }

Searching for a key is similar to insert operation, however we only compare the characters and move down. The search can terminate due to end of string or lack of key in trie. In the former case, if the value field of last node is non-zero then the key exists in trie. In the second case, the search terminates without examining all the characters of key, since the key is not present in trie.

public boolean search(String word) {
        TrieNode trieNode = searchNode(word);

        if(trieNode != null && trieNode.isLeaf)
            return true;
        else
            return false;
    }

    public TrieNode searchNode(String str){
        Map<Character, TrieNode> children = root.child;
        TrieNode node = null;
        for(int i=0; i<str.length(); i++){
            char c = str.charAt(i);
            if(children.containsKey(c)){
                node = children.get(c);
                children = node.child;
            }else{
                return null;
            }
        }

        return node;
    }

Applications of Trie

Auto Complete

Auto Complete functionality is used widely in mobile apps and text editors. Trie is an efficient data structure widely used for its implementation. Trie provides an easy way to search for the possible dictionary words to complete word because of the following reasons

  • Looking up data in a trie is faster in the worst case O(n)(n = size of the string involved in the operation) time compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table.

  • A trie can provide an alphabetical ordering of the entries by key.

  • Searching in a trie enables us to trace pointers to get to a node that represent the string user has entered. By exploring a trie traversing down the tree, we can easily enumerate all strings that complete the user input.

Spell Checkers

Spell checking is a three-step process. Check if a word is in a dictionary, generate potential suggestions, and then sort the suggestions–hopefully with the intended word on top.
Tries can be used to store that dictionary and by searching the words over the data structure one can easily implement a spell checker in the most efficient way. Using trie not only the lookup for a word into the dictionary becomes easy but an algorithm to provide the list of valid words or suggestions can be easily constructed.

Longest Prefix Matching

Also called Maximum prefix length match refers to an algorithm used by routers in Internet protocol(IP) networking to select an entry from a routing table.
One of the first IP lookup techniques to employ tries is the radix trie implementation in the BSD kernel. Optimizations requiring contiguous masks bound the worst case lookup time to O(W) where W is the length of the address
in bits. In order to speed up the lookup process,multi bit trie schemes were developed which perform a search using multiple bits of the address at a time.

Automatic Command completion

when using an operating system such as Unix, we type in system commands to accomplish certain tasks. For example to see the list of commands (ls /usr/bin/ps*) having prefix ps can be autosuggested by just pressing tab

/usr/bin/psed        /usr/bin/pstopdf     /usr/bin/pstruct5.16
/usr/bin/psed5.12    /usr/bin/pstruct
/usr/bin/psed5.16    /usr/bin/pstruct5.12

We can simply the task of typing in commands by providing a command completion facility which automatically types in the command suffix once the user has typed in a long enough prefix to uniquely identify the command. For instance, once the letters psi have been entered, we know that the command must be psidtopgm because there is only one command that has the prefix psi.

Network browser history

A network browser keeps a history of the URLs of sites that you have visited. By organizing this history as a trie, the user need only type the prefix of a previously used URL and the browser can complete the URL.

Leave a Reply

Your email address will not be published. Required fields are marked *