Tuesday, August 2, 2011

Auto Completion - Find Strings starting with Prefix



We are all familiar with the auto completion function provided by IDE, for example, in eclipse, if we type Collections.un, then eclipse would list all methods that start with "un" such as unmodifiableCollection, unmodifiableList etc.

So we have the following question:
You have a list of string, you need find all strings that starts with prefix provided repeatedly, how to do this efficiently?

Answer:
We need to preprocess the list of string, so later we can quickly search it.

One way is to sort the string list by alphabetical order, then when search with the prefix (say app), we binary search this list and get a lower index whose string is larger than “app”, and get a higher index whose string is less than “apr”, then all strings between the lower index and higher index[lower index, higher index) are the strings that starts with the prefix.
Each query would take O(longn), n is the length of the string list.

Another better way is to create a tree from the string list, for example, for string "append", it would look like this:
  [root node(flag)]
         /
        a
       / \
     [ST] p
          \
          p -- return all strings from this sub tree
         /
         e
         \
         n
        / \
        d [Sub Tree]
       /
[leaf node(flag)]
So when we search all strings that starts with "app", it can search this tree, and get all strings of the p node, the time complexity depends on the length of the prefix, having nothing to do with the length of the string list. This is much better.

Code:
The complete algorithm/test code and also many other algorithm problems and solutions are available from https://github.com/programmer-plus.
package org.codeexample.autocomplete;

public class WordTree {
 private WordNode root;
 public WordTree() {
  root = new WordNode(null);
 }
 /**
  * Add a string into this word tree, if word is null or an empty string, do
  * nothing
  */
 public void addWord(String word) {
  if (word == null)
   return;
  word = word.trim();
  if ("".equals(word)) {
   return;
  }
  WordNode parentNode = root, curretNode;
  for (int i = 0; i < word.length(); i++) {
   char character = word.charAt(i);
   Map<Character, WordNode> children = parentNode.getChildrenMap();
   if (children.containsKey(character)) {
    curretNode = children.get(character);
   } else {
    curretNode = new WordNode(character);
    parentNode.addChild(curretNode);
   }
   parentNode = curretNode;
  }
  // at last, add a leaf node - whose character value is null to indicate
  // the end of the word
  curretNode = new WordNode(null);
  parentNode.addChild(curretNode);
 }
 /**
  * @param prefix
  * @return all words in this tree that starts with the prefix, <br>
  *         if prefix is null, return an empty list, if prefix is empty
  *         string, return all words in this word tree.
  */
 public List<String> wordsPrefixWith(String prefix) {
  List<String> words = new ArrayList<String>();
  if (prefix == null)
   return words;
  prefix = prefix.trim();
  WordNode currentNode = root;
  for (int i = 0; i < prefix.length(); i++) {
   char character = prefix.charAt(i);
   Map<Character, WordNode> children = currentNode.getChildrenMap();
   if (!children.containsKey(character)) {
    return words;
   }
   currentNode = children.get(character);
  }
  return currentNode.subWords();
 }
 /**
  * @param word
  * @return whether this tree contains this word, <br>
  *         if the word is null return false, if word is empty string, return
  *         true.
  */
 public boolean hasWord(String word) {
  if (word == null)
   return false;
  word = word.trim();
  if ("".equals(word))
   return true;
  WordNode currentNode = root;
  for (int i = 0; i < word.length(); i++) {
   char character = word.charAt(i);
   Map<Character, WordNode> children = currentNode.getChildrenMap();
   if (!children.containsKey(character)) {
    return false;
   }
   currentNode = children.get(character);
  }
  // at last, check whether the parent node contains one null key - the
  // leaf node, if so return true, else return false.
  return currentNode.getChildrenMap().containsKey(null);
 }
}
class WordNode {
 private Character character;
 private WordNode parent;
 private Map<Character, WordNode> childrenMap = new HashMap<Character, WordNode>();
 public WordNode(Character character) {
  this.character = character;
 }
 /**
  * @return all strings of this sub tree
  */
 public List<String> subWords() {
  List<String> subWords = new ArrayList<String>();
  String prefix = getPrefix();
  List<String> noPrefixSubWords = subWordsImpl();
  for (String noPrefixSubWord : noPrefixSubWords) {
   subWords.add(prefix + noPrefixSubWord);
  }
  return subWords;
 }
 private List<String> subWordsImpl() {
  List<String> words = new ArrayList<String>();
  Iterator<Character> keyIterator = childrenMap.keySet().iterator();
  while (keyIterator.hasNext()) {
   Character key = keyIterator.next();
   if (key == null) {
    words.add(convertToString(this.character));
   } else {
    WordNode node = childrenMap.get(key);
    List<String> childWords = node.subWordsImpl();
    for (String childWord : childWords) {
     words.add(convertToString(this.character) + childWord);
    }
   }
  }
  return words;
 }
 public void addChild(WordNode child) {
  child.parent = this;
  childrenMap.put(child.getCharacter(), child);
 }
 public Character getCharacter() {
  return character;
 }
 public WordNode getParent() {
  return parent;
 }
 public Map<Character, WordNode> getChildrenMap() {
  return childrenMap;
 }
 private String convertToString(Character character) {
  return (character == null) ? "" : String.valueOf(character);
 }
 private String getPrefix() {
  StringBuilder sb = new StringBuilder();
  WordNode parentNode = this.parent;
  while (parentNode != null) {
   if (parentNode.getCharacter() != null) {
    sb.append(parentNode.getCharacter());
   }
   parentNode = parentNode.parent;
  }
  return sb.reverse().toString();
 }
}

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts