Monday, April 21, 2014

LeetCode: Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).


The basic idea is to check every possible substring of the given length. Another idea is from online, also start from every possible position, the trick is that when we have n words matched, no matter the n words can formed a valid substring, we can test the next possibility by move forward by one word instead of all words. For example, if the string is "barfoothefoobarman", assume we have "barfoo" now, we can check "foothe" for the next time. Another case, if the string is "barbarfoofoobarman", assume we have "barbar" now, we also can check "barfoo" for the next time. I implemented the first approach in C++, second one in Python.

C++ Code:

/*
 * func: test_substring
 * goal: test if the current subtring match the requirements
 * @param current: current string
 * @param word_length: length of each word
 * @param word_dict: given word dict
 * return: true of false
 */
bool test_substring(const string &current, const size_t& word_length, unordered_map<string, int> word_dict){
    for(int i = 0; i < current.length() - word_length + 1; i += word_length){
        string current_word = current.substr(i, word_length);
        if(word_dict.find(current_word) != word_dict.end() && word_dict[current_word] >= 1){
            word_dict[current_word] -= 1;
        }else{
            return false;
        }
    }
    //Test if all count in word_dict is 0
    for(const auto& it : word_dict){
        if(it.second > 0){
            return false;
        }
    }
    return true;
}

/*
 * func: find_substring_all_concatenation
 * goal: find the starting indices of substrings made of all words once in the set
 * @param S: target string S
 * @param L: vector of words
 * return: vector of starting indices
 */
/*
 * Just test every possible substring
 * complexity: time O(m * n), space O(m) where m is the length of substring, n is the length
               of string S
 */
vector<int> find_substring_all_concatenation(const string& S, const vector<string>& L){
    vector<int> result;
    if(L.size() == 0 || S.size() < L.at(0).size() * L.size()){
        return result;
    }
    
    //Add all words to a map
    unordered_map<string, int> word_dict;
    for(const auto &it : L){
        if(word_dict.find(it) == word_dict.end()){
            word_dict.emplace(it, 1);
        }else{
            word_dict[it] += 1;
        }
    }
    //Test every possible substring with length L.at(0).size() * L.size()
    int length = static_cast<int>(L.at(0).size() * L.size());
    for(int i = 0; i < S.size() - length + 1; i++){
        string current = S.substr(i, length);
        if(word_dict.find(current.substr(0, L.at(0).size())) != word_dict.end()){
            if(test_substring(current, L.at(0).size(), word_dict)){
                result.emplace_back(i);
            }
        }
    }
    
    return result;
}

Python Code:

# func: Find all starting indices of substring(s) in S that is a
#       concatenation of each word in L exactly once and without
#       any intervening characters
# @param S: target string S
# @param L: word list
# @return: a list of starting indices
# complexity: time O(n * m), space O(m * size) where n is the size of string
#             m is the length of a word, size is the number of words in L
def find_substring_concatenation(S, L):
    if len(L) == 0 or len(S) < len(L) * len(L[0]):
        return []
    word_length = len(L[0])
    str_length = len(S)
    result = []
    # put all words in a dict
    word_dict = {}
    for word in L:
        if word in word_dict:
            word_dict[word] += 1
        else:
            word_dict[word] = 1

    for offset in xrange(word_length):
        match_dict = {}

        #start position of the substring
        start = offset

        #counter of the words
        count = 0

        end = offset
        while end + word_length <= str_length:
            current_str = S[end:end+word_length]
            if current_str in word_dict:
                if current_str in match_dict:
                    match_dict[current_str] += 1
                else:
                    match_dict[current_str] = 1

                #If the current word's frequency is less than the dict
                if match_dict[current_str] <= word_dict[current_str]:
                    count += 1

                if count == len(L):
                    result.append(start)

                #If len(L) words have been tested, we can move start forward by
                # one word length
                if (end-start)/word_length + 1 == len(L):
                    current_str = S[start: start+word_length]
                    match_dict[current_str] -= 1
                    if match_dict[current_str] < word_dict[current_str]:
                        count -= 1
                    start += word_length
            else:
                match_dict.clear()
                start = end + word_length
                count = 0

            end += word_length
    return result

Reference Link:

1.Substring with Concatenation of All Words - LeetCode: http://discuss.leetcode.com/questions/210/substring-with-concatenation-of-all-words

No comments:

Post a Comment