30. Substring with Concatenation of All Words

Leetcode Hash Table Two Pointers String

题目¶

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

Example 1:

Input:
  s = "barfoothefoobarman",
  words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoor" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input:
  s = "wordgoodstudentgoodword",
  words = ["word","student"]
Output: []

思路¶

因为words中的单词可能有重复，所以要有一个dict来记录一下每个字符串的数目。然后在遍历原字符串的时候，只需要遍历单词的长度次即可，如"barfoothefoobarman"，因为目标单词的长度为3，所以只需遍历：

'bar' | 'foo' | 'the' | 'foo' | 'bar' | 'man' 'arf' | 'oot' | 'hef' | 'oob' | 'arm' 'rfo' | 'oth' | 'efo' | 'oba' | 'rma'

在遍历时，需要两个指针，一个用来标记子字符串的开始，另一个用来标记子字符串的结束。再用一个dict来记录当前字符串中单词的数量，如果下一个单词不在words中，那么清空该dict，把前指针直接跳到后指针处；如果在words中，那么相应的键值要加一，此时如果那个单词的数量超过了目标中的数目，那么前指针要不断后移直到吐出一个那个单词。通过前后指针之差是否等于所有目标单词长度之和来判断是否有目标子字符串。

def findSubstring(self, s, words):
    """
    :type s: str
    :type words: List[str]
    :rtype: List[int]
    """
    s_length = len(s)
    word_num = len(words)
    word_length = len(words[0])
    words_length = word_num * word_length
    result = []
    words_dict = {}

    for word in words:
        if word in words_dict:
            words_dict[word] = words_dict[word] + 1
        else:
            words_dict[word] = 1

    for i in range(word_length):
        # 两个指针，left, right:
        # 一个用来标记子字符串的开始，另一个用来标记子字符串的结束
        left = i
        right = i

        # curr_dict 记录当前字符串中单词的数量
        curr_dict = {}
        while right + word_length <= s_length:
            word = s[right: right + word_length] # 取出单词
            right += word_length
            # 如果在words中，那么相应的键值要加1
            # 此时如果那个单词的数量超过了目标中的数目，那么左指针要不断后移直到吐出一个那个单词。
            if word in words_dict:
                curr_dict[word] = curr_dict[word] + 1 if word in curr_dict else 1
                while curr_dict[word] > words_dict[word]:
                    curr_dict[s[left: left+word_length]] -= 1
                    left += word_length
                # 通过前后指针之差是否等于所有目标单词长度之和来判断是否有目标子字符串。
                if right - left == words_length:
                    result.append(left)
            else:
                # 如果下一个单词不在words中，那么清空该dict，把前指针直接跳到后指针处
                curr_dict.clear()
                left = right
    return result