438. Find All Anagrams in a String

Leetcode Hash Table

Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.

Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.

The order of output does not matter.

Example 1:

Input:
s: "cbaebabacd" p: "abc"

Output:
[0, 6]

Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input:
s: "abab" p: "ab"

Output:
[0, 1, 2]

Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

分析¶

这道题目是LeetCode 242. Valid Anagram的延伸。前者只要判断字符串是否是变位词，这里需要寻找出变位词的位置。一个容易想到的办法就是依次从字符串s中取出与p长度相等的子字符串，然后判定该子字符串是否是变位词。

public List<Integer> findAnagrams(String s, String p) {
    int slen = s.length(), plen = p.length();
    List<Integer> list = new ArrayList<>();
    for (int i = 0; i < slen - plen + 1; i++)
        if (isAnagram(p, s.substring(i, i + plen))) list.add(i);

    return list;
}

private boolean isAnagram(String s, String t) {
    int[] count = new int[26];
    for (int i = 0; i < s.length(); i++) {
        count[s.charAt(i) - 'a']++;
        count[t.charAt(i) - 'a']--;
    }

    for (int i = 0; i < count.length; i++) 
        if (count[i] != 0) return false;

    return true;
}

但是这种方法的时间复杂度不是很理想。一种被广泛应用在子字符串算法的滑动窗口方法能够很好的降低复杂度：它的基本思想是维持一个哈希表，哈希表的键值分别为字符和字符出现的次数；另外有begin，end指针分别表示滑动窗口的起始点和终点位置。哈希表中的值，也就是字符次数，初始化为变位词中各个字符出现的次数。遍历字符串，当出现变位词的字符时，哈希表中相应的值减去1，当哈希表中的值都为0时，说明滑动窗口内的子字符串很可能是变位词，如果此时起始点和终点的距离end-begin恰好为变位词的长度时，说明滑动窗口内肯定是变位词，记录下此时变位词的位置。然后调整滑动窗口的位置，寻找下一个变位词。

public List<Integer> findAnagrams(String s, String p) {
    //init a collection or int value to save the result according the question.
    List<Integer> result = new LinkedList<>();
    if(p.length() > s.length()) return result;

    // create a hashmap to save the Characters of the target substring.
    // (K, V) = (Character, Frequence of the Characters)
    Map<Character, Integer> map = new HashMap<>();
    for(char c : p.toCharArray())
        map.put(c, map.getOrDefault(c, 0) + 1);

    //maintain a counter to check whether match the target string.
    //must be the map size, NOT the string size because the char may be duplicate.
    int counter = map.size();

    //Two Pointers: begin - left pointer of the window; end - right pointer of the window
    int begin = 0, end = 0;

    //loop at the beginning of the source string
    while(end < s.length()){
        //get a character
        char c = s.charAt(end);
        if(map.containsKey(c)){
            // plus or minus one
            map.put(c, map.get(c)-1);
            // modify the counter according the requirement(different condition).
            if (map.get(c) == 0) counter--;
        }
        end++;

        //increase begin pointer to make it invalid/valid again
        while(counter == 0){
            // ***be careful here: choose the char at begin pointer, NOT the end pointer
            char tempc = s.charAt(begin);
            if (map.containsKey(tempc)){
                //plus or minus one
                map.put(tempc, map.get(tempc) + 1);
                //modify the counter according the requirement(different condition).
                if (map.get(tempc) > 0) counter++;
            }

            // save / update(min/max) the result if find a target
            // result collections or result int value
            if (end - begin == p.length())
                result.add(begin);
            begin++;
        } //end while
    } // end while
    return result;
} // end findAnagrams

但是这种做法还是稍显麻烦，毕竟这里变位词只是26个字母，而且还都是小写的，真的没有必要用HashMap来做哈希表，用一个长26的整形数组来做哈希表更加好。注意下面的counter是变位词长度，不是上面的出现次数，所以代码简洁了一些。

public List<Integer> findAnagrams(String s, String p) {
    List<Integer> res = new ArrayList<>();
    if (p.length() > s.length()) return res;

    // total number of character in p to be contained in s
    int counter = p.length();
    // calculate the number of each character to be contained in S
    int[] map = new int[26];
    for (char c : p.toCharArray()) map[c - 'a']++;

    // the begin/end of the sliding window
    int begin = 0, end = 0;
    // current char
    int cur;
    while (end < s.length()) {
        // current char
        cur = s.charAt(end++) - 'a';
        // if the character needs to be contained, include it and minus the counter
        if (map[cur] > 0) counter--;
        map[cur]--;


        // all included, move the begin pointer to minimize the window
        while (counter == 0) {
            // current char
            cur = s.charAt(begin) - 'a';
            // current window contains same number of the current character as in p,
            // cannot move forward anymore
            if (map[cur] == 0) {
                // if the window size is same as p, an anagram is found
                if (end - begin == p.length()) res.add(begin);
                counter++;
            } // end if
            map[cur]++;
            begin++;
        } // end while
    } // end while
    return res;
} // end findAnagrams