find Minimum Window Substring

find Minimum Window Substring - python

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".
Update,
I read one implementation from http://articles.leetcode.com/2010/11/finding-minimum-window-in-s-which.html, and the implementation seems wrong, which does not decrease count, and also does not move begin when find a matched window? Thanks.
// Returns false if no valid window is found. Else returns
// true and updates minWindowBegin and minWindowEnd with the
// starting and ending position of the minimum window.
bool minWindow(const char* S, const char *T,
int &minWindowBegin, int &minWindowEnd) {
int sLen = strlen(S);
int tLen = strlen(T);
int needToFind[256] = {0};
for (int i = 0; i < tLen; i++)
needToFind[T[i]]++;
int hasFound[256] = {0};
int minWindowLen = INT_MAX;
int count = 0;
for (int begin = 0, end = 0; end < sLen; end++) {
// skip characters not in T
if (needToFind[S[end]] == 0) continue;
hasFound[S[end]]++;
if (hasFound[S[end]] <= needToFind[S[end]])
count++;
// if window constraint is satisfied
if (count == tLen) {
// advance begin index as far right as possible,
// stop when advancing breaks window constraint.
while (needToFind[S[begin]] == 0 ||
hasFound[S[begin]] > needToFind[S[begin]]) {
if (hasFound[S[begin]] > needToFind[S[begin]])
hasFound[S[begin]]--;
begin++;
}
// update minWindow if a minimum length is met
int windowLen = end - begin + 1;
if (windowLen < minWindowLen) {
minWindowBegin = begin;
minWindowEnd = end;
minWindowLen = windowLen;
} // end if
} // end if
} // end for
return (count == tLen) ? true : false;
}

Assume that String S and T only contains A-Z characters (26 characters)
First, create an array count, which store the frequency of each characters in T.
Process each character in S, maintaining a window l, r, which will be the current minimum window that contains all characters in T.
We maintain an array cur to store the current frequency of characters in window. If the frequency of the character at the left end of the window is greater than needed frequency, we increase l
Sample Code:
int[]count = new int[26];
for(int i = 0; i < T.length; i++)
count[T[i] - 'A']++;
int need = 0;//Number of unique characters in T
for(int i = 0; i < 26; i++)
if(count[i] > 0)
need++;
int l = 0, r = 0;
int count = 0;
int result ;
int[]cur = new int[26];
for(int i = 0; i < S.length; i++){
cur[S[i] - 'A']++;
r = i;
if(cur[S[i] - 'A'] == count[S[i] - `A`]){
count++;
}
//Update the start of the window,
while(cur[S[l] - 'A'] > count[S[l] - 'A']){
cur[S[l] - 'A']--;
l++;
}
if(count == need)
result = min(result, r - l + 1);
}
Each character in S will be processed at most two times, which give us O(n) complexity.

def minWindow(self, s, t):
"""
:type s: str
:type t: str
:rtype: str
"""
count = len(t)
require = [0] * 128
chSet = [False] * 128
for i in range(count):
require[ord(t[i])] += 1
chSet[ord(t[i])] = True
i = -1
j = 0
minLen = 999999999
minIdx = 0
while i < len(s) and j < len(s):
if count > 0:
i += 1
if i == len(s):
index = 0
else:
index = ord(s[i])
require[index] -= 1
if chSet[index] and require[index] >=0:
count -= 1
else:
if minLen > i - j + 1:
minLen = i - j + 1
minIdx = j
require[ord(s[j])] += 1
if chSet[ord(s[j])] and require[ord(s[j])] > 0:
count += 1
j += 1
if minLen == 999999999:
return ""
return s[minIdx:minIdx+minLen]
The method I used was to map the characters and how many are in the substring vs how many are needed. If all the values are non-negative, then you can remove characters from the start of the substring until you reach a negative, and if there's a negative, you add to the end of the substring until it is 0 again. You continue this until you've reached the end of S, and then remove characters until you have a negative count for one of the characters.
Going through the example, S="ADOBECODEBANC" and T="ABC". Starting out, the map has the values A=-1, B=-1, C=-1, and has a count of 3 negatives. Adding the first letter increases A to 0, which removes a negative, leaving a count of 2. You can count the others as well, since they will never become negative, resulting in A=0,B=0,C=0,D=1,O=1,E=1 when you add the C. Since the negative count is 0, you start removing characters from the start, which is A, dropping it to -1, and switching back to adding at the end.
You then add to the end until you reach an A again, which results in A=0,B=1,C=0,D=2,E=2,O=2 and a count of 0. Remove from the start until you reach a negative again, which removes D,O,B,E,C, since B's removal only drops it to 0, not a negative. At that point, the substring is "ODEBA" and C = -1. Add to the end until you reach a C and you have "ODEBANC", and remove from the start until you get a negative again, leaving "ANC". You've reached the end of the string and have a negative, so there is no shorter string remaining with all the characters.
You can retrieve the shortest substring by taking the start and end indices of the mapped substring whenever you switch from removing to adding and storing them if they are shorter than the previous shortest. If you never switch from removing to adding, then the result is the empty string.
If S="BANC" and T="ABC", then the result is adding until you reach "BANC", switching to remove, hitting a negative (and therefore copying those lengths at 0 and 3), and attempting to add beyond the end which ends the algorithm with the substring starting at 0 and ending at 3.
As every character gets adding once and removed once or less, it takes 2n steps at most to complete the algorithm, an O(n) solution.
Idea from mike3

You can try this method:
Create a hash of T (because order of characters in t does not matter, we will use its hash)
Now take two pointers (to iterate through S), both indexed at 0 to begin with. Let their names be i,j.
Increment j at each step and calculate hash of S as you move forward. When this hash covers hash of T (of course you will need to compare the two hashes at each step), start to increment i (and decrement hash values in hash of S) until hash remains covered.
When hash of S < hash of T, start again by incrementing j.
At any point, the least window size of i..j that covers hash of T is your answer.
PS: take care of the corner cases, like end of string and all. I'll help if you need the code, but I'd recommend if you try it yourself first and then ask doubts.

A more 'pythonic' approach on the algorithm explained in http://articles.leetcode.com/2010/11/finding-minimum-window-in-s-which.html
In short: Use head and tail 'pointers', advance head until match is found, then advance tail (decrease the size of the window), while the substring still matches.
import collections
def windows(S, T):
# empty string/multiset is matched everywhere
if not T:
yield(0, 0)
return
# target multiset initialized to contents of T
target_ms = collections.Counter(T)
# empty test multiset
test_ms = collections.Counter()
head = enumerate(S)
tail = enumerate(S)
# while the condition is not met, advance head
# and add to the test multiset
# iterate over the whole input with head
for i_head, char_head in head:
test_ms[char_head] += 1
# while the condition is met, advance tail
# (and subtract from test multiset)
# (a - b) for Counters has only elements from a that
# remained >0 after subtraction
while not target_ms - test_ms:
i_tail, char_tail = tail.next()
yield (i_tail, i_head + 1)
test_ms[char_tail] -= 1
def min_window(S, T):
# initialize
min_len = len(S) + 1
min_start, min_end = None, None
# go through all matching windows, pick the shortest
for start, end in windows(S, T):
if end - start < min_len:
min_start, min_end = start, end
min_len = end - start
return (min_start, min_end)

My C++ solution that runs in O(n) time (accepted solution at leetcode, runs faster than 99% of submitted C++ solutions):
#include<string>
#include<vector>
using namespace std;
class CharCounter
{
private:
const int fullCount;
int currentCount;
vector<pair<short, short>> charMap;
public:
CharCounter(const string &str) :fullCount(str.size()), currentCount(0) {
charMap = vector<pair<short, short>>(128, { 0,0 });
for (const auto ch : str) {
charMap[ch].second++;
}
};
void reset() {
for (auto &entry : charMap)
entry.first = 0;
currentCount = 0;
}
bool complete() const {
return (currentCount == fullCount);
}
void add(char ch) {
if (charMap[ch].second > 0) {
if (charMap[ch].first < charMap[ch].second)
currentCount++;
charMap[ch].first++;
}
}
void subtract(char ch) {
if (charMap[ch].second > 0) {
if (charMap[ch].first <= charMap[ch].second)
currentCount--;
charMap[ch].first--;
}
}
};
class Solution
{
public:
string minWindow(string s, string t) {
if ((s.size() < 1) || (t.size() < 1))
return "";
CharCounter counter(t);
pair<size_t, size_t> shortest = { 0, numeric_limits<size_t>::max() };
size_t beg = 0, end = 0;
while (end < s.size()) {
while ((end < s.size()) && (!counter.complete())) {
counter.add(s[end]);
if (counter.complete())
break;
end++;
}
while (beg < end) {
counter.subtract(s[beg]);
if (!counter.complete()) {
counter.add(s[beg]);
break;
}
beg++;
}
if (counter.complete()) {
if ((end - beg) < shortest.second - shortest.first) {
shortest = { beg, end };
if (shortest.second - shortest.first + 1 == t.size())
break;
}
if (end >= s.size() - 1)
break;
counter.subtract(s[beg++]);
end++;
}
}
return s.substr(shortest.first, shortest.second - shortest.first + 1);
}
};
The idea is simple: iterate the source string (s) char by char using two "pointers", beg and end. Add every char encountered at the end. As soon as all chairs contained in t added, update the shortest interval. Increment left pointer beg and subtract the left char from counter.
Example of usage:
int main()
{
Solution fnder;
string str = "figehaeci";
string chars = "aei";
string shortest = fnder.minWindow(str, chars); // returns "aeci"
}
The only purpose of the CharCounter is to count encountered chars contained in t.

Related

Cartesian product in Gray code order : including affected set in this order?

Having an excellent solution to: Cartesian product in Gray code order with itertools?, is there a way to add something simple to this solution to also report the set (its index) that underwent the change in going from one element to the next of the Cartesian product in Gray code order? That is, a gray_code_product_with_change(['a','b','c'], [0,1], ['x','y']) which would produce something like:
(('a',0,'x'), -1)
(('a',0,'y'), 2)
(('a',1,'y'), 1)
(('a',1,'x'), 2)
(('b',1,'x'), 0)
(('b',1,'y'), 2)
(('b',0,'y'), 1)
(('b',0,'x'), 2)
(('c',0,'x'), 0)
(('c',0,'y'), 2)
(('c',1,'y'), 1)
(('c',1,'x'), 2)
I want to avoid taking the "difference" between consecutive tuples, but to have constant-time updates --- hence the Gray code order thing to begin with. One solution could be to write an index_changed iterator, i.e., index_changed(3,2,2) would return the sequence -1,2,1,2,0,2,1,2,0,2,1,2 that I want, but can something even simpler be added to the solution above to achieve the same result?

There are several things wrong with this question, but I'll keep it like this, rather than only making it worse by turning it into a "chameleon question"
Indeed, why even ask for the elements of the Cartesian product in Gray code order, when you have this "index changed" sequence? So I suppose what I was really looking for was efficient computation of this sequence. So I ended up implementing the above-mentioned gray_code_product_with_change, which takes a base set of sets, e.g., ['a','b','c'], [0,1], ['x','y'], computing this "index changed" sequence, and updating this base set of sets as it moves through the sequence. Since the implementation ended up being more interesting than I thought, I figured I would share, should someone find it useful:
(Disclaimer: probably not the most pythonic code, rather almost C-like)
def gray_code_product_with_change(*args, repeat=1) :
sets = args * repeat
s = [len(x) - 1 for x in sets]
n = len(s)
# setup parity array and first combination
p = n * [True] # True: move foward (False: move backward)
c = n * [0] # inital combo: all 0's (first element of each set)
# emit the first combination
yield tuple(sets[i][x] for i, x in enumerate(c))
# incrementally update combination in Gray code order
has_next = True
while has_next :
# look for the smallest index to increment/decrement
has_next = False
for j in range(n-1,-1,-1) :
if p[j] : # currently moving forward..
if c[j] < s[j] :
c[j] += 1
has_next = True
# emit affected set (forward direction)
yield j
else : # ..moving backward
if c[j] > 0 :
c[j] -= 1
has_next = True
# emit affected set (reverse direction)
yield -j
# we did manage to increment/decrement at position j..
if has_next :
# emit the combination
yield tuple(sets[i][x] for i, x in enumerate(c))
for q in range(n-1,j,-1) : # cascade
p[q] = not p[q]
break
Trying to tease out as much performance as I could in just computing this sequence --- since the number of elements in the Cartesian product of a set of sets grows exponentially with the number of sets (of size 2 or more) --- I implemented this in C. What it essentially does, is implement the above-mentioned index_changed (using a slightly different notation):
(Disclaimer: there is much room for optimization here)
void gray_code_sequence(int s[], int n) {
// set up parity array
int p[n];
for(int i = 0; i < n; ++i) {
p[i] = 1; // 1: move forward, (1: move backward)
}
// initialize / emit first combination
int c[n];
printf("(");
for(int i = 0; i < n-1; ++i) {
c[i] = 0; // initial combo: all 0s (first element of each set)
printf("%d, ", c[i]); // emit the first combination
}
c[n-1] = 0;
printf("%d)\n", c[n-1]);
int has_next = 1;
while(has_next) {
// look for the smallest index to increment/decrement
has_next = 0;
for(int j = n-1; j >= 0; --j) {
if(p[j] > 0) { // currently moving forward..
if(c[j] < s[j]) {
c[j] += 1;
has_next = 1;
printf("%d\n", j);
}
}
else { // ..moving backward
if(c[j] > 0) {
c[j] -= 1;
has_next = 1;
printf("%d\n", -j);
}
}
if(has_next) {
for(int q = n-1; q > j; --q) {
p[q] = -1 * p[q]; // cascade
}
break;
}
}
}
}
When compared to the above python (where the yielding of the elements of the Cartesian product is suppressed, and only the elements of the sequence are yielded, so that the output is essentially the same, for a fair comparison), this C implementation seems to be about 15 times as fast, asymptotically.
Again this C code could be highly optimized (the irony that python code is so C-like being well-noted), for example, this parity array could stored in a single int type, performing bit shift >> operations, etc., so I bet that even a 30 or 40x speedup could be achieved.

PowerSum time out test case

Please need your help, I got one failed test case due to time out if anyone can help me to improve the time taken by code to be executed. This problem is from HackerRank website if anyone needs more explanation I will refer the link of the problem in the comments below
from itertools import combinations
def powerSum(X, N,n=1,poss=[]):
if(n**N <= X):
poss.append(n)
n+=1
rslt = powerSum(X,N,n,poss)
else:
tmp=[]
for _ in range(len(poss)):
oc=combinations(poss,_+1)
for x in oc:
ok = sum([num**N for num in x])
if(ok == X):
tmp.append(ok)
return len(tmp)
return rslt

I am not good in python, but I hope below java code can be easily understood, This is a indirectly a variation of subset sum problem which is a dynamic programming problem where you have to find no. of ways to get a given particular sum given an array of values,so basically before applying subset problem, I have made a list of number which can be used in making the required sum by stopping at that number whose kth power exceed the x because starting from that natural number, further natural number are going to have much larger kth power value so no need of keeping them in our list so break there then it is just a dynamic programming problem as mentioned above where our list has value of kth power of valid natural number and we have to find the different way to get the sum x using those kth power values.
below is the code for more clear understanding
import java.util.*;
public class Main {
public static int find_it(int x , int n , List<Integer> a , int [][] dp){
for(int i = 0; i < n; ++i){
dp[i][0] = 1;
}
for(int i = 1; i <= n; ++i){
for(int j = 1; j <= x; ++j){
dp[i][j] += dp[i - 1][j];
if(j - a.get(i - 1) >= 0){
dp[i][j] += dp[i - 1][j - a.get(i - 1)];
}
}
}
return dp[n][x];
}
public static void main(String [] args){
Scanner input = new Scanner(System.in);
int x = input.nextInt() , k = input.nextInt();
List<Integer> a = new ArrayList<>();
for(int i = 1; ; ++i){
double value = Math.pow(i , k);
if(value > x){
break;
}
a.add((int)value);
}
int n = a.size();
int [][]dp = new int[n + 1][x + 1];
int answer = find_it(x , n , a , dp);
System.out.println(answer);
input.close();
}
}

I'm trying to understand how to print all the possible combinations of a array

i = start;
while(i <= end and end - i + 1 >= r - index):
data[index] = arr[i];
combinationUtil(arr, data, i + 1,
end, index + 1, r);
i += 1;
I'm having a hard time trying to understand why, "end - i + 1 >= r - index" this condition is needed, I've tried running the code, with and without, it produced the same output, I want to know what is the edge case that causes this condition to return False.
The full code is available here.

Try to group the variables into pieces that are easier to understand e.g.
int values_left_to_print = r - index; // (size of combination to be printed) - (current index into data)
int values_left_in_array = end - i + 1; // number of values left until the end of given arr
Now we can interpret it like this:
for (int i = start; i <= end && (values_left_in_array >= values_left_to_print); i++)
{
so if i is near the end of the given array and there are not enough values left to print a full combination, then the loop (and function) will stop. Let's look at an example:
Given
arr = {1,2,3,4}
n = 4; // size of arr
r = 3; // size of combination
The top level function will start to form a combination with 1 and then with 2 resulting in (1,2,3), (1,2,4), (1,3,4)
It will not try 3 and 4, because (values_left_in_array < values_left_to_print).
If the condition was not there, then the function would try 3 and 4, but the values in the sequence only ever increase in index from left-to-right in the given array, so the combination will end because i will reach end before being able to find 3 values.

Longest palindromic substring top down dynamic programming

Here is the algorithm for finding longest palindromic substring given a string s using bottom-up dynamic programming. So the algorithm explores all possible length j substring and checks whether it is a valid palindrome for j in 1 to n. The resulting time and space complexity is O(n^2).
def longestPalindrome(s):
n = len(s)
if n < 2:
return s
P = [[False for _ in range(n)] for _ in range(n)]
longest = s[0]
# j is the length of palindrome
for j in range(1, n+1):
for i in range(n-j+1):
# if length is less than 3, checking s[i] == s[i+j-1] is sufficient
P[i][i+j-1] = s[i] == s[i+j-1] and (j < 3 or P[i+1][i+j-2])
if P[i][i+j-1] and j > len(longest):
longest = s[i:i+j]
return longest
I am trying to implement the same algorithm in top-down approach with memoization.
Question:
Is it possible to convert this algorithm to top-down approach?
There are many questions about longest palindromic substring, but they are mostly using this bottom-up approach. The answer in https://stackoverflow.com/a/29959104/6217326 seems to be the closest to what I have in mind. But the answer seems to be using different algorithm from this one (and much slower).

Here is my solution recursively:
Start with i = 0, j = max length
if(i,j) is palindrome: then max substring length is j-1.
else do recursion with (i+1,j) and (i, j-1) and take the Max between these two.
Code will explain more.
The code is in Java, but I hope it will give the idea how to implement it. #zcadqe wanted the idea regarding how to implement in Top-down approach. I gave the idea and as a bonus also giving the code of java for better understanding. Anyone who knows python can easily convert the code!
public class LongestPalindromeSubstringWithSubStr {
static String str;
static int maxLen;
static int startLen;
static int endLen;
static int dp[][];// 0: not calculaed. 1: from index i to j is palindrome
static boolean isPal(int i, int j) {
if (dp[i][j] != 0) {
System.out.println("Res found for i:" + i + " j: " + j);
return (dp[i][j] == 1);
}
if (i == j) {
dp[i][j] = 1;
return true;
}
if (i + 1 == j) {// len 2
if (str.charAt(i) == str.charAt(j)) {
dp[i][j] = 1;
return true;
}
dp[i][j] = -1;
return false;
}
if (str.charAt(i) == str.charAt(j)) {
boolean res = isPal(i + 1, j - 1);
dp[i][j] = (res) ? 1 : 0;
return res;
}
dp[i][j] = 0;
return false;
}
// update if whole string from i to j is palindrome
static void longestPalCalc(int i, int j) {
if (isPal(i, j)) {
if (j - i + 1 > maxLen) {// update res
maxLen = j - i + 1;
startLen = i;
endLen = j;
}
} else {
longestPalCalc(i + 1, j);
longestPalCalc(i, j - 1);
}
}
public static void main(String[] args) {
str = "abadbbda";
dp = new int[str.length()][str.length()];
longestPalCalc(0, str.length() - 1);
System.out.println("Longest: " + maxLen);
System.out.println(str.subSequence(startLen, endLen + 1));
}
}

the problem with top down approach here is that it's hard to implement topological order . You cant run 2 for loops and use memoization with it, as this Topological order (2 for loops) gives substrings but it isn't the right T.O for palindrome as palindrome of 3 digit requires info about it's inside palindrome always(of 1 digit in this case).to know if a _ _ a is palindrome or not you must know whether _ _ is palindrome or not. Thus the Topo order you require is : x,x,xx,xx,xx,xxx,xxx,xxxx,xxxxx substrings of increasing length.
I'll post Top Down approach when I code or get one.

I tried to code Junaed's java code to Python and it's running quite well on Leetcode but is getting Memory Limit Exceeded on one of the test cases. See if we can somehow modify this further to get a better result or if I missed something in it, please do correct me.
def longestPalindrome(self, s: str) -> str:
#lru_cache(maxsize=None)
def dp(i,j):
if i==j:
return True
if i+1==j:
if s[i]==s[j]:
return True
return False
if s[i]==s[j]:
return dp(i+1,j-1)
return False
self.maxlen=0
#lru_cache(maxsize=None)
def dp2(i,j):
if dp(i,j):
if (j-i+1 > self.maxlen):
self.maxlen=j-i+1
self.ans=s[i:j+1]
else:
dp2(i+1,j)
dp2(i,j-1)
self.ans=""
i=0
j=len(s)-1
dp2(i,j)
return self.ans

This problem can be solved by adding memorization to the brute force approach,
We need to generate each substring this will take O(n^2) time, and
we need to check whether the generated substring is a palindrome, this will take an additional O(n),
in total it will be an O(n^3) time complexity.
Now, adding and storing the states that we already encountered to speed up the process, the time complexity can be reduced by O(n). So the total time complexity will be O(n^2)
here's the solution:
class Solution:
def longestPalindrome(self, s: str) -> str:
memo = {}
def isPalindrome(left,right):
state = (left, right)
if state in memo: return memo[state]
if left >= right: return True
if s[left] != s[right]: return False
memo[state] = isPalindrome(left+1, right-1)
return memo[state]
N = len(s)
result = ''
for i in range(N):
for j in range(i,N):
if (j-i+1) > len(result) and isPalindrome(i,j):
result = s[i:j+1]
return result

#include<iostream>
#include<string>
#include<vector>
using namespace std;
bool isPalindrome(string str, int startIdx, int stopIdx, vector<vector<int>>& T) {
const int i = startIdx;
const int j = stopIdx - 1;
if (i == (j + 1)) {
return true;
}
if (i >= j) {
return false;
}
if (T[i][j] == -1) {
if (str[i] == str[j]) {
T[i][j] = isPalindrome(str, startIdx + 1, stopIdx - 1, T);
}
else {
T[i][j] = 0;
}
}
return (T[i][j] == 1);
}
string getLongestStr(string str, int startIdx, int stopIdx, vector<vector<int>>& T) {
if (isPalindrome(str, startIdx, stopIdx, T)) {
return str.substr(startIdx, (stopIdx - startIdx));
}
else {
string str1 = getLongestStr(str, startIdx + 1, stopIdx, T);
string str2 = getLongestStr(str, startIdx, stopIdx - 1, T);
return str1.size() > str2.size() ? str1 : str2;
}
return "";
}
string getLongestStr(string str) {
const int N = str.size();
vector<vector<int>> T(N, vector<int>(N, -1));
return getLongestStr(str, 0, N, T);
}
int main() {
string str = "forgeeksskeegfor";
//string str = "Geeks";
cout << getLongestStr(str) << endl;
return 0;
}

Find the unique element in an unordered array consisting of duplicates

For example, if L = [1,4,2,6,4,3,2,6,3], then we want 1 as the unique element. Here's pseudocode of what I had in mind:
initialize a dictionary to store number of occurrences of each element: ~O(n),
look through the dictionary to find the element whose value is 1: ~O(n)
This ensures that the total time complexity then stay to be O(n). Does this seem like the right idea?
Also, if the array was sorted, say for example, how would the time complexity change? I'm thinking it would be some variation of binary search which would reduce it to O(log n).

You can use collections.Counter
from collections import Counter
uniques = [k for k, cnt in Counter(L).items() if cnt == 1]
Complexity will always be O(n). You only ever need to traverse the list once (which is what Counter is doing). Sorting doesn't matter, since dictionary assignment is always O(1).

There is a very simple-looking solution that is O(n): XOR elements of your sequence together using the ^ operator. The end value of the variable will be the value of the unique number.
The proof is simple: XOR-ing a number with itself yields zero, so since each number except one contains its own duplicate, the net result of XOR-ing them all would be zero. XOR-ing the unique number with zero yields the number itself.

Your outlined algorithm is basically correct, and it's what the Counter-based solution by #BrendanAbel does. I encourage you to implement the algorithm yourself without Counter as a good exercise.
You can't beat O(n) even if the array is sorted (unless the array is sorted by the number of occurrences!). The unique element could be anywhere in the array, and until you find it, you can't narrow down the search space (unlike binary search, where you can eliminate half of the remaining possibilities with each test).

In the general case, where duplicates can be present any number of times, I don't think you can reduce the complexity below O(N), but for the special case outlined in dasblinkenlight's answer, one can do better.
If the array is already sorted and if duplicates are present an even number of times as is the case in the simple example shown, you can find the unique element in O(log N) time with a binary search. You will search for the position where a[2*n] != a[2*n+1]:
size_t find_unique_index(type *array, size_t size) {
size_t a = 0, b = size / 2;
while (a < b) {
size_t m = (a + b) / 2;
if (array[2 * m] == array[2 * m + 1]) {
/* the unique element is the the right half */
a = m + 1;
} else {
b = m;
}
}
return array[2 * m];
}

You can use variation of binary search if you have array is already sorted. It will reduce your cost to O(lg N). You just have to search left and right appropriate position. Here is the C/C++ implementation of your problem.(I am assuming array is already sorted)
#include<stdio.h>
#include<stdlib.h>
// Input: Indices Range [l ... r)
// Invariant: A[l] <= key and A[r] > key
int GetRightPosition(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r - l)/2;
if( A[m] <= key )
l = m;
else
r = m;
}
return l;
}
// Input: Indices Range (l ... r]
// Invariant: A[r] >= key and A[l] > key
int GetLeftPosition(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r - l)/2;
if( A[m] >= key )
r = m;
else
l = m;
}
return r;
}
int CountOccurances(int A[], int size, int key)
{
// Observe boundary conditions
int left = GetLeftPosition(A, 0, size, key);
int right = GetRightPosition(A, 0, size, key);
return (A[left] == key && key == A[right])?
(right - left + 1) : 0;
}
int main() {
int arr[] = {1,1,1,2,2,2,3};
printf("%d",CountOccurances(arr,7,2));
return 0;
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

find Minimum Window Substring - python

Related

Cartesian product in Gray code order : including affected set in this order?

PowerSum time out test case

I'm trying to understand how to print all the possible combinations of a array

Longest palindromic substring top down dynamic programming

Find the unique element in an unordered array consisting of duplicates

Categories

Resources