Using greedy algorithm within two lists in Python - python

We observe a particular data sample explained by an int value n and two lists A and B, where the two lists contain integer element or elements ranging from 1 to n, and the elements in each list aren't repeated. (There could be the same element in both lists, however.)
n represents the size of the observed sample.
Elements in A represent the numbers that are 'taken out' from the sample. Hence, if n=5 and A=[2,3], the size of our resulting sample would be 3.
Elements in B represent the numbers that are 'put back into' the sample. The maximum size of the resulting sample cannot exceed n.
However, the elements in B can only be put back in if and only if there is an element in A that is either equal to the element in B, or one less or greater than the element in B. For example, if n=5, A=[2,3], B=[4], the size of our sample would be 4, as there exists an element in A that is one less than the element in B.
Finally, the elements in B are only considered once if they are 'put back in'. If n=5, A=[2,3,5], B=[3,4], even though the elements in B satisfy the condition twice each, the size of the resulting sample would still be 4.
Some of the test cases are given:
n A B return
5 [2, 4] [1, 3, 5] 5
5 [2, 4] [3] 4
3 [3] [1] 2
I'm aware that this is a type of a greedy algorithm (which I am not super familiar with), but I also tried the following:
def solution(n, A, B):
count = n - len(A)
for i in range(len(B)):
if B[i]-1 in A:
count += 1
elif B[i]+1 in A:
count += 1
elif B[i] in A:
count += 1
else:
count += 0
if n > count:
answer = count
else:
answer = n
return answer
While this seemingly works, it doesn't take into account that the elements in B cannot be considered once they are put back in already. Is there any edit I can make to my code, and how would this problem be optimally solved?

I guess the key was to use set()in order to retrieve the set without any overlapping elements first, and then start removing the elements that are gone over (which is done similarly to my initial code).
def solution(n, A, B):
B_uniq = set(B)-set(A)
A_uniq = set(A)-set(B)
for i in B_uniq:
if i-1 in A_uniq:
A_uniq.remove(i-1)
elif i+1 in A_uniq:
A_uniq.remove(i+1)
return n-len(A_uniq)

Related

Function that returns the length of the longest run of repetition in a given list

I'm trying to write a function that returns the length of the longest run of repetition in a given list
Here is my code:
def longest_repetition(a):
longest = 0
j = 0
run2 = 0
while j <= len(a)-1:
for i in a:
run = a.count(a[j] == i)
if run == 1:
run2 += 1
if run2 > longest:
longest = run2
j += 1
run2 = 0
return longest
print(longest_repetition([4,1,2,4,7,9,4]))
print(longest_repetition([5,3,5,6,9,4,4,4,4]))
3
0
The first test function works fine, but the second test function is not counting at all and I'm not sure why. Any insight is much appreciated
Just noticed that the question I was given and the expected results are not consistent. So what I'm basically trying to do is find the most repeated element in a list and the output would be the number of times it is repeated. That said, the output for the second test function should be 4 because the element '4' is repeated four times (elements are not required to be in one run as implied in my original question)
First of all, let's check if you were consistent with your question (function that returns the length of the longest run of repetition):
e.g.:
a = [4,1,2,4,7,9,4]
b = [5,3,5,6,9,4,4,4,4]
(assuming, you are only checking single position, e.g. c = [1,2,3,1,2,3] could have one repetition of sequence 1,2,3 - i am assuming that is not your goal)
So:
for a, there is no repetitions of same value, therefore length equals 0
for b, you have one, quadruple repetition of 4, therefore length equals 4
First, your max_amount_of_repetitions=0 and current_repetitions_run=0' So, what you need to do to detect repetition is simply check if value of n-1'th and n'th element is same. If so, you increment current_repetitions_run', else, you reset current_repetitions_run=0.
Last step is check if your current run is longest of all:
max_amount_of_repetitions= max(max_amount_of_repetitions, current_repetitions_run)
to surely get both n-1 and n within your list range, I'd simply start iteration from second element. That way, n-1 is first element.
for n in range(1,len(a)):
if a[n-1] == a[n]:
print("I am sure, you can figure out the rest")
you can use hash to calculate the frequency of the element and then get the max of frequencies.
using functional approach
from collections import Counter
def longest_repitition(array):
return max(Counter(array).values())
other way, without using Counter
def longest_repitition(array):
freq = {}
for val in array:
if val not in freq:
freq[val] = 0
freq[val] += 1
values = freq.values()
return max(values)

When making comparison between the elements in a list, how to efficiently iterate and improve the time complexity from O(n^2)?

I have a list where I would like to compare each element of the list with each other. I know we can do that using a nested loop but the time complexity is O(n^2). Is there any option to improve the time complexity and make the comparisons efficient?
For example:
I have a list where I would like to find the difference in digits among each element. Consider a list array=[100,110,010,011,100] where I am trying to find the difference in the digits among each integer. array[0] is same as array[4] (i.e 100 and 100), while array[0] has 1 digit that is different from array[1] (i.e 100 and 110) and array[0] has 3 digits that are different from array[3] (i.e 100 and 011). Assuming similar integers are defined as integers that have either identical or the difference in digits is just 1, I would like to return a list as output, where every element denotes the integers with similar digits (i.e difference in digits <=1).
For the input list array=[100,110,010,011,100], my expected output should be [2,3,2,1,2]. In the output list, the output[0] indicates that array[0] is similar to array[1] and array[4] (i.e similar to 100 , we have 2 other integers 110,100 in the list)
This is my code that works, though very inefficient O(n^2):
def diff(a,b):
difference= [i for i in range(len(a)) if a[i]!=b[i]]
return len(difference)
def find_similarity_int(array):
# write your code in Python 3.6
res=[0]*len(array)
string=[]
for n in array:
string.append(str(n))
for i in range(0,len(string)):
for j in range(i+1,len(string)):
count=diff(string[i],string[j])
if(count<=1):
res[i]=res[i]+1
res[j]=res[j]+1
return res
input_list=['100','110','010','011','100']
output=find_similarity_int(input_list)
print("The similarity metrics for the given list is : ",output)
Output:
The similarity metrics for the given list is : [2, 3, 2, 1, 2]
Could anyone please suggest an efficient way to make the comparison, preferably with just 1 loop? Thanks!
If the values are binary digits only, you can get a O(nxm) solution (where m is the width of the values) using a multiset (Counter from collections). With the count of values in the multiset, add the counts of items that correspond to exactly one bit change in each number (plus the number of duplicates):
from collections import Counter
def simCount(L):
counts = Counter(L) # multiset of distinct values / count
result = []
for n in L:
r = counts[n]-1 # duplicates
for i,b in enumerate(n): # 1 bit changes
r += counts[n[:i]+"01"[b=="0"]+n[i+1:]] # count others
result.append(r) # sum of similars
return result
Output:
A = ['100','110','010','011','100']
print(simCount(A)) # [2, 3, 2, 1, 2]
To avoid the string manipulations on every item, you can convert them to integers and use bitwise operators to make the 1-bit changes:
from collections import Counter
def simCount(L):
bits = [1<<i for i in range(len(L[0]))] # bit masks
L = [int(n,2) for n in L] # numeric values
counts = Counter(L) # multiset n:count
result = []
for n in L:
result.append(counts[n]-1) # duplicates
for b in bits: # 1 bit changes
result[-1] += counts[b^n] # sum similars
return result
A = ['100','110','010','011','100']
print(simCount(A)) # [2, 3, 2, 1, 2]

Reducing nested-loops of a python question on array

for _ in range(int(input())):
num=int(input())
a=list(map(int,input().split()))[:num]
sum=0
for i in range(len(a)):
j=a[i]
count=0
key=0
for k in range(len(a)):
if j==a[k]:
key=k
sum+=abs(key-i)
print(sum)
Given an integer array. The task is to calculate the sum of absolute difference of indices of first and last occurrence for every integer that is present in the array.
Required to calculate the sum of the answer for every such that occurs in the array.
One input:
1 2 3 3 2
Sample Output:
4
Explanation: The elements which occur in the array are 1,2,3.
it has only occurred once so the answer for 1 is 0.
it has two occurrences at 2 and 5 so |5-2|=3
it has two occurrences at 3 and 4 so |4-3|=1.
So total sum=0+3+1=4.
p.s: The first loop is for test cases.
Pleae suggest me to reduce time-complexity.
intially you can create a dictiory of unique number and append all the index of each number and then in second loop you can get the diffrence of each integar.
for _ in range(int(input())):
num=int(input())
a=list(map(int,input().split()))[:num]
sum=0
nums = {}
for i in range(len(a)):
j=a[i]
if j not in nums:
nums[j] = []
nums[j].append(i)
for key in nums:
sum += abs(nums[key][-1] - nums[key][0])
print(sum)
This answer uses the same reasoning as others: that is storing the indices as a list of values in a dictionary, but uses a few built-in functions and methods to reduce code and make it 'cleaner'.
In [11]: array = [1, 2, 3, 3, 2]
In [12]: indices = {}
In [13]: for ix, num in enumerate(array, start=1):
...: indices.setdefault(num, []).append(ix)
...:
In [14]: total = 0
In [15]: for num, ixes in indices.items():
...: if len(ixes) == 1:
...: continue
...: else:
...: total += abs(ixes[-1] - ixes[0])
...:
In [16]: total
Out[16]: 4
enumerate is a function that creates a sequence of tuple pairs from a given sequence like a list. The first element is an "index" (by default, set to 0, but you can start from any integer) and the second is the actual value from the original sequence.
setdefault is a method on a dictionary that returns the value for a given key, but if that key doesn't exist, inserts the key and sets as its default value the item passed in as the second parameter; in this case, it's an empty list to store the indices.
items is again a method on dictionaries with which one can loop through one key-value pair at a time.
Sounds like hackerrank. As usual, most of the provided information of the problem is irrelevant and can be forgotten as soon as seen.
You need:
the index when an element occures first: you add it as negative to the total and put it into the dictionary
if the value is already in the dict, update the position only
at the end you sum all values of the dict and add it to your summation
Code:
num = 5
a = list(map(int,"1 2 3 3 2".split()))
s = 0
d = {}
for idx, num in enumerate(a):
if num not in d:
s -= idx
d[num] = idx
print(s+sum(d.values()))
Output:
4
This uses a dictionary and strictly 2 loops - one over the n given numbers and one over u distinct numbers inside it if you ignore the int-conversion step wich already loops once over all of them.
Space:
the total sum and 1 index for each unique number which makes it worstcase O(n+1) in space (each number is also unique)
Time:
normally you get O(n + u) wich is less then the worst case (all numbers are unique) which would be O(2*n). 2 is only a factor - so it is essentially O(n) in time.
If you still have time-problems, using a collections.defaultdict(int) should be faster.
Solution 1 (dict):
One way to it is by using a dictionary for each item, saving all indices and get the difference of last and first occurence for each item. See below:
def get_sum(a):
d={i:[] for i in set(a)}
for i in range(len(a)):
d[a[i]].append(i)
sum=0
for i in d.values():
sum+=i[-1]-i[0]
return sum
Solution 2 (reversed list):
Another way is to use the reversed list and use list.index(item) for both original and reverse list and get the difference. See below:
def get_sum2(a):
m=a[::-1]
return sum(len(m)-m.index(i)-1-a.index(i) for i in set(a))
Output:
>>>get_sum([1,2,3,3,2])
4
>>>get_sum2([1,2,3,3,2])
4

Find number of subset that satisfy these two conditions?

def findNumber(N,A,B):
return count
Count is total number of subsets of array - [1,2,3,...,N] satisfying these Conditions:
1. All subsets should be contiguous.
2. No subset should contain A[i] and B[i] (order doesn't matter).
Example
N = 3, A=[2,1,3], B=[3,3,1]
All subsets = [1],[2],[3],[1,2],[2,3],[1,2,3]
Invalid subsets = [2,3] because A[0] and B[0] are in it. [1,2,3] because it contains A[1],B[1] and A[2],B[2]
so count will be 4.
I was able to figure out that total number of contiguous subsets will be N(N+1)/2 But i got stuck on how to satisfy condition 2.
I tried explaining it as best as i could please ask for clarification if needed.
EDIT
def findallvalid(n,a,b):
for w in range(1, n+1):
for i in range(n-w+1):
if not((a[0],b[0]) in (i+1,i+w+1)):
yield range(i+1,i+w+1)
I tried this code but i don't know how to iterate over all values of a and b without making this very slow. It's already to slow on n>10^2.
1<=n<=10^5
1<=len(A)<=10^6
I'm interested in how to approach this problem without generating subsets, for example I found total contiguous subsets will be n(n+1)/2 I just want to know how to know number of subsets to rule out.
That gave me an idea - indeed it is quite simple to compute the number of subsets ruled out by a single pair (A[i], B[i]). A little more challenging it is to do for multiple pairs, since the excluded subsets can overlap, so just subtracting a number for each pair won't work. What works is to have a set of the numbers or indexes of all N(N+1)/2 subsets, and remove the indexes of the excluded subsets from it; at the end, the cardinality of the reduced index set is the wanted number of remaining subsets.
def findNumber(N, A, B):
count = N*(N+1)//2
powerset = set(range(count)) # set of enumeration of possible intervals
for a, b in zip(A, B):
if a > b: a, b = b, a # let a be the lower number
# when a and b are in a subset, they form a sub-subset of length "span"
span = (b-a)+1
start = 0 # index where the invervals of current length w begin
for w in range(1, N+1): # for all interval lengths w
if span <= w: # if a and b can be in interval of length w
first = 0 if b <= w else b-w # index of first containment
last = a # index of last containment
# remove the intervals containing a and b from the enumeration
powerset -= set(range(start+first, start+last))
start += N+1-w # compute start index of next length w
return len(powerset) # number of remaining intervals
I did some small modification to your code,
this code is really slow, because it is iterating over the entire list which can be made of 10^5 items, and doing some nested operation which will make the complexity skyrocket up to 10^10, which is really slow
from collections import defaultdict
def findallvalid(N,A,B):
a_in = defaultdict(list)
b_in = defaultdict(list)
for idx, a in enumerate(A):
a_in[a].append(idx)
for idx, b in enumerate(B):
b_in[b].append(idx)
def diff_elem_index(subset):
indecies = []
for elem in subset:
indecies.extend(a_in[elem])
indecies.extend(b_in[elem])
return len(set(indecies)) == len(indecies)
for set_window in range(1, N+1):
for start_idx in range(N - set_window + 1):
sett = list(range(start_idx+1,start_idx + set_window + 1))
if diff_elem_index(sett):
yield sett
My closest assumption, since the code only needs to return the count of items
it can be solved mathematically
All contagious permutations of a N-size list is (N*(N+1))/2 + 1
after that you need to deduct the count of possible permutations that doesn't comply with the second condition, which can be figured out from list A and B
I think calculating the count excluded permutations from list A and B, will be much more efficient than going through all permutations from 1 to N.

Why is bitwise operator needed in this powerset generator?

I am currently following MITx's 6.00.2x, and we are asked to come up with a variant of power set generator of the one at the bottom.
But before I can work on the variant, I do not even understand what's going on with the given generator. Specifically:
What does (i >> j) % 2 == 1, and in fact the whole for j in range(N): block do? I understand that i >> j shifts
the binary of i by j, then returns the decimal representation of
that shifted binary number. But I have absolutely no clue why binary
is even needed in a powerset generator in the first place, let alone
the necessity of this conditional.
I understand that for any given set A a cardinality n, the
cardinality of its powerset is 2**n - because for every subset of A
every member is either in or not, and we repeat that for n times.
Is that what for i in range(2**N): is doing? i.e. going over 2**n subsets and either include or not include any given member of the set?
I tried running it with items=['apple,'banana','orange'] and items=[1,2,3], and both returned an empty list, which makes it all the more confusing.
def powerSet(items):
# generate all combinations of N items, items is a Python list
N = len(items)
# enumerate the 2**N possible combinations
for i in range(2**N):
combo = []
for j in range(N):
# test bit jth of integer i
if (i >> j) % 2 == 1:
combo.append(items[j])
return combo
So the algorithm here starts with an observation that any subset of {1,...,N} can be seen as a function f:{1,...,N}->{0,1}, i.e. the characteristic function. How it works? Well, if A is a subset of {1,...,N} then f is given by f(x)=0 if x not in A and f(x)=1 otherwise.
Now another observation is that any function f:{1,...,N}->{0,1} can be encoded as a binary number of N bits: j-th bit is 1 if f(j)=1 and 0 otherwise.
And so if we want to generate all subsets of {1,..,N} it is enough to generate all binary numbers of length N. So how many such numbers are there? Of course 2**N. And since every number between 0 and 2**N - 1 (-1 since we count from 0) uniquely corresponds to some subset of {1,...,N} then we can simply loop through them. That's where the for i in range(2**N): loop comes from.
But we don't simply deal with subsets of {1,...,N}, we actually have some unknown set/list items of length N. So if A is a subset of {1,...,N}, meaning A is a number between 0 and 2**N - 1 then how do we convert it to a subset of items? Well, again, we use the fact that the bit 1 corresponds to "is in set" and the bit 0 corresponds to "is not in set". And that's where (i >> j) % 2 == 1 comes from. It simply means "if j-th bit is 1" which in the consequence leads to "j-th element should be in the subset".
There's a slight issue with your code. You should maybe yield instead of return:
def powerSet(items):
N = len(items)
for i in range(2**N):
combo = [] # <-- this is our subset
for j in range(N):
if (i >> j) % 2 == 1:
combo.append(items[j])
yield combo # <-- here we yield it to caller
subsets = list(powerSet(["apple", "banana", "pear"]))
Here's an example of this binary encoding of subsets. Say you have a list
["apple", "banana", "pear"]
It has 3 elements so we are looking at numbers of (binary) length 3. So here are all possible subsets and their encodings in the "loop" order:
000 == []
001 == ["apple"]
010 == ["banana"]
011 == ["apple", "banana"]
100 == ["pear"]
101 == ["apple", "pear"]
110 == ["banana", "pear"]
111 == ["apple", "banana", "pear"]
Your code was basically creating new lists in every loop and not saving the previous results.
Here is the corrected code to get all combinations:
def powerSet(items):
# generate all combinations of N items, items is a Python list
N = len(items)
# This will store the complete set of combinations
outer_combo = []
# enumerate the 2**N possible combinations
for i in range(2**N):
# This will store the intermediate sets
inner_combo = []
for j in range(N):
# test bit jth of integer i
if (i >> j) % 2 == 1:
inner_combo.append(items[j])
# Uncomment below to understand each step
# print(inner_combo)
# Add the intermediate set to final result
outer_combo.append(inner_combo)
return outer_combo
print(powerSet([1,2,3]))
# Output : [[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]]
Now lets come to your points:
Basically you are generating all numbers from 0 to (2**N)-1. So, in our example [1, 2, 3], i has the values 0,1,2,3,4,5,6,7
The binary representation of these values is 000, 001, 010, 011, 100, 101, 110, 111 respectively
Using i>>j you are basically trying to shift all the 1's in each binary representation to the right most side.
Then using (i>>j)%2==1 you are checking if 1 even exists at all
The second loop for j in range(N): will help using in two ways. First here N not only stores the number of elements in list, but all the number of relevant bits to look up in the operation (i>>j)%2==1. This is because, internally the binary representation can have upto 64 bits, but the relevant bits here are the first N bits (remember the operation (2**N)-1 ?). Secondly, this will shift the bits N times to right to check how many 1's are actually there.
An example is something like this. For example, i=5 i.e. 101. Now j can have values 0, 1, 2. So, in first case when j=0, the operation (i>>j)%2==1 will return True since the bit at 0th position is 1. So, item[0], i.e. 1 is appended to intermediate combination, i.e. we have [1] till now. Now j=1 and the operation (i>>j)%2==1 will return False since the bit at 1st position is 0. So no element is added. Finally, when j=2, (i>>j)%2==1 will return True since the bit at 2nd position is 1. Hence item[2], i.e. 3 is added to the intermediate result, i.e. the set now becomes [1, 3].

Categories