Remove elements that appear more often than once from numpy array - python

The question is, how can I remove elements that appear more often than once in an array completely. Below you see an approach that is very slow when it comes to bigger arrays.
Any idea of doing this the numpy-way? Thanks in advance.
import numpy as np
count = 0
result = []
input = np.array([[1,1], [1,1], [2,3], [4,5], [1,1]]) # array with points [x, y]
# count appearance of elements with same x and y coordinate
# append to result if element appears just once
for i in input:
for j in input:
if (j[0] == i [0]) and (j[1] == i[1]):
count += 1
if count == 1:
result.append(i)
count = 0
print np.array(result)
UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION
Again to be clear: How can I remove elements appearing more than once concerning a certain attribute from an array/list ?? Here: list with elements of length 6, if first and second entry of every elements both appears more than once in the list, remove all concerning elements from list. Hope I'm not to confusing. Eumiro helped me a lot on this, but I don't manage to flatten the output list as it should be :(
import numpy as np
import collections
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
# here, from input there should be removed input[0], input[1] and input[4] because
# first and second entry appears more than once in the list, got it? :)
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2:])
outputDict = [list(k)+list(v) for k,v in d.iteritems() if len(v) == 1 ]
result = []
def flatten(x):
if isinstance(x, collections.Iterable):
return [a for i in x for a in flatten(i)]
else:
return [x]
# I took flatten(x) from http://stackoverflow.com/a/2158522/1132378
# And I need it, because output is a nested list :(
for i in outputDict:
result.append(flatten(i))
print np.array(result)
So, this works, but it's impracticable with big lists.
First I got
RuntimeError: maximum recursion depth exceeded in cmp
and after applying
sys.setrecursionlimit(10000)
I got
Segmentation fault
how could I implement Eumiros solution for big lists > 100000 elements?

np.array(list(set(map(tuple, input))))
returns
array([[4, 5],
[2, 3],
[1, 1]])
UPDATE 1: If you want to remove the [1, 1] too (because it appears more than once), you can do:
from collections import Counter
np.array([k for k, v in Counter(map(tuple, input)).iteritems() if v == 1])
returns
array([[4, 5],
[2, 3]])
UPDATE 2: with input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]:
input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2])
d is now:
{(1, 1): [2, 3, 7],
(2, 3): [4],
(4, 5): [5]}
so we want to take all key-value pairs, that have single values and re-create the arrays:
np.array([k+tuple(v) for k,v in d.iteritems() if len(v) == 1])
returns:
array([[4, 5, 5],
[2, 3, 4]])
UPDATE 3: For larger arrays, you can adapt my previous solution to:
import numpy as np
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a)
np.array([v for v in d.itervalues() if len(v) == 1])
returns:
array([[[456, 6, 5, 343, 435, 5]],
[[ 1, 3, 4, 5, 6, 7]],
[[ 3, 4, 6, 7, 7, 6]],
[[ 3, 3, 3, 3, 3, 3]]])

This is a corrected, faster version of Hooked's answer. count_unique counts the number of the number of occurrences for each unique key in keys.
import numpy as np
input = np.array([[1,1,3,5,6,6],
[1,1,4,4,5,6],
[1,3,4,5,6,7],
[3,4,6,7,7,6],
[1,1,4,6,88,7],
[3,3,3,3,3,3],
[456,6,5,343,435,5]])
def count_unique(keys):
"""Finds an index to each unique key (row) in keys and counts the number of
occurrences for each key"""
order = np.lexsort(keys.T)
keys = keys[order]
diff = np.ones(len(keys)+1, 'bool')
diff[1:-1] = (keys[1:] != keys[:-1]).any(-1)
count = np.where(diff)[0]
count = count[1:] - count[:-1]
ind = order[diff[1:]]
return ind, count
key = input[:, :2]
ind, count = count_unique(key)
print key[ind]
#[[ 1 1]
# [ 1 3]
# [ 3 3]
# [ 3 4]
# [456 6]]
print count
[3 1 1 1 1]
ind = ind[count == 1]
output = input[ind]
print output
#[[ 1 3 4 5 6 7]
# [ 3 3 3 3 3 3]
# [ 3 4 6 7 7 6]
# [456 6 5 343 435 5]]

Updated Solution:
From the comments below, the new solution is:
idx = argsort(A[:, 0:2], axis=0)[:,1]
kidx = where(sum(A[idx,:][:-1,0:2]!=A[idx,:][1:,0:2], axis=1)==0)[0]
kidx = unique(concatenate((kidx,kidx+1)))
for n in arange(0,A.shape[0],1):
if n not in kidx:
print A[idx,:][n]
> [1 3 4 5 6 7]
[3 3 3 3 3 3]
[3 4 6 7 7 6]
[456 6 5 343 435 5]
kidx is a index list of the elements you don't want. This preserves rows where the first two inner elements do not match any other inner element. Since everything is done with indexing, it should be fast(ish), though it requires a sort on the first two elements. Note that original row order is not preserved, though I don't think this is a problem.
Old Solution:
If I understand it correctly, you simply want to filter out the results of a list of lists where the first element of each inner list is equal to the second element.
With your input from your update A=[[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]], the following line removes A[0],A[1] and A[4]. A[5] is also removed since that seems to match your criteria.
[x for x in A if x[0]!=x[1]]
If you can use numpy, there is a really slick way of doing the above. Assume that A is an array, then
A[A[0,:] == A[1,:]]
Will pull out the same values. This is probably faster than the solution listed above if you want to loop over it.

Why not create another array to hold the output?
Iterate through your main list and for each i check if i is in your other array and if not append it.
This way, your new array will not contain more than one of each element

Related

Count how many permutations of list possible as long as it 'fits' into another list

I'm trying to find how many arrangements of a list are possible with each arrangement 'fitting' into another list (i.e. all elements of the arrangement have to be less than or equal to the corresponding element). For example, the list [1, 2, 3, 4] has to fit in the list [2, 4, 3, 4].
There are 8 possible arrangements in this case:
[1, 2, 3, 4]
[1, 4, 2, 3]
[1, 3, 2, 4]
[1, 4, 3, 2]
[2, 1, 3, 4]
[2, 4, 1, 3]
[2, 3, 1, 4]
[2, 4, 3, 1]
Because 3 and 4 cannot fit into the first slot of the list, all arrangements that start with 3 or 4 are cut out. Additionally, 4 cannot fit into the third slot, so any remaining arrangements with 4 in the third slot are removed.
This is my current code trying to brute-force the problem:
from itertools import permutations
x = [1, 2, 3, 4]
box = [2, 4, 3, 4] # this is the list we need to fit our arrangements into
counter = 0
for permutation in permutations(x):
foo = True
for i in range(len(permutation)):
if permutation[i] > box[i]:
foo = False
break
if foo:
counter += 1
print(counter)
It works, but because I'm generating all the possible permutations of the first list, it's very slow, but I just can't find an algorithm for it. I realize that it's a basically a math problem, but I'm bad at math.
If you sort the x in reverse, you can try to find all the spots each element can fit in the box one at a time.
In your example:
4 has 2 spots it can go
3 has 3 spots, but you have to account for already placing the "4",
so you have 3 - 1 = 2 available
2 has 4 spots, but you have to account for already placing two things
(the "4" and "3"), so you have 4 - 2 = 2 available
1 has 4 spots, but you have already placed 3... so 4 - 3 = 1
The product 2 * 2 * 2 * 1 is 8.
Here's one way you can do that:
import numpy as np
counter = 1
for i, val in enumerate(reversed(sorted(x))):
counter *= ( (val <= np.array(box)).sum() - i)
print(counter)
...or without numpy (and faster, actually):
for i, val in enumerate(reversed(sorted(x))):
counter *= ( sum( ( val <= boxval for boxval in box)) - i)
I've experimented a bit with timings and here's what I found:
Your original code
for permutation in permutations(x):
foo = True
for i in range(len(permutation)):
if permutation[i] > box[i]:
foo = False
break
if foo:
counter += 1
Took about 13569 ns per run
Filtering the permutation
for i in range(100):
res = len(list(filter(lambda perm: all([perm[i] <= box[i] for i in range(len(box))]), permutations(x))))
Took slightly longer at 16717 ns
Rick M
counter = 1
for i, val in enumerate(reversed(sorted(x))):
counter *= ((val <= np.array(box)).sum() - i)
Took even longer at 20146 ns
Recursive Listcomprehension
def findPossiblities(possibleValues, box):
return not box or sum([findPossiblities([rem for rem in possibleValues if rem != val], box[1:]) for val in [val for val in possibleValues if val <= box[0]]])
findPossiblities(x, box)
Even longer at 27052 ns.
As a conclusion, using itertools and filtering is probably the best option

Unexpected behavior of python code while solving a hackerranks problem called Lily's Homework

Link to the problem: https://www.hackerrank.com/challenges/lilys-homework/forum
Summary: We have to find the minimum no. of swaps required to convert an array into sorted array. It can be sorted in ascending or descending order. So, here is the array I want to sort:
arr = [3, 4, 2, 5, 1]
We sort it in ascending order we need 4 swaps, and 2 swaps when in descending order.
For descending: -Swap 5 and 3 and then swap 3 and 2
Now, I have written a python code to solve this test case. Here is the code:
arr = [3, 4, 2, 5, 1]
arr2 = arr[:]
count = 0; count2 = 0; n = len(arr)
registry = {}
for i in range(n):
registry[arr[i]] = i
sorted_arr = sorted(arr)
#######################first for loop starts#########################
#find no. of swap required when we sort arr is in ascending order.
for i in range(n-1):
if arr[i] != sorted_arr[i]:
index = registry[sorted_arr[i]]
registry[sorted_arr[i]],registry[arr[i]]= i, index
temp = arr[i]
arr[i],arr[index]=sorted_arr[i],temp
count = count + 1
###################first for loop ends#######################
# re-initalising registry and sorted_arr for descending problem.
registry = {}
for i in range(n):
registry[arr2[i]] = i
sorted_arr = sorted(arr2)
sorted_arr.reverse()
print(arr2) #unsorted array
print(registry) #dictionary which stores the index of the array arr2
print(sorted_arr) #array in descending order.
#find no. of swap required when array is in descending order.
for i in range(n-1):
print('For iteration i = %i' %i)
if arr2[i] != sorted_arr[i]:
print('\tTrue')
index = registry[sorted_arr[i]]
registry[sorted_arr[i]],registry[arr[i]]= i, index
temp = arr2[i]
arr2[i],arr2[index]=sorted_arr[i],temp
print('\t '+ str(arr2))
count2 = count2 + 1
else:
print('\tfalse')
print('\t '+ str(arr2))
print('######Result######')
print(arr)
print(count)
print(arr2)
print(count2)
Here's the problem:
When I run the code, the second for loop i.e. the for loop for descending gives wrong value of count which is 3. But, when I comment the first for loop, i.e. the for loop for ascending it gives correct value of count which is 2.
I want to know why for loop 2 changes output when for loop 1 is present.
The output I get when loop 1 is NOT commented.
arr2: [3, 4, 2, 5, 1]
Registry: {3: 0, 4: 1, 2: 2, 5: 3, 1: 4}
sorted_arr: [5, 4, 3, 2, 1]
For iteration i = 0
True
[5, 4, 2, 3, 1]
For iteration i = 1
false
[5, 4, 2, 3, 1]
For iteration i = 2
True
[2, 4, 3, 3, 1]
For iteration i = 3
True
[2, 4, 3, 2, 1]
######Result######
[1, 2, 3, 4, 5]
4
[2, 4, 3, 2, 1]
3
The error is in your second loop, where you have:
registry[sorted_arr[i]],registry[arr[i]]= i, index
This should be:
registry[sorted_arr[i]],registry[arr2[i]]= i, index
Generally, it is a bad idea to work with such arr and arr2 variables. Instead make two functions, and pass arr as argument to the function call. The function should then make a local copy of that array ( [:]) before mutating it. All other variables should be local to the function. That way the two algorithms use their own variable scope and there is no risk of "leaking" accidently a wrong variable into the other algorithm.

Generate random array of integers with a number of appearance of each integer

I need to create a random array of 6 integers between 1 and 5 in Python but I also have another data say a=[2 2 3 1 2] which can be considered as the capacity. It means 1 can occur no more than 2 times or 3 can occur no more than 3 times.
I need to set up a counter for each integer from 1 to 5 to make sure each integer is not generated by the random function more than a[i].
Here is the initial array I created in python but I need to find out how I can make sure about the condition I described above. For example, I don't need a solution like [2 1 5 4 5 4] where 4 is shown twice or [2 2 2 2 1 2].
solution = np.array([np.random.randint(1,6) for i in range(6)])
Even if I can add probability, that should work. Any help is appreciated on this.
You can create an pool of data that have the most counts and then pick from there:
import numpy as np
a = [2, 2, 3, 1, 2]
data = [i + 1 for i, e in enumerate(a) for _ in range(e)]
print(data)
result = np.random.choice(data, 6, replace=False)
print(result)
Output
[1, 1, 2, 2, 3, 3, 3, 4, 5, 5]
[1 3 2 2 3 1]
Note that data is array that has for each element the specified count, then we pick randomly from data this way we ensure that you won't have more elements that the specify count.
UPDATE
If you need that each number appears at least one time, you can start with a list of each of the numbers, sample from the rest and then shuffle:
import numpy as np
result = [1, 2, 3, 4, 5]
a = [1, 1, 2, 0, 1]
data = [i + 1 for i, e in enumerate(a) for _ in range(e)]
print(data)
result = result + np.random.choice(data, 1, replace=False).tolist()
np.random.shuffle(result)
print(result)
Output
[1, 2, 3, 3, 5]
[3, 4, 2, 5, 1, 2]
Notice that I subtract 1 from each of the original values of a, also the original 6 was change to 1 because you already have 5 numbers in the variable result.
You could test your count against a dictionary
import random
a = [2, 2, 3, 1, 2]
d = {idx: item for idx,item in enumerate(a, start = 1)}
l = []
while len(set(l) ^ set([*range(1, 6)])) > 0:
l = []
while len(l) != 6:
x = random.randint(1,5)
while l.count(x) == d[x]:
x = random.randint(1,5)
l.append(x)
print(l)

Largest Subset whose sum is less than equal to a given sum

A list is defined as follows: [1, 2, 3]
and the sub-lists of this are:
[1], [2], [3],
[1,2]
[1,3]
[2,3]
[1,2,3]
Given K for example 3 the task is to find the largest length of sublist with sum of elements is less than equal to k.
I am aware of itertools in python but it will result in segmentation fault for larger lists. Is there any other efficient algorithm to achieve this? Any help would be appreciated.
My code is as allows:
from itertools import combinations
def maxLength(a, k):
#print a,k
l= []
i = len(a)
while(i>=0):
lst= list(combinations(sorted(a),i))
for j in lst:
#rint list(j)
lst = list(j)
#print sum(lst)
sum1=0
sum1 = sum(lst)
if sum1<=k:
return len(lst)
i=i-1
You can use the dynamic programming solution that #Apy linked to. Here's a Python example:
def largest_subset(items, k):
res = 0
# We can form subset with value 0 from empty set,
# items[0], items[0...1], items[0...2]
arr = [[True] * (len(items) + 1)]
for i in range(1, k + 1):
# Subset with value i can't be formed from empty set
cur = [False] * (len(items) + 1)
for j, val in enumerate(items, 1):
# cur[j] is True if we can form a set with value of i from
# items[0...j-1]
# There are two possibilities
# - Set can be formed already without even considering item[j-1]
# - There is a subset with value i - val formed from items[0...j-2]
cur[j] = cur[j-1] or ((i >= val) and arr[i-val][j-1])
if cur[-1]:
# If subset with value of i can be formed store
# it as current result
res = i
arr.append(cur)
return res
ITEMS = [5, 4, 1]
for i in range(sum(ITEMS) + 1):
print('{} -> {}'.format(i, largest_subset(ITEMS, i)))
Output:
0 -> 0
1 -> 1
2 -> 1
3 -> 1
4 -> 4
5 -> 5
6 -> 6
7 -> 6
8 -> 6
9 -> 9
10 -> 10
In above arr[i][j] is True if set with value of i can be chosen from items[0...j-1]. Naturally arr[0] contains only True values since empty set can be chosen. Similarly for all the successive rows the first cell is False since there can't be empty set with non-zero value.
For rest of the cells there are two options:
If there already is a subset with value of i even without considering item[j-1] the value is True
If there is a subset with value of i - items[j - 1] then we can add item to it and have a subset with value of i.
As far as I can see (since you treat sub array as any items of the initial array) you can use greedy algorithm with O(N*log(N)) complexity (you have to sort the array):
1. Assign entire array to the sub array
2. If sum(sub array) <= k then stop and return sub array
3. Remove maximim item from the sub array
4. goto 2
Example
[1, 2, 3, 5, 10, 25]
k = 12
Solution
sub array = [1, 2, 3, 5, 10, 25], sum = 46 > 12, remove 25
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, remove 10
sub array = [1, 2, 3, 5], sum = 11 <= 12, stop and return
As an alternative you can start with an empty sub array and add up items from minimum to maximum while sum is less or equal then k:
sub array = [], sum = 0 <= 12, add 1
sub array = [1], sum = 1 <= 12, add 2
sub array = [1, 2], sum = 3 <= 12, add 3
sub array = [1, 2, 3], sum = 6 <= 12, add 5
sub array = [1, 2, 3, 5], sum = 11 <= 12, add 10
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, stop,
return prior one: [1, 2, 3, 5]
Look, for generating the power-set it takes O(2^n) time. It's pretty bad. You can instead use the dynamic programming approach.
Check in here for the algorithm.
http://www.geeksforgeeks.org/dynamic-programming-subset-sum-problem/
And yes, https://www.youtube.com/watch?v=s6FhG--P7z0 (Tushar explains everything well) :D
Assume everything is positive. (Handling negatives is a simple extension of this and is left to the reader as an exercise). There exists an O(n) algorithm for the described problem. Using the O(n) median select, we partition the array based on the median. We find the sum of the left side. If that is greater than k, then we cannot take all elements, we must thus recur on the left half to try to take a smaller set. Otherwise, we subtract the sum of the left half from k, then we recur on the right half to see how many more elements we can take.
Partitioning the array based on median select and recurring on only 1 of the halves yields a runtime of n+n/2 +n/4 +n/8.. which geometrically sums up to O(n).

Updating list values with new values read - Python [duplicate]

This question already has answers here:
How do i add two lists' elements into one list?
(4 answers)
Closed 9 years ago.
I was't really sure how to ask this. I have a list of 3 values initially set to zero. Then I read 3 values in at a time from the user and I want to update the 3 values in the list with the new ones I read.
cordlist = [0]*3
Input:
3 4 5
I want list to now look like:
[3, 4, 5]
Input:
2 3 -6
List should now be
[5, 7, -1]
How do I go about accomplishing this? This is what I have:
cordlist += ([int(g) for g in raw_input().split()] for i in xrange(n))
but that just adds a new list, and doesn't really update the values in the previous list
In [17]: import numpy as np
In [18]: lst=np.array([0]*3)
In [19]: lst+=np.array([int(g) for g in raw_input().split()])
3 4 5
In [20]: lst
Out[20]: array([3, 4, 5])
In [21]: lst+=np.array([int(g) for g in raw_input().split()])
2 3 -6
In [22]: lst
Out[22]: array([ 5, 7, -1])
I would do something like this:
cordlist = [0, 0, 0]
for i in xrange(n):
cordlist = map(sum, zip(cordlist, map(int, raw_input().split())))
Breakdown:
map(int, raw_input().split()) is equivalent to [int(i) for i in raw_input().split()]
zip basically takes a number a lists, and returns a list of tuples containing the elements that are in the same index. See the docs for more information.
map, as I explained earlier, applies a function to each of the elements in an iterable, and returns a list. See the docs for more information.
cordlist = [v1+int(v2) for v1, v2 in zip(cordlist, raw_input().split())]
tested like that:
l1 = [1,2,3]
l2 = [2,3,4]
print [v1+v2 for v1, v2 in zip(l1, l2)]
result: [3, 5, 7]
I would go that way using itertools.zip_longest:
from itertools import zip_longest
def add_lists(l1, l2):
return [int(i)+int(j) for i, j in zip_longest(l1, l2, fillvalue=0)]
result = []
while True:
l = input().split()
print('result = ', add_lists(result, l))
Output:
>>> 1 2 3
result = [1, 2, 3]
>>> 3 4 5
result = [4, 6, 8]
More compact version of #namit's numpy solution
>>> import numpy as np
>>> lst = np.zeros(3, dtype=int)
>>> for i in range(2):
lst += np.fromstring(raw_input(), dtype=int, sep=' ')
3 4 5
2 3 -6
>>> lst
array([ 5, 7, -1])

Categories