Extracting items from array using variable sliding window in Python [duplicate] - python

This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed 6 days ago.
I have an array of digits: array = [1.0, 1.0, 2.0, 4.0, 1.0]
I would like to create a function that extracts sequences of digits from the input array and appends to one of two lists depending on defined conditions being met
The first condition f specifies the number of places to look ahead from index i and check if a valid index exists. If true, append array[i] to list1. If false, append to list2.
I have implemented it as follows:
def somefunc(array, f):
list1, list2 = [], []
for i in range(len(array)):
if i + f < len(array):
list1.append(array[i])
else:
list2.append(array[i])
return list1, list2
This functions correctly as follows:
somefunc(array,f=1) returns ([1.0, 1.0, 2.0, 4.0], [1.0])
somefunc(array,f=2) returns ([1.0, 1.0, 2.0], [4.0, 1.0])
somefunc(array,f=3) returns ([1.0, 1.0], [2.0, 4.0, 1.0])
However, I would like to add a second condition to this function, b, that specifies the window length for previous digits to be summed and then appended to the lists according to the f condition above.
The logic is this:
iterate through array and at each index i check if i+f is a valid index.
If true, append the sum of the previous b digits to list1
If false, append the sum of the previous b digits to list2
If the length of window b isn't possible (i.e. b=2 when i=0) continue to next index.
With both f and b conditions implemented. I would expect:
somefunc(array,f=1, b=1) returns ([1.0, 1.0, 2.0, 4.0], [1.0])
somefunc(array,f=1, b=2) returns ([2.0, 3.0, 6.0], [5.0])
somefunc(array,f=2, b=2) returns ([2.0, 3.0], [6.0, 5.0])
My first challenge is implementing the b condition. I cannot seem to figure out how. see edit below
I also wonder if there is a more efficient approach than the iterative method I have begun?
Given only the f condition, I know that the following functions correctly and would bypass the need for iteration:
def somefunc(array, f):
return array[:-f], array[-f:]
However, I again don't know how to implement the b condition in this approach.
Edit
I have managed an iterative solution which implements the f and b conditions:
def somefunc(array, f, b):
list1, list2 = [], []
for i in range(len(array)):
if i >= (b-1):
if i + f < len(array):
list1.append(sum(array[i+1-b: i+1]))
else:
list2.append(sum(array[i+1-b: i+1]))
return list1, list2
However, the indexing syntax feels horrible and I so I am certain there must be a more elegant solution. Also, anything with improved runtime would really be preferable.

I can see two minor improvements you could implement in your code:
def somefunc(array, f, b):
list1, list2 = [], []
size = len(array) # Will only measure the length of the array once
for i in range(b-1, size): # By starting from b-1 you can remove an if statement
if i + f < size: # We use the size here
list1.append(sum(array[i+1-b: i+1]))
else:
list2.append(sum(array[i+1-b: i+1]))
return list1, list2
Edit:
An ever better solution would be to add the new digit and substract the last at each iteration. This way you don't need to redo the whole sum each iteration:
def somefunc(array, f, b):
list1, list2 = [], []
value = 0
size = len(array)
for i in range(b-1, size):
if value != 0:
value = value - array[i-b] + array[i] # Get the last value, add the value at index i and remove the value at index i-b
else:
value = sum(array[i+1-b: i+1])
if i + f < size:
list1.append(value)
else:
list2.append(value)
return list1, list2

Related

How to pick same values in a list if the list contain floating numbers

In the following code I want to check how many unique values are in the list and this can be done in for loop. After knowing the number of unique values I want to see how many times a single unique values appear in a and then I want to count their number. Can someone please guide me how to do that. List contains floating points. What if I convert it in numpy array and then find same values.
`a= [1.0, 1.0, 1.0, 1.0, 1.5, 1.5, 1.5, 3.0, 3.0]
list = []
for i in a:
if i not in list:
list.append(i)
print(list)
for j in range(len(list))
g= np.argwhere(a==list[j])
print(g)`
You can use np.unique to get it done
np.unique(np.array(a),return_counts=True)
You can also do it using counters from collections
from collections import Counter
Var=dict(Counter(a))
print(Var)
The primitive way is to use loops
[[x,a.count(x)] for x in set(a)]
If you are not familiar with list comprehensions, this is its explaination
ls=[]
for x in set(a):
ls.append([x,a.count(x)])
print(ls)
If you want it using if else,
counter = dict()
for k in a:
if not k in counter:
counter[k] = 1
else:
counter[k] += 1
print(counter)

How to find the repeating arrays in a list

I have a list of around 131000 arrays, each of length 300. I am using python
I want to check which of the arrays are repeating in this list. I am trying this by comparing each array with others. like :
Import numpy as np
wordEmbeddings = [[0.8,0.4....upto 300 elements]....upto 131000 arrays]
count = 0
for i in range(0,len(wordEmbeddings)):
for j in range(0,len(wordEmbeddings)):
if i != j:
if np.array_equal(wordEmbeddings[i],wordEmbeddings[j]):
count += 1
this is running very slowly, It might take hours to finish, how can I do this efficiently ?
You can use collections.Counter to count the frequency of each sub list
>>> from collections import Counter
>>> Counter(list(map(tuple, wordEmbeddings)))
We need to cast the sublist to tuples since list is unhashable i.e. it cannot be used as a key in dict.
This will give you result like this:
>>> Counter({(...4, 5, 6...): 1, (...1, 2, 3...): 1})
The key of Counter object here is the list and value is the number of times this list occurs. Next you can filter the resulting Counter object to only yield elements where value is > 1:
>>> items = Counter(list(map(tuple, wordEmbeddings)))
>>> list(filter(lambda x: items[x] > 1,items))
Timeit results:
$ python -m timeit -s "a = [range(300) for _ in range(131000)]" -s "from collections import Counter" "Counter(list(map(tuple, a)))"
10 loops, best of 3: 1.18 sec per loop
You can remove duplicate comparisons by using
for i in range(0,len(wordEmbeddings)):
for j in range(i,len(wordEmbeddings)):
You could look in to pypy for general purpose speed ups.
It might also be worth looking into hashing the arrays somehow.
Here's a question on the speeding up np array comparison. Do the order of the elements matter to you?
You can use set and tuple to find duplicated arrays inside another array. Create a new list contains tuples, we use tuples because lists are unhashable type. And then filter new list with using set.
tuple = list(map(tuple, wordEmbeddings))
duplications = set([t for t in tuple if tuple.count(t) > 1])
print(duplications)
maybe you can reduce the initial list to unique hashes, or non-unique sums,
and go over the hashes first - which may be a faster way to compare elements
I suggest you first sort the list (might also be helpful for further processing) and then compare. The advantage is that you only need to compare every array element to the previous one:
import numpy as np
from functools import cmp_to_key
wordEmbeddings = [[0.8, 0.4, 0.3, 0.2], [0.2,0.3,0.7], [0.8, 0.4, 0.3, 0.2], [ 1.0, 3.0, 4.0, 5.0]]
def smaller (x,y):
for i in range(min(len(x), len(y))):
if x[i] < y[i]:
return 1
elif y[i] < x[i]:
return -1
if len(x) > len(y):
return 1
else:
return -1
wordEmbeddings = sorted(wordEmbeddings, key=cmp_to_key(smaller))
print(wordEmbeddings)
# output: [[1.0, 3.0, 4.0, 5.0], [0.8, 0.4, 0.3, 0.2], [0.8, 0.4, 0.3, 0.2], [0.2, 0.3, 0.7]]
count = 0
for i in range(1, len(wordEmbeddings)):
if (np.array_equal(wordEmbeddings[i], wordEmbeddings[i-1])):
count += 1
print(count)
# output: 1
If N is the length of word embedding and n is the length of the inner array, then your approach was to do O(N*N*n) comparisons. When reducing the comparisons as in con--'s answer, then you still have O(N*N*n/2) comparisons.
Sorting will take O(N*log(N)*n) time and the subsequent step of counting only takes O(N*n) time which all in all is shorter than O(N*N*n/2)

Python: How to generate all combinations of lists of tuples without repeating contents of the tuple

I'm working with a bit of a riddle:
Given a dictionary with tuples for keys: dictionary = {(p,q):n}, I need to generate a list of new dictionaries of every combination such that neither p nor q repeat within the new dictionary. And during the generation of this list of dictionaries, or after, pick one of the dictionaries as the desired one based on a calculation using the dictionary values.
example of what I mean (but much smaller):
dictionary = {(1,1): 1.0, (1,2): 2.0, (1,3): 2.5, (1,4): 5.0, (2,1): 3.5, (2,2): 6.0, (2,3): 4.0, (2,4): 1.0}
becomes
listofdictionaries = [{(1,1): 1.0, (2,2): 6.0}, {(1,1): 1.0, (2,3): 4.0}, (1,1): 1.0, (2,4): 1.0}, {(1,2): 2.0, (2,1): 3.5}, {(1,2): 2.0, (2,3): 4.0}, etc.
a dictionary like: {(1,1): 1.0, (2,1): 3.5} is not allowable because q repeats.
Now my sob story: I'm brand new to coding... but I've been trying to write this script to analyze some of my data. But I also think it's an interesting algorithm riddle. I wrote something that works with very small dictionaries but when I input a large one, it takes way too long to run (copied below). In my script attempt, I actually generated a list of combinations of tuples instead that I use to refer to my master dictionary later on in the script. I'll copy it below:
The dictionary tuple keys were generated using two lists: "ExpList1" and "ExpList2"
#first, I generate all the tuple combinations from my ExpDict dictionary
combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
#then I generate a list of only the combinations that don't repeat p or q
uniquecombolist = []
for foo in combos:
counter = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
if counter == 0:
uniquecombolist.append(foo)
After generating this list, I apply a function to all of the dictionary combinations (iterating through the tuple lists and calling their respective values from the master dictionary) and pick the combination with the smallest resulting value from that function.
I also tried to apply the function while iterating through the combinations picking the unique p,q ones and then checking whether the resulting value is smaller than the previous and keeping it if it is (this is instead of generating that list "uniquecombolist", I end up generating just the final tuple list) - still takes too long.
I think the solution lies in embedding the p,q-no-repeat and the final selecting function DURING the generation of combinations. I'm just having trouble wrapping my head around how to actually do this.
Thanks for reading!
Sara
EDIT:
To clarify, I wrote an alternative to my code that incorporates the final function (basically root mean squares) to the sets of pairs.
`combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
prevRMSD = float('inf')
for foo in combos:
counter = 0
distanceSUM = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
distanceSUM = distanceSUM + RMSDdict[bar]
RMSD = math.sqrt (distanceSUM**2/len(foo))
if counter == 0 and RMSD< prevRMSD:
chosencombo = foo
prevRMSD = RMSD`
So if I could incorporate the RMS calculation during the set generation and only keep the smallest one, I think that will solve my combinatorial problem.
If I understood your problem, you are interested in all the possible combinations of pairs (p,q) with unique p's and q's respecting a given set of possible values for p's and q's. In my answer I assume those possible values are, respectively, in list_p and list_q (I think this is what you have in ExpList1 and ExpList2 am I right?)
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombolist = [tuple(zip(i[0], i[1])) for i in prod]
Let me know if this is what you're looking for. By the way welcome to SO, great question!
Edit:
If you're concerned that your list may become enormous, you can always use a generator expression and apply whatever function you desire to it, e.g.,
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombo = (tuple(zip(y[0], y[1])) for y in prod) # this is now a generator expression, not a list -- observe the parentheses
def your_function(x):
# do whatever you want with the values, here I'm just printing and returning
print(x)
return x
# now prints the minimum value
print(min(itertools.imap(your_function, uniquecombo)))
When you use generators instead of lists, the values are computed as they are needed. Here since we're interested in the minimum value, each value is computed and is discarded right away unless it is the minimum.
This answer assume that you are trying to generate sets with |S| elements, where S is the smaller pool of tuple coordinates. The larger pool will be denoted L.
Since the set will contain |S| pairs with no repeated elements, each element from S must occur exactly once. From here, match up the permutations of L where |S| elements are chosen with the ordered elements of S. This will generate all requested sets exhaustively and without repetition.
Note that P(|L|, |S|) is equal to |L|!/(|L|-|S|)!
Depending on the sizes of the tuple coordinate pools, there may be too many permutations to enumerate.
Some code to replicate this enumeration might look like:
from itertools import permutations
S, L = range(2), range(4) # or ExpList1, ExpList2
for p in permutations(L, len(S)):
print(zip(S, p))
In total, your final code might look something like:
S, L = ExpList1, ExpList2
pairset_maker = lambda p: zip(S, p)
if len(S) > len(L):
S, L = L, S
pairset_maker = lambda p: zip(p, S)
n = len(S)
get_perm_value = lambda p: math.sqrt(sum(RMSDdict[t] for t in pairset_maker(p))**2/n)
min_pairset = min(itertools.permutations(L, n), key=get_perm_value)
If this doesn't get you to within an order or magnitude or two of your desired runtime, then you might need to consider an algorithm that doesn't produce an optimal solution.

Python print nth element from list of lists [duplicate]

This question already has answers here:
How to print column in python array?
(2 answers)
Closed 5 years ago.
I have the following list:
[[50.954818803035948, 55.49664787231189, 8007927.0, 0.0],
[50.630482185654436, 55.133473852776916, 8547795.0, 0.0],
[51.32738085400576, 55.118344981379266, 6600841.0, 0.0],
[49.425931642638567, 55.312890225131163, 7400096.0, 0.0],
[48.593467836476407, 55.073137270550006, 6001334.0, 0.0]]
I want to print the third element from every list. The desired result is:
8007927.0
8547795.0
6600841.0
7400096.0
6001334.0
I tried:
print data[:][2]
but it is not outputting the desired result.
Many way to do this. Here's a simple list way, without an explicit for loop.
tt = [[50.954818803035948, 55.49664787231189, 8007927.0, 0.0], [50.630482185654436, 55.133473852776916, 8547795.0, 0.0], [51.32738085400576, 55.118344981379266, 6600841.0, 0.0], [49.425931642638567, 55.312890225131163, 7400096.0, 0.0], [48.593467836476407, 55.073137270550006, 6001334.0, 0.0]]
print [x[2] for x in tt]
> [8007927.0, 8547795.0, 6600841.0, 7400096.0, 6001334.0]
And making is safe for potentially shorted lists
print [x[2] for x in tt if len(tt) > 3]
More sophisticated output (python 2.7), prints values as newline (\n) seperated
print '\n'.join([str(x[2]) for x in tt])
> 8007927.0
> 8547795.0
> 6600841.0
> 7400096.0
> 6001334.0
Try this:
for item in data:
if len(item) >= 3: # to prevent list out of bound exception.
print(int(item[2]))
map and list comprehensive have been given, I would like to provide two more ways, say d is your list:
With zip:
zip(*d)[2]
With numpy:
>>> import numpy
>>> nd = numpy.array(d)
>>> print(nd[:,2])
[ 8007927., 8547795., 6600841., 7400096., 6001334.]
Maybe you try a map function
In python 3:
list(map(lambda l: l[2], z))
In python 2:
map(lambda l: l[2], z)
In order to print the nth element of every list from a list of lists, you need to first access each list, and then access the nth element in that list.
In practice, it would look something like this
def print_nth_element(listset, n):
for listitem in listset:
print(int(listitem[n])) # Since you want them to be ints
Which could then be called in the form print_nth_element(data, 2) for your case.
The reason your data[:][2] is not yielding correct results is because data[:] returns the entire list of lists as it is, and then executing getting the 3rd element of that same list is just getting the thirst element of the original list. So data[:][2] is practically equivalent to data[2].

Equal elements in list of lists. Delete one

I have a list of lists
list = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
What I want to do is delete one the elements of same value should I divide first element with second. For example [-2.0, 5.0] = -2/5 and [2.0, -5.0] = -2/5. I want to delete either [-2.0, 5.0] or [2.0, -5.0] since they produce the same value.
Any ideas?
Can i try like this:
Tuple could be a dictionary key, so I converted the list into tuple after changing to abs
value of the list element and keeping the original list as the values.
>>> lis
[[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
>>> dict([(tuple([abs(x[0]), abs(x[1])]), x) for x in lis]).values()
[[2.0, -5.0], [1.0, 3.0]]
>>>
Assuming all your values are all floats (so you can always use float division) you can do the following:
my_list = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
values_seen = []
new_list = []
for x,y in my_list:
if x/y in values_seen:
continue
else:
values_seen.append(x/y)
new_list.append([x,y])
Now the list you want will be stored as new_list. Note that you should avoid writing a value to the keyword list as you have above.
*Clarification, I am assuming that if you have any more than 2 values that return the same ratio (for example [[1,3],[2,6],[3,9]]) you will want to keep only one of these.
If you want to eliminate all equivalent fractions (meaning [-2.0, 5.0] and [4.0, -10.0] are considered equivalent), then the following code would work.
seen = set()
for numerator, denominator in lst:
quotient = numerator / denominator
if quotient not in seen:
seen.add(quotient)
yield numerator, denominator
Otherwise, if you want the final list to contain both [-2.0, 5.0] and [4.0, -10.0]:
seen = set()
for numerator, denominator in lst:
value = (abs(numerator), abs(denominator), sign(numerator)*sign(denominator))
if value not in seen:
seen.add(value)
yield numerator, denominator
If you're writing this in Python, a language that lacks a sign function, you'll either need to use math.copysign or (numerator > 0) ^ (denominator > 0) where ^ is the xor operator.
This code assumes both numerator and denominator are nonzero.
If you really are keeping a list of numerator-denominator number pairs, consider storing the pairs as immutable tuples or better yet, as Python fractions.
I would first get a unique set of ratios using set:
In [1]: lst = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
In [2]: rs = list(set([ l[0]/l[1] for l in lst]))
And then just filter out the first occurance of the ratios:
In [3]: [ filter(lambda m: m[0]/m[1] == r , lst )[0] for r in rs ]
Out[3]: [[-2.0, 5.0], [-1.0, -3.0]]
In [4]:
Quick and dirty way, since keys in a dictionary are unique.
{num/denom : [num, denom] for (num, denom) in lst}.values()
In general, comparing floats using == is unreliable, it's normally better to check if they're within a tolerance. e.g.
abs(x-y) < tolerance
a more robust way would might look like the following. An else attached to a for loop just means do this unless you exited the loop early. It's quite handy. This version, however, is quadratic rather than linear time.
div = lambda x,y : x/y
unique = []
for j in range(len(lst)):
for i in range(j):
if abs( div(*lst[i])-div(*lst[j]) ) < tolerance:
break
else
unique.append(lst[j])
unique

Categories