Max of 99percentile in a window of list of lists - python

I have a list of lists(2000x1000), but as example consider this one(10x3):
l = [[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7], [8, 2, 5], [1, 9, 2], [8, 7, 6], [9, 9, 9], [4, 5, 9]]
In this example, every list corresponds to 3 measurements of each instant:
t0 -> [8,7,6]
t1 -> [5,3,1]
and so on.
I would like to compare the measurements with a window of 4 instants for position and take the max value which is in the 99 percentile of the peak to peak value.
EXAMPLE
Lets consider the first window:
[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1] :
[8,5,4,1] -> peak to peak: 8-1=7
[7,3,5,5] -> ptp=4
[6,1,9,1] -> ptp=8
with these 3 values [7,4,8] I want to take the max in the 99percentile, in this case 7
For the second window:
[5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7]:
[5,4,1,3] -> ptp=4
[3,5,5,5] -> ptp=2
[1,9,1,7] -> ptp=8
max in 99percentile -> 4
After I do that for all the windows of size 4, I want to make a list with these values.
My code is this the following one, but it is slow. Is there a fast way to implement this?
NOTE: I cannot use pandas, and Numpy version should be <=1.6
num_meas = 4
m = []
for index, i in enumerate(l):
if index < len(l) - num_meas + 1:
p = []
for j in range(len(i)):
t = []
for k in range(num_meas):
t.append(l[index + k][j])
t = [x for x in t if ~np.isnan(x)]
try:
a = np.ptp(t)
except ValueError:
a = 0
p.append(a)
perce = np.percentile(p, 99)
p = max([el for el in p if el < perce])
m.append(p)
print m
The output:
[7, 4, 7, 6, 5, 7, 7]

Please check if the following code works with NumPy 1.6:
import numpy as np
l = [[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7], [8, 2, 5],
[1, 9, 2], [8, 7, 6], [9, 9, 9], [4, 5, 9]]
l = np.array(l)
# range matrix
mat_ptp = np.zeros((l.shape[0]-3, l.shape[1]))
for i in range(l.shape[0]-3):
l[i:i+4].ptp(axis=0, out=mat_ptp[i])
percentiles = np.percentile(mat_ptp, 99, axis=1)
greater_pos = np.greater_equal(mat_ptp, percentiles.reshape(-1, 1))
mat_ptp[greater_pos] = -np.inf
result = np.max(mat_ptp, axis=1)
For enhancing performance, you can try to vetorize your operations as much as possible using numpy. It could be much faster than using for loops and the append function.
EDIT
Sorry, I didn't notice that you wanted the selected elements strictly less than the percentile. Here is the correct version.
BENCHMARK
Just to validate the question about performance, here is the result with:
l = np.random.randint(0, 100, size=(200, 100))
run 100 times with timeit:
OP code: 0.5197743272900698 ms in average
Code above: 0.0021439407201251015 in average

Related

Get fixed size combinations of a list of lists in python?

I am looking for modified version of itertools.product(*a). This command returns combinations by selecting elements from each list but I need to restrict size.
Suppose,
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
output: (6, 3), (6, 2),....(3, 2)... when size is 2
Number of lists and size are not fixed. I need something that can be dynamic enough.
You can try:
from itertools import product, combinations, chain
mylist=[[6, 7, 8], [3, 5, 9], [2, 1]]
size = 2
results = chain.from_iterable(product(*t) for t in combinations(mylist, size))
print(list(results))
Perhaps you can try this:
from itertools import chain, combinations
l=[[6, 7, 8], [3, 5, 9], [2, 1, 4]]
x=list(combinations(chain.from_iterable(l),2))
print(x)
Solution:
import itertools
size = 2
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
res = []
for x in list(itertools.product(*mylist)):
res += itertools.combinations(x, size)
print(set(res))

How to make new list of the numbers not appended into its own list?

If I have a multidimensional list called t and I append some numbers from the list into a new list called TC, how do I take all of the numbers that were not appended into the new list and put them in their own list, called nonTC? For example:
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
And I write some conditions to append only some values from each list to create the new list, TC:
TC = [[3, 4, 6], [9, 7, 2], [5]]
How do I append the values not included in TC into its own list? So I would get:
nonTC = [[1, 5, 7],[4, 5],[3,4]]
You can use list comprehensions and a list of sets to filter your original list:
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
# filter sets - each index corresponds to one inner list of t - the numbers in the
# set should be put into TC - those that are not go into nonTC
getem = [{3,4,6},{9,7,2},{5}]
TC = [ [p for p in part if p in getem[i]] for i,part in enumerate(t)]
print(TC)
nonTC = [ [p for p in part if p not in getem[i]] for i,part in enumerate(t)]
print(nonTC)
Output:
[[3, 4, 6], [9, 7, 2], [5]] # TC
[[1, 5, 7], [4, 5], [3, 4]] # nonTC
Readup:
list comprehensions
sets
enumerate(iterable)
And: Explanation of how nested list comprehension works?
Suggestion for other way to do it, creds to AChampion:
TC_1 = [[p for p in part if p in g] for g, part in zip(getem, t)]
nonTC_1 = [[p for p in part if p not in g] for g, part in zip(getem, t)]
See zip() - it essentially bundles the two lists into an iterable of tuples
( (t[0],getem[0]), (t[1],getem[1]) (t[2],getem[2]))
Add-On for multiple occurences - forfeiting list comp and sets:
t = [[1, 3, 4, 5, 6, 7, 3, 3, 3],[9, 7, 4, 5, 2], [3, 4, 5]]
# filter lists - each index corresponds to one inner list of t - the numbers in the list
# should be put into TC - those that are not go into nonTC - exactly with the amounts given
getem = [[3,3,4,6],[9,7,2],[5]]
from collections import Counter
TC = []
nonTC = []
for f, part in zip(getem,t):
TC.append([])
nonTC.append([])
c = Counter(f)
for num in part:
if c.get(num,0) > 0:
TC[-1].append(num)
c[num]-=1
else:
nonTC[-1].append(num)
print(TC) # [[3, 4, 6, 3], [9, 7, 2], [5]]
print(nonTC) # [[1, 5, 7, 3, 3], [4, 5], [3, 4]]
It needs only 1 pass over your items instead of 2 (seperate list comps) which makes it probably more efficient in the long run...
Just out of curiosity, using NumPy:
import numpy as np
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
TC = [[3, 4, 6], [9, 7, 2], [5]]
print([np.setdiff1d(a, b) for a, b in zip(t, TC)])
#=> [array([1, 5, 7]), array([4, 5]), array([3, 4])]

How to modify the elements in a list within list

I am very new to Python, trying to learn the basics. Have a doubt about the list.
Have a list:
L = [[1,2,3],[4,5,6],[3,4,6]]
The output should be:
[[2,4,6],[8,10,12],[6,8,12]]
The code that works for me is the following
for x in range(len(L)):
for y in range(len(L[x])):
L[x][y] = L[x][y] + L[x][y]
print L
It gives the output [[2,4,6],[8,10,12],[6,8,12]].
Now I want the same output with a different code:
for x in L:
a = L.index(x)
for y in L[a]:
b = L[a].index(y)
L[a][b] = L[a][b] + L[a][b]
print L
With the above code the output obtained is:
[[4,2,6],[8,10,12],[12,8,6]]
I tried to debug about the above output.
I put a print statement above the line "L[a][b] = L[a][b] + L[a][b]" for printing a and b. I was surprised to see the values of a and b are :
0,0
0,0
0,2
1,0
1,1
1,2
2,0
2,1
2,0
Again if I comment out the line "L[a][b] = L[a][b] + L[a][b]" then the values of a and b are as expected:
0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2
I suspect this might be happening due to the scope of variable in python and tried to study few stuffs about scoping in python. But I didn't get appropriate answer neither for scoping or the above question.
You modifying your list with statement - L[a][b] = L[a][b] + L[a][b]
e.g. -
L = [[1, 2, 3], [4, 5, 6], [3, 4, 6]]
L[0][0] = 1 initially
Then you modify it as L[0][0] = 2
L = [[2, 2, 3], [4, 5, 6], [3, 4, 6]]
In next loop you search index for 2, which is 0,0 now, Because you modified list L.
I tried to print L along with a,b in your example. Result explains the behavior -
0 0
[[1, 2, 3], [4, 5, 6], [3, 4, 6]]
0 0
[[2, 2, 3], [4, 5, 6], [3, 4, 6]]
0 2
[[4, 2, 3], [4, 5, 6], [3, 4, 6]]
1 0
[[4, 2, 6], [4, 5, 6], [3, 4, 6]]
1 1
[[4, 2, 6], [8, 5, 6], [3, 4, 6]]
1 2
[[4, 2, 6], [8, 10, 6], [3, 4, 6]]
2 0
[[4, 2, 6], [8, 10, 12], [3, 4, 6]]
2 1
[[4, 2, 6], [8, 10, 12], [6, 4, 6]]
2 0
[[4, 2, 6], [8, 10, 12], [6, 8, 6]]
As other people have explained, when you use the index function, it finds the first occurrence of the value you are search for. So the first time through you're loop (for the first row), it looks like
b = 1
[1,2,3].find(1) # returns index 0
#Then you modify the first element of the list
b = 2
[2,2,3].find(2) #returns index 0 again!
For getting the indices in an easier, more deterministic way, you can use the enumerate function on a list. It will provided you with an iterator that returns the index AND value as you move throughout a list.
for rowInd, x in enumerate(L):
for colInd, y in enumerate(x):
L[rowInd][colInd] = y + y
Note that this will do it in place, as in your original solution.
L = [[2, 4, 6], [8, 10, 12], [6, 8, 12]]
The best way to achieved your desired output is to use a list comprehension. You could do as follows:
L = [[1,2,3], [4,5,6], [3,4,6]]
answer = [[2*el for el in sublist] for sublist in L]
print(answer)
Output
[[2, 4, 6], [8, 10, 12], [6, 8, 12]]
This iterates over each sublist in your list L and multiplies each el in the sublist by 2, thus achieving the desired result.
I think the following piece of code might be better
for x in L: #iterating over the orig list
for y in x: #iterating over the inner list
[x][y] = [x][y] + [x][y]
If you insist on using your second method, then you need to store the results in a temporary variable:
L = [[1, 2, 3], [4, 5, 6], [3, 4, 6]]
M = [[0 for y in range(3)] for x in range(3)]
for x in L:
a = L.index(x)
for y in L[a]:
b = L[a].index(y)
M[a][b] = L[a][b] + L[a][b]
L = M
print L
Output:
[[2, 4, 6], [8, 10, 12], [6, 8, 12]]

Find in python combinations of mutually exclusive sets from a list's elements

In a project I am currently working on I have implemented about 80% of what I want my program to do and I am very happy with the results.
In the remaining 20% I am faced with a problem which puzzles me a bit on how to solve.
Here it is:
I have come up with a list of lists which contain several numbers (arbitrary length)
For example:
listElement[0] = [1, 2, 3]
listElement[1] = [3, 6, 8]
listElement[2] = [4, 9]
listElement[4] = [6, 11]
listElement[n] = [x, y, z...]
where n could reach up to 40,000 or so.
Assuming each list element is a set of numbers (in the mathematical sense), what I would like to do is to derive all the combinations of mutually exclusive sets; that is, like the powerset of the above list elements, but with all non-disjoint-set elements excluded.
So, to continue the example with n=4, I would like to come up with a list that has the following combinations:
newlistElement[0] = [1, 2, 3]
newlistElement[1] = [3, 6, 8]
newlistElement[2] = [4, 9]
newlistElement[4] = [6, 11]
newlistElement[5] = [[1, 2, 3], [4, 9]]
newlistElement[6] = [[1, 2, 3], [6, 11]]
newlistElement[7] = [[1, 2, 3], [4, 9], [6, 11]]
newlistElement[8] = [[3, 6, 8], [4, 9]]
newlistElement[9] = [[4, 9], [6, 11]
An invalid case, for example would be combination [[1, 2, 3], [3, 6, 8]] because 3 is common in two elements.
Is there any elegant way to do this? I would be extremely grateful for any feedback.
I must also specify that I would not like to do the powerset function, because the initial list could have quite a large number of elements (as I said n could go up to 40000), and taking the powerset with so many elements would never finish.
I'd use a generator:
import itertools
def comb(seq):
for n in range(1, len(seq)):
for c in itertools.combinations(seq, n): # all combinations of length n
if len(set.union(*map(set, c))) == sum(len(s) for s in c): # pairwise disjoint?
yield list(c)
for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print c
This produces:
[[1, 2, 3]]
[[3, 6, 8]]
[[4, 9]]
[[6, 11]]
[[1, 2, 3], [4, 9]]
[[1, 2, 3], [6, 11]]
[[3, 6, 8], [4, 9]]
[[4, 9], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]
If you need to store the results in a single list:
print list(comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]))
The following is a recursive generator:
def comb(input, lst = [], lset = set()):
if lst:
yield lst
for i, el in enumerate(input):
if lset.isdisjoint(el):
for out in comb(input[i+1:], lst + [el], lset | set(el)):
yield out
for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print c
This is likely to be a lot more efficient than the other solutions in situations where a lot of sets have common elements (of course in the worst case it still has to iterate over the 2**n elements of the powerset).
The method used in the program below is similar to a couple of previous answers in excluding not-disjoint sets and therefore usually not testing all combinations. It differs from previous answers by greedily excluding all the sets it can, as early as it can. This allows it to run several times faster than NPE's solution. Here is a time comparison of the two methods, using input data with 200, 400, ... 1000 size-6 sets having elements in the range 0 to 20:
Set size = 6, Number max = 20 NPE method
0.042s Sizes: [200, 1534, 67]
0.281s Sizes: [400, 6257, 618]
0.890s Sizes: [600, 13908, 2043]
2.097s Sizes: [800, 24589, 4620]
4.387s Sizes: [1000, 39035, 9689]
Set size = 6, Number max = 20 jwpat7 method
0.041s Sizes: [200, 1534, 67]
0.077s Sizes: [400, 6257, 618]
0.167s Sizes: [600, 13908, 2043]
0.330s Sizes: [800, 24589, 4620]
0.590s Sizes: [1000, 39035, 9689]
In the above data, the left column shows execution time in seconds. The lists of numbers show how many single, double, or triple unions occurred. Constants in the program specify data set sizes and characteristics.
#!/usr/bin/python
from random import sample, seed
import time
nsets, ndelta, ncount, setsize = 200, 200, 5, 6
topnum, ranSeed, shoSets, shoUnion = 20, 1234, 0, 0
seed(ranSeed)
print 'Set size = {:3d}, Number max = {:3d}'.format(setsize, topnum)
for casenumber in range(ncount):
t0 = time.time()
sets, sizes, ssum = [], [0]*nsets, [0]*(nsets+1);
for i in range(nsets):
sets.append(set(sample(xrange(topnum), setsize)))
if shoSets:
print 'sets = {}, setSize = {}, top# = {}, seed = {}'.format(
nsets, setsize, topnum, ranSeed)
print 'Sets:'
for s in sets: print s
# Method by jwpat7
def accrue(u, bset, csets):
for i, c in enumerate(csets):
y = u + [c]
yield y
boc = bset|c
ts = [s for s in csets[i+1:] if boc.isdisjoint(s)]
for v in accrue (y, boc, ts):
yield v
# Method by NPE
def comb(input, lst = [], lset = set()):
if lst:
yield lst
for i, el in enumerate(input):
if lset.isdisjoint(el):
for out in comb(input[i+1:], lst + [el], lset | set(el)):
yield out
# Uncomment one of the following 2 lines to select method
#for u in comb (sets):
for u in accrue ([], set(), sets):
sizes[len(u)-1] += 1
if shoUnion: print u
t1 = time.time()
for t in range(nsets-1, -1, -1):
ssum[t] = sizes[t] + ssum[t+1]
print '{:7.3f}s Sizes:'.format(t1-t0), [s for (s,t) in zip(sizes, ssum) if t>0]
nsets += ndelta
Edit: In function accrue, arguments (u, bset, csets) are used as follows:
• u = list of sets in current union of sets
• bset = "big set" = flat value of u = elements already used
• csets = candidate sets = list of sets eligible to be included
Note that if the first line of accrue is replaced by
def accrue(csets, u=[], bset=set()):
and the seventh line by
for v in accrue (ts, y, boc):
(ie, if parameters are re-ordered and defaults given for u and bset) then accrue can be invoked via [accrue(listofsets)] to produce its list of compatible unions.
Regarding the ValueError: zero length field name in format error mentioned in a comment as occurring when using Python 2.6, try the following.
# change:
print "Set size = {:3d}, Number max = {:3d}".format(setsize, topnum)
# to:
print "Set size = {0:3d}, Number max = {1:3d}".format(setsize, topnum)
Similar changes (adding appropriate field numbers) may be needed in other formats in the program. Note, the what's new in 2.6 page says “Support for the str.format() method has been backported to Python 2.6”. While it does not say whether field names or numbers are required, it does not show examples without them. By contrast, either way works in 2.7.3.
using itertools.combinations, set.intersection and for-else loop:
from itertools import *
lis=[[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]
def func(lis):
for i in range(1,len(lis)+1):
for x in combinations(lis,i):
s=set(x[0])
for y in x[1:]:
if len(s & set(y)) != 0:
break
else:
s.update(y)
else:
yield x
for item in func(lis):
print item
output:
([1, 2, 3],)
([3, 6, 8],)
([4, 9],)
([6, 11],)
([1, 2, 3], [4, 9])
([1, 2, 3], [6, 11])
([3, 6, 8], [4, 9])
([4, 9], [6, 11])
([1, 2, 3], [4, 9], [6, 11])
Similar to NPE's solution, but it's without recursion and it returns a list:
def disjoint_combinations(seqs):
disjoint = []
for seq in seqs:
disjoint.extend([(each + [seq], items.union(seq))
for each, items in disjoint
if items.isdisjoint(seq)])
disjoint.append(([seq], set(seq)))
return [each for each, _ in disjoint]
for each in disjoint_combinations([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print each
Result:
[[1, 2, 3]]
[[3, 6, 8]]
[[1, 2, 3], [4, 9]]
[[3, 6, 8], [4, 9]]
[[4, 9]]
[[1, 2, 3], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]
[[4, 9], [6, 11]]
[[6, 11]]
One-liner without employing the itertools package.
Here's your data:
lE={}
lE[0]=[1, 2, 3]
lE[1] = [3, 6, 8]
lE[2] = [4, 9]
lE[4] = [6, 11]
Here's the one-liner:
results=[(lE[v1],lE[v2]) for v1 in lE for v2 in lE if (set(lE[v1]).isdisjoint(set(lE[v2])) and v1>v2)]

Return min/max of multidimensional in Python?

I have a list in the form of
[ [[a,b,c],[d,e,f]] , [[a,b,c],[d,e,f]] , [[a,b,c],[d,e,f]] ... ] etc.
I want to return the minimal c value and the maximal c+f value. Is this possible?
For the minimum c:
min(c for (a,b,c),(d,e,f) in your_list)
For the maximum c+f
max(c+f for (a,b,c),(d,e,f) in your_list)
Example:
>>> your_list = [[[1,2,3],[4,5,6]], [[0,1,2],[3,4,5]], [[2,3,4],[5,6,7]]]
>>> min(c for (a,b,c),(d,e,f) in lst)
2
>>> max(c+f for (a,b,c),(d,e,f) in lst)
11
List comprehension to the rescue
a=[[[1,2,3],[4,5,6]], [[2,3,4],[4,5,6]]]
>>> min([x[0][2] for x in a])
3
>>> max([x[0][2]+ x[1][2] for x in a])
10
You have to map your list to one containing just the items you care about.
Here is one possible way of doing this:
x = [[[5, 5, 3], [6, 9, 7]], [[6, 2, 4], [0, 7, 5]], [[2, 5, 6], [6, 6, 9]], [[7, 3, 5], [6, 3, 2]], [[3, 10, 1], [6, 8, 2]], [[1, 2, 2], [0, 9, 7]], [[9, 5, 2], [7, 9, 9]], [[4, 0, 0], [1, 10, 6]], [[1, 5, 6], [1, 7, 3]], [[6, 1, 4], [1, 2, 0]]]
minc = min(l[0][2] for l in x)
maxcf = max(l[0][2]+l[1][2] for l in x)
The contents of the min and max calls is what is called a "generator", and is responsible for generating a mapping of the original data to the filtered data.
Of course it's possible. You've got a list containing a list of two-element lists that turn out to be lists themselves. Your basic algorithm is
for each of the pairs
if c is less than minimum c so far
make minimum c so far be c
if (c+f) is greater than max c+f so far
make max c+f so far be (c+f)
suppose your list is stored in my_list:
min_c = min(e[0][2] for e in my_list)
max_c_plus_f = max(map(lambda e : e[0][2] + e[1][2], my_list))

Categories