I have a list of even number of float numbers:
[2.34, 3.45, 4.56, 1.23, 2.34, 7.89, ...].
My task is to calculate average of 1 and 2 elements, 3 and 4, 5 and 6, etc. What is the short way to do this in Python?
data = [2.34, 3.45, 4.56, 1.23, 2.34, 7.89]
print [(a + b) / 2 for a, b in zip(data[::2], data[1::2])]
Explanation:
data[::2] is the elements 2.34, 4.56, 2.34
data[1::2] is the elements 3.45, 1.23, 7.89
zip combines them into 2-tuples: (2.34, 3.45), (4.56, 1.23), (2.34, 7.89)
If the list is not too long, Paul Draper's answer is easy. If it is really long, you probably want to consider one of two other options.
First, using iterators, you can avoid copying around giant temporary lists:
avgs = [(a + b) / 2 for a, b in zip(*[iter(data)]*2)]
This does effectively the same thing, but lazily, meaning it only has to store one value at a time in memory (well, three values—a, b, and the average) instead of all of them.
iter(data) creates a lazy iterator over the data.
[iter(data)]*2 creates a list with two references to the same iterator, so when one advances, the other does as well.
Then we're using the same zip and list comprehension that Paul already explained so well. (In Python 2.x, as opposed to 3.x, zip is not lazy, so you're want to use itertools.izip rather than zip.)
If you don't actually need the result list, but just something you can iterate over, change the outer square brackets to parentheses and it becomes a generator expression, meaning it gives you an iterator instead of a list, and you're not storing anything at all.
Notice that the itertools docs have a recipe for a grouper that does the tricky bit (and you can also find it in the third-party module more-itertools), so you can just write grouper(data, 2) instead of zip(*[iter(data)]*2), which is certainly more readable if you're doing it frequently. If you want more explanation, see How grouper works.
Alternatively, you could use NumPy arrays instead of lists:
data_array = np.array(data)
And then you can just do this:
avg_array = (data_array[::2] + data_array[1::2]) / 2
That's not only simpler (no need for explicit loops), it's also about 10x faster, and takes about 1/4th the memory.
If you want to generalize this to arbitrary-length groups…
For the iterator solution, it's trivial:
[sum(group) / size for group in zip(*[iter(data)]*size)]
For the NumPy solution, it's a bit trickier. You have to dynamically create something to iterator over data[::size], data[1::size], …, data[size-1::size], like this:
sum(data[x::size] for x in range(size)) / size
There are other ways to do this in NumPy, but as long as size isn't too big, this will be fine—and it has the advantage that the exact same trick will work for Paul Draper's solution:
[sum(group) / size for group in zip(*(data[x::size] for x in range(size)))]
s= [2.34, 3.45, 4.56, 1.23, 2.34, 7.89, ...]
res= [(s[i]+s[i+1])/2 for i in range(0, len(s)-1, 2)]
Using NumPy to find the mean/average of consecutive two values this is more efficient in terms of Time and Space Complexity:
data=np.array([1,2,3,4,5,6])
k=2 #In your case
data1=np.mean(data.reshape(-1, k), axis=1)
Just use index for the task.
For simple example,
avg = []
list1 = [2.34, 3.45, 4.56, 1.23, 2.34, 7.89]
for i in range(len(list1)):
if(i+1 < len(list1):
avg.append( (list1[i] + list1[i+1]) / 2.0 )
avg2 = []
avg2 = [j for j in avg[::2]]
avg2 is what you want. This maybe easy to understand..
Related
I have a strange problem so I'll just demo it for you to make it easier to understand. I have two lists:
>>> a = [1, 2, 20, 6, 210]
>>> b = [20, 6, 1]
The result I'm looking for is 3 (position of last matching item in list a based on matches in b)
b always has less data as it contains all duplicates of list a. I want to know what item in B matches the furthest in list a. So in this example, it would be 6 as 6 is furthest in the list of A.
Is there an easy way to do this - my initial approach is a nested loop but I suspect there's a simpler approach?
The simplest (if not necessarily most efficient) code is just:
max(b, key=a.index)
or if you want the whole list sorted, not just the maximum value as described, one of:
b.sort(key=a.index)
or
sorted(b, key=a.index)
If duplicates in the reference list are a possibility and you need to get the last index of a value, replace index with one of these simulations of rindex.
Update: Addressing your requirement for getting the position, not the value, there is an alternative way to solve this that would involve less work. It's basically a modification of one of the better solutions to emulating rindex:
bset = frozenset(b)
last_a_index = len(a) - next(i for i, x in enumerate(reversed(a), 1) if x in bset)
This gets the work down to O(m + n) (vs. O(m * n) for other solutions) and it short-circuits the loop over a; it scans in reverse until it finds a value in b then immediately produces the index. It can trivially be extended to produce the value with a[last_a_index], since it doesn't really matter where it was found in b. More complex code, but faster, particularly if a/b might be huge.
Since you asked for the position:
>>> max(map(a.index, b))
3
I have an array with numbers like this, in this case 5 numbers:
a = array([2.88581812, 2.8930633 , 2.85603976, 2.86739916, 2.85736707])
I would like to create an array of 10 elements in which the pairwise difference of all the numbers of the array is present.
For now i used nested loop for like this:
diffmean = []
for i in range(len(a)-1):
for j in range(i+1, len(a)):
diffmean.append(a[i]-a[j])
Obtaining a list with 10 elements of pairwise difference
[-0.007245185215707384,
0.029778354907735505,
0.018418952746142025,
0.0284510446999775,
0.03702354012344289,
0.02566413796184941,
0.035696229915684885,
-0.01135940216159348,
-0.0013273102077580035,
0.010032091953835476]
there is a "pythonic" way to perform this? without loopfor or nested loop for?
You can use combinations in build-in itertools library. Like:
from itertools import combinations
a = [2.88581812, 2.8930633 , 2.85603976, 2.86739916, 2.85736707]
diffmean = []
for b,c in combinations(a, 2):
diffmean.append(b - c)
The second argument of the function is the number of elements you want to combine. The function is order-free and the order-based version is permutations which returns 20 values in this case.
Assuming a is is numpy array. The below should work, but perhaps a more efficient solution exists as this calculates differences twices
np.expand_dims(a, 1) - a
d[np.tril_indices(a.size, k=-1)]
I'm working with a bit of a riddle:
Given a dictionary with tuples for keys: dictionary = {(p,q):n}, I need to generate a list of new dictionaries of every combination such that neither p nor q repeat within the new dictionary. And during the generation of this list of dictionaries, or after, pick one of the dictionaries as the desired one based on a calculation using the dictionary values.
example of what I mean (but much smaller):
dictionary = {(1,1): 1.0, (1,2): 2.0, (1,3): 2.5, (1,4): 5.0, (2,1): 3.5, (2,2): 6.0, (2,3): 4.0, (2,4): 1.0}
becomes
listofdictionaries = [{(1,1): 1.0, (2,2): 6.0}, {(1,1): 1.0, (2,3): 4.0}, (1,1): 1.0, (2,4): 1.0}, {(1,2): 2.0, (2,1): 3.5}, {(1,2): 2.0, (2,3): 4.0}, etc.
a dictionary like: {(1,1): 1.0, (2,1): 3.5} is not allowable because q repeats.
Now my sob story: I'm brand new to coding... but I've been trying to write this script to analyze some of my data. But I also think it's an interesting algorithm riddle. I wrote something that works with very small dictionaries but when I input a large one, it takes way too long to run (copied below). In my script attempt, I actually generated a list of combinations of tuples instead that I use to refer to my master dictionary later on in the script. I'll copy it below:
The dictionary tuple keys were generated using two lists: "ExpList1" and "ExpList2"
#first, I generate all the tuple combinations from my ExpDict dictionary
combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
#then I generate a list of only the combinations that don't repeat p or q
uniquecombolist = []
for foo in combos:
counter = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
if counter == 0:
uniquecombolist.append(foo)
After generating this list, I apply a function to all of the dictionary combinations (iterating through the tuple lists and calling their respective values from the master dictionary) and pick the combination with the smallest resulting value from that function.
I also tried to apply the function while iterating through the combinations picking the unique p,q ones and then checking whether the resulting value is smaller than the previous and keeping it if it is (this is instead of generating that list "uniquecombolist", I end up generating just the final tuple list) - still takes too long.
I think the solution lies in embedding the p,q-no-repeat and the final selecting function DURING the generation of combinations. I'm just having trouble wrapping my head around how to actually do this.
Thanks for reading!
Sara
EDIT:
To clarify, I wrote an alternative to my code that incorporates the final function (basically root mean squares) to the sets of pairs.
`combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
prevRMSD = float('inf')
for foo in combos:
counter = 0
distanceSUM = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
distanceSUM = distanceSUM + RMSDdict[bar]
RMSD = math.sqrt (distanceSUM**2/len(foo))
if counter == 0 and RMSD< prevRMSD:
chosencombo = foo
prevRMSD = RMSD`
So if I could incorporate the RMS calculation during the set generation and only keep the smallest one, I think that will solve my combinatorial problem.
If I understood your problem, you are interested in all the possible combinations of pairs (p,q) with unique p's and q's respecting a given set of possible values for p's and q's. In my answer I assume those possible values are, respectively, in list_p and list_q (I think this is what you have in ExpList1 and ExpList2 am I right?)
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombolist = [tuple(zip(i[0], i[1])) for i in prod]
Let me know if this is what you're looking for. By the way welcome to SO, great question!
Edit:
If you're concerned that your list may become enormous, you can always use a generator expression and apply whatever function you desire to it, e.g.,
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombo = (tuple(zip(y[0], y[1])) for y in prod) # this is now a generator expression, not a list -- observe the parentheses
def your_function(x):
# do whatever you want with the values, here I'm just printing and returning
print(x)
return x
# now prints the minimum value
print(min(itertools.imap(your_function, uniquecombo)))
When you use generators instead of lists, the values are computed as they are needed. Here since we're interested in the minimum value, each value is computed and is discarded right away unless it is the minimum.
This answer assume that you are trying to generate sets with |S| elements, where S is the smaller pool of tuple coordinates. The larger pool will be denoted L.
Since the set will contain |S| pairs with no repeated elements, each element from S must occur exactly once. From here, match up the permutations of L where |S| elements are chosen with the ordered elements of S. This will generate all requested sets exhaustively and without repetition.
Note that P(|L|, |S|) is equal to |L|!/(|L|-|S|)!
Depending on the sizes of the tuple coordinate pools, there may be too many permutations to enumerate.
Some code to replicate this enumeration might look like:
from itertools import permutations
S, L = range(2), range(4) # or ExpList1, ExpList2
for p in permutations(L, len(S)):
print(zip(S, p))
In total, your final code might look something like:
S, L = ExpList1, ExpList2
pairset_maker = lambda p: zip(S, p)
if len(S) > len(L):
S, L = L, S
pairset_maker = lambda p: zip(p, S)
n = len(S)
get_perm_value = lambda p: math.sqrt(sum(RMSDdict[t] for t in pairset_maker(p))**2/n)
min_pairset = min(itertools.permutations(L, n), key=get_perm_value)
If this doesn't get you to within an order or magnitude or two of your desired runtime, then you might need to consider an algorithm that doesn't produce an optimal solution.
I need to create a list comprehension that extracts values from a dict within a list within a list, and my attempts so far are failing me. The object looks like this:
MyList=[[{'animal':'A','color':'blue'},{'animal':'B','color':'red'}],[{'animal':'C','color':'blue'},{'animal':'D','color':'Y'}]]
I want to extract the values for each element in the dict/list/list so that I get two new lists:
Animals=[[A,B],[C,D]]
Colors=[[blue,red],[blue,Y]]
Any suggestions? Doesn't necessarily need to use a list comprehension; that's just been my starting point so far. Thanks!
Animals = [[d['animal'] for d in sub] for sub in MyList]
Colors = [[d['color'] for d in sub] for sub in MyList]
Gives the desired result:
[['A', 'B'], ['C', 'D']]
[['blue', 'red'], ['blue', 'Y']] # No second 'red'.
What I have done here is take each sub-list, then each dictionary, and then access the correct key.
In a single assignment (with a single list comprehension, and the help of map and zip):
Colors, Animals = map(list,
zip(*[map(list,
zip(*[(d['color'], d['animal']) for d in a]))
for a in MyList]))
If you are fine with tuples, you can avoid the two calls to map => list
EDIT:
Let's see it in some details, by decomposing the nested comprehension.
Let's also assume MyList have m elements, for a total of n objects (dictionaries).
[[d for d in sub] for sub in MyList]
This would iterate through every dictionary in the sublists. For each of them, we create a couple with its color property in the first element and its animal property in the second one:
(d['color'], d['animal'])
So far, this will take time proportional to O(n) - exatly n elements will be processed.
print [[(d['color'], d['animal']) for d in sub] for sub in MyList]
Now, for each of the m sublists of the original list, we have one list of couples that we need to unzip, i.e. transform it into two lists of singletons. In Python, unzip is performed using the zip function by passing a variable number of tuples as arguments (the arity of the first tuple determines the number of tuples it outputs). For instance, passing 3 couples, we get two lists of 3 elements each
>>> zip((1,2), (3,4), (5,6)) #Prints [(1, 3, 5), (2, 4, 6)]
To apply this to our case, we need to pass array of couples to zip as a variable number of arguments: that's done using the splat operator, i.e. *
[zip(*[(d['color'], d['animal']) for d in sub]) for sub in MyList]
This operation requires going through each sublist once, and in turn through each one of the couples we created in the previous step. Total running time is therefore O(n + n + m) = O(n), with approximatively 2*n + 2*m operations.
So far we have m sublists, each one containing two tuples (the first one will gather all the colors for the sublist, the second one all the animals). To obtain two lists with m tuples each, we apply unzip again
zip(*[zip(*[(d['color'], d['animal']) for d in sub]) for sub in MyList]
This will require an additional m steps - the running time will therefore stay O(n), with approximatively 2*n + 4*m operations.
For sake of simplicity we left out mapping tuples to lists in this analysis - which is fine if you are ok with tuples instead.
Tuples are immutable, however, so it might not be the case.
If you need lists of lists, you need to apply the list function to each tuple: once for each of the m sublists (with a total of 2*n elements), and once for each of the 2 first level lists, i.e. Animals and Colors, (which have a total of m elements each). Assuming list requires time proportional to the length of the sequence it is applied to, this extra step requires 2*n + 2*m operations, which is still O(n).
This is an offshoot of a previous question which started to snowball. If I have a matrix A and I want to use the mean/average of each row [1:] values to create another matrix B, but keep the row headings intact, this list comprehension works.
from operator import mul,len
# matrix A with row headings and values
A = [('Apple',0.95,0.99,0.89,0.87,0.93),
('Bear',0.33,0.25.0.85,0.44,0.33),
('Crab',0.55,0.55,0.10,0.43,0.22)]
#List Comprehension
def average(lst):
return sum(lst) / len(lst)
B = [(a[0], average(a[1:])) for a in A]
Expected outcome
B = [('Apple', 0.926), ('Bear', 0.44), ('Crab', 0.37)]
However, if the dataset has holes in it (symbolized by 'x'), the analysis won't run, i.e.
# matrix A with row headings and values
A = [('Apple',0.95,x,0.89,0.87,0.93),
('Bear',0.33,0.25.0.85,0.44,0.33),
('Crab',x,0.55,0.10,x,0.22)]
In a matrix where the relative placement of each row and column means something, I can't just delete the "blank" entries, so how can I fill or skip over them and make this work, again? In retrospect, my data has more holes than an old bed sheet.
Also, how would I introduce the filters suggested below into the following definitions (which choke when they hit something that isn't a number) so that hitting an 'X' value would return another 'X' value?
def plus(matrix, i):
return [row[i] for row in matrix]
def minus(matrix, i):
return [1.00-row[i] for row in matrix]
Try this:
B = [(a[0], average(filter(lambda elt: elt != x, a[1:]))) for a in A]
Performance could be improved by using ifilter from itertools, especially for large matrices. This should give you the expected result without changing the average function or modifying A.
EDIT
You may want to consider implementing your matrix differently if it is sparse. If you want to keep your current implementation, you should use the value None to represent missing values. This is the Python equivalent to null that you may be familiar with from other languages.
How you implement the matrix drastically changes how you implement the functions you want, and I'll try to cover your way and an alternate method that could be more efficient for sparse matrices.
For both I'll use your example matrix with holes:
# matrix A with row headings and values
A = [('Apple',0.95, x, 0.89, 0.87, 0.93),
('Bear', 0.33, 0.25, 0.85, 0.44, 0.33),
('Crab', x, 0.55, 0.10, x, 0.22)]
List of lists (or tuples, or whatever)
Like I said before, use None for an empty value:
A = [('Apple', 0.95, None, 0.89, 0.87, 0.93),
('Bear', 0.33, 0.25, 0.85, 0.44, 0.33),
('Crab', None, 0.55, 0.10, None, 0.22)]
B is similar to what I posted earlier:
B = [(a[0], average(filter(lambda x: x is not None, a[1:]))) for a in A]
Define column as a generator (iterable) that returns only the filled values:
def column(M, i):
i += 1 # this will allow you to use zero-based indices if you want
return (row[i] for row in M if row[i] is not None)
Then you can implement minus more easily and efficiently:
from operator import sub
from itertools import imap, repeat
def minus(M, i):
return list(imap(sub, repeat(1.0), column(M, i)))
Dictionaries
Another way to represent your matrix is with Python dicts. There are some advantages here, especially that you don't waste storage space if you have a lot of holes in the matrix. A con to this method is that it can be more of a pain to create the matrix depending on how you construct it.
Your example might become (whitespace for clarity):
A = [('Apple', dict([(0, 0.95), (2, 0.89), (3, 0.87), (4, 0.93)])),
('Bear', dict([(0, 0.33), (1, 0.25), (2, 0.85), (3, 0.44), (4, 0.33)])),
('Crab', dict([ (1, 0.55), (2, 0.10), (4, 0.22)]))]
This is an ugly way to construct it for sure, but if you are constructing the matrix from other data with a loop it can be a lot nicer.
Now,
B = [(a[0], sum(a[1].itervalues())/len(a[1])) for a in A2]
This is uglier than it should be but I'm not so good at Python and I can't get it to do exactly what I want...
You can define a column function which returns a generator that will be more efficient than a list comprehension:
def column(M, i):
return (row[1][i] for row in M if i in row[1])
minus is done exactly as in the other example.
I have a feeling that there is something I'm not getting about what you want, so feel free to let me know what needs fixing. Also, my lack of Python codez probably didn't do the dictionary version justice, but it can be efficient for sparse matrices. This whole example would be easier if you created a matrix class, then you could switch implementations and see which is better for you. Good luck.
This doesn't work because x is not necessarily a number (you don't tell us what it is either).
So you probably have to write your own summing function that checks whether an item is an x or something else (maybe you'll have to use isinstance(element, int) or isinstance(element, float)).
In average(), use a loop to remove all x values from the list with lst.remove(x) before you calculate the average (you'll have to catch the error remove() generates when it can't find what it's looking for).
I recommend using something like "" for representing holes, unless you have something made up already.