Related
I have a dictionary:
groups = {'group1': array([450, 449.]), 'group2': array([490, 489.]), 'group3': array([568, 567.])}
I have to iterate over a txt file that I have loaded using numpy.loadtxt() with many values:
subjects =
[1.0, -1.0
2.0, 1.0
3.0, 2.0
...
565.0, 564.0
566.0, 565.0
567.0, 566.0
568.0, 567.0]
What I want to do is to check if the value in the first column of "subject" is equal to the value of the second column of each array in my dictionary.
So basically when the condition is met the line of "subjects" should be added to the appropriate array of the dictionary...
The output that I expect is this:
groups = {'group1': array([450, 449.], [449, 448]), 'group2': array([490, 489.], [489, 488]), 'group3': array([568, 567.], [567, 566])}
You have to make it by parts
Note: I've done partitally. Analyze this code. Expected result is colse. I've observed array is splitted. I'm redaing dcoumention & looking way to split. Meanwhile explore this logic.Modify accordingly.
import numpy as np
from numpy import array
groups = {'group1': array([450, 449.]), 'group2': array([490, 489.]), 'group3': array([568, 567.])}
lv = np.array(list(groups.values()), dtype=object)
lv1 =lv.copy()
lv[:, [1, 0]] = lv[:, [0, 1]]
lv=np.concatenate((lv1, lv), axis=1)
groups = {k : [] for k in groups}
groups = {e: lv[i] for i, e in enumerate(groups)}
print(groups)
output #
{'group1': array([450.0, 449.0, 449.0, 450.0], dtype=object), 'group2': array([490.0, 489.0, 489.0, 490.0], dtype=object), 'group3': array([568.0, 567.0, 567.0, 568.0], dtype=object)}
I am trying to create a list based on some data, but the code I am using is very slow when I run it on large data. So I suspect I am not using all of the Python power for this task. Is there a more efficient and faster way of doing this in Python?
Here an explanantion of the code:
You can think of this problem as a list of games (list_type) each with a list of participating teams and the scores for each team in the game (list_xx).For each of the pairs in the current game it first calculate the sum of the differences in score from the previous competitions (win_comp_past_difs); including only the pairs in the current game. Then it update each pair in the current game with the difference in scores. Using a defaultdict keeps track of the scores for each pair in each game and update this score as each game is played.
In the example below, based on some data, there are for-loops used to create a new variable list_zz.
The data and the for-loop code:
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import permutations
list_type = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'], ['C', 'A', 'B'], ['A'], ['B', 'C']]
list_xx = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0], [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]
list_zz= []
#for-loop
wd = defaultdict(float)
for i, x in zip(list_type, list_xx):
# staff 1
if len(i) == 1:
#print('NaN')
list_zz.append(np.nan)
continue
# Pairs and difference generator for current game (i)
pairs = list(permutations(i, 2))
dgen = (value[0] - value[1] for value in permutations(x, 2))
# Sum of differences from previous games incluiding only pair of teams in the current game
for team, result in zip(i, x):
win_comp_past_difs = sum(wd[key] for key in pairs if key[0] == team)
#print(win_comp_past_difs)
list_zz.append(win_comp_past_difs)
# Update pair differences for current game
for pair, diff in zip(pairs, dgen):
wd[pair] += diff
print(list_zz)
Which looks like this:
[0.0,
0.0,
nan,
-4.0,
4.0,
0.0,
0.0,
0.0,
nan,
-10.0,
13.0,
-3.0,
nan,
3.0,
-3.0,
-6.0,
6.0,
-10.0,
-10.0,
20.0,
nan,
14.0,
-14.0]
If you could elaborate on the code to make it more efficient and execute faster, I would really appreciate it.
Without reviewing the overall design of your code, one improvement pops out at me: move your code to a function.
As currently written, all of the variables you use are global variables. Due to the dynamic nature of the global namespace, Python must look up each global variable you use each and every time you use access it.(1) In CPython, this corresponds to a hash table lookup, which can be expensive, particularly if hash collisions are present.
In contrast, local variables can be known at compile time, and so are stored in a fixed-size array. Accessing these variables therefore only involves dereferencing a pointer, which is comparatively much faster.
With this principal in mind, you should be able to boost your performance (somewhere around a 40% drop in run time) by moving all you your code into a "main" function:
def main():
...
# Your code here
if __name__ == '__main__':
main()
(1) Source
Is it possible to rename/alter all the keys of a dict? As an example, let's look at the following dictionary:
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
I want to remove all the a_ in the keys, so I end up with
a_dict = {'var1': 0.05,
'var2': 4.0,
'var3': 100.0,
'var4': 0.3}
If you want to alter the existing dict, instead of creating a new one, you can loop the keys, pop the old one, and insert the new, modified key with the old value.
>>> for k in list(a_dict):
... a_dict[k[2:]] = a_dict.pop(k)
...
>>> a_dict
{'var2': 4.0, 'var1': 0.05, 'var3': 100.0, 'var4': 0.3}
(Iterating a list(a_dict) will prevent errors due to concurrent modification.)
Strictly speaking, this, too, does not alter the existing keys, but inserts new keys, as it has to re-insert them according to their new hash codes. But it does alter the dictionary as a whole.
As noted in comments, updating the keys in the dict in a loop can in fact be slower than a dict comprehension. If this is a problem, you could also create a new dict using a dict comprehension, and then clear the existing dict and update it with the new values.
>>> b_dict = {k[2:]: a_dict[k] for k in a_dict}
>>> a_dict.clear()
>>> a_dict.update(b_dict)
You can use:
{k[2:]: v for k, v in a_dict.items()}
You can do that easily enough with a dict comprehension.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = { k[2:]:v for k,v in a_dict.items() }
Result:
{'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
You could use the str.replace function to replace key to match the desired format.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = {k.replace('a_', ''): v for k, v in a_dict.items()}
# {'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
I have two multi-index dataframes: mean and std
arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]
mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'
std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'
My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that:
{'A': {
'Z': [(60.0,5.0,10.0),
(120.0,9.0,10.0)],
'Y': [(0.0,2.0,10.0),
(120.0,10.0,10.0)]
},
'B': {
'X': [(0.0,3.0,10.0),
(60.0,7.0,10.0)],
'W': [(0.0,4.0,10.0),
(60.0,8.0,10.0),
(120.0,12.0,10.0)]
}
}
Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.
How do I get there? I constructed already a dict of dict of list of tuples for a single line:
iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}
Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.
{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}
creates errors, and I would only have the inner loop
KeyError: 'the label [Z] is not in the [index]'
Thanks!
EDIT: I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.
Here is a solution using a defaultdict:
from collections import defaultdict
mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')
mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}
sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}
solution = defaultdict(dict)
for k, v in sol.items():
solution[k[0]][k[1]] = v
Resulting dict will be defaultdict object that you can change to dict easily:
solution = dict(solution)
con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
if i[0] not in primary.keys():
primary[i[0]] = dict()
primary[i[0]][i[1]] = list()
for x in con.columns:
primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))
Here is sample output
I found a very comprehensive way of putting up this nested dict:
mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}
creates:
{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}
I'm looking to convert lists like:
idx = ['id','m','x','y','z']
a = ['1, 1.0, 1.11, 1.11, 1.11']
b = ['2, 2.0, 2.22, 2.22, 2,22']
c = ['3, 3.0, 3.33, 3.33, 3.33']
d = ['4, 4.0, 4.44, 4.44, 4.44']
e = ['5, 5.0, 5.55, 5.55, 5.55']
Into a dictionary where:
dictlist = {
'id':[1,2,3,4,5],
'm':[1.0,2.0,3.0,4.0,5.0],
'x':[1.11,2.22,3.33,4.44,5.55],
'y':[1.11,2.22,3.33,4.44,5.55],
'z':[1.11,2.22,3.33,4.44,5.55]
}
But I would like to be able to do this for a longer set of lists >> 6 elements per list. So I assume a function would be best to be able to create dict for the len of elements in the idx list.
**Edit:
in response to g.d.d.c:
I had tried something like:
def make_dict(indx):
data=dict()
for item in xrange(0,len(indx)):
data.update({a[item]:''})
return data
data = make_dict(idx)
Which worked for making:
{'id': '', 'm': '', 'x': '', 'y': '', 'z': ''}
but then adding each value to the dictionary became an issue.
result = {}
keys = idx
lists = [a, b, c, d, e]
for index, key in enumerate(keys):
result[key] = []
for l in lists:
result[key].append(l[index])
As a single comprehension
Start by grouping your lists {a,b,c,d,e,...} into a list of lists
dataset = [a,b,c,d,e]
idx = ['id','m','x','y','z']
d = { k: [v[i] for v in dataset] for i,k in enumerate(idx) }
The last line builds a dictionary by enumerating over idx using the value for the dict key, and its index to pick out the correct column of each data sample.
The comprehension will work regardless of the number of fields, as long as each list has the same length as idx
You can try this:
idx = ['id','m','x','y','z']
a = [1, 1.0, 1.11, 1.11, 1.11]
b = [2, 2.0, 2.22, 2.22, 2,22]
c = [3, 3.0, 3.33, 3.33, 3.33]
d = [4, 4.0, 4.44, 4.44, 4.44]
e = [5, 5.0, 5.55, 5.55, 5.55]
dictlist = {x[0] : list(x[1:]) for x in zip(idx,a,b,c,d,e)}
print dictlist
answer = {}
for key, a,b,c,d,e in zip(idx, map(lambda s:[float(i) for i in s.split(',')], [a,b,c,d,e])):
answer[key] = [a,b,c,d,e]