I have a dictionary in this format:
d = {'type 1':[1,2,3],'type 2':['a','b','c']}
It's a symmetric dictionary, in the sense that for each key I'll always have the same number of elements.
There is a way to loop as if it were rows:
for row in d.rows():
print row
So I get the output:
[1]: 1, a
[2]: 2, b
[3]: 3, b
You can zip the .values(), but the order is not guaranteed unless you are using a collections.OrderedDict (or a fairly recent version of PyPy):
for row in zip(*d.values()):
print(row)
e.g. when I run this, I get
('a', 1)
('b', 2)
('c', 3)
but I could have just as easily gotten:
(1, 'a')
(2, 'b')
(3, 'c')
(and I might get it on future runs if hash randomization is enabled).
If you want a specified order and have a finite set of keys that you know up front, you can just zip those items directly:
zip(d['type 1'], d['type 2'])
If you want your rows ordered alphabetically by key you might consider:
zip(*(v for k, v in sorted(d.iteritems())))
# [(1, 'a', 'd', 4), (2, 'b', 'e', 5), (3, 'c', 'f', 6)]
If your dataset is huge then consider itertools.izip or pandas.
import pandas as pd
df = pd.DataFrame(d)
print df.to_string(index=False)
# type 1 type 2 type 3 type 4
# 1 a d 4
# 2 b e 5
# 3 c f 6
print df.to_csv(index=False, header=False)
# 1,a,d,4
# 2,b,e,5
# 3,c,f,6
Even you pass the dictionary to OrderedDict you will not get ordered result as dictionary entry. So here tuples of tuple can be a good option.
See the code and output below:
Code (Python 3):
import collections
t = (('type 1',[1,2,3]),
('type 2',['a','b','c']),
('type 4',[4,5,6]),
('type 3',['d','e','f']))
d = collections.OrderedDict(t)
d_items = list(d.items())
values_list = []
for i in range(len(d_items)):
values_list.append(d_items[i][1])
values_list_length = len(values_list)
single_list_length = len(values_list[0])
for i in range(0,single_list_length):
for j in range(0,values_list_length):
print(values_list[j][i],' ',end='')
print('')
Output:
1 a 4 d
2 b 5 e
3 c 6 f
Related
Given I have the following list:
group
code
A
1
A
2
A
3
B
4
B
5
B
6
B
7
How do I create the following list in a pythonic way?:
group
code
code
A
1
2
A
1
3
A
2
3
B
4
5
B
4
6
B
4
7
B
5
6
B
5
7
B
6
7
I saw from another ticket that suggests using itertools import combinations. But how to get by the grouping restriction: I don't want all matches, just ones within groups.
You need to use itertool to get combinations all possible code combinations for each group
from itertools import combinations
A = [1,2,3]
B = [4,5,6,7]
comb_A = combinations(A, 2)
comb_B = combinations(B, 2)
#to see the results iterate through all the combinations
# For A (the same applies for B)
for comb in comb_A:
print(comb)
>>> (1,2)
>>> (1,3)
>>> (2,3)
Note: It will be more helpful if you could provide the "list" so we can give a more specific answer
Since you didn't post a MWE of what you tried, I'll show you steps that you can implement yourself.
Build a dictionary of data groups, say group_dict.
Create an empty list for result.
Iterate through the items in group_dict, where each item is a group name and a list of codes for that group.
For each group, use the combinations function to generate all possible combinations of 2 codes.
For each combination, append a tuple of the group name, the first code, and the second code to result.
This should give you a result like this:
[('A', 1, 2), ('A', 1, 3), ('A', 2, 3), ('B', 4, 5), ('B', 4, 6), ('B', 4, 7), ('B', 5, 6), ('B', 5, 7), ('B', 6, 7)]
Suppose I have two tables A and B where
A is
and B is
I have to merge them in such a way that the new table looks like this
For the common first elements in A and B, I'm taking the weighted average of middle elements of both rows having the common first elements. For example:
A and B have 'AAA' in common, so I'll compute the middle element using (5 * 3 + 5 * 2) / (3 + 2) = 5. Hence the first row of the third table becomes 'AAA', 5, 3 + 2 = 5.
I know it can be done by iterating over all the elements if I use lists, but is there a faster way to do this?
edit from comments: I'm also searching for a simpler way using pandas.DataFrame
The simplest pure python solution would be to use dictionary-like data structure, with keys being your labels, and values pairs (value = quantity * weight, quantity):
from collections import defaultdict
A = [
('A', 5, 3),
('B', 6, 1),
('D', 10, 2),
('C', 2, 4),
]
B = [
('A', 5, 5),
('D', 2, 1),
('B', 5, 4),
]
# we need to calculate (value, quantity) for each label:
a = {key: [weight * quantity, quantity] for key, weight, quantity in A}
b = {key: [weight * quantity, quantity] for key, weight, quantity in B}
# defaultdict is a dictionary like structure, but able to create
# a new item if key is not found, in our case a new [0, 0] list:
merged = defaultdict(lambda: list((0, 0)))
# let's sum quantities and
for key, pair in a.items() + b.items():
# add both value and quantity respectively
quantity, value = map(sum, zip(merged[key], pair))
merged[key] = [quantity, value]
# now let's calculate means
for key, (quantity, value) in merged.items():
mean = quantity / float(value)
merged[key] = [mean, value]
for item in merged.items():
print item
And even simplier it is using pandas:
import pandas as pd
# first let's create dataframes
colnames = 'label weight quantity'.split()
A = pd.DataFrame.from_records([
('A', 5, 3),
('B', 6, 1),
('D', 10, 2),
('C', 2, 4),
], columns=colnames)
B = pd.DataFrame.from_records([
('A', 5, 2),
('D', 2, 1),
('B', 5, 4),
], columns=colnames)
# we can just concatenate those DataFrames and do calculation:
df = pd.concat([A, B])
df['value'] = df.weight * df.quantity
# sum each group with the same label
df = df.groupby('label').sum()
del df['weight'] # it got messed up anyway and we don't need it
# and calculate means:
df['mean'] = df.value / df.quantity
print df
print(df[['mean', 'quantity']])
# mean quantity
# label
# A 5.000000 5
# B 5.200000 5
# C 2.000000 4
# D 7.333333 3
You can do better than this, but here is a pandas solution
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df1 = pd.DataFrame({'AAA':np.array([5,3]),'BBB':np.array([6,1]),
.....: 'DDD':np.array([10,2]),'CCC':np.array([2,4])})
In [4]: df2 = pd.DataFrame({'AAA':np.array([5,2]),'DDD':np.array([2,1]),
.....: 'BBB':np.array([5,4])})
In [5]: df = pd.concat([df1,df2])
In [6]: df.transpose()
0 1 0 1
AAA 5 3 5 2
BBB 6 1 5 4
CCC 2 4 NaN NaN
DDD 10 2 2 1
In [7]: vals = np.nan_to_num(df.values)
In [8]: _mean = (vals[0,:]*vals[1,:]+vals[2,:]*vals[3,:])/(vals[1,:]+vals[3,:])
In [9]: _sum = (vals[1,:]+vals[3,:])
In [10]: result = pd.DataFrame(columns = df.columns,data = [_mean,_sum], index=['mean','sum'])
In [11]: result.transpose()
mean sum
AAA 5.000000 5
BBB 5.200000 5
CCC 2.000000 4
DDD 7.333333 3
It probably isn't the most elegant solution, but gets the job done.
I know how to get the most frequent element of list of list, e.g.
a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)
should print [3, 4] even though the most frequent number is 1 for the first element and 2 for the second element.
My question is how to do the same kind of thing with Pandas.DataFrame.
For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df:
def get_max_freq_elem_of_df(df):
# do some things
return freq_list
df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]
Please notice that DataFrame.mode() method does not work. For above example, df.mode() returns [1, 2] not [3,4]
Update
have explained why DataFrame.mode() doesn't work.
You could use groupby.size and then find the max:
>>> df.groupby([0,1]).size()
0 1
1 1 1
2 2
3 1
2 2 1
3 4 3
4 2 1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)
In python you'd use Counter*:
In [11]: from collections import Counter
In [12]: c = Counter(df.itertuples(index=False))
In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})
In [14]: c.most_common(1) # get the top 1 most common items
Out[14]: [((3, 4), 3)]
In [15]: c.most_common(1)[0][0] # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)
* Note that your solution
max(a, key=a.count)
(although it works) is O(N^2), since on each iteration it needs to iterate through a (to get the count), whereas Counter is O(N).
Lets say there is a dictionary
foo = {'b': 1, 'c':2, 'a':3 }
I want to iterate over this dictionary in the order of the appearance of items in the dictionary.
for k,v in foo.items():
print k, v
prints
a 3
c 2
b 1
If we use sorted() function:
for k,v in sorted(foo.items()):
print k, v
prints
a 3
b 1
c 2
But i need them in the order in which they appear in the dictionary i;e
b 1
c 2
a 3
How do i achieve this ?
Dictionaries have no order. If you want to do that, you need to find some method of sorting in your original list. Or, save the keys in a list in the order they are saved and then access the dictionary using those as keys.
From The Python Docs
It is best to think of a dictionary as an unordered set of key: value
pairs, with the requirement that the keys are unique (within one
dictionary).
Example -
>>> testList = ['a', 'c', 'b']
>>> testDict = {'a' : 1, 'c' : 2, 'b' : 3}
>>> for elem in testList:
print elem, testDict[elem]
a 1
c 2
b 3
Or better yet, use an OrderedDict -
>>> from collections import OrderedDict
>>> testDict = OrderedDict([('a', 1), ('c', 2), ('b', 3)])
>>> for key, value in testDict.items():
print key, value
a 1
c 2
b 3
Maybe this?
sorted(foo, key=foo.get)
If you want to use your OrderedDict multiple times, use an OrderedDict like people have said. :) If you just want a one-liner for a one-off, change your sort function:
sorted(foo.items(), lambda a,b:a[1]-b[1])
You can do this by one-liner:
>>> sorted(foo.items(), key=lambda x: x[1])
[('b', 1), ('c', 2), ('a', 3)]
An ordered dictionary would have to be used to remember the order that they were stored in
>>>from collections import OrderedDict
>>>od = OrderedDict()
>>>od['b'] = 1
>>>od['c'] = 2
>>>od['a'] = 3
>>>print od
OrderedDict([('b',1), ('c',2), ('a',3)]
The see this more directly, the order you used to create the dict is not the order of the dict. The order is indeterminate.
>>> {'b': 1, 'c':2, 'a':3 }
{'a': 3, 'c': 2, 'b': 1}
If you just want to sort them by the keys do:
sorted_by_keys_dict = dict((y,x) for x,y in foo.iteritems())
for k,v in sorted(sorted_by_keys_dict.items()):
print v, k
a 1
c 2
b 3
or simply:
for k,v in sorted(dict((y,x) for x,y in foo.iteritems()).items()):
print v, k
a 1
c 2
b 3
I am trying to do something in pyGTk where I build a list of HBoxes:
self.keyvalueboxes = []
for keyval in range(1,self.keyvaluelen):
self.keyvalueboxes.append(gtk.HBox(False, 5))
But I then want to run over the list and assign A text entry & a label into each one both of which are stored in a list.
If your list are of equal length use zip
>>> x = ['a', 'b', 'c', 'd']
>>> y = [1, 2, 3, 4]
>>> z = zip(x,y)
>>> z
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> for l in z: print l[0], l[1]
...
a 1
b 2
c 3
d 4
>>>
Check out http://docs.python.org/library/functions.html#zip. It lets you iterate over two lists at the same time.