Print a dictionary by rows

Print a dictionary by rows - python

I have a dictionary in this format:
d = {'type 1':[1,2,3],'type 2':['a','b','c']}
It's a symmetric dictionary, in the sense that for each key I'll always have the same number of elements.
There is a way to loop as if it were rows:
for row in d.rows():
print row
So I get the output:
[1]: 1, a
[2]: 2, b
[3]: 3, b

You can zip the .values(), but the order is not guaranteed unless you are using a collections.OrderedDict (or a fairly recent version of PyPy):
for row in zip(*d.values()):
print(row)
e.g. when I run this, I get
('a', 1)
('b', 2)
('c', 3)
but I could have just as easily gotten:
(1, 'a')
(2, 'b')
(3, 'c')
(and I might get it on future runs if hash randomization is enabled).
If you want a specified order and have a finite set of keys that you know up front, you can just zip those items directly:
zip(d['type 1'], d['type 2'])

If you want your rows ordered alphabetically by key you might consider:
zip(*(v for k, v in sorted(d.iteritems())))
# [(1, 'a', 'd', 4), (2, 'b', 'e', 5), (3, 'c', 'f', 6)]
If your dataset is huge then consider itertools.izip or pandas.
import pandas as pd
df = pd.DataFrame(d)
print df.to_string(index=False)
# type 1 type 2 type 3 type 4
# 1 a d 4
# 2 b e 5
# 3 c f 6
print df.to_csv(index=False, header=False)
# 1,a,d,4
# 2,b,e,5
# 3,c,f,6

Even you pass the dictionary to OrderedDict you will not get ordered result as dictionary entry. So here tuples of tuple can be a good option.
See the code and output below:
Code (Python 3):
import collections
t = (('type 1',[1,2,3]),
('type 2',['a','b','c']),
('type 4',[4,5,6]),
('type 3',['d','e','f']))
d = collections.OrderedDict(t)
d_items = list(d.items())
values_list = []
for i in range(len(d_items)):
values_list.append(d_items[i][1])
values_list_length = len(values_list)
single_list_length = len(values_list[0])
for i in range(0,single_list_length):
for j in range(0,values_list_length):
print(values_list[j][i],' ',end='')
print('')
Output:
1 a 4 d
2 b 5 e
3 c 6 f

Related

Pythonic way of making combinations in groups

Given I have the following list:
group
code
A
1
A
2
A
3
B
4
B
5
B
6
B
7
How do I create the following list in a pythonic way?:
group
code
code
A
1
2
A
1
3
A
2
3
B
4
5
B
4
6
B
4
7
B
5
6
B
5
7
B
6
7
I saw from another ticket that suggests using itertools import combinations. But how to get by the grouping restriction: I don't want all matches, just ones within groups.

You need to use itertool to get combinations all possible code combinations for each group
from itertools import combinations
A = [1,2,3]
B = [4,5,6,7]
comb_A = combinations(A, 2)
comb_B = combinations(B, 2)
#to see the results iterate through all the combinations
# For A (the same applies for B)
for comb in comb_A:
print(comb)
>>> (1,2)
>>> (1,3)
>>> (2,3)
Note: It will be more helpful if you could provide the "list" so we can give a more specific answer

Since you didn't post a MWE of what you tried, I'll show you steps that you can implement yourself.
Build a dictionary of data groups, say group_dict.
Create an empty list for result.
Iterate through the items in group_dict, where each item is a group name and a list of codes for that group.
For each group, use the combinations function to generate all possible combinations of 2 codes.
For each combination, append a tuple of the group name, the first code, and the second code to result.
This should give you a result like this:
[('A', 1, 2), ('A', 1, 3), ('A', 2, 3), ('B', 4, 5), ('B', 4, 6), ('B', 4, 7), ('B', 5, 6), ('B', 5, 7), ('B', 6, 7)]

Merge two tables based on intersection in Python

Suppose I have two tables A and B where
A is
and B is
I have to merge them in such a way that the new table looks like this
For the common first elements in A and B, I'm taking the weighted average of middle elements of both rows having the common first elements. For example:
A and B have 'AAA' in common, so I'll compute the middle element using (5 * 3 + 5 * 2) / (3 + 2) = 5. Hence the first row of the third table becomes 'AAA', 5, 3 + 2 = 5.
I know it can be done by iterating over all the elements if I use lists, but is there a faster way to do this?
edit from comments: I'm also searching for a simpler way using pandas.DataFrame

The simplest pure python solution would be to use dictionary-like data structure, with keys being your labels, and values pairs (value = quantity * weight, quantity):
from collections import defaultdict
A = [
('A', 5, 3),
('B', 6, 1),
('D', 10, 2),
('C', 2, 4),
]
B = [
('A', 5, 5),
('D', 2, 1),
('B', 5, 4),
]
# we need to calculate (value, quantity) for each label:
a = {key: [weight * quantity, quantity] for key, weight, quantity in A}
b = {key: [weight * quantity, quantity] for key, weight, quantity in B}
# defaultdict is a dictionary like structure, but able to create
# a new item if key is not found, in our case a new [0, 0] list:
merged = defaultdict(lambda: list((0, 0)))
# let's sum quantities and
for key, pair in a.items() + b.items():
# add both value and quantity respectively
quantity, value = map(sum, zip(merged[key], pair))
merged[key] = [quantity, value]
# now let's calculate means
for key, (quantity, value) in merged.items():
mean = quantity / float(value)
merged[key] = [mean, value]
for item in merged.items():
print item
And even simplier it is using pandas:
import pandas as pd
# first let's create dataframes
colnames = 'label weight quantity'.split()
A = pd.DataFrame.from_records([
('A', 5, 3),
('B', 6, 1),
('D', 10, 2),
('C', 2, 4),
], columns=colnames)
B = pd.DataFrame.from_records([
('A', 5, 2),
('D', 2, 1),
('B', 5, 4),
], columns=colnames)
# we can just concatenate those DataFrames and do calculation:
df = pd.concat([A, B])
df['value'] = df.weight * df.quantity
# sum each group with the same label
df = df.groupby('label').sum()
del df['weight'] # it got messed up anyway and we don't need it
# and calculate means:
df['mean'] = df.value / df.quantity
print df
print(df[['mean', 'quantity']])
# mean quantity
# label
# A 5.000000 5
# B 5.200000 5
# C 2.000000 4
# D 7.333333 3

You can do better than this, but here is a pandas solution
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df1 = pd.DataFrame({'AAA':np.array([5,3]),'BBB':np.array([6,1]),
.....: 'DDD':np.array([10,2]),'CCC':np.array([2,4])})
In [4]: df2 = pd.DataFrame({'AAA':np.array([5,2]),'DDD':np.array([2,1]),
.....: 'BBB':np.array([5,4])})
In [5]: df = pd.concat([df1,df2])
In [6]: df.transpose()
0 1 0 1
AAA 5 3 5 2
BBB 6 1 5 4
CCC 2 4 NaN NaN
DDD 10 2 2 1
In [7]: vals = np.nan_to_num(df.values)
In [8]: _mean = (vals[0,:]*vals[1,:]+vals[2,:]*vals[3,:])/(vals[1,:]+vals[3,:])
In [9]: _sum = (vals[1,:]+vals[3,:])
In [10]: result = pd.DataFrame(columns = df.columns,data = [_mean,_sum], index=['mean','sum'])
In [11]: result.transpose()
mean sum
AAA 5.000000 5
BBB 5.200000 5
CCC 2.000000 4
DDD 7.333333 3
It probably isn't the most elegant solution, but gets the job done.

The most frequent pattern of specific columns in Pandas.DataFrame in python

I know how to get the most frequent element of list of list, e.g.
a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)
should print [3, 4] even though the most frequent number is 1 for the first element and 2 for the second element.
My question is how to do the same kind of thing with Pandas.DataFrame.
For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df:
def get_max_freq_elem_of_df(df):
# do some things
return freq_list
df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]
Please notice that DataFrame.mode() method does not work. For above example, df.mode() returns [1, 2] not [3,4]
Update
have explained why DataFrame.mode() doesn't work.

You could use groupby.size and then find the max:
>>> df.groupby([0,1]).size()
0 1
1 1 1
2 2
3 1
2 2 1
3 4 3
4 2 1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)

In python you'd use Counter*:
In [11]: from collections import Counter
In [12]: c = Counter(df.itertuples(index=False))
In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})
In [14]: c.most_common(1) # get the top 1 most common items
Out[14]: [((3, 4), 3)]
In [15]: c.most_common(1)[0][0] # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)
* Note that your solution
max(a, key=a.count)
(although it works) is O(N^2), since on each iteration it needs to iterate through a (to get the count), whereas Counter is O(N).

Python Sorted by Index

Lets say there is a dictionary
foo = {'b': 1, 'c':2, 'a':3 }
I want to iterate over this dictionary in the order of the appearance of items in the dictionary.
for k,v in foo.items():
print k, v
prints
a 3
c 2
b 1
If we use sorted() function:
for k,v in sorted(foo.items()):
print k, v
prints
a 3
b 1
c 2
But i need them in the order in which they appear in the dictionary i;e
b 1
c 2
a 3
How do i achieve this ?

Dictionaries have no order. If you want to do that, you need to find some method of sorting in your original list. Or, save the keys in a list in the order they are saved and then access the dictionary using those as keys.
From The Python Docs
It is best to think of a dictionary as an unordered set of key: value
pairs, with the requirement that the keys are unique (within one
dictionary).
Example -
>>> testList = ['a', 'c', 'b']
>>> testDict = {'a' : 1, 'c' : 2, 'b' : 3}
>>> for elem in testList:
print elem, testDict[elem]
a 1
c 2
b 3
Or better yet, use an OrderedDict -
>>> from collections import OrderedDict
>>> testDict = OrderedDict([('a', 1), ('c', 2), ('b', 3)])
>>> for key, value in testDict.items():
print key, value
a 1
c 2
b 3

Maybe this?
sorted(foo, key=foo.get)

If you want to use your OrderedDict multiple times, use an OrderedDict like people have said. :) If you just want a one-liner for a one-off, change your sort function:
sorted(foo.items(), lambda a,b:a[1]-b[1])

You can do this by one-liner:
>>> sorted(foo.items(), key=lambda x: x[1])
[('b', 1), ('c', 2), ('a', 3)]

An ordered dictionary would have to be used to remember the order that they were stored in
>>>from collections import OrderedDict
>>>od = OrderedDict()
>>>od['b'] = 1
>>>od['c'] = 2
>>>od['a'] = 3
>>>print od
OrderedDict([('b',1), ('c',2), ('a',3)]

The see this more directly, the order you used to create the dict is not the order of the dict. The order is indeterminate.
>>> {'b': 1, 'c':2, 'a':3 }
{'a': 3, 'c': 2, 'b': 1}

If you just want to sort them by the keys do:
sorted_by_keys_dict = dict((y,x) for x,y in foo.iteritems())
for k,v in sorted(sorted_by_keys_dict.items()):
print v, k
a 1
c 2
b 3
or simply:
for k,v in sorted(dict((y,x) for x,y in foo.iteritems()).items()):
print v, k
a 1
c 2
b 3

How to iterate over two lists?

I am trying to do something in pyGTk where I build a list of HBoxes:
self.keyvalueboxes = []
for keyval in range(1,self.keyvaluelen):
self.keyvalueboxes.append(gtk.HBox(False, 5))
But I then want to run over the list and assign A text entry & a label into each one both of which are stored in a list.

If your list are of equal length use zip
>>> x = ['a', 'b', 'c', 'd']
>>> y = [1, 2, 3, 4]
>>> z = zip(x,y)
>>> z
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> for l in z: print l[0], l[1]
...
a 1
b 2
c 3
d 4
>>>

Check out http://docs.python.org/library/functions.html#zip. It lets you iterate over two lists at the same time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Print a dictionary by rows - python

Related

Pythonic way of making combinations in groups

Merge two tables based on intersection in Python

The most frequent pattern of specific columns in Pandas.DataFrame in python

Python Sorted by Index

How to iterate over two lists?

Categories

Resources