Values of dictionary = sum of 2dlist at i - python

I have the following 2d list and dictionary:
List2d = [['1', '55', '32', '667' ],
['43', '76', '55', '100'],
['23', '70', '15', '300']]
dictionary = {'New York':0, "London": 0, "Tokyo": 0, "Toronto": 0 }
How do I replace all the values of the dictionary with sums of the columns in List2d? So dictionary will look like this:
dictionary= {'New York' : 67, 'London': 201, 'Tokyo': 102, 'Toronto': 1067}
#67 comes from adding up first column (1+43+23) in 'List2d'
#201 comes from adding up second column (55+76+70) in 'List2d'
#102 comes from adding up third column (32+55+15) in 'List2d'
#1067 comes from adding up fourth column (667+100+300) in 'List2d'

Since Python 3.7, keys in dict are ordered.
You can use enumerate in order to keep track of the position of the element in the dict while iterating over it. Then, you use the i as an index on each row of the 2d list, convert each value to int and do a sum of the result.
List2d = [['1', '55', '32', '667' ],
['43', '76', '55', '100'],
['23', '70', '15', '300']]
dictionary = {'New York':0, "London": 0, "Tokyo": 0, "Toronto": 0 }
for i, city in enumerate(dictionary.keys()):
dictionary[city] = sum(int(row[i]) for row in List2d)
print(dictionary)
# {'New York': 67, 'London': 201, 'Tokyo': 102, 'Toronto': 1067}

Use pandas
#!pip install pandas
import pandas as pd
pd.DataFrame(List2d, columns=dictionary.keys()).astype(int).sum(axis=0).to_dict()
output:
{'New York': 67, 'London': 201, 'Tokyo': 102, 'Toronto': 1067}

Related

Mapping items in one list to the items in another list

I have two lists in Python and I'm trying to map the values of one to the other.
List 1 (coordinates):
['7,16', '71,84', '72,48', '36,52', '75,36', '52,28', '76,44', '11,69', '56,35',
'15,21', '32,74', '88,32', '10,74', '61,34', '51,85', '10,75', '55,96',
'94,12', '34,64', '71,59', '76,75', '25,16', '54,100', '62,1', '60,85',
'16,32', '14,77', '40,78', '2,60', '71,4', '78,91', '100,98', '42,32', '37,49',
'49,34', '3,5', '42,77', '39,60', '38,77', '49,40', '40,53', '57,48', '14,99',
'66,67', '10,9', '97,3', '66,76', '86,68', '10,60', '8,87']
List 2 (index):
[3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 2, 3, 1, 3, 2, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3,
1, 2, 1, 3, 2, 2, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 1, 2, 3, 3, 2, 2, 1, 1]
For the output, I need to have something like:
cluster_1: [x, y], [a,b]...
cluster_2: [c, d], [e, f]...
cluster_3: [g, h], [o, j]...
I tried doing this in a dictionary, but I can only get it to put in the last coordinate in the for loop for each value. It also always outputs keys starting from 0, and I'm looking to label them starting from 1.
for i in range(len(patients)):
# other stuff
k = 3
for b in range(k):
if cluster == (k - b):
dct['cluster_%s' % b] = patients[i]
which outputs:
{'cluster_0': '97,3', 'cluster_1': '86,68', 'cluster_2': '8,87'}
I've tried using dct['cluster_%s' % b].append(patients[i]) but I get a key error on cluster_0. Any help would be much appreciated!
You can zip your indices and coordinates, then loop over them element-wise and populate a dictionary based on the index.
clusters = {}
for idx, coord in zip(index, coords):
if idx in clusters:
clusters[idx].append(coord.split(','))
else:
clusters[idx] = [coord.split(',')]
result, where clusters[i] refers the the i-th cluster.
>>> clusters
{
3: [['7', '16'], ['72', '48'], ['36', '52'], ['75', '36'], ['52', '28'], ['76', '44'], ['56', '35'], ['15', '21'], ['88', '32'], ['61', '34'], ['94', '12'], ['71', '59'], ['25', '16'], ['62', '1'], ['16', '32'], ['71', '4'], ['42', '32'], ['37', '49'], ['49', '34'], ['3', '5'], ['49', '40'], ['40', '53'], ['57', '48'], ['10', '9'], ['97', '3']],
2: [['71', '84'], ['32', '74'], ['51', '85'], ['55', '96'], ['34', '64'], ['76', '75'], ['54', '100'], ['60', '85'], ['40', '78'], ['78', '91'], ['100', '98'], ['42', '77'], ['39', '60'], ['38', '77'], ['66', '67'], ['66', '76'], ['86', '68']],
1: [['11', '69'], ['10', '74'], ['10', '75'], ['14', '77'], ['2', '60'], ['14', '99'], ['10', '60'], ['8', '87']]
}
You could use defaultdict along with zip:
from collections import defaultdict
clusters = defaultdict(list)
for id, value in zip(cluster_indices, values):
clusters[id].append(value.split(","))
print(dict(clusters)) # {3: [['7', '16'], ['72', '48'], ...
A defaultdict can be converted to a dict with dict(clusters). However, this may not be necessary since defaultdict basically extends dict.
Note: If you need int values, then you may replace value.split(",") with [int(v) for v in value.split(",")] or list(map(int, value.split(","))). Casting them already at this point will save you an iteration later.
from collections import defaultdict
clusters = defaultdict(list)
for id, value in zip(cluster_indices, values):
clusters[id].append([int(v) for v in value.split(",")])
print(dict(clusters)) # {3: [[7, 16], [72, 48], ...
The group-by behaviour extracted to a function groupby (using a lambda function to allow any kind of transformation) so it can be reused:
from collections import defaultdict
def groupby(indices, values, map_fn):
grouped = defaultdict(list)
for id, value in zip(indices, values):
grouped[id].append(map_fn(id, value))
return dict(grouped)
clusters = groupby(cluster_indices, values, lambda _, value: value.split(","))
print(clusters) # {3: [['7', '16'], ['72', '48'], ...
Here just another way by using itertools.groupby:
from itertools import groupby
from operator import itemgetter
data = sorted(zip(cluster_indices, values), key=itemgetter(0))
grouped = groupby(data, key=itemgetter(0))
clusters = {
cluster: [value[1].split(",") for value in list(values)]
for cluster, values in grouped
}
print(clusters) # {3: [['7', '16'], ['72', '48'], ...
However, I would use the defaultdict approach above or Cory Kramer's answer as it is more simple and easier to read (and therefore preferable)!

Python Coercing Mixed Factor Levels to String

I have a certain column in a Pandas Dataframe that have the following unique factor levels:
My_Factor_Levels = [9.0, 0, 6.0, '9', '6', 9, 6, 'DE', '3U', '9.0', '6Z', '6.0', '9.', '6.', '3B', '1U', '2Z', '68', '6B']
Note that there are ten separate values in My_factor_Levels (9.0, 6.0, '9', '6', 9, 6, '9.0', '6.0', '9.', '6.') that represent values from two different factor levels - '9' and '6'. How can I coerce these values to conform to one unique grouping (preferably in string format)? Any help would be much appreciated!
You can try casting values as either int or float and then converting to a set (all unique values in the iterable):
My_Factor_Levels = [9.0, 0, 6.0, '9', '6', 9, 6, 'DE', '3U', '9.0', '6Z', '6.0', '9.', '6.', '3B', '1U', '2Z', '68', '6B']
def safe_convert(x):
try:
return str(float(x))
except:
return x
coerced = set([safe_convert(x) for x in My_Factor_Levels])
>>> coerced
{'0.0', '1U', '2Z', '3B', '3U', '6.0', '68.0', '6B', '6Z', '9.0', 'DE'}
If you would prefer the final coerced result to be a list, simply do list(set(...)) instead.

Shuffling text from file by group of data

I was looking for some approach in Python / Unix Command to shuffle large data set of text by grouping based on first words value like below-
Input Text:
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
So it would be randomly shuffled but keep the group together like below
Output Sample-
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
I found solution by normal shuffling, but I am not getting the idea to keep the group while shuffling.
It is possible to do it using collections.defaultdict. By identifying each line by its first sequence you can sort through them easily and then only sample over the dictionary's keys, like so:
import random
from collections import defaultdict
# Read all the lines from the file
lines = defaultdict(list)
with open("/path/to/file", "r") as in_file:
for line in in_file:
s_line = line.split(",")
lines[s_line[0]].append(line)
# Randomize the order
rnd_keys = random.sample(lines.keys(), len(lines))
# Write back to the file?
with open("/path/to/file", "w") as out_file:
for k in rnd_keys:
for line in lines[k]:
out_file.write(line)
Hope this helps in your endeavor.
You could also store each line from the file into a nested list:
lines = []
with open('input_text.txt') as in_file:
for line in in_file.readlines():
line = [x.strip() for x in line.strip().split(',')]
lines.append(line)
Which gives:
[['"ABC"', '21', '15', '45'], ['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5'], ['"QQQ"', '43', '54', '35'], ['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]
Then you could group these lists by the first item with itertools.groupby():
import itertools
from operator import itemgetter
grouped = [list(g) for _, g in itertools.groupby(lines, key = itemgetter(0))]
Which gives a list of your grouped items:
[[['"ABC"', '21', '15', '45']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']], [['"QQQ"', '43', '54', '35']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]]
Then you could shuffle this with random.shuffle():
import random
random.shuffle(grouped)
Which gives a randomized list of your grouped items intact:
[[['"QQQ"', '43', '54', '35']], [['"ABC"', '21', '15', '45']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']]]
And now all you have to do is flatten the final list and write it to a new file, which you can do with itertools.chain.from_iterable():
with open('output_text.txt', 'w') as out_file:
for line in itertools.chain.from_iterable(grouped):
out_file.write(', '.join(line) + '\n')
print(open('output_text.txt').read())
Which a gives new shuffled version of your file:
"QQQ", 43, 54, 35
"ABC", 21, 15, 45
"XZZ", 43, 35, 32
"XZZ", 45, 35, 32
"DEF", 35, 3, 35
"DEF", 124, 33, 5

How to convert a list into a multi value dictionary

I have a lists of lists that I want to convert into a 4 value dictionary where the first value in each list is the key. So for example the list would be:
[['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
and i want it to be
{"267-10-7633":[66,85,74,0], "709-40-8165", [71,96,34,0] }
You can use a dictionary comprehension:
lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
{k: v for k, *v in lst}
# {'267-10-7633': ['66', '85', '74', 0], '709-40-8165': ['71', '96', '34', 0]}
If you are on python2, seems like you can't use *v to unpack multiple elements:
{x[0]: x[1:] for x in lst}
# {'267-10-7633': ['66', '85', '74', 0], '709-40-8165': ['71', '96', '34', 0]}
Didn't take care of the type conversion here. I guess you can refer to other answers as to how to do that.
A dict comprehension to compile the dictionary with a list comprehension to convert the strings to int:
> lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
> {l[0]: [int(x) for x in l[1:]] for l in lst}
{'267-10-7633': [66, 85, 74, 0], '709-40-8165': [71, 96, 34, 0]}
A simple and straight forward solution.
lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
# create an empty dict
new_dict = {}
# iterate through the list
for item in lst:
# key is first element in the inner list
# value is second element in the inner list
key = item[0]
value = item[1:]
new_dict[key] = value
print new_dict
List comprehensions is suitable in this case
{element[0]: [int(x) for x in element[1:]] for element in\
[['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]}
A simple approach:
your_list = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
dictionary = {}
for item in your_list:
dictionary[item[0]] = [int(i) for i in item[1:]]
print(dictionary)
With list and dict comprehension:
dictionary = {item[0]: [int(i) for i in item[1:]] for item in your_list}
print(dictionary)
In both cases, output:
{'267-10-7633': [66, 85, 74, 0], '709-40-8165': [71, 96, 34, 0]}
ll = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
mydict = {}
for item in ll:
key,*values = item
mydict[key] = values
print(mydict)

python pandas groupby about categorial variables

Here is what my dataframe looks like:
df = pd.DataFrame([
['01', 'aa', '1+', 1200],
['01', 'ab', '1+', 1500],
['01', 'jn', '1+', 1600],
['02', 'bb', '2', 2100],
['02', 'ji', '2', 785],
['03', 'oo', '2', 5234],
['04', 'hg', '5-', 1231],
['04', 'kf', '5-', 454],
['05', 'mn', '6', 45],
], columns=['faculty_id', 'sub_id', 'default_grade', 'sum'])
df
I want to groupby facility id, ignore subid, aggregate sum, and assign one default_grade to each facility id. How to do that? I know how to groupby facility id and aggregate sum, but I'm not sure about how to assign the default_grade to each facility.
Thanks a lot!
You can apply different functions by column in a groupby using dictionary syntax.
df.groupby('faculty_id').agg({'default_grade': 'first', 'sum': 'sum'})

Categories