Related
I have two lists in Python and I'm trying to map the values of one to the other.
List 1 (coordinates):
['7,16', '71,84', '72,48', '36,52', '75,36', '52,28', '76,44', '11,69', '56,35',
'15,21', '32,74', '88,32', '10,74', '61,34', '51,85', '10,75', '55,96',
'94,12', '34,64', '71,59', '76,75', '25,16', '54,100', '62,1', '60,85',
'16,32', '14,77', '40,78', '2,60', '71,4', '78,91', '100,98', '42,32', '37,49',
'49,34', '3,5', '42,77', '39,60', '38,77', '49,40', '40,53', '57,48', '14,99',
'66,67', '10,9', '97,3', '66,76', '86,68', '10,60', '8,87']
List 2 (index):
[3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 2, 3, 1, 3, 2, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3,
1, 2, 1, 3, 2, 2, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 1, 2, 3, 3, 2, 2, 1, 1]
For the output, I need to have something like:
cluster_1: [x, y], [a,b]...
cluster_2: [c, d], [e, f]...
cluster_3: [g, h], [o, j]...
I tried doing this in a dictionary, but I can only get it to put in the last coordinate in the for loop for each value. It also always outputs keys starting from 0, and I'm looking to label them starting from 1.
for i in range(len(patients)):
# other stuff
k = 3
for b in range(k):
if cluster == (k - b):
dct['cluster_%s' % b] = patients[i]
which outputs:
{'cluster_0': '97,3', 'cluster_1': '86,68', 'cluster_2': '8,87'}
I've tried using dct['cluster_%s' % b].append(patients[i]) but I get a key error on cluster_0. Any help would be much appreciated!
You can zip your indices and coordinates, then loop over them element-wise and populate a dictionary based on the index.
clusters = {}
for idx, coord in zip(index, coords):
if idx in clusters:
clusters[idx].append(coord.split(','))
else:
clusters[idx] = [coord.split(',')]
result, where clusters[i] refers the the i-th cluster.
>>> clusters
{
3: [['7', '16'], ['72', '48'], ['36', '52'], ['75', '36'], ['52', '28'], ['76', '44'], ['56', '35'], ['15', '21'], ['88', '32'], ['61', '34'], ['94', '12'], ['71', '59'], ['25', '16'], ['62', '1'], ['16', '32'], ['71', '4'], ['42', '32'], ['37', '49'], ['49', '34'], ['3', '5'], ['49', '40'], ['40', '53'], ['57', '48'], ['10', '9'], ['97', '3']],
2: [['71', '84'], ['32', '74'], ['51', '85'], ['55', '96'], ['34', '64'], ['76', '75'], ['54', '100'], ['60', '85'], ['40', '78'], ['78', '91'], ['100', '98'], ['42', '77'], ['39', '60'], ['38', '77'], ['66', '67'], ['66', '76'], ['86', '68']],
1: [['11', '69'], ['10', '74'], ['10', '75'], ['14', '77'], ['2', '60'], ['14', '99'], ['10', '60'], ['8', '87']]
}
You could use defaultdict along with zip:
from collections import defaultdict
clusters = defaultdict(list)
for id, value in zip(cluster_indices, values):
clusters[id].append(value.split(","))
print(dict(clusters)) # {3: [['7', '16'], ['72', '48'], ...
A defaultdict can be converted to a dict with dict(clusters). However, this may not be necessary since defaultdict basically extends dict.
Note: If you need int values, then you may replace value.split(",") with [int(v) for v in value.split(",")] or list(map(int, value.split(","))). Casting them already at this point will save you an iteration later.
from collections import defaultdict
clusters = defaultdict(list)
for id, value in zip(cluster_indices, values):
clusters[id].append([int(v) for v in value.split(",")])
print(dict(clusters)) # {3: [[7, 16], [72, 48], ...
The group-by behaviour extracted to a function groupby (using a lambda function to allow any kind of transformation) so it can be reused:
from collections import defaultdict
def groupby(indices, values, map_fn):
grouped = defaultdict(list)
for id, value in zip(indices, values):
grouped[id].append(map_fn(id, value))
return dict(grouped)
clusters = groupby(cluster_indices, values, lambda _, value: value.split(","))
print(clusters) # {3: [['7', '16'], ['72', '48'], ...
Here just another way by using itertools.groupby:
from itertools import groupby
from operator import itemgetter
data = sorted(zip(cluster_indices, values), key=itemgetter(0))
grouped = groupby(data, key=itemgetter(0))
clusters = {
cluster: [value[1].split(",") for value in list(values)]
for cluster, values in grouped
}
print(clusters) # {3: [['7', '16'], ['72', '48'], ...
However, I would use the defaultdict approach above or Cory Kramer's answer as it is more simple and easier to read (and therefore preferable)!
I have a certain column in a Pandas Dataframe that have the following unique factor levels:
My_Factor_Levels = [9.0, 0, 6.0, '9', '6', 9, 6, 'DE', '3U', '9.0', '6Z', '6.0', '9.', '6.', '3B', '1U', '2Z', '68', '6B']
Note that there are ten separate values in My_factor_Levels (9.0, 6.0, '9', '6', 9, 6, '9.0', '6.0', '9.', '6.') that represent values from two different factor levels - '9' and '6'. How can I coerce these values to conform to one unique grouping (preferably in string format)? Any help would be much appreciated!
You can try casting values as either int or float and then converting to a set (all unique values in the iterable):
My_Factor_Levels = [9.0, 0, 6.0, '9', '6', 9, 6, 'DE', '3U', '9.0', '6Z', '6.0', '9.', '6.', '3B', '1U', '2Z', '68', '6B']
def safe_convert(x):
try:
return str(float(x))
except:
return x
coerced = set([safe_convert(x) for x in My_Factor_Levels])
>>> coerced
{'0.0', '1U', '2Z', '3B', '3U', '6.0', '68.0', '6B', '6Z', '9.0', 'DE'}
If you would prefer the final coerced result to be a list, simply do list(set(...)) instead.
I was looking for some approach in Python / Unix Command to shuffle large data set of text by grouping based on first words value like below-
Input Text:
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
So it would be randomly shuffled but keep the group together like below
Output Sample-
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
I found solution by normal shuffling, but I am not getting the idea to keep the group while shuffling.
It is possible to do it using collections.defaultdict. By identifying each line by its first sequence you can sort through them easily and then only sample over the dictionary's keys, like so:
import random
from collections import defaultdict
# Read all the lines from the file
lines = defaultdict(list)
with open("/path/to/file", "r") as in_file:
for line in in_file:
s_line = line.split(",")
lines[s_line[0]].append(line)
# Randomize the order
rnd_keys = random.sample(lines.keys(), len(lines))
# Write back to the file?
with open("/path/to/file", "w") as out_file:
for k in rnd_keys:
for line in lines[k]:
out_file.write(line)
Hope this helps in your endeavor.
You could also store each line from the file into a nested list:
lines = []
with open('input_text.txt') as in_file:
for line in in_file.readlines():
line = [x.strip() for x in line.strip().split(',')]
lines.append(line)
Which gives:
[['"ABC"', '21', '15', '45'], ['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5'], ['"QQQ"', '43', '54', '35'], ['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]
Then you could group these lists by the first item with itertools.groupby():
import itertools
from operator import itemgetter
grouped = [list(g) for _, g in itertools.groupby(lines, key = itemgetter(0))]
Which gives a list of your grouped items:
[[['"ABC"', '21', '15', '45']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']], [['"QQQ"', '43', '54', '35']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]]
Then you could shuffle this with random.shuffle():
import random
random.shuffle(grouped)
Which gives a randomized list of your grouped items intact:
[[['"QQQ"', '43', '54', '35']], [['"ABC"', '21', '15', '45']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']]]
And now all you have to do is flatten the final list and write it to a new file, which you can do with itertools.chain.from_iterable():
with open('output_text.txt', 'w') as out_file:
for line in itertools.chain.from_iterable(grouped):
out_file.write(', '.join(line) + '\n')
print(open('output_text.txt').read())
Which a gives new shuffled version of your file:
"QQQ", 43, 54, 35
"ABC", 21, 15, 45
"XZZ", 43, 35, 32
"XZZ", 45, 35, 32
"DEF", 35, 3, 35
"DEF", 124, 33, 5
I have a lists of lists that I want to convert into a 4 value dictionary where the first value in each list is the key. So for example the list would be:
[['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
and i want it to be
{"267-10-7633":[66,85,74,0], "709-40-8165", [71,96,34,0] }
You can use a dictionary comprehension:
lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
{k: v for k, *v in lst}
# {'267-10-7633': ['66', '85', '74', 0], '709-40-8165': ['71', '96', '34', 0]}
If you are on python2, seems like you can't use *v to unpack multiple elements:
{x[0]: x[1:] for x in lst}
# {'267-10-7633': ['66', '85', '74', 0], '709-40-8165': ['71', '96', '34', 0]}
Didn't take care of the type conversion here. I guess you can refer to other answers as to how to do that.
A dict comprehension to compile the dictionary with a list comprehension to convert the strings to int:
> lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
> {l[0]: [int(x) for x in l[1:]] for l in lst}
{'267-10-7633': [66, 85, 74, 0], '709-40-8165': [71, 96, 34, 0]}
A simple and straight forward solution.
lst = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
# create an empty dict
new_dict = {}
# iterate through the list
for item in lst:
# key is first element in the inner list
# value is second element in the inner list
key = item[0]
value = item[1:]
new_dict[key] = value
print new_dict
List comprehensions is suitable in this case
{element[0]: [int(x) for x in element[1:]] for element in\
[['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]}
A simple approach:
your_list = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
dictionary = {}
for item in your_list:
dictionary[item[0]] = [int(i) for i in item[1:]]
print(dictionary)
With list and dict comprehension:
dictionary = {item[0]: [int(i) for i in item[1:]] for item in your_list}
print(dictionary)
In both cases, output:
{'267-10-7633': [66, 85, 74, 0], '709-40-8165': [71, 96, 34, 0]}
ll = [['267-10-7633', '66', '85', '74', 0], ['709-40-8165', '71', '96', '34', 0]]
mydict = {}
for item in ll:
key,*values = item
mydict[key] = values
print(mydict)
Here is what my dataframe looks like:
df = pd.DataFrame([
['01', 'aa', '1+', 1200],
['01', 'ab', '1+', 1500],
['01', 'jn', '1+', 1600],
['02', 'bb', '2', 2100],
['02', 'ji', '2', 785],
['03', 'oo', '2', 5234],
['04', 'hg', '5-', 1231],
['04', 'kf', '5-', 454],
['05', 'mn', '6', 45],
], columns=['faculty_id', 'sub_id', 'default_grade', 'sum'])
df
I want to groupby facility id, ignore subid, aggregate sum, and assign one default_grade to each facility id. How to do that? I know how to groupby facility id and aggregate sum, but I'm not sure about how to assign the default_grade to each facility.
Thanks a lot!
You can apply different functions by column in a groupby using dictionary syntax.
df.groupby('faculty_id').agg({'default_grade': 'first', 'sum': 'sum'})