Delete item from a list of Dictionaries - python

I have a quick one.
I do have a long list of dictionaries that looks like this:
mydict = [{'id': '450118',
'redcap_event_name': 'preliminary_arm_1',
'redcap_repeat_instrument': '',
'redcap_repeat_instance': '',
'date_today': '2022-11-04',
'timestamp': '2022-11-04 10:49',
'doc_source': '1',
'hosp_id': '45',
'study_id': '18',
'participant_name': 'CHAR WA WAN',
'ipno': '141223',
'dob': '2020-06-30'},
{'id': '450118',
'redcap_event_name': 'preliminary_arm_1',
'redcap_repeat_instrument': '',
'redcap_repeat_instance': '',
'date_today': '2022-11-04',
'timestamp': '2022-11-04 10:49',
'doc_source': '1',
'hosp_id': '45',
'study_id': '01118',
'participant_name': 'CHARIT',
'ipno': '1413',
'dob': '2020-06-30'}]
Now I want to do a simple thing, I do want to delete this 3 items from the dictionaries ,'redcap_event_name','redcap_repeat_instrument','redcap_repeat_instance'.
I have tried writing this code but its not deleting at all
for k in mydict:
for j in k.keys():
if j == 'preliminary_arm_1':
del j
My final result is the original list of dictionaries but without the 3 items mentioned above. any help will highly be appreciated

You can iterate over each dict and then iterate over each key you want to delete. At the end delete key from each dict.
del_keys = ['redcap_event_name','redcap_repeat_instrument','redcap_repeat_instance']
for dct in mydict:
for k in del_keys:
# To delete a key regardless of whether it is in the dictionary or not
dct.pop(k, None)
print(mydict)
Output:
[{'id': '450118',
'date_today': '2022-11-04',
'timestamp': '2022-11-04 10:49',
'doc_source': '1',
'hosp_id': '45',
'study_id': '18',
'participant_name': 'CHAR WA WAN',
'ipno': '141223',
'dob': '2020-06-30'},
{'id': '450118',
'date_today': '2022-11-04',
'timestamp': '2022-11-04 10:49',
'doc_source': '1',
'hosp_id': '45',
'study_id': '01118',
'participant_name': 'CHARIT',
'ipno': '1413',
'dob': '2020-06-30'}]

Maybe it helps:
[{j: k[j] for j in k.keys() if j not in ['redcap_event_name','redcap_repeat_instrument','redcap_repeat_instance']}
for k in mydict]

Related

TypeError: string indices must be integers (working with pandas Dataframe)

Humbly asking again for the community's help.
I have a task in Data Analysis, to research the connections between different columns of the dataset given. For that sake I have to edit the columns I want to work with. The column I need contains data, which looks like a list of dictionaries, but it's actually a string. So I have to edit it to take 'name' values from those former "dictionaries".
The code below represents my magical rituals to take "name" values from that string, to save them in another column as a string with only those "name" values collected in a list, after what I would apply that function to a whole column and group it by unique combinations of those strings with "name" values. (Maximum-task was to separate those "name" values for several additional columns, to sort them later by all these columns; but the problem appeared, that a huge string in source column (df['specializations']) can contain a number of "dictionaries", so I can't know exactly, how many additional columns to create for them; so I gave up on that idea.)
Typical string with pseudo-list of dictionaries looks like that (the number of those "dictionaries" varies):
[{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}]
import re
def get_names_values(df):
for a in df['specializations']:
for r in (("\'", "\""), ("[", ""), ("]", ""), ("}", "")):
a = a.replace(*r)
a = re.split("{", a)
m = 0
while m < len(a):
if a[m] in ('', ': ', ', '):
del a[m]
m += 1
a = "".join(a)
a = re.split("\"", a)
n = 0
while n < len(a):
if a[n] in ('', ': ', ', '):
del a[n]
n += 1
nameslist = []
for num in range(len(a)):
if a[num] == 'name':
nameslist.append(a[num+1])
return str(nameslist)
df['specializations_names'] = df['specializations'].fillna('{}').apply(get_names_values)
df['specializations_names']
The problem arouses with for a in df['specializations']:, as it raises
TypeError: string indices must be integers. I checked that cycle separately, like (print(a)), and it gave me a proper result; I tried it also via:
for k in range(len(df)):
a = df['specializations'][k]
and again, separately it worked as I needed, but inside my function it raises TypeError.
I feel like I'm going to give up on ['specialization'] column and try researching some others; but still I'm curious what's wrong here and how to solve this problem.
Huge thanks to all those who will try to advise, in advance.
What you've encountered as a "string with pseudo-list of dictionaries" seems to be json data. You may use eval() to convert it to an actual list of dicts and then operate with it normally. Use eval() with caution, though. I tried to recreate that string and make it work:
str_dicts = str([{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'},
{'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'},
{'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}])
dicts = list(eval(str_dicts))
names = [d['name'] for d in dicts]
print(names)
[0]: ['Beginner', 'Testing', 'IT']
If your column is a Series of strings that are in fact lists of dicts, then you may want to do such list comprehension:
df['specializations_names'] = [[d['name'] for d in list(eval(row))]
for row in df['specializations']]
I tried to partially reproduce what you tried to do from what you provided:
import pandas as pd
str_dicts = str([{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'},
{'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'},
{'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}])
df = pd.DataFrame({'specializations': [str_dicts, str_dicts, str_dicts]})
df['specializations_names'] = [[d['name'] for d in list(eval(row))]
for row in df['specializations']]
print(df)
Which resulted in:
specializations
specializations_names
0
[{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}]
['Beginner', 'Testing', 'IT']
1
[{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}]
['Beginner', 'Testing', 'IT']
2
[{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}]
['Beginner', 'Testing', 'IT']
Consequently, there could be strings with lists of any number of dicts instead of the dummies I used, as many as the length of df.

Flatten list of list into list for dictionary

I have a dictionary that looks like this:
{'data': [['748','','285','102','76024']]}
and I want to flatten the lists to look like this:
{'data': ['748','','285','102','76024']}
I have tried this from here:
[item for sublist in data.items() for item in sublist]
but it gives me:
['data',
[['748',
'',
'285',
'102',
'76024',
'88',
'3',
'89%831',
'77%',
'',
'68%632',
'19%177',
'13%120']]]
Based on the title, I noticed that you might have list of lists in each item in your dictionary. Using itertools.chain() you can merge multiple lists into one:
import itertools
data = {'data': [['748','','285','102','76024']]}
data1 = {'data': [['748','','285','102','76024'], ['12', '13', '14']]}
output = {k: list(itertools.chain(*v)) for k,v in data.items()}
output1 = {k: list(itertools.chain(*v)) for k,v in data1.items()}
Output:
# Output
{'data': ['748', '', '285', '102', '76024']}
# Output1
{'data': ['748', '', '285', '102', '76024', '12', '13', '14']}

Convert multiple lists into dictionary

I want to convert following
Input: Name;class;subject;grade
sam;4;maths;A
tom;5;science;B
kathy;8;biology;A
nancy;9;maths;B
output: [Name:sam,class:4,subject: maths, grade:A],[name:tom,class:5,subject:science,grade:B],[name: kathy,class:8,subject:biology,grade:B],[name:nancy,class:9,subject:maths,grade:B]
You can create a function that accepts strings in a way that each piece of data is seperated by a character like : or ;.
Then you can use
string.split("the character you used")
to get a list of each piece of data stored in a list.
And finally you can store each of these elements in a dictionary and append that dictionary into the list you want to have as your output.
This code I used in my python shell will help you understand these operations better.
>>> input_string = "Tom:6:Maths"
>>> list_of_elements = input_string.split(":")
>>> container_dictioanry = {"Name":list_of_elements[0], "class":list_of_elements[1], "grade":list_of_elements[2]}
>>> output_list = []
>>> output_list.append(container_dictioanry)
>>> print(output_list)
[{'Name': 'Tom', 'class': '6', 'grade': 'Maths'}]
Basically yours is just a csv text chunk with delimiter ;
It can be as simple as:
input_text = '''<YOUR DATA>'''
lines = input_text.split('\n')
headers = lines[0].split(';')
output = [
dict(zip(headers, line.split(';')))
for line in lines[1:]
]
Since the text is CSV, you can use the csv library.
>>> import csv
>>>
>>>
>>> foo = '''Name;class;subject;grade
... sam;4;maths;A
... tom;5;science;B
... kathy;8;biology;A
... nancy;9;maths;B'''
>>>
>>> reader = csv.DictReader(foo.splitlines(), delimiter=';')
>>> print([row for row in reader])
[{'Name': 'sam', 'class': '4', 'subject': 'maths', 'grade': 'A'}, {'Name': 'tom', 'class': '5', 'subject': 'science', 'grade': 'B'}, {'Name': 'kathy', 'class': '8', 'subject': 'biology', 'grade': 'A'}, {'Name': 'nancy', 'class': '9', 'subject': 'maths', 'grade': 'B'}]

How to sort data in the dictionary of list of dictionary in python?

Please help me. I have dataset like this:
my_dict = { 'project_1' : [{'commit_number':'14','name':'john'},
{'commit_number':'10','name':'steve'}],
'project_2' : [{'commit_number':'12','name':'jack'},
{'commit_number':'15','name':'anna'},
{'commit_number':'11','name':'andy'}]
}
I need to sort the dataset based on the commit number in descending order and make it into a new list by ignoring the name of the project using python. The list expected will be like this:
ordered_list_of_dict = [{'commit_number':'15','name':'anna'},
{'commit_number':'14','name':'john'},
{'commit_number':'12','name':'jack'},
{'commit_number':'11','name':'andy'},
{'commit_number':'10','name':'steve'}]
Thank you so much for helping me.
Extract my_dict's values as a list of lists*
Join each sub-list together (flatten dict_values) to form a flat list
Sort each element by commit_number
*list of lists on python2. On python3, a dict_values object is returned.
from itertools import chain
res = sorted(chain.from_iterable(my_dict.values()),
key=lambda x: x['commit_number'],
reverse=True)
[{'commit_number': '15', 'name': 'anna'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '12', 'name': 'jack'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '10', 'name': 'steve'}]
On python2, you'd use dict.itervalues instead of dict.values to the same effect.
Coldspeed's answer is great as usual but as an alternative, you can use the following:
ordered_list_of_dict = sorted([x for y in my_dict.values() for x in y], key=lambda x: x['commit_number'], reverse=True)
which, when printed, gives:
print(ordered_list_of_dict)
# [{'commit_number': '15', 'name': 'anna'}, {'commit_number': '14', 'name': 'john'}, {'commit_number': '12', 'name': 'jack'}, {'commit_number': '11', 'name': 'andy'}, {'commit_number': '10', 'name': 'steve'}]
Note that in the list-comprehension you have the standard construct for flattening a list of lists:
[x for sublist in big_list for x in sublist]
I'll provide the less-pythonic and more reader-friendly answer.
First, iterate through key-value pairs in my_dict, and add each element of value to an empty list. This way you avoid having to flatten out a list of lists:
commits = []
for key, val in my_dict.items():
for commit in val:
commits.append(commit)
which gives this:
In [121]: commits
Out[121]:
[{'commit_number': '12', 'name': 'jack'},
{'commit_number': '15', 'name': 'anna'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '10', 'name': 'steve'}]
Then sort it in descending order:
sorted(commits, reverse = True)
This will sort based on 'commit_number' even if you don't specify it because it comes alphabetically before 'name'. If you want to specify it for the sake of defensive coding, this would be fastest and cleanest way, to the best of my knowledge :
from operator import itemgetter
sorted(commits, key = itemgetter('commit_number'), reverse = True)

List of values to dictionary

I'm trying to make a list of values to a list of dictionary's based on my set keys. I tried the following but i'm loosing all the other values because of the duplicate key names.
>>> values = ['XS ', '1', 'S ', '10', 'M ', '1', 'L ', '10', 'XL ', '10']
>>> keys = ['size', 'stock'] * (len(values) / 2)
>>> result = dict(zip(keys, values))
>>> print result
{'stock': '10', 'size': 'XL '}
What i'm trying to achieve is a list of the dicts like below. How can I achieve this?
[{'stock': '10', 'size': 'XL '}, {'stock': '10', 'size': 'L'}, ......]
You can use a list comprehension like following:
>>> values = ['XS ', '1', 'S ', '10', 'M ', '1', 'L ', '10', 'XL ', '10']
>>> [{'size':i, 'stock':j} for i, j in zip(values[0::2], values[1::2])]
[{'stock': '1', 'size': 'XS '}, {'stock': '10', 'size': 'S '}, {'stock': '1', 'size': 'M '}, {'stock': '10', 'size': 'L '}, {'stock': '10', 'size': 'XL '}]
Note that in this case you don't have to multiply the keys.
Usually the point of using a dict is to associate unique keys to associated values, you were originally trying to associate size: ... and stock: ... for each item but why not link the size to stock? In that case you would simply do:
result = dict(zip(values[::2], values[1::2]))
or without needing slicing:
value_iter = iter(values)
result = dict(zip(value_iter, value_iter))
This grabs two elements from the list at a time.
This way you still know that a given key in the dict is the size and the associated value is the stock for that size.

Categories