Related
I have a list of list that look like this, they have been sorted so that duplicate IDs are arranged with the one I want to keep at the top..
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
What is the simplest way to remove rows that have duplicate IDs but always keep the first one? In this instance the end result would look like this...
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
I know I can remove duplicates from lists by converting to set like set(my_list) but in this instance it is duplicates by ID that I want to remove by
Since you already hav the list sorted properly, a simple way to do this is to use itertools.groupby to grab the first element of each group in a list comprehension:
from itertools import groupby
l= [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
[next(g) for k, g in groupby(l, key=lambda d: d['id'])]
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]
I would probably convert to Pandas DataFrame and then use drop_duplicates
import pandas as pd
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
df = pd.DataFrame(data)
df.drop_duplicates(subset=['id'], inplace=True)
print(df.to_dict('records'))
# Output
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]
Here's an answer that involves no external modules or unnecessary manipulation of the data:
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
seen = set()
result = [row for row in data if row['id'] not in seen and not seen.add(row['id'])]
print(result)
Result:
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]
Note that the not seen.add(row['id'])] part of the list comprehension will always be True. It's just a way of noting that a unique entry has been seen by adding it to the seen set.
Let's take the name of the given list as data.
unique_ids = []
result = []
for item in data:
if item["id"] not in unique_ids:
result.append(item)
unique_ids.append(item["id"])
print(result)
The result will be,
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]
Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
I have a panda dataframe with different set of values like first one is an list or array and other elements or not
>>> df_3['integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record']
0 [{'Internalid': '24348', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3127'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4434'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5545'}]}}, {'Internalid': '24349', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3125'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4268'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5418'}]}}, {'Internalid': '24350', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3122'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel425'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5221'}]}}]
0 {'isDelete': 'false', 'fields': {'field': [{'id': 'S_EAST', 'value': 'N'}, {'id': 'W_EST', 'value': 'N'}, {'id': 'M_WEST', 'value': 'N'}, {'id': 'N_EAST', 'value': 'N'}, {'id': 'LOW_AREYOU_ASSET', 'value': '-1'}, {'id': 'LOW_SWART_PROG', 'value': '-1'}]}}
0 {'isDelete': 'false', 'fields': {'field': {'id': 'LOW_COD_CONDUCT', 'value': '-1'}}}
0 {'isDelete': 'false', 'fields': {'field': [{'id': 'LOW_SUPPLIER_TYPE', 'value': '2'}, {'id': 'LOW_DO_INT_BOTH', 'value': '1'}]}}
I want explode this into multiple rows. The first row is list and other rows or not ?
>>> type(df_3)
<class 'pandas.core.frame.DataFrame'>
>>> type(df_3['integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record'])
<class 'pandas.core.series.Series'>
Expected output -
{'Internalid': '24348', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3127'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4434'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5545'}]}}
{'Internalid': '24349', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3125'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4268'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5418'}]}}
{'Internalid': '24350', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3122'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel425'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5221'}]}}]
{'isDelete': 'false', 'fields': {'field': [{'id': 'S_EAST', 'value': 'N'}, {'id': 'W_EST', 'value': 'N'}, {'id': 'M_WEST', 'value': 'N'}, {'id': 'N_EAST', 'value': 'N'}, {'id': 'LOW_AREYOU_ASSET', 'value': '-1'}, {'id': 'LOW_SWART_PROG', 'value': '-1'}]}}
{'isDelete': 'false', 'fields': {'field': {'id': 'LOW_COD_CONDUCT', 'value': '-1'}}}
{'isDelete': 'false', 'fields': {'field': [{'id': 'LOW_SUPPLIER_TYPE', 'value': '2'}, {'id': 'LOW_DO_INT_BOTH', 'value': '1'}]}}
i tried to explode this columns
>>> df_3.explode('integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 6318, in explode
result = df[column].explode()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/series.py", line 3504, in explode
values, counts = reshape.explode(np.asarray(self.array))
File "pandas/_libs/reshape.pyx", line 129, in pandas._libs.reshape.explode
KeyError: 0
I can run through each row and try to find out if its a list and implement something but it doesnt seems right
if str(type(df_3.loc[i,'{}'.format(c)])) == "<class 'list'>":
Is there any way we ca use an explode function on such kind of data
alternative way using pandas-read-xml
from pandas_read_xml import flatten, fully_flatten
df = flatten(df)
I was able to do it, but the exploded rows are all filtered to the top of the DataFrame (in case there are more list type object in lower rows).
pd.concat((df.iloc[[type(item) == list for item in df['Column']]].explode('Column'),
df.iloc[[type(item) != list for item in df['Column']]]))
It essentially does what you've said: check if object type is list, if so, explode. Then concatenate this exploded Series with the rest of the data (i.e. the non-lists). Performance doesn't seem to hurt much from longer DataFrames.
Output:
Column
0 {'Internalid': '24348', 'isDelete': 'false', '...
0 {'Internalid': '24349', 'isDelete': 'false', '...
0 {'Internalid': '24350', 'isDelete': 'false', '...
1 {'isDelete': 'false', 'fields': {'field': [{'i...
2 {'isDelete': 'false', 'fields': {'field': {'id...
3 {'isDelete': 'false', 'fields': {'field': [{'i...
I have a column "data" which has json object as values. I would like to add a key-value pair inside nested json
source = {'my_dict':[{'_id': 'SE-DATA-BB3A'},{'_id': 'SE-DATA-BB3E'},{'_id': 'SE-DATA-BB3F'}], 'data': [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
] }
input_df = pd.DataFrame(source)
input_df looks like below:
Now I need to add the "my_dict" column values as a 1st element inside the nested json values of "data" column
My Target dataframe should look like below ( I have highlighted the changes in bold)
I tired using dict.update() but it doesn't seem to help. I'm stuck here and not getting any idea how to take this forward. Appreciate your help.
I don't see any benefit putting it as a dataframe, if you keep the original dictionary, then the following loop will do,
my_dict=[{'_id': 'SE-DATA-BB3A'},{'_id': 'SE-DATA-BB3E'},{'_id': 'SE-DATA-BB3F'}]
data = [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
]
for idx, val in enumerate(data):
val[list(val.keys())[0]][0].update(my_dict[idx])
def get_val(row):
my_dict_val = row.loc['my_dict']
dict_key = list(row['data'].keys())[0]
if not list(row['data'].values())[0]:
return row['data']
data_dict = list(row['data'].values())[0][0]
data_dict.update(my_dict_val)
res = dict()
res[dict_key] = []
res[dict_key].append(data_dict)
return res
input_df['data'] = input_df.apply(get_val, axis=1)
The solution is as follows:
def update_data(row):
data_dict = row['data']
for key in data_dict:
data_dict.update(row.loc['my_dict'])
return data_dict
df['data'] = df.apply(update_data,axis=1)
I am looking to read one list which consists of columns names and another list of lists which consists of data which needs to be mapped to the columns. Each list in the list of list is one row of data to later be push into the database.
I've tried to use the following code to join these two lists:
dict(zip( column_names, data)) but I recieve an error:
TypeError unhashable type: 'list'
How would I join a list of lists and another list together to a dict?
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
The result I'm seeking is:
dict_items = {{'id': '1', 'first_name': 'Mike', 'last_name': 'Walters',
'city': 'New York City', 'dob': '1998-12-01'},
{'id': '2', ...}}
Later looking to push this dict of dicts to the database with SQLAlchemy.
You can create a list of key-value-pairs like this:
result = [dict(zip(column_names, row)) for row in data]
Note the brackets are not curly like you specified.
zip will not work in your case, because its map one to one input arguments.
Zip Documentation
Demo:
>>> l1 = ["key01", "key02", "key03"]
>>> l2 = ["value01", "value02", "value03"]
>>> zip(l1, l2)
[('key01', 'value01'), ('key02', 'value02'), ('key03', 'value03')]
>>> dict(zip(l1, l2))
{'key01': 'value01', 'key02': 'value02', 'key03': 'value03'}
>>>
Use normal iteration and list append method to create final output:
Demo:
>>> list_data_items = []
>>> for item in data:
... list_data_items.append(dict(zip(column_names, item)))
...
All the other answers above worked fine. Just for the sake of completeness you could also use pandas (and it might be convenient if your data is coming from say a csv file).
Just create a data frame with your data and then convert it to dict:
import pandas as pd
df = pd.DataFrame(data, columns=column_names)
df.to_dict(orient='records')
Two simple for-loops:
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
db_result = []
for data_row in data:
new_db_row = {}
for i, data_value in enumerate(data_row):
new_db_row[column_names[i]] = data_value
result.append(new_db_row)
print(result)
First For statement loops over all data rows.
The second uses enumerate to separate the index(i) and the data_value of the rows. The index is used to extract the column names from the list column_names.
I hope this explanation does not make it more complicated.
Following the printed result.
[{'id': '1', 'first_name': 'Mike', 'last_name': 'Walters', 'city': 'New York City', 'dob': '1998-12-01'}, {'id': '2', 'first_name': 'Daniel', 'last_name': 'Strange', 'city': 'Baltimore', 'dob': '1992-08-12'}, {'id': '3', 'first_name': 'Sarah', 'last_name': 'McNeal', 'city': 'Miami', 'dob': '1990-05-05'}, {'id': '4', 'first_name': 'Steve', 'last_name': 'Breene', 'city': 'Philadelphia', 'dob': '1988-02-06'}]
Since you want to construct multiple dictionaries, you have to zip your column names with each list in data and pass the result to the dict constructor. Your result dict_items also needs to be a collection that can store unhashable types such as dictionaries. We cannot use a set for this (which you say you are seeking), but we can use a list (or a tuple).
Employ a simple list comprehension in order to build one dictionary for each sublist in data.
>>> [dict(zip(column_names, sublist)) for sublist in data]
[{'dob': '1998-12-01', 'city': 'New York City', 'first_name': 'Mike', 'last_name': 'Walters', 'id': '1'}, {'dob': '1992-08-12', 'city': 'Baltimore', 'first_name': 'Daniel', 'last_name': 'Strange', 'id': '2'}, {'dob': '1990-05-05', 'city': 'Miami', 'first_name': 'Sarah', 'last_name': 'McNeal', 'id': '3'}, {'dob': '1988-02-06', 'city': 'Philadelphia', 'first_name': 'Steve', 'last_name': 'Breene', 'id': '4'}]
I also assumed that {'id':'2'} in your expected result is a typo.
Using Pandas:
>>> column_names
['id', 'first_name', 'last_name', 'city', 'dob']
>>> data
[['1', 'Mike', 'Walters', 'New York City', '1998-12-01'], ['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'], ['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'], ['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']]
>>> import pandas as pd
>>> pd.DataFrame(data, columns=column_names).T.to_dict().values()
[{'dob': '1998-12-01', 'city': 'New York City', 'first_name': 'Mike', 'last_name': 'Walters', 'id': '1'}, {'dob': '1992-08-12', 'city': 'Baltimore', 'first_name': 'Daniel', 'last_name': 'Strange', 'id': '2'}, {'dob': '1990-05-05', 'city': 'Miami', 'first_name': 'Sarah', 'last_name': 'McNeal', 'id': '3'}, {'dob': '1988-02-06', 'city': 'Philadelphia', 'first_name': 'Steve', 'last_name': 'Breene', 'id': '4'}]
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
destinationList = []
for value in data:
destinationList.append(dict(zip(column_names,value)))
print(destinationList)
#
# zip(column_names,value)
# [('id', '1'), ('first_name', 'Mike') , ('last_name', 'Walters'), ('city', 'New York City'),('dob', '1998-12-01')]]
# dict(zip(column_names,value))
# {'last_name': 'Walters', 'dob': '1998-12-01','id': '1','first_name': 'Mike','city': 'New York City'}