Python iterate through list of dictionaries - python

I have the below list of dictionaries -
results = [
{'type': 'check_datatype',
'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False}
]
I want output list with below output where key and value fields are coming from kwargs key in the above list -
id|key|value|index
[[1,table,cars,null],[1,columns,car_id,1],[1,columns,index,2]
[1,dtype,str,null],[2,table,cars,null],[2,columns,car_id,null],[2,string_length,6,null]]
Update - Now, i want one more column in output - uniquehaschode --> here unique hashcode means Dictionaries with the same keys and values should generate the same id or hash. Hence if key value pairs are same in dictionary 'kwargs', then they should return the same hashcode. Output should be like this -
[[1,table,cars,null,uniquehaschode1],[1,columns,car_id,1,uniquehaschode1],[1,columns,index,2,uniquehaschode1]
[1,dtype,str,null,uniquehaschode1],[2,table,cars,null,uniquehaschode2],[2,columns,car_id,null,uniquehaschode2],[2,string_length,6,null,uniquehaschode2]]
Also, i don't want to insert anything into this table if a particular uniquehaschode already exists.
Update2: I want to create a dataframe with below schema. args_id will be same for each unique pair of (kwargs and check_name). i want to run the above list of dictionaries everyday and hence for different date run, args_id should be same if
unique pair of (kwargs and check_name) has come again. i want to store this result into a dataframe everyday and then put it into my delta table of spark.
Type|time|args_id
check_datatype|2021-03-29|0
check_string_consistency|2021-03-29|1
check_datatype|2021-03-30|0
Until now, i was using below code -
type_results = [[elt['type'] for
elt in results]
checkColumns = ['type']
spark = SparkSession.builder.getOrCreate()
DF = spark.createDataFrame(data=results, schema=checkColumns)
DF = DF.withColumn("time", F.current_timestamp())
DF = DF.withColumn("args_id", F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))

Probably you need:
results = [
{'type': 'check_datatype',
'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False}
]
result_list = []
for c, l in enumerate(results, start=1):
for key, value in l['kwargs'].items():
if isinstance(value,list):
if len(value) == 1:
result_list.append([str(c),key,value[0],'null'])
continue
for i in value:
result_list.append([str(c),key,i,str(value.index(i)+1)])
else:
result_list.append([str(c),key,value,'null'])
print(result_list)
Output:
[['1', 'table', 'cars', 'null'], ['1', 'columns', 'car_id', '1'], ['1', 'columns', 'index', '2'], ['1', 'd_type', 'str', 'null'], ['2', 'table', 'cars', 'null'], ['2', 'columns', 'car_id', 'null'], ['2', 'string_length', 6, 'null']]
As for the Update part you can use pip install maps:
import maps
results = [
{'type': 'check_datatype',
'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False}
]
result_list = []
for c, l in enumerate(results, start=1):
h = hash(maps.FrozenMap.recurse(l['kwargs']))
for key, value in l['kwargs'].items():
if isinstance(value,list):
if len(value) == 1:
result_list.append([str(c),key,value[0],'null', f'{h}-{c}'])
continue
for i in value:
result_list.append([str(c),key,i,str(value.index(i)+1),f'{h}-{c}'])
else:
result_list.append([str(c),key,value,'null',f'{h}-{c}'])
print(result_list)
Output:
[['1', 'table', 'cars', 'null', '-6654319495930648246-1'], ['1', 'columns', 'car_id', '1', '-6654319495930648246-1'], ['1', 'columns', 'index', '2', '-6654319495930648246-1'], ['1', 'd_type', 'str', 'null', '-6654319495930648246-1'], ['2', 'table', 'cars', 'null', '-3876605863049152209-2'], ['2', 'columns', 'car_id', 'null', '-3876605863049152209-2'], ['2', 'string_length', 6, 'null', '-3876605863049152209-2'], ['3', 'table', 'cars', 'null', '-3876605863049152209-3'], ['3', 'columns', 'car_id', 'null', '-3876605863049152209-3'], ['3', 'string_length', 6, 'null', '-3876605863049152209-3']]

results = [
{'type': 'check_datatype',
'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False}
]
for each in results:
print(each['kwargs'])

Related

Iterate and extract several lists from a dictionary in Python

I have a dictionary like this:
dic = {'features': [{'type': 'Feature',
'geometry': {'geodesic': False,
'type': 'Point',
'coordinates': [33.44904857310912, 52.340950190985474]},
'id': '0',
'properties': {'a1': 1.313,
'a2': -0.028, 'a3': 0.0026, 'a4': -0.025...
'a40': -0.056 ...
{'type': 'Feature',
'geometry': {'geodesic': False,
'type': 'Point',
'coordinates': [33.817042613128294, 52.340950190985474]},
'id': '1',
'properties': {'a1': 1.319,
'a2': -0.026, 'a3': 0.003,'a4': -0.045, ...
'a40': -0.032 ......
Almost 1000 ids, e.g. 'id': '0', 'id': '1'...'id': '960'
I want to iterate through the dictionary to extract a list of element containing 'a1', 'a2'... 'a40', separately. Something like this:
list_a1 = [1.313, 1.319... ]
list_a2 = [-0.028, -0.026 ...]
How to get these lists using Python?
You can use something like this. Using setdefault makes it dynamic and any number of keys in properties will be included in the result.
dic = {'features': [{'type': 'Feature',
'geometry': {'geodesic': False,
'type': 'Point',
'coordinates': [33.44904857310912, 52.340950190985474]},
'id': '0',
'properties': {'a1': 1.313,
'a2': -0.028,
'a3': 0.0026,
'a4': -0.025,
'a40': -0.056}},
{'type': 'Feature',
'geometry': {'geodesic': False,
'type': 'Point',
'coordinates': [33.817042613128294, 52.340950190985474]},
'id': '1',
'properties': {'a1': 1.319,
'a2': -0.026,
'a3': 0.003,
'a4': -0.045,
'a40': -0.032}}]}
separated_properties = {}
for feature in dic['features']:
for key, val in feature['properties'].items():
separated_properties.setdefault(key, []).append(val)
print(separated_properties)
print('a1: ', separated_properties['a1'])
Output
{'a1': [1.313, 1.319],
'a2': [-0.028, -0.026],
'a3': [0.0026, 0.003],
'a4': [-0.025, -0.045],
'a40': [-0.056, -0.032]}
a1: [1.313, 1.319]

python convert dictionary to dataframe

I try to convert json to dataframe but I could not find what i want
here is dictionary and result I got
{'20210df12820df1456-ssddsd': {'2': {'num': '2',
'product_name': 'apple',
'product_price': '20900'},
'order': {'add_info': None,
'basket_count': '2',
'deli_price': '2500',
'id': 'nhdd#abvc',
'is_member': 'MEMBER',
'mem_type': 'PERSON',
'order_date': '2021-01-28 20:14:56',
'ordernum': '20210df12820df1456-ssddsd',
'pay_price': '43100',
'reserve': '840',
'start_price': '43100',
'total_product_price': '41800',
'used_emoney': '0',
'used_reserve': '0'},
'pay_history': [{'add_price': '0',
'deli_price': '2500',
'discount_price': '-1200',
'order_price': '43100',
'pay_date': '2021-01-28 '
'20:15:14',
'pay_price': '43100',
'pay_type': 'creditcard',
'paymethod': 'C',
'total_price': '41800',
'used_emoney': '0',
'used_reserve': '0'}],
'payment': {'card_flag': '0000',
'card_partcancel_code': '00',
'card_state': 'Y',
'in_card_price': '43100',
'pay_date': '2021-01-28 20:15:14',
'pay_status': 'Y',
'paymethod': 'C',
'simple_pay': 'NPY'},
'product': {'1': {'num': '1',
'product_name': 'banana',
'product_price': '20900'}}}}
json_data = response.json()
result =json_data['list']
df = pd.DataFrame(result).transpose()
df.head()
I would like to make this dictionary to four dataframe.
but i have this result
I expected like below
order = df[['order']]
payment = df[['payment']]
pay_history = df[['pay_history']]
product = df[['product']]
something like this.
any ideas??
Since the data for each category (order, payment, pay history, product) are organized differently, you should consider iterating through each category, adding additional data (such as ordernum for indexing purpose) and putting them in their own list that you will later use to convert them into DataFrame object
import pandas as pd
json_data = {'20210df12820df1456-ssddsd': {'order': {'ordernum': '20210df12820df1456-ssddsd', 'order_date': '2021-01-28 20:14:56', 'is_member': 'MEMBER', 'start_price': '43100', 'pay_price': '43100', 'deli_price': '2500', 'total_product_price': '41800', 'basket_count': '2', 'id': 'nhdd#abvc', 'mem_type': 'PERSON', 'used_emoney': '0', 'used_reserve': '0', 'add_info': None, 'reserve': '840'}, 'payment': {'paymethod': 'C', 'pay_date': '2021-01-28 20:15:14', 'card_state': 'Y', 'pay_status': 'Y', 'simple_pay': 'NPY', 'card_flag': '0000', 'card_partcancel_code': '00', 'in_card_price': '43100'}, 'pay_history': [{'pay_date': '2021-01-28 20:15:14', 'pay_type': 'creditcard', 'total_price': '41800', 'deli_price': '2500', 'discount_price': '-1200', 'add_price': '0', 'order_price': '43100', 'pay_price': '43100', 'used_reserve': '0', 'used_emoney': '0', 'paymethod': 'C'}], 'product': {'1': {'num': '1', 'product_name': 'banana', 'product_price': '20900'}, '2': {'num': '2', 'product_name': 'apple', 'product_price': '20900'}}}}
order_data = []
payment_data = []
pay_history_data = []
product_data = []
for key in json_data:
order_data.append(json_data[key]['order'])
payment = dict(json_data[key]['payment'])
payment['ordernum'] = key
pay_history_data.append(payment)
pay_history = json_data[key]['pay_history']
if 'pay_history' in json_data[key]:
for p in pay_history:
p_clone = dict(p)
p_clone['ordernum'] = key
payment_data.append(p_clone)
product = json_data[key]['product']
if 'product' in json_data[key]:
for product_key in product:
p_clone = dict(product[product_key])
p_clone['ordernum'] = key
product_data.append(p_clone)
order_df = pd.DataFrame(order_data)
payment_df = pd.DataFrame(payment_data)
pay_history_df = pd.DataFrame(pay_history_data)
product_df = pd.DataFrame(product_data)
Edit: If you're experiencing KeyError exception in iterating through pay_history, it could be that in some order, there are no pay_history key in the json data of that order so you can avoid this by first checking if the key exists in the json file before proceeding to iterating through the pay_history (if 'pay_history' in json_data[key]:), same thing can be done before iterating through product (if 'product' in json_data[key]:).
What you need is pandas json_normalize function.
I wrote some example codes.
import pandas as pd
from pandas.io.json import json_normalize
data = {'20210df12820df1456-ssddsd': {'order': {'ordernum': '20210df12820df1456-ssddsd', 'order_date': '2021-01-28 20:14:56', 'is_member': 'MEMBER',
'start_price': '43100', 'pay_price': '43100', 'deli_price': '2500', 'total_product_price': '41800',
'basket_count': '2', 'id': 'nhdd#abvc', 'mem_type': 'PERSON', 'used_emoney': '0', 'used_reserve': '0', 'add_info': None, 'reserve': '840'},
'payment': {'paymethod': 'C', 'pay_date': '2021-01-28 20:15:14', 'card_state': 'Y', 'pay_status': 'Y', 'simple_pay': 'NPY', 'card_flag': '0000', 'card_partcancel_code': '00', 'in_card_price': '43100'},
'pay_history': [{'pay_date': '2021-01-28 20:15:14', 'pay_type': 'creditcard', 'total_price': '41800', 'deli_price': '2500', 'discount_price': '-1200', 'add_price': '0', 'order_price': '43100', 'pay_price': '43100', 'used_reserve': '0', 'used_emoney': '0', 'paymethod': 'C'}],
'product': {'1': {'num': '1', 'product_name': 'banana', 'product_price': '20900'}}, '2': {'num': '2', 'product_name': 'apple', 'product_price': '20900'}}}
df = pd.DataFrame(data).transpose()
order = json_normalize(df['order'])
the result will look like:

How to desalinize json coming from dynamodb stream

event = event = {'Records': [{'eventID': '2339bc590c21035b84f8cc602b12c1d2', 'eventName': 'INSERT', 'eventVersion': '1.1', 'eventSource': 'aws:dynamodb', 'awsRegion': 'us-east-1', 'dynamodb': {'ApproximateCreationDateTime': 1595908037.0, 'Keys': {'id': {'S': '9'}}, 'NewImage': {'last_name': {'S': 'Hus'}, 'id': {'S': '9'}, 'age': {'S': '95'}}, 'SequenceNumber': '3100000000035684810908', 'SizeBytes': 23, 'StreamViewType': 'NEW_IMAGE'}, 'eventSourceARN': 'arn:aws:dynamodb:us-east-1:656441365658:table/glossary/stream/2020-07-28T00:26:55.462'}, {'eventID': 'bbd4073256ef3182b3c00f13ead09501', 'eventName': 'MODIFY', 'eventVersion': '1.1', 'eventSource': 'aws:dynamodb', 'awsRegion': 'us-east-1', 'dynamodb': {'ApproximateCreationDateTime': 1595908037.0, 'Keys': {'id': {'S': '2'}}, 'NewImage': {'last_name': {'S': 'JJ'}, 'id': {'S': '2'}, 'age': {'S': '5'}}, 'SequenceNumber': '3200000000035684810954', 'SizeBytes': 21, 'StreamViewType': 'NEW_IMAGE'}, 'eventSourceARN': 'arn:aws:dynamodb:us-east-1:656441365658:table/glossary/stream/2020-07-28T00:26:55.462'}, {'eventID': 'a9c90c0c4a5a4b64d0314c4557e94e28', 'eventName': 'INSERT', 'eventVersion': '1.1', 'eventSource': 'aws:dynamodb', 'awsRegion': 'us-east-1', 'dynamodb': {'ApproximateCreationDateTime': 1595908037.0, 'Keys': {'id': {'S': '10'}}, 'NewImage': {'last_name': {'S': 'Hus'}, 'id': {'S': '10'}, 'age': {'S': '95'}}, 'SequenceNumber': '3300000000035684810956', 'SizeBytes': 25, 'StreamViewType': 'NEW_IMAGE'}, 'eventSourceARN': 'arn:aws:dynamodb:us-east-1:656441365658:table/glossary/stream/2020-07-28T00:26:55.462'}, {'eventID': '288f4a424992e5917af0350b53f754dc', 'eventName': 'MODIFY', 'eventVersion': '1.1', 'eventSource': 'aws:dynamodb', 'awsRegion': 'us-east-1', 'dynamodb': {'ApproximateCreationDateTime': 1595908037.0, 'Keys': {'id': {'S': '1'}}, 'NewImage': {'last_name': {'S': 'V'}, 'id': {'S': '1'}, 'age': {'S': '2'}}, 'SequenceNumber': '3400000000035684810957', 'SizeBytes': 20, 'StreamViewType': 'NEW_IMAGE'}, 'eventSourceARN': 'arn:aws:dynamodb:us-east-1:656441365658:table/glossary/stream/2020-07-28T00:26:55.462'}]}
The above one coming from dynamodb stream. I need to extract the some value from above
Code is below nothing is returning
def deserialize(event):
data = {}
data["M"] = event
return extract_some(data)
def extract_some(event):
for key, value in list(event.items()):
if (key == "NULL"):
return None
if (key == "S" or key == "BOOL"):
return value
for record in event['Records']:
doc = deserialise(record['dynamodb']['NewImage'])
print (doc)
Expected Out
{'last_name': 'Hus', 'id': '9', 'age': '95'}
{'last_name': 'JJ', 'id': '2', 'age': '5'}
{'last_name': 'Hus', 'id': '10', 'age': '95'}
{'last_name': 'V', 'id': '1', 'age': '2'}
try this,
from pprint import pprint
result = []
for r in event['Records']:
tmp = {}
for k, v in r['dynamodb']['NewImage'].items():
if "S" in v.keys() or "BOOL" in v.keys():
tmp[k] = v.get('S', v.get('BOOL', False))
elif 'NULL' in v:
tmp[k] = None
result.append(tmp)
pprint(result)
[{'age': '95', 'id': '9', 'last_name': 'Hus'},
{'age': '5', 'id': '2', 'last_name': 'JJ'},
{'age': '95', 'id': '10', 'last_name': 'Hus'},
{'age': '2', 'id': '1', 'last_name': 'V'}]

Finding missing value in JSON using python

I am facing this problem, I want to separate the dataset that has completed and not complete.
So, I want to put flag like 'complete' in the JSON. Example as in output.
This is the data that i have
data=[{'id': 'abc001',
'demo':{'gender':'1',
'job':'6',
'area':'3',
'study':'3'},
'ex_data':{'fam':'small',
'scholar':'2'}},
{'id': 'abc002',
'demo':{'gender':'1',
'edu':'6',
'qual':'3',
'living':'3'},
'ex_data':{'fam':'',
'scholar':''}},
{'id': 'abc003',
'demo':{'gender':'1',
'edu':'6',
'area':'3',
'sal':'3'}
'ex_data':{'fam':'big',
'scholar':NaN}}]
Output
How can I put the flag and also detect NaN and NULL in JSON?
Output=[{'id': 'abc001',
'completed':'yes',
'demo':{'gender':'1',
'job':'6',
'area':'3',
'study':'3'},
'ex_data':{'fam':'small',
'scholar':'2'}},
{'id': 'abc002',
'completed':'no',
'demo':{'gender':'1',
'edu':'6',
'qual':'3',
'living':'3'},
'ex_data':{'fam':'',
'scholar':''}},
{'id': 'abc003',
'completed':'no',
'demo':{'gender':'1',
'edu':'6',
'area':'3',
'sal':'3'}
'ex_data':{'fam':'big',
'scholar':NaN}}]
Something like this should work for you:
data = [
{
'id': 'abc001',
'demo': {
'gender': '1',
'job': '6',
'area': '3',
'study': '3'},
'ex_data': {'fam': 'small',
'scholar': '2'}
},
{
'id': 'abc002',
'demo': {
'gender': '1',
'edu': '6',
'qual': '3',
'living': '3'},
'ex_data': {'fam': '',
'scholar': ''}},
{
'id': 'abc003',
'demo': {
'gender': '1',
'edu': '6',
'area': '3',
'sal': '3'},
'ex_data': {'fam': 'big',
'scholar': None}
}
]
def browse_dict(dico):
empty_values = 0
for key in dico:
if dico[key] is None or dico[key] == "":
empty_values += 1
if isinstance(dico[key], dict):
for k in dico[key]:
if dico[key][k] is None or dico[key][k] == "":
empty_values += 1
if empty_values == 0:
dico["completed"] = "yes"
else:
dico["completed"] = "no"
for d in data:
browse_dict(d)
print(d)
Output :
{'id': 'abc001', 'demo': {'gender': '1', 'job': '6', 'area': '3', 'study': '3'}, 'ex_data': {'fam': 'small', 'scholar': '2'}, 'completed': 'yes'}
{'id': 'abc002', 'demo': {'gender': '1', 'edu': '6', 'qual': '3', 'living': '3'}, 'ex_data': {'fam': '', 'scholar': ''}, 'completed': 'no'}
{'id': 'abc003', 'demo': {'gender': '1', 'edu': '6', 'area': '3', 'sal': '3'}, 'ex_data': {'fam': 'big', 'scholar': None}, 'completed': 'no'}
Note that I changed NaN to None, because here you are most likely showing a python dictionary, not a JSON file since you are using data =
In a dictionary, the NaN value would be changed for None.
If you have to convert your JSON to a dictionary, refer to the JSON module documentation.
Also please check your dictionary syntax. You missed several commas to separate data.
You should try
The Input is
data = [{'demo': {'gender': '1', 'job': '6', 'study': '3', 'area': '3'}, 'id': 'abc001', 'ex_data': {'scholar': '2', 'fam': 'small'}}, {'demo': {'living': '3', 'gender': '1', 'qual': '3', 'edu': '6'}, 'id': 'abc002', 'ex_data': {'scholar': '', 'fam': ''}}, {'demo': {'gender': '1', 'area': '3', 'sal': '3', 'edu': '6'}, 'id': 'abc003', 'ex_data': {'scholar': None, 'fam': 'big'}}]
Also, Nan will not work in Python. So, instead of Nan we have used None.
for item in data:
item["completed"] = 'yes'
for key in item.keys():
if isinstance(item[key],dict):
for inner_key in item[key].keys():
if (not item[key][inner_key]):
item["completed"] = "no"
break
else:
if (not item[key]):
item["completed"] = "no"
break
The Output will be
data = [{'demo': {'gender': '1', 'job': '6', 'study': '3', 'area': '3'}, 'completed': 'yes', 'id': 'abc001', 'ex_data': {'scholar': '2', 'fam': 'small'}}, {'demo': {'living': '3', 'edu': '6', 'qual': '3', 'gender': '1'}, 'completed': 'no', 'id': 'abc002', 'ex_data': {'scholar': '', 'fam': ''}}, {'demo': {'edu': '6', 'gender': '1', 'sal': '3', 'area': '3'}, 'completed': 'no', 'id': 'abc003', 'ex_data': {'scholar': None, 'fam': 'big'}}]

How to get/filter values in python3 json list dictionary response?

Below is result I got from API query.
[{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
How do I specify in code to get value pairs of title, price, & status only?
So result will be like:
[{'title': 'example1', 'price': '8.20', 'status': 'available'},
{'title': 'example2', 'price': '10.50', 'status': 'none'}]
You can use a dictionary comprehension within a list comprehension:
L = [{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
keys = ['title', 'price', 'status']
res = [{k: d[k] for k in keys} for d in L]
print(res)
[{'price': '8.20', 'status': 'available', 'title': 'example1'},
{'price': '10.50', 'status': 'none', 'title': 'example2'}]

Categories