I am trying to do some automation, but got stuck at this:
ipreparray =
[{'rep': 'B'}, {'rep': 'L'}, {'rep': 'M'}, {'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}]
[{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}, {'rep': 'L'}, {'rep': 'M'}, {'rep': 'H'}]
How can I only extract only these key:value pairs?
{'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}
{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}
i.e. if there are additional values along with 'rep' which are 'Ips' & 'IpC'
Adding code here:
def get_traffic_stats(service, domain_name, date):
try:
query = 'domains/%s/trafficStats/%s' %(domain_name,date)
traffic_stats = service.domains().trafficStats().get(name=query).execute()
json_str = json.dumps(traffic_stats)
resp_dict = json.loads(json_str)
a = domain_name
b = ','
domrep = str(resp_dict.get('domainReputation'))
spamratio = str(resp_dict.get('userReportedSpamRatio'))
spf = str(resp_dict.get('spfSuccessRatio'))
dkim = str(resp_dict.get('dkimSuccessRatio'))
dmarc = str(resp_dict.get('dmarcSuccessRatio'))
encrypt = str(resp_dict.get('inboundEncryptionRatio'))
#iprep = next((i for i, item in enumerate(resp_dict) if item['ipReputations'] == 'HIGH'), None)
final = date + ',' + a + b + domrep + b + spamratio + b + spf + b + dkim + b + dmarc + b + encrypt #+ b + iprep
domain = a + b + domrep
for x, y in resp_dict.items():
if (x== "ipReputations"):
ipreparray = []
ipreparray = y
print(ipreparray)
for dic in ipreparray:
for val in dic.values():
print(val)
#for index in range(len(ipreparray)):
#for key in ipreparray[index]:
#print(ipreparray[index][key])
#print (final)
#print(keys)
#print (domain)
return str(final)
trafficStats = {'name': 'domains/account.net/trafficStats/20210718',
'ipReputations': [{'rep': 'B'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H', 'Ips': ['141.54.247.35-141.54.247.41', '141.54.247.44'], 'ipC': '8'}],
'domainReputation': 'HIGH',
'spfSuccessRatio': 1,
'dkimSuccessRatio': 1,
'dmarcSuccessRatio': 1,
'inboundEncryptionRatio': 1,
'deliveryErrors': [{'errorClass': 'TEMPORARY_ERROR', 'errorType': 'SUSPECTED_SPAM'}]}
Not a clever one-liner like in #Axe319's comment, but is arguably more readable and might be a little easier to extend:
ipreparray1 = [{'rep': 'B'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}]
ipreparray2 = [{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H'}]
def extractor(ipreparray):
for dct in ipreparray:
if {'Ips', 'ipC'} <= dct.keys():
return dct
print(extractor(ipreparray1)) # -> {'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}
print(extractor(ipreparray2)) # -> {'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}
I have a single df that includes multiple json strings per row that need reading and normalizing.
I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.
However I need to append the original unique Id in the original df (i.e. 'id': ['9clpa','g659am']) - which is lost in my current code.
The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).
I hope that makes sense, any suggestions are very welcome. thanks so much
dataframe
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
current code
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
expected output
pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
df_new = pd.DataFrame(data=out[i],columns=columns1)
df_new = df_new.assign(id = lambda x: df.id[i])
display(df_new)
You can add a lambda function which will assign the value of 'id' to new df formed.
Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.
Output dataframe:
You are just missing on assigning the id to your dataframe after your normalize columns:
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
out[i]['id'] = df.id[i]
out[i] = out[i].loc[:, ['id','qi','answers']]
Output:
>>> out[0]
id qi answers
0 9clpa 01 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1 9clpa 02 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]
You can use .json_normalize (doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)
(from https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e)
I have a bit of a tricky JSON I want to put into a dataframe.
{'A': {'name': 'A',
'left_foot': [{'toes': '5'}],
'right_foot': [{'toes': '4'}]},
'B': {'name': 'B',
'left_foot': [{'toes': '3'}],
'right_foot': [{'toes': '5'}]},
...
}
I don't need the first layer with A and B as it is part of name. There will always only be one left_foot and one right_foot.
The data I want is as follows:
name left_foot.toes right_foot.toes
0 A 5 4
1 B 3 5
Using this post is was able to get the feet and toes but that is if you say data["A"]. Is there an easier way?
EDIT
I have something like this, but I need to specify "A" in the first line.
df = pd.json_normalize(tickers["A"]).pipe(
lambda x: x.drop('left_foot', 1).join(
x.left_foot.apply(lambda y: pd.Series(merge(y)))
)
).rename(columns={"toes": "left_foot.toes"}).pipe(
lambda x: x.drop('right_foot', 1).join(
x.right_foot.apply(lambda y: pd.Series(merge(y)))
)).rename(columns={"toes": "right_foot.toes"})
Given your data, each top level key (e.g. 'A' and 'B') is repeated as a value in 'name', therefore it will be easier to use pandas.json_normalize on only the values of the dict.
The 'left_foot' and 'right_foot' columns need be exploded to remove each dict from the list
The final step converts the columns of dicts to a dataframe and joins it back to df
It's not necessarily less code, but this should be significantly faster than the multiple applies used in the current code.
See this timing analysis comparing apply pandas.Series to just using pandas.DataFrame to convert a column.
If there are issues because your dataframe has NaN (e.g. missing dicts or lists) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNs
import pandas as pd
# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}
# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)
# display(df)
name left_foot right_foot
0 A {'toes': '5'} {'toes': '4'}
1 B {'toes': '3'} {'toes': '5'}
2 C {'toes': '5'} {'toes': '4'}
3 D {'toes': '3'} {'toes': '5'}
# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})
# display(df)
name lf_toes rf_toes
0 A 5 4
1 B 3 5
2 C 5 4
3 D 3 5
What I have:
a=[{'name':'a','vals':1,'required':'yes'},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
What I tried:
[[i]+[j] for i in b for j in a if i['name']==j['name']]
What I got:
[[{'name': 'a', 'type': 'car'}, {'name': 'a', 'vals': 1}], [{'name': 'b', 'type': 'bike'}, {'name': 'b', 'vals': 2}]]
What I want:
[{'name': 'a', 'type': 'car','vals': 1},{'name': 'b', 'type': 'bike','vals': 2}]
Note:
I need to merge dicts into one dict.
It should merge only those have common 'name' in both a and b.
I want python one liner answer.
For Python 3, you can do this:
a=[{'name':'a','vals':1},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
print([{**i,**j} for i in b for j in a if i['name']==j['name']])
I have a data frame (10 million rows) which looks like following. For better understanding, I have simplified it.
user_id event_params
10 [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]
11 [{'key': 'y', 'value': '5'}, {'key': 'z', 'value': '9'}]
12 [{'key': 'a', 'value': '5'}]
I want to make new columns that are all the unique keys from the dataframe, with values stored in the respective keys. Output should like below:
user_id x y z a
10 1 3 4 NA
11 NA 5 9 NA
12 NA NA NA 5
Just create the new dataframe and append new lines via append function. You can find more alternatives here.
import pandas as pd
df = pd.DataFrame()
data = [
[12, [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]],
[13, [{'key': 'a', 'value': '5'}]]
]
for user_id, event_params in data:
record = {e['key']: e['value'] for e in event_params}
record['user_id'] = user_id
df = df.append(record, ignore_index=True)
df