add a list to another list pandas

add a list to another list pandas - python

I need help with a problem with a dataframe like this:
df = pd.DataFrame({'column_A': [[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}],
[{'zone':'A', 'number':'6'}, {'zone':'E', 'number':'7'}]],
'column_B': [[{'zone':'C', 'number':'4'}], [{'zone':'D', 'number': '9'}]]})
I want to insert column_B into the column_A list so the output of the first line of column_A has to be:
[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}, {'zone':'C', 'number':'4'}]
Probably is the easiest thing, I can imagine, but I find so many errors with functions like insert and the '+' command and I ran out of ideas.

Simpliest is join lists by +:
df['column_A'] = df['column_A'] + df['column_B']
print (df)
column_A \
0 [{'zone': 'A', 'number': '7'}, {'zone': 'B', '...
1 [{'zone': 'A', 'number': '6'}, {'zone': 'E', '...
column_B
0 [{'zone': 'C', 'number': '4'}]
1 [{'zone': 'D', 'number': '9'}]
Data are different, seems in second column are not lists:
df = pd.DataFrame({'column_A': [[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}],
[{'zone':'A', 'number':'6'}, {'zone':'E', 'number':'7'}]],
'column_B': [{'zone':'C', 'number':'4'}, {'zone':'D', 'number': '9'}]})
df['column_A'] = df['column_A'] + df['column_B'].apply(lambda x: [x])
print (df)
column_A \
0 [{'zone': 'A', 'number': '7'}, {'zone': 'B', '...
1 [{'zone': 'A', 'number': '6'}, {'zone': 'E', '...
column_B
0 {'zone': 'C', 'number': '4'}
1 {'zone': 'D', 'number': '9'}

Related

Parsing a list/dict

I am trying to do some automation, but got stuck at this:
ipreparray =
[{'rep': 'B'}, {'rep': 'L'}, {'rep': 'M'}, {'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}]
[{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}, {'rep': 'L'}, {'rep': 'M'}, {'rep': 'H'}]
How can I only extract only these key:value pairs?
{'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}
{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}
i.e. if there are additional values along with 'rep' which are 'Ips' & 'IpC'
Adding code here:
def get_traffic_stats(service, domain_name, date):
try:
query = 'domains/%s/trafficStats/%s' %(domain_name,date)
traffic_stats = service.domains().trafficStats().get(name=query).execute()
json_str = json.dumps(traffic_stats)
resp_dict = json.loads(json_str)
a = domain_name
b = ','
domrep = str(resp_dict.get('domainReputation'))
spamratio = str(resp_dict.get('userReportedSpamRatio'))
spf = str(resp_dict.get('spfSuccessRatio'))
dkim = str(resp_dict.get('dkimSuccessRatio'))
dmarc = str(resp_dict.get('dmarcSuccessRatio'))
encrypt = str(resp_dict.get('inboundEncryptionRatio'))
#iprep = next((i for i, item in enumerate(resp_dict) if item['ipReputations'] == 'HIGH'), None)
final = date + ',' + a + b + domrep + b + spamratio + b + spf + b + dkim + b + dmarc + b + encrypt #+ b + iprep
domain = a + b + domrep
for x, y in resp_dict.items():
if (x== "ipReputations"):
ipreparray = []
ipreparray = y
print(ipreparray)
for dic in ipreparray:
for val in dic.values():
print(val)
#for index in range(len(ipreparray)):
#for key in ipreparray[index]:
#print(ipreparray[index][key])
#print (final)
#print(keys)
#print (domain)
return str(final)
trafficStats = {'name': 'domains/account.net/trafficStats/20210718',
'ipReputations': [{'rep': 'B'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H', 'Ips': ['141.54.247.35-141.54.247.41', '141.54.247.44'], 'ipC': '8'}],
'domainReputation': 'HIGH',
'spfSuccessRatio': 1,
'dkimSuccessRatio': 1,
'dmarcSuccessRatio': 1,
'inboundEncryptionRatio': 1,
'deliveryErrors': [{'errorClass': 'TEMPORARY_ERROR', 'errorType': 'SUSPECTED_SPAM'}]}

Not a clever one-liner like in #Axe319's comment, but is arguably more readable and might be a little easier to extend:
ipreparray1 = [{'rep': 'B'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}]
ipreparray2 = [{'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'},
{'rep': 'L'},
{'rep': 'M'},
{'rep': 'H'}]
def extractor(ipreparray):
for dct in ipreparray:
if {'Ips', 'ipC'} <= dct.keys():
return dct
print(extractor(ipreparray1)) # -> {'rep': 'H', 'Ips': ['147.56.24.35'], 'ipC': '2'}
print(extractor(ipreparray2)) # -> {'rep': 'B', 'Ips': ['142.56.24.50'], 'ipC': '2'}

python pandas - add unique Ids in column from master df back in to processed dfs stored in list of dataframes

I have a single df that includes multiple json strings per row that need reading and normalizing.
I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.
However I need to append the original unique Id in the original df (i.e. 'id': ['9clpa','g659am']) - which is lost in my current code.
The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).
I hope that makes sense, any suggestions are very welcome. thanks so much
dataframe
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
current code
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
expected output
pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
df_new = pd.DataFrame(data=out[i],columns=columns1)
df_new = df_new.assign(id = lambda x: df.id[i])
display(df_new)
You can add a lambda function which will assign the value of 'id' to new df formed.
Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.
Output dataframe:

You are just missing on assigning the id to your dataframe after your normalize columns:
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
out[i]['id'] = df.id[i]
out[i] = out[i].loc[:, ['id','qi','answers']]
Output:
>>> out[0]
id qi answers
0 9clpa 01 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1 9clpa 02 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]

You can use .json_normalize (doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)
(from https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e)

How to flatten a nested JSON into a pandas dataframe

I have a bit of a tricky JSON I want to put into a dataframe.
{'A': {'name': 'A',
'left_foot': [{'toes': '5'}],
'right_foot': [{'toes': '4'}]},
'B': {'name': 'B',
'left_foot': [{'toes': '3'}],
'right_foot': [{'toes': '5'}]},
...
}
I don't need the first layer with A and B as it is part of name. There will always only be one left_foot and one right_foot.
The data I want is as follows:
name left_foot.toes right_foot.toes
0 A 5 4
1 B 3 5
Using this post is was able to get the feet and toes but that is if you say data["A"]. Is there an easier way?
EDIT
I have something like this, but I need to specify "A" in the first line.
df = pd.json_normalize(tickers["A"]).pipe(
lambda x: x.drop('left_foot', 1).join(
x.left_foot.apply(lambda y: pd.Series(merge(y)))
)
).rename(columns={"toes": "left_foot.toes"}).pipe(
lambda x: x.drop('right_foot', 1).join(
x.right_foot.apply(lambda y: pd.Series(merge(y)))
)).rename(columns={"toes": "right_foot.toes"})

Given your data, each top level key (e.g. 'A' and 'B') is repeated as a value in 'name', therefore it will be easier to use pandas.json_normalize on only the values of the dict.
The 'left_foot' and 'right_foot' columns need be exploded to remove each dict from the list
The final step converts the columns of dicts to a dataframe and joins it back to df
It's not necessarily less code, but this should be significantly faster than the multiple applies used in the current code.
See this timing analysis comparing apply pandas.Series to just using pandas.DataFrame to convert a column.
If there are issues because your dataframe has NaN (e.g. missing dicts or lists) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNs
import pandas as pd
# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}
# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)
# display(df)
name left_foot right_foot
0 A {'toes': '5'} {'toes': '4'}
1 B {'toes': '3'} {'toes': '5'}
2 C {'toes': '5'} {'toes': '4'}
3 D {'toes': '3'} {'toes': '5'}
# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})
# display(df)
name lf_toes rf_toes
0 A 5 4
1 B 3 5
2 C 5 4
3 D 3 5

Python one liner to merge dictionary which has common values

What I have:
a=[{'name':'a','vals':1,'required':'yes'},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
What I tried:
[[i]+[j] for i in b for j in a if i['name']==j['name']]
What I got:
[[{'name': 'a', 'type': 'car'}, {'name': 'a', 'vals': 1}], [{'name': 'b', 'type': 'bike'}, {'name': 'b', 'vals': 2}]]
What I want:
[{'name': 'a', 'type': 'car','vals': 1},{'name': 'b', 'type': 'bike','vals': 2}]
Note:
I need to merge dicts into one dict.
It should merge only those have common 'name' in both a and b.
I want python one liner answer.

For Python 3, you can do this:
a=[{'name':'a','vals':1},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
print([{**i,**j} for i in b for j in a if i['name']==j['name']])

Making new columns of keys with values stores as a list of values from a list of dicts?

I have a data frame (10 million rows) which looks like following. For better understanding, I have simplified it.
user_id event_params
10 [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]
11 [{'key': 'y', 'value': '5'}, {'key': 'z', 'value': '9'}]
12 [{'key': 'a', 'value': '5'}]
I want to make new columns that are all the unique keys from the dataframe, with values stored in the respective keys. Output should like below:
user_id x y z a
10 1 3 4 NA
11 NA 5 9 NA
12 NA NA NA 5

Just create the new dataframe and append new lines via append function. You can find more alternatives here.
import pandas as pd
df = pd.DataFrame()
data = [
[12, [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]],
[13, [{'key': 'a', 'value': '5'}]]
]
for user_id, event_params in data:
record = {e['key']: e['value'] for e in event_params}
record['user_id'] = user_id
df = df.append(record, ignore_index=True)
df

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

add a list to another list pandas - python

Related

Parsing a list/dict

python pandas - add unique Ids in column from master df back in to processed dfs stored in list of dataframes

How to flatten a nested JSON into a pandas dataframe

Python one liner to merge dictionary which has common values

Making new columns of keys with values stores as a list of values from a list of dicts?

Categories

Resources