How to use for loop along with if inside lambda python? - python

I have a dataframe df that has a column tags . Each element of the column tags is a list of dictionary and looks like this:
[
{
"id": "leena123",
"name": "LeenaShaw",
"slug": null,
"type": "UserTag",
"endIndex": 0,
"startIndex": 0
},
{
"id": "1234",
"name": "abc ltd.",
"slug": "5678",
"type": "StockTag",
"endIndex": 0,
"startIndex": 0
}
]
The list can have any number of elements.
Sample dataset:
0 some_data [{'id': 'leena123', 'name': 'leenaShaw', 'slug': None, 'type...
1 some data [{'id': '6', 'name': 'new', 'slug': None, 'type...
I want to create a list of all the ids from the tags column where the type is UserTag
sample output:
['leena123', 'saily639,...]
I am trying with this :
list(df['tags'].apply(lambda x: d['name'] if any(d['type'] == 'UserTag' for d in x)))
but it doesn't work. Kindly help pn this.

Use List Comprehension with df.apply:
df['id'] = df.tags.apply(lambda x: [i['id'] for i in x if i.get('type') == 'UserTag'])
Create a list from id column:
import itertools
l = df['id'].values.tolist()
output_id_list = list(itertools.chain(*l))
If you want to drop id column from df, do:
df.drop('id', inplace=True)

Related

Statistics on a list of dictionaries considering multiples keys

I have a list of dicts:
input = [{'name':'A', 'Status':'Passed','id':'x1'},
{'name':'A', 'Status':'Passed','id':'x2'},
{'name':'A','Status':'Failed','id':'x3'},
{'name':'B', 'Status':'Passed','id':'x4'},
{'name':'B', 'Status':'Passed','id':'x5'}]
I want an output like :
output = [{'name':'A', 'Passed':'2', 'Failed':'1', 'Total':'3', '%Pass':'66%'},
{'name':'B', 'Passed':'2', 'Failed':'0', 'Total':'2', '%Pass':'100%'},
{'name':'Total', 'Passed':'4', 'Failed':'1', 'Total':'5', '%Pass':'80%'}]\
i started retrieving the different names by using a lookup :
lookup = {(d["name"]): d for d in input [::-1]}
names= [e for e in lookup.values()]
names= names[::-1]
and after using the list comprehension something like :\
for name in names :
name_passed = sum(["Passed" and "name" for d in input if 'Status' in d and name in d])
name_faled = sum(["Failed" and "name" for d in input if 'Status' in d and name in d])\
But i am not sure if there is a smartest way ? a simple loop and comparing dict values will be more simple!?
Assuming your input entries will always be grouped according to the "name" key-value pair:
entries = [
{"name": "A", "Status": "Passed", "id": "x1"},
{"name": "A", "Status": "Passed", "id": "x2"},
{"name": "A", "Status": "Failed", "id": "x3"},
{"name": "B", "Status": "Passed", "id": "x4"},
{"name": "B", "Status": "Passed", "id": "x5"}
]
def to_grouped(entries):
from itertools import groupby
from operator import itemgetter
for key, group_iter in groupby(entries, key=itemgetter("name")):
group = list(group_iter)
total = len(group)
passed = sum(1 for entry in group if entry["Status"] == "Passed")
failed = total - passed
perc_pass = (100 // total) * passed
yield {
"name": key,
"Passed": str(passed),
"Failed": str(failed),
"Total": str(total),
"%Pass": f"{perc_pass:.0f}%"
}
print(list(to_grouped(entries)))
Output:
[{'name': 'A', 'Passed': '2', 'Failed': '1', 'Total': '3', '%Pass': '66%'}, {'name': 'B', 'Passed': '2', 'Failed': '0', 'Total': '2', '%Pass': '100%'}]
This will not create the final entry you're looking for, which sums the statistics of all other entries. Though, that shouldn't be too hard to do.

Remove duplicate dictionary from a list on the basis of key value by priority

Suppose I have the following type of list of dictionaries:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"},
{"Name": "Zeke", "Type": "Reader"}
]
I want to remove duplicates of "Name" on the basis of "Type" by the following priority (Admin > Writer > Reader), so the end result should be:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Writer"}
]
I found a similar question but it removes duplicates for one explicit type of key-value: Link
Can someone please guide me on how to move forward with this?
This is a modified form of the solution suggested by #azro, their solution and the other solution does not take into account the priority you mentioned, you can get over that using the following code. Have a priority dict.
iterlist = [
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"}
]
# this is used to get the priority
priorites = {i:idx for idx, i in enumerate(['Admin', 'Writer', 'Reader'])}
sort_key = lambda x:(x['Name'], priorites[x['Type']])
groupby_key = lambda x:x['Name']
result = [next(i[1]) for i in groupby(sorted(iterlist, key=sort_key), key=groupby_key)]
print(result)
Output
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
You can also use pandas in the following way:
transform the list of dictionary to data frame:
import pandas as pd
df = pd.DataFrame(iterlist)
create a mapping dict:
m = {'Admin': 3, 'Writer': 2, 'Reader': 1}
create a priority column using replace:
df['pri'] = df['Type'].replace(m)
sort_values by pri and groupby by Name and get the first element only:
df = df.sort_values('pri', ascending=False).groupby('Name').first().reset_index()
drop the pri column and return to dictionary using to_dict:
df.drop('pri', axis='columns').to_dict(orient='records')
This will give you the following:
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
Here is solution you can try out,
unique = {}
for v in iterlist:
# check if key exists, if not update to `unique` dict
if not unique.get(v['Name']):
unique[v['Name']] = v
print(unique.values())
dict_values([{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}])

Python - How to create a JSON nested file from a Pandas dataframe and group by?

so I'm having some troubles to create an appropriate JSON format from a pandas dataframe. My dataframe looks like this (sorry for the csv format):
first_date, second_date, id, type, codename, description, price
201901,201902,05555,111,01111,1,200.00
201901,201902,05555,111,023111,44,120.00
201901,201902,05555,111,14113,23,84.00
As you can see, the first four rows have repeated values, so I would like to group all my columns in two groups to get this JSON file:
[
{
"report":
{
"first_date":201901,
"second_date": 201902,
"id":05555,
"type": 111
},
"features": [
{
"codename":01111,
"description": 1,
"price":200.00
},
{
"codename":023111,
"description": 44,
"price":120.00
},
{
"codename":14113,
"description": 23,
"price":84.00
}
]
}
]
So far I've tried to group by the last three columns, add them to a dictionary and rename them:
cols = ["codename","description","price"]
rep = (df.groupby(["first_date","second_date","id","type"])[cols]
.apply(lambda x:x.to_dict('r')
.reset_index(name="features")
.to_json(orient="records"))
output = json.dumps(json.loads(rep),indent=4)
And I get this as the output:
[
{
"first_date":201901,
"second_date": 201902,
"id":05555,
"type": 111,
"features": [
{
"codename":01111,
"description": 1,
"price":200.00
},
{
"codename":023111,
"description": 44,
"price":120.00
},
{
"codename":14113,
"description": 23,
"price":84.00
}
]
}
]
Can anyone guide me to rename and group the first group of columns? Or does anyone knows another approach to this problem? I would like to do it this way since I have to repeat the same procedure but with more groups of columns and searching, this seems simpler than to create the son from several for loops.
Any advice sure will be helpful! I've been searching a lot but this is my first approach to this type of output. Thanks in advance!!!
see if this works for you :
#get rid of whitespaces if any
df.columns = df.columns.str.strip()
#split into two sections
fixed = df.columns[:4]
varying = df.columns[4:]
#create dicts for both fixed and varying
features = df[varying].to_dict('records')
report = df[fixed].drop_duplicates().to_dict('records')[0]
#combine into a dict into a list :
fin = [{"report":report,"features":features}]
print(fin)
[{'report': {'first_date': 201901,
'second_date': 201902,
'id': 5555,
'type': 111},
'features': [{'codename': 1111, 'description': 1, 'price': 200.0},
{'codename': 23111, 'description': 44, 'price': 120.0},
{'codename': 14113, 'description': 23, 'price': 84.0}]}]

Splitting a string in json using python

I have a simple Json file
input.json
[
{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "200/600"
},
{
"title": "Person1",
"type": "object2",
"required": "firstName1",
"min_max": "230/630"
},
{
"title": "Person2",
"type": "object2",
"required": "firstName2",
"min_max": "201/601"
},
{
"title": "Person3",
"type": "object3",
"required": "firstName3",
"min_max": "2000/6000"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "null"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "1024 / 256"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "0"
}
]
I am trying to create a new json file with new data. I would like to split "min_max" into two different fields ie., min and max. Below is the code written in python.
import json
input=open('input.json', 'r')
output=open('test.json', 'w')
json_decode=json.load(input)
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('title')
my_dict['min']=item.get('min_max')
my_dict['max']=item.get('min_max')
result.append(my_dict)
data=json.dumps(result, output)
output.write(data)
output.close()
How do I split the string into two different values. Also, is there any possibility of printing the json output in order.
Your JSON file seems to be written wrong (the example one). It is not a list. It is just a single associated array (or dictionary, in Python). Additionally, you don't seem to be using json.dumps properly. It only takes 1 argument. I also figured it would be easier to just create the dictionary inline. And you don't seem to be splitting the min_max properly.
Here's the correct input:
[{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "20/60"
}]
Here's your new code:
import json
with open('input.json', 'r') as inp, open('test.json', 'w') as outp:
json_decode=json.load(inp)
result = []
for temp in json_decode:
minMax = temp["min_max"].split("/")
result.append({
"title":temp["title"],
"min":minMax[0],
"max":minMax[1]
})
data=json.dumps(result)
outp.write(data)
Table + Python == Pandas
import pandas as pd
# Read old json to a dataframe
df = pd.read_json("input.json")
# Create two new columns based on min_max
# Removes empty spaces with strip()
# Returns [None,None] if length of split is not equal to 2
df['min'], df['max'] = (zip(*df['min_max'].apply
(lambda x: [i.strip() for i in x.split("/")]
if len(x.split("/"))== 2 else [None,None])))
# 'delete' (drop) min_max column
df.drop('min_max', axis=1, inplace=True)
# output to json again
df.to_json("test.json",orient='records')
Result:
[{'max': '600',
'min': '200',
'required': 'firstName',
'title': 'Person',
'type': 'object'},
{'max': '630',
'min': '230',
'required': 'firstName1',
'title': 'Person1',
'type': 'object2'},
{'max': '601',
'min': '201',
'required': 'firstName2',
'title': 'Person2',
'type': 'object2'},
{'max': '6000',
'min': '2000',
'required': 'firstName3',
'title': 'Person3',
'type': 'object3'},
{'max': None,
'min': None,
...
You can do something like this:
import json
nl=[]
for di in json.loads(js):
min_,sep,max_=map(lambda s: s.strip(), di['min_max'].partition('/'))
if sep=='/':
del di['min_max']
di['min']=min_
di['max']=max_
nl.append(di)
print json.dumps(nl)
This keeps the "min_max" values that cannot be separated into two values unchanged.

Write JSON values from array and nested array to single CSV

I have a JSON output, where I want to create a csv file, that contains two columns. The first column should contain the userId and the second column should contain the value of videoSeries. The output looks like this:
{
"start": 1490383076,
"stop": 1492975076,
"events": [
{
"time": 1491294219,
"customParameters": [
{
"group": "channelId",
"item": "dr3"
},
{
"group": "videoGenre",
"item": "unknown"
},
{
"group": "videoSeries",
"item": "min-mor-er-pink"
},
{
"group": "videoSlug",
"item": "min-mor-er-pink"
}
],
"userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16"
}
],
}
My csv should look like this:
--------------------------------------------------------------
User ID videoSeries
--------------------------------------------------------------
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
--------------------------------------------------------------
I have tried using ijson and pandas to get the desired output, but I am unable to get values from two different arrays into a single csv
import ijson
import pandas as pd
with open('MY JSON FILE', 'r') as f:
objects = ijson.items(f, 'events.item')
pandaReadable = list(objects)
df = pd.DataFrame(pandaReadable, columns=['userId', 'customParameters'])
df.to_csv('C:/Users/.../Desktop/output.csv', columns=['userId', 'customParameters'], index=False)
Try this approach:
d is a dictionary built from your JSON:
In [150]: d
Out[150]:
{'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'},
{'group': 'videoGenre', 'item': 'unknown'},
{'group': 'videoSeries', 'item': 'min-mor-er-pink'},
{'group': 'videoSlug', 'item': 'min-mor-er-pink'}],
'time': 1491294219,
'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}],
'start': 1490383076,
'stop': 1492975076}
Solution:
In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']]
...:
Out[153]:
userId item
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
if you need to have videoSeries as a column name:
In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']] \
...: .rename(columns={'item':'videoSeries'})
...:
Out[154]:
userId videoSeries
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink

Categories