pandas split list like object - python

Hi I have this column of data named labels:
[{'id': 123456,
'name': John,
'age': 22,
'pet': None,
'gender': male,
'result': [{'id': 'vEo0PIYPEE',
'type': 'choices',
'value': {'choices': ['Same Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
[{'id': 123457,
'name': May,
'age': 21,
'pet': None,
'gender': female,
'result': [{'id': zTHYuKIOQ',
'type': 'choices',
'value': {'choices': ['Different Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
......
Not sure what type is this, and I would like to break this down, to extract the value [Same Person], the outcome should be something like this:
0 [Same Person]
1 [Different Person]
....
How should I achieve this?

Based on the limited data that you have provided, would this work?
df['labels_new'] = df['labels'].apply(lambda x: x[0].get('result')[0].get('value').get('choices'))
labels labels_new
0 [{'id': 123456, 'name': 'John', 'age': 22, 'pe... [Same Person]
1 [{'id': 123457, 'name': 'May', 'age': 21, 'pet... [Different Person]
You can use the following as well, but I find dict.get() to be more versatile (returning default values for example) and has better exception handling.
df['labels'].apply(lambda x: x[0]['result'][0]['value']['choices'])
You could consider using pd.json_normalize (read more here) but for the current state of your column that you have, its going to be a bit complex to extract the data with that, rather than simply using a lambda function

Related

pandas create a one row dataframe from nested dict

I know the "create pandas dataframe from nested dict" has a lot of entries here but I'm not found the answer that applies to my problem:
I have a dict like this:
{'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}....,
and after reading with pandas look like this:
df = pd.DataFrame.from_dict(my_dict,orient='index')
print(df)
id 1
creator_user_id {'id': 12170254, 'name': 'Nicolas', 'email': '...
user_id {'id': 12264469, 'name': 'Daniela Giraldo G', ...
person_id {'active_flag': True, 'name': 'Cristina Cardoz...
org_id {'name': 'Cristina Cardozo', 'people_count': 1...
stage_id 2
title Cristina Cardozo
I would like to create a one-row dataframe where, for example, the nested creator_user_id column results in several columns that I after can name: creator_user_id_id, creator_user_id_name, etc.
thank you for your time!
Given you want one row, just use json_normalize()
pd.json_normalize({'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}})

MongoDB complex aggregation pipeline of Age fields

I have a MongoDB collection with documents that may have zero or more of the following fields: DOB (date of birth), YOB (year of birth) and Age. These may contain integers, floats or strings, and they may or may not be mappable to real values. For instance:
{'_id': id1, 'Age': 25},
{'_id': id2, 'Age': 'unknown', 'DOB': 'xxxx-xx-xx'},
{'_id': id3, 'Age': '8', 'DOB': '1988/01/05'},
{'_id': id4, 'YOB': '1995.0'},
{'_id': id5, 'DOB': '5/8/1886'},
{'_id': id6, 'Age': 17, 'YOB': 2003},
{'_id': id7},
...
For each document in the database, I need to extract a single standardized field, Age_Standardized, with the following criteria:
is an integer
is > 12
is < 99
I also need to implement an order of preference in the event multiple of the fields have a viable value, as DOB then YOB then Age.
So, for instance, if Age = 17 but DOB = '1900', Age_Standardized = 17 because, even though DOB exists (and is preferred to Age), it produces an Age_Standardized outside of the viable range (13-98).
If YOB = '1998.0' and Age = '19' then Age_Standardized = 23 because, even though Age exists and is viable, YOB is preferred and also viable.
I need to implement all of this over a large collection, and I was hoping to do it within a single PyMongo aggregation framework. In the examples above, the output would be:
{'_id': id1, 'Age': 25, 'Age_Standardized': 25},
{'_id': id2, 'Age': 'unknown', 'DOB': 'xxxx-xx-xx'},
{'_id': id3, 'Age': '8', 'DOB': '1988/01/05', 'Age_Standardized': 33},
{'_id': id4, 'YOB': '1995.0', 'Age_Standardized': 26},
{'_id': id5, 'DOB': '5/8/1886'},
{'_id': id6, 'Age': 17, 'YOB': 2003, 'Age_Standardized': 18},
{'_id': id7},

Converting data frame into a nested dictionary

Below is my subsetted data frame, I am having a hard time trying to convert it into my desired output as I am fairly new to Python. Essentially I want to create a nested dictionary inside a list, with the column names as a value, and then another nested dictionary inside a list. Is this doable?
import pandas as pd
Sector Community Name
0 centre: 10901.0 park: 3238.0
1 northeast: 6958.0 heights: 1955.0
Desired output:
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': 10901.0},
{'name': 'northeast', 'value': 6958.0}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': 3238.0},
{'name': 'heights', 'value': 1955.0},
{'name': 'hill', 'value': 1454.0}]}]
From #sushanth's answer, I may add up to this solution. Assume that your dataframe variable is defined as df.
result = []
for header in list(df):
column_values = df[header].to_list()
result.append({
"column" : header,
"value" : [dict(zip(['name', 'value'], str(value).split(":"))) for value in column_values]
})
Using pandas in above case might be a overkill, Here is a solution using python inbuilt functions which you can give a try,
input_ = {"Sector": ["centre: 10901.0", "northeast: 6958.0"],
"Community Name": ["park: 3238.0", "heights: 1955.0"]}
result = []
for k, v in input_.items():
result.append({
"column" : k,
"value" : [dict(zip(['name', 'value'], vv.split(":"))) for vv in v]
})
print(result)
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': ' 10901.0'},
{'name': 'northeast', 'value': ' 6958.0'}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': ' 3238.0'},
{'name': 'heights', 'value': ' 1955.0'}]}]

Pandas - Extracting values from a Dataframe column

I have a Dataframe in the below format:
cust_id, cust_details
101, [{'self': 'https://website.com/rest/api/2/customFieldOption/1', 'value': 'Type-A', 'id': '1'},
{'self': 'https://website.com/rest/api/2/customFieldOption/2', 'value': 'Type-B', 'id': '2'},
{'self': 'https://website.com/rest/api/2/customFieldOption/3', 'value': 'Type-C', 'id': '3'},
{'self': 'https://website.com/rest/api/2/customFieldOption/4', 'value': 'Type-D', 'id': '4'}]
102, [{'self': 'https://website.com/rest/api/2/customFieldOption/5', 'value': 'Type-X', 'id': '5'},
{'self': 'https://website.com/rest/api/2/customFieldOption/6', 'value': 'Type-Y', 'id': '6'}]
I am trying to extract for every cust_id all cust_detail values
Expected output:
cust_id, new_value
101,Type-A, Type-B, Type-C, Type-D
102,Type-X, Type-Y
Easy answer:
df['new_value'] = df.cust_details.apply(lambda ds: [d['value'] for d in ds])
More complex, potentially better answer:
Rather than storing lists of dictionaries in the first place, I'd recommend making each dictionary a row in the original dataframe.
df = pd.concat([
df['cust_id'],
pd.DataFrame(
df['cust_details'].explode().values.tolist(),
index=df['cust_details'].explode().index
)
], axis=1)
If you need to group values by id, you can do so via standard groupby methods:
df.groupby('cust_id')['value'].apply(list)
This may seem more complex, but depending on your use case might save you effort in the long-run.

Append Dates in Chronological Order

This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])

Categories