How to change dataframe into a specific json format? - python

I want to convert the below dataframe into the json formation as mentioned:
Dataframe:
Desired Ouput:
{"Adam": [{"name": "ABC", "y": 2.0}, {"name": "DEF", "y": 5}],
"John": [{"name": "GHI", "y": 29.01}, {"name": "FMI", "y": 219.77}]}
I tried creating the index of Party column and use df.to_json(orient="index") but its failing due to duplicate in index columns (Party column). Can someone please help.

Use custom lambda function in GroupBy.apply:
j = (df.groupby('Party')[['name','y']]
.apply(lambda x: x.to_dict('records'))
.to_json(orient="index"))
print (j)
{"Adam":[{"name":"ABC","y":2.0},{"name":"DEF","y":5.0}],
"John":[{"name":"GHI","y":29.01},{"name":"FMI","y":219.77}]}

Related

pandas df explode and implode to remove specific dict from the list

I have pandas dataframe with multiple columns. On the the column called request_headers is in a format of list of dictionaries, example:
[{"name": "name1", "value": "value1"}, {"name": "name2", "value": "value2"}]
I would like to remove only those elements from that list which do contains specific name. For example with:
blacklist = "name2"
I should get the same dataframe, with all the columns including request_headers, but it's value (based on the example above) should be:
[{"name": "name1", "value": "value1"}]
How to achieve it ? I've tried first to explode, then filter, but was not able to "implode" correctly.
Thanks,
Exploding is expensive, rather us a list comprehension:
blacklist = "name2"
df['request_headers'] = [[d for d in l if 'name' in d and d['name'] != blacklist]
for l in df['request_headers']]
Output:
request_headers
0 [{'name': 'name1', 'value': 'value1'}]
can use a .apply function:
blacklist = 'name2'
df['request_headers'] = df['request_headers'].apply(lambda x: [d for d in x if blacklist not in d.values()])
df1=pd.DataFrame([{"name": "name1", "value": "value1"}, {"name": "name2", "value": "value2"}])
blacklist = "name2"
col1=df1.name.eq(blacklist)
df1.loc[col1]
out:
name value
1 name2 value2

Is there a way to transform all unique values into a new dataframe using loop and at the same time create additional columns? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 8 months ago.
My problem is that I have a dataframe like this:
##for demonstration
import pandas as pd
example = {
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name": ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
}
#load into df:
example = pd.DataFrame(example)
print(example)
}
And I want to sort it by unique ID so that it will be reorganized like that:
#for demonstration
import pandas as pd
result = {
"ID": [1,2,3],
"place":["Maryland","Washington", "Los Angeles"],
"condition": ["depression", "fatigue", "fever"],
"condition1":["no", "depression", "no"],
"symptom": ["cough", "no", "no"],
"sky": ["no", "blue", "no"]
}
#load into df:
result = pd.DataFrame(result)
print(result)
I tried to sort it like:
example.nunique()
df_names = dict()
for k, v in example.groupby('ID'):
df_names[k] = v
However, this gives me back a dictionary and is not organized in a way it should.
Is there a way to do it with the loop like for all unique ID create a new column if there is condition, sky or others? If there are couple conditions that the next condition is becoming condition1. Could you please help me if you know the way to realize it?
This should give you the answers you need. It is a combination of cumsum() and pivot
import pandas as pd
df = pd.DataFrame({
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name": ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
})
df['type'] = df['type'].astype(str) + '_' + df.groupby(['place', 'type']).cumcount().astype(str)
df = df.pivot(index=['ID', 'place'], columns = 'type', values = 'name').reset_index()
df = df.fillna('no')
df.columns = df.columns.str.replace('_0', '')
df = df[['ID', 'place', 'condition', 'condition_1', 'symptom', 'sky']]
df

(python) How to convert a dictionary value into a pandas DataFrame

My problem is as follows:
I have a txt-file that holds nothing but a dictionary with one single key. The value to that one, single key is a huge list containing dictionaries as list entries. First key:value pair for comparison:
"data": [{"type": "utl", "id": "53150", "attributes": {"timestamp": "T13:00:00Z", "count": 0.0}}, [...etc.]
I tried the following method to convert the value of the single-keyed dictionary into a list by calling the .values method and then using list():
list_variable = list(dict_variable.values())
But it seems that this just converts the value into a list with just one index, for when I try to call index 0 the file crashes (list is too big) and if I try to call index 1 I get a KeyError stating that the index is out of range. (My current idea is to frist convert it into a list and THEN into a DataFrame)
I'm a bloody beginner and have no idea what else I could try. What am I missing?
Thanks a lot in advance! fpr your helpful comments!
looks like a json to me. try using pandas.json_normalize
d = {"data": [{"type": "utl", "id": "53150", "attributes": {"timestamp": "T13:00:00Z", "count": 0.0}}]}
pd.json_normalize(d['data'])
type id attributes.timestamp attributes.count
0 utl 53150 T13:00:00Z 0.0
Does the below codes help you?
test.txt
"data": [{"type": "utl", "id": "53150", "attributes": {"timestamp": "T13:00:00Z", "count": 0.0}}, {"type": "utl2", "id": "53151", "attributes": {"timestamp": "T12:00:00Z", "count": 1.0}}]
from re import findall
from pandas.io.json import json_normalize
with open("test.txt") as f:
print(json_normalize(eval(findall("{.+}", f.read())[0])))
Output:
type id attributes.timestamp attributes.count
0 utl 53150 T13:00:00Z 0.0
1 utl2 53151 T12:00:00Z 1.0

Loading JSON data into pandas data frame and creating custom columns

Here is example JSON im working with.
{
":#computed_region_amqz_jbr4": "587",
":#computed_region_d3gw_znnf": "18",
":#computed_region_nmsq_hqvv": "55",
":#computed_region_r6rf_p9et": "36",
":#computed_region_rayf_jjgk": "295",
"arrests": "1",
"county_code": "44",
"county_code_text": "44",
"county_name": "Mifflin",
"fips_county_code": "087",
"fips_state_code": "42",
"incident_count": "1",
"lat_long": {
"type": "Point",
"coordinates": [
-77.620031,
40.612749
]
}
I have been able to pull out select columns I want except I'm having troubles with "lat_long". So far my code looks like:
# PRINTS OUT SPECIFIED COLUMNS
col_titles = ['county_name', 'incident_count', 'lat_long']
df = df.reindex(columns=col_titles)
However 'lat_long' is added to the data frame as such: {'type': 'Point', 'coordinates': [-75.71107, 4...
I thought once I figured out how properly add the coordinates to the data frame I would then create two seperate columns, one for latitude and one for longitude.
Any help with this matter would be appreciated. Thank you.
If I don't misunderstood your requirements then you can try this way with json_normalize. I just added the demo for single json, you can use apply or lambda for multiple datasets.
import pandas as pd
from pandas.io.json import json_normalize
df = {":#computed_region_amqz_jbr4":"587",":#computed_region_d3gw_znnf":"18",":#computed_region_nmsq_hqvv":"55",":#computed_region_r6rf_p9et":"36",":#computed_region_rayf_jjgk":"295","arrests":"1","county_code":"44","county_code_text":"44","county_name":"Mifflin","fips_county_code":"087","fips_state_code":"42","incident_count":"1","lat_long":{"type":"Point","coordinates":[-77.620031,40.612749]}}
df = pd.io.json.json_normalize(df)
df_modified = df[['county_name', 'incident_count', 'lat_long.type']]
df_modified['lat'] = df['lat_long.coordinates'][0][0]
df_modified['lng'] = df['lat_long.coordinates'][0][1]
print(df_modified)
Here is how you can do it as well:
df1 = pd.io.json.json_normalize(df)
pd.concat([df1, df1['lat_long.coordinates'].apply(pd.Series) \
.rename(columns={0: 'lat', 1: 'long'})], axis=1) \
.drop(columns=['lat_long.coordinates', 'lat_long.type'])

extracting values by keywords in a pandas column

I have a column that is a list of dictionary. I extracted only the values by the name key, and saved it to a list. Since I need to run the column to a tfidVectorizer, I need the columns to be a string of words. My code is as follows.
def transform(s,to_extract):
return [object[to_extract] for object in json.loads(s)]
cols = ['genres','keywords']
for col in cols:
lst = df[col]
df[col] = list(map(lambda x : transform(x,to_extract='name'), lst))
df[col] = [', '.join(x) for x in df[col]]
for testing, here's 2 rows.
data = {'genres': [[{"id": 851, "name": "dual identity"},{"id": 2038, "name": "love of one's life"}],
[{"id": 5983, "name": "pizza boy"},{"id": 8828, "name": "marvel comic"}]],
'keywords': [[{"id": 9663, "name": "sequel"},{"id": 9715, "name": "superhero"}],
[{"id": 14991, "name": "tentacle"},{"id": 34079, "name": "death", "id": 163074, "name": "super villain"}]]
}
df = pd.DataFrame(data)
I'm able to extract the necessary data and save it accordingly. However, I find the codes too verbose, and I would like to know if there's a more pythonic way to achieve the same outcome?
Desired output of one row should be a string, delimited only by a comma. Ex, 'Dual Identity,love of one's life'.
Is this what you need ?
df.applymap(lambda x : pd.DataFrame(x).name.tolist())
Out[278]:
genres keywords
0 [dual identity, love of one's life] [sequel, superhero]
1 [pizza boy, marvel comic] [tentacle, super villain]
Update
df.applymap(lambda x : pd.DataFrame(x).name.str.cat(sep=','))
Out[280]:
genres keywords
0 dual identity,love of one's life sequel,superhero
1 pizza boy,marvel comic tentacle,super villain

Categories