How do I convert pandas dataframe to nested JSON object? - python

I have an SQL database that I need to fetch and convert to JSON. I am thinking that the first step to do that is to fetch the data from the database and load it as a dataframe, then convert the dataframe into JSON object.
Let's say I have the following dataframe.
df_school = pd.DataFrame({'id':[1,2,3,4], 'school_code': ['ABC', 'IJK', 'QRS', 'XYZ'], 'name': ['School A','School B', 'School C', 'School D'], 'type':['private', 'public', 'public', 'private']})
print(df_school)
I want to convert it to JSON with the following code.
import collections
object_list =[]
for idx, row in df_school.iterrows():
d = collections.OrderedDict()
d['id'] = row['id']
d['school_code'] = row['school_code']
d['name'] = row['name']
d['type'] = row['type']
object_list.append(d)
j = json.dumps(object_list)
object_list = 'school_objects.js'
f = open(object_list, 'w')
print(j)
But the result is string. It only looks like a JSON, but when I try to access the item inside the so-called JSON, like j[0] it prints [, not an item inside the JSON.
I also tried another approach, by converting the result from SQL directly to JSON.
query = "Select * from school;"
df_school = pd.read_sql_query(query, connection)
json_school = df_school.head(10).to_json(orient='records')
But I also still got string.
How do I convert it to real JSON object?

Given the provided df_school variable, we can just do j=df_school.to_json(orient='records') to turn it into a JSON formatted string.
Once we have j storing the JSON formatted string, if we want to do something with it, we first have to load the JSON into Python again using json.loads(j).
So if we do:
j = df_school.to_json(orient='records')
# parse j into Python
loaded_json = json.loads(j)
print(loaded_json[0])
# print outputs: {'id': 1, 'name': 'School A', 'school_code': 'ABC', 'type': 'private'}
Hope this helps!

import pandas as pd
import json
df_school = pd.DataFrame({'id':[1,2,3,4], 'school_code': ['ABC', 'IJK', 'QRS', 'XYZ'], 'name': ['School A','School B', 'School C', 'School D'], 'type':['private', 'public', 'public', 'private']})
str_school = df_school.to_json(orient='records')
json_school = json.loads(str_school)
json_school[0]
{'id': 1, 'school_code': 'ABC', 'name': 'School A', 'type': 'private'}

JSON is a string encoding of objects.
Once you use json.dumps() or similar, you'll get a string.

Try the below code, Hope this will help :
data = [{columns:df_school.iloc[i][columns] for columns in list(df_school.columns) } for i in range(df_school.shape[0]) ]
print(data)
print("***********************")
print(type(data[0]))
Ouput will be :
[{'id': 1, 'school_code': 'ABC', 'name': 'School A', 'type': 'private'},
{'id': 2, 'school_code': 'IJK', 'name': 'School B', 'type': 'public'},
{'id': 3, 'school_code': 'QRS', 'name': 'School C', 'type': 'public'},
{'id': 4, 'school_code': 'XYZ', 'name': 'School D', 'type': 'private'}]
*************************
<class 'dict'>

data={k:list(v.values()) for k,v in df_school.to_dict().items()}
{
'id': [1, 2, 3, 4],
'school_code': ['ABC', 'IJK', 'QRS', 'XYZ'],
'name': ['School A', 'School B', 'School C', 'School D'],
'type': ['private', 'public', 'public', 'private']
}

Related

Python parsing json from string

I need to parse json from a partial string I get back from a web service. I have the following snippet of code which is working fine but is extremely ugly. Is there a better or cleaner way to do this?
x = '"1":{"name":"item one","code":"1"},"2":{"name":"item two","code":"2"},"3":{"name":"item three","code":"3"}'
split = x.split('},')
index = 0
for s in split:
split[index] = '{' + s + '}}'
index += 1
joined = ','.join(split)
joined = '[' + joined[:-1] + ']'
j = json.loads(joined)
print(j)
Here is the result:
[{'1': {'name': 'item one', 'code': '1'}},
{'2': {'name': 'item two', 'code': '2'}},
{'3': {'name': 'item three', 'code': '3'}}]
You can use the following snippet:
>>> [dict([t]) for t in json.loads(f"{{{x}}}").items()]
[{'1': {'name': 'item one', 'code': '1'}},
{'2': {'name': 'item two', 'code': '2'}},
{'3': {'name': 'item three', 'code': '3'}}]
You can fix the inconsistency by hand (add the missing braces) and use json module to parse:
data = json.loads('{' + x + '}')
Then you can convert the parsed data to the desired representation:
[{item[0]: item[1]} for item in data.items()]
#[{'1': {'name': 'item one', 'code': '1'}},
# {'2': {'name': 'item two', 'code': '2'}},
# {'3': {'name': 'item three', 'code': '3'}}]
Otherwise, you will end up implementing your own JSON parser, which is not trivial.

Force a string to be interpreted as a list?

I've got a data object that was created by creating what I assume is a json object -
jsonobj = {}
jsonobj["recordnum"] = callpri
and then pushing that onto a list, as there is more than one of them -
myList = []
myList.append(jsonobj)
Then that gets passed back and forth between flask subroutines and Jinja2 templates, until it ends up a few steps later coming into the function where I need to access that data, and it comes in looking something like so -
techlist: [{'recordnum': '1', 'name': 'Person 1', 'phonenumber': '123-456-7890', 'email': 'person1#company.tld', 'maxnumtechs': 'ALL'}, {'recordnum': '2', 'name': 'Person 2', 'phonenumber': '098-765-4321', 'email': 'person2#company.tld', 'maxnumtechs': 'ALL'}, {'recordnum': '3', 'name': 'Person 3', 'phonenumber': '567-890-1234', 'email': 'person3#company.tld', 'maxnumtechs': 'ALL'}]
I tried a for tech in techlist: print(tech['recordnum']) type deal and got an error, so I started printing types for everything and it's all strings. The for tech in techlist is I think just splitting it all into words even, which is obviously not what I want at all.
I tried messing around with json.loads on techlist, but it complained about expecting an entry in double quotes or something along those lines. I'm totally stumped, and would really appreciate if someone could please tell me how to turn this string back into a list of dicts or a list of json objects, or whatever it takes for me to be able to iterate through the items and access specific fields.
Response to comments about it working right:
It's coming in as a string for me, and I think for the two of you it is working for, you're creating it as a list, so it would work correctly ... sadly, that's my problem, it's a string, not a list, so it's doing this -
(env) [me#box directory]$ cat test.py
techlist = "[{'recordnum': '1', 'name': 'Person 1', 'phonenumber': '123-456-7890', 'email': 'person1#company.tld', 'maxnumtechs': 'ALL'}, {'recordnum': '2', 'name': 'Person 2', 'phonenumber': '098-765-4321', 'email': 'person2#company.tld', 'maxnumtechs': 'ALL'}, {'recordnum': '3', 'name': 'Person 3', 'phonenumber': '567-890-1234', 'email': 'person3#company.tld', 'maxnumtechs': 'ALL'}]"
print(type(techlist))
for tech in techlist:
print(type(tech))
print(str(tech))
(env) [me#box directory]$
(env) [me#box directory]$
(env) [me#box directory]$ python test.py
<class 'str'>
<class 'str'>
[
<class 'str'>
{
<class 'str'>
'
<class 'str'>
r
<class 'str'>
e
<snip>
Update:
Trenton McKinney 's comment worked PERFECTLY, THANK YOU!! If you're so inclined as to post it as a answer I'll accept it as the solution. Thank you thank you thank you!!
Convert the string back to dict:
Use ast.literal_eval to evaluate the string
from ast import literal_eval
techlist = """[{'recordnum': '1', 'name': 'Person 1', 'phonenumber': '123-456-7890', 'email': 'person1#company.tld', 'maxnumtechs': 'ALL'},
{'recordnum': '2', 'name': 'Person 2', 'phonenumber': '098-765-4321', 'email': 'person2#company.tld', 'maxnumtechs': 'ALL'},
{'recordnum': '3', 'name': 'Person 3', 'phonenumber': '567-890-1234', 'email': 'person3#company.tld', 'maxnumtechs': 'ALL'}]"""
print(type(techlist))
>>> <class 'str'>
techlist = literal_eval(techlist)
print(type(techlist))
>>> <class 'list'>
print(techlist)
# output
[{'email': 'person1#company.tld',
'maxnumtechs': 'ALL',
'name': 'Person 1',
'phonenumber': '123-456-7890',
'recordnum': '1'},
{'email': 'person2#company.tld',
'maxnumtechs': 'ALL',
'name': 'Person 2',
'phonenumber': '098-765-4321',
'recordnum': '2'},
{'email': 'person3#company.tld',
'maxnumtechs': 'ALL',
'name': 'Person 3',
'phonenumber': '567-890-1234',
'recordnum': '3'}]
Top answer are about string, but option 3 is about dealing with dictionaries with pandas
Option 1
from csv import reader
import pandas as pd
data=[str]
df=pd.DataFrame( list(reader(data)))
print(df)
results = df[col].to_list()
Option 2
Other wise just split str by a value
result = str.split(',')
Option 3 (I am pretty sure this what you want):
df = pd.DataFrame(techlist)
results = df['recordnum'].to_list()
Hopefully one of those answer is good enough, cause your question is confusing

Python sorting str price with two decimal points

My goal: Sort a list of Products (dict) first by Price, then by Name.
My problem: Str values with numbers in them aren't sorted properly (AKA "Human sorting" or "Natural Sorting").
I found this function from a similar question:
Python sorting list of dictionaries by multiple keys
def multikeysort(items, columns):
from operator import itemgetter
comparers = [((itemgetter(col[1:].strip()), -1) if col.startswith('-') else
(itemgetter(col.strip()), 1)) for col in columns]
def comparer(left, right):
for fn, mult in comparers:
result = cmp(fn(left), fn(right))
if result:
return mult * result
else:
return 0
return sorted(items, cmp=comparer)
The problem is that my Prices are str type, like this:
products = [
{'name': 'Product 200', 'price': '3000.00'},
{'name': 'Product 4', 'price': '100.10'},
{'name': 'Product 15', 'price': '20.00'},
{'name': 'Product 1', 'price': '5.05'},
{'name': 'Product 2', 'price': '4.99'},
]
So they're getting sorted alphabetically, like this:
'100.10'
'20.10'
'3000.00'
'4.99'
'5.05'
Similarly, when I sort by name, I get this:
'Product 1'
'Product 15'
'Product 2'
'Product 200'
'Product 4'
The names should be listed in "human" order (1,2,15 instead of 1,15,2). Is it possible to fix this? I'm pretty new to python, so maybe I'm missing something vital. Thanks.
EDIT
More Info: I'm sending the list of products to a Django template, which requires the numbers to be properly formatted. If I float the prices and then un-float them, I have to iterate through the list of products twice, which seems like overkill.
Your sort function is overkill. Try this simple approach:
from pprint import pprint
products = [
{'name': 'Product 200', 'price': '3000.00'},
{'name': 'Product 4', 'price': '100.10'},
{'name': 'Product 15', 'price': '20.00'},
{'name': 'Product 1', 'price': '5.05'},
{'name': 'Product 2', 'price': '4.99'},
]
sorted_products = sorted(products, key=lambda x: (float(x['price']), x['name']))
pprint(sorted_products)
Result:
[{'name': 'Product 2', 'price': '4.99'},
{'name': 'Product 1', 'price': '5.05'},
{'name': 'Product 15', 'price': '20.00'},
{'name': 'Product 4', 'price': '100.10'},
{'name': 'Product 200', 'price': '3000.00'}]
The essence of my solution is to have the key function return a tuple of the sort conditions. Tuples always compare lexicographically, so the first item is the primary sort, the second is the secondary sort, and so on.
I think your best bet is to parse the prices as floats (so you can sort them):
float("1.00")
# output: 1.0
Then output them with two decimal places:
"{:.2f}".format(1.0)
# output: "1.00"
Try typecasting them to floats in the question and when you need to print 2 decimal places, you can easily format the output like so:
float_num = float("110.10")
print "{0:.2f}".format(float_num) # prints 110.10
To break ties should there be any sorting the strings using the integer value from the product, you can return a tuple:
products = [
{'name': 'Product 200', 'price': '2.99'},
{'name': 'Product 4', 'price': '4.99'},
{'name': 'Product 15', 'price': '4.99'},
{'name': 'Product 1', 'price': '9.99'},
{'name': 'Product 2', 'price': '4.99'},
]
def key(x):
p, i = x["name"].rsplit(None, 1)
return float(x["price"]), p, int(i)
sorted_products = sorted(products, key=key)
Which would give you:
[{'name': 'Product 200', 'price': '2.99'},
{'name': 'Product 2', 'price': '4.99'},
{'name': 'Product 4', 'price': '4.99'},
{'name': 'Product 15', 'price': '4.99'},
{'name': 'Product 1', 'price': '9.99'}]
As opposed to:
[{'name': 'Product 200', 'price': '2.99'},
{'name': 'Product 15', 'price': '4.99'},
{'name': 'Product 2', 'price': '4.99'},
{'name': 'Product 4', 'price': '4.99'},
{'name': 'Product 1', 'price': '9.99'}]
using just float(x['price']), x['name']

Add an item with common key to a dictionary in python

I have this:
items = {{'project':'Project 1','description':'Task description','time':1222222},
{'project':'Project 2','description':'Task description 2','time':1224322},
{'project':'Project 1','description':'Task description 3','time':13222152}}
And I need something like this:
resultitems = {
'project':'Project 1','pritems':{
{'description':'Task description','time':1222222},
{'description':'Task description 3','time':13222152}},
'project':'Project 2',pritems':{
{'description':'Task description 2','time':1224322}},
}
of simply the name of each project as a key
I've tried this approach:
resultitems = {}
resultitems['Project 2'] = {}
resultitems['Project 2'].update(..)
update does not work, since it replaces the previous value
in php, it was easy,
$resultitems['Project 2'][] = array(...)
but don't find the way to do this in Python
result_items = {
'house project': [{'task': 'cleaning', 'hours': 20}, {'task': 'painting', 'hours: 30', etc.],
'garden project': [{'task': 'mowing the lawn', 'hours': 1, etc.
etc.
}
Your variable 'items' is not correct. If it is a list of dictionaries, it should be:
items = [{...}, {...}, {...}]
Please write the source of the data, from where do you get the data. This will determine the way you will fill in the desired dictionary. If you already have the data as in 'items' (i.e. a list of dictionaries), then here is how to converted it:
items = [{'project':'Project 1','description':'Task description','time':1222222},
{'project':'Project 2','description':'Task description 2','time':1224322},
{'project':'Project 1','description':'Task description 3','time':13222152}]
dct = {}
for e in items :
if e['project'] not in dct :
dct[e['project']] = []
dct[e['project']].append(dict([(k, v) for k,v in e.items() if k != 'project']))
print dct
and output is:
{'Project 2': [{'description': 'Task description 2', 'time': 1224322}], 'Project 1': [{'description': 'Task description', 'time': 1222222}, {'description': 'Task description 3', 'time': 13222152}]}
Finally, I used this:
newdata = {}
for data in result['data']:
try:
newdata[data['project']].append({"description":data['description'],"start":data['start'],"time":data['dur']})
except:
newdata[data['project']] = []
newdata[data['project']].append({"description":data['description'],"start":data['start'],"time":data['dur']})
print newdata
And the result has been like this, and this is what I needed:
{
u'Project 1': [
{'start': u'2015-07-09T18:09:41-03:00', 'description': u'Task 1 name', 'time': 1432000},
{'start': u'2015-07-09T17:42:36-03:00', 'description': u'Task 2 name', 'time': 618000}
],
u'Project 2': [
{'start': u'2015-07-09T20:14:16-03:00', 'description': u'Other Task Name', 'time': 4424000}
],
u'Project 3': [
{'start': u'2015-07-09T22:29:51-03:00', 'description': u'another task name for pr3', 'time': 3697000},
{'start': u'2015-07-09T19:38:02-03:00', 'description': u'something more to do', 'time': 59000},
{'start': u'2015-07-09T19:11:49-03:00', 'description': u'Base tests', 'time': 0},
{'start': u'2015-07-09T19:11:29-03:00', 'description': u'Domain', 'time': 0}
],
u'Project something': [
{'start': u'2015-07-09T19:39:30-03:00', 'description': u'Study more', 'time': 2069000},
{'start': u'2015-07-09T15:46:39-03:00', 'description': u'Study more (2)', 'time': 3800000},
{'start': u'2015-07-09T11:46:00-03:00', 'description': u'check forms', 'time': 660000}
]
}
by the way, I was no asking about the structure itself.. instead what I needed was someway to program a "something like this" structure.

Merging similar dictionaries in a list together

New to python here. I've been pulling my hair for hours and still can't figure this out.
I have a list of dictionaries:
[ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
.
.
.
.
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ]
I want to merge the dictionaries in the list based on their Type, Name, and Taxonomy ID
[ {'FX0XST001.MID5': '195', 'FX0XST001.MID13': '4929', 'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
.
.
.
.
{'FX0XST001.MID6': '125', 'FX0XST001.MID25': '70', 'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}]
I have the data structure setup like this because I need to write the data to CSV using csv.DictWriter later.
Would anyone kindly point me to the right direction?
You can use the groupby function for this:
http://docs.python.org/library/itertools.html#itertools.groupby
from itertools import groupby
keyfunc = lambda row : (row['Type'], row['Taxonomy ID'], row['Name'])
result = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
# you can either add the matching rows to the item so you end up with what you wanted
item = {}
for row in g:
item.update(row)
result.append(item)
# or you could just add the matched rows as subitems to a parent dictionary
# which might come in handy if you need to work with just the parts that are
# different
item = {'Type': k[0], 'Taxonomy ID' : k[1], 'Name' : k[2], 'matches': [])
for row in g:
del row['Type']
del row['Taxonomy ID']
del row['Name']
item['matches'].append(row)
result.append(item)
Make some test data:
list_of_dicts = [
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "hair":"brown", "eyes":"green"},
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "height":"6'2''", "weight":200},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "hair":"black", "eyes":"hazel"},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "height":"5'7''", "weight":145}
]
I think this (below) is a neat trick using reduce that improves upon the other groupby solution.
import itertools
def key_func(elem):
return (elem["Taxonomy ID"], elem["Name"], elem["Type"])
output_list_of_dicts = [reduce((lambda x,y: x.update(y) or x), list(val)) for key, val in itertools.groupby(list_of_dicts, key_func)]
Then print the output:
for elem in output_list_of_dicts:
print elem
This prints:
{'eyes': 'green', 'Name': 'Bob', 'weight': 200, 'Taxonomy ID': 1, 'hair': 'brown', 'height': "6'2''", 'Type': 'M'}
{'eyes': 'hazel', 'Name': 'Alice', 'weight': 145, 'Taxonomy ID': 2, 'hair': 'black', 'height': "5'7''", 'Type': 'F'}
FYI, Python Pandas is far better for this sort of aggregation, especially when dealing with file I/O to .csv or .h5 files, than the itertools stuff.
Perhaps the easiest thing to do would be to create a new dictionary, indexed by a (Type, Name, Taxonomy ID) tuple, and iterate over your dictionary, storing values by (Type, Name, Taxonomy ID). Use a default dict to make this easier. For example:
from collections import defaultdict
grouped = defaultdict(lambda : {})
# iterate over items and store:
for entry in list_of_dictionaries:
grouped[(entry["Type"], entry["Name"], entry["Taxonomy ID"])].update(entry)
# now you have everything stored the way you want in values, and you don't
# need the dict anymore
grouped_entries = grouped.values()
This is a bit hackish, especially because you end up overwriting "Type", "Name", and "Phylum" every time you use update, but since your dict keys are variable, that might be the best you can do. This will get you at least close to what you need.
Even better would be to do this on your initial import and skip intermediate steps (unless you actually need to transform the data beforehand). Plus, if you could get at the only varying field, you could change the update to just: grouped[(type, name, taxonomy_id)][key] = value where key and value are something like: 'FX0XST001.MID5', '195'
from itertools import groupby
data = [ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type':'phylum'},
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ,]
kk = ('Name', 'Taxonomy ID', 'Type')
def key(item): return tuple(item[k] for k in kk)
result = []
data = sorted(data, key=key)
for k, g in groupby(data, key):
result.append(dict((i, j) for d in g for i,j in d.items()))
print result

Categories