JSON format to Data Frame in Python - python

I am reading API data from the cloud server in JSON format as shown here
How to write a code to store this data frame into the data frame in any database. How to convert JSON format to DataFrame?
The requirement output format is shown in table2

Here is an example:
from requests import request
import json
import pandas as pd
response = request(url="http://api.open-notify.org/astros.json", method='get')# API source
data=json.loads(response.text)['people']# pick the 'people' data source from json
pd.DataFrame(data) # convert to pandas dataframe
let me know if it works.

You can try this one, tell me if it works!
import request
import json
import pandas as pd
response = request(url='http://google.com') # Assuming the url
res = response.json()
df = pd.DataFrame(data)
Goodluck mate!

check out the docs here
import pandas as pd
df = pd.read_json("http://some.com/blah.json")
and as for storing it to a database you will need to know some things about your database connection. docs here
tablename = "my_tablename"
connection_values = <your sql alchemy connection here>
df.to_sql(name=tablename, con=connection_values)

To help others also, here is an example with nested json. I have tried to make this similar to the example you showed in your question.
import json
import pandas as pd
import pandas.io.json as pd_json
jsondata = '''
{
"source": { "id": "2480300" },
"time": "2013-07-02T16:32:30.152+02:00",
"type": "huawei_E3131SignalStrength",
"c8y_SignalStrength": {
"rssi": { "value": -53, "unit": "dBm" },
"ber": { "value": 0.14, "unit": "%" }
}
}
'''
data = pd_json.loads(jsondata) #load
df=pd_json.json_normalize(data) #normalise
df
Result:
c8y_SignalStrength.ber.unit c8y_SignalStrength.ber.value c8y_SignalStrength.rssi.unit c8y_SignalStrength.rssi.value source.id time type
% 0.14 dBm -53 2480300 2013-07-02T16:32:30.152+02:00 huawei_E3131SignalStrength

Related

DataFrame constructor not properly called when using data in HTML file

I would like to put some data in a html file into a pandas dataframe but I'm getting the error '. My data has the following structures. It is the data between the square brackets after lots I would like to put into a dataframe but I'm pretty confused as to what type of object this is.
html_doc = """<html><head><script>
"unrequired_data = [{"ID":XXX, "Name":XXX, "Price":100GBP, "description": null },
{"ID":XXX, "Name":XXX, "Price":150GBP, "description": null },
{"ID":XXX, "Name":XXX, "Price":150GBP, "description": null }]
"lots":[{"ID":123, "Name":ABC, "Price":100, "description": null },
{"ID":456, "Name":DEF, "Price":150, "description": null },
{"ID":789, "Name":GHI, "Price":150, "description": null }]
</script></head></html>"""
I have tried the following code
from bs4 import BeautifulSoup
import pandas as pd
soup = BeautifulSoup(html_doc)
df = pd.DataFrame("lots")
The output I would like to get would be in this format.
Your data is not valid JSON, so you need to fix it.
I would use:
from bs4 import BeautifulSoup
import pandas as pd
import json, re
soup = BeautifulSoup(html_doc)
# extract script
script = soup.find("script").text.strip()
# get first value that starts with "lot"
data = next((s.split(':', maxsplit=1)[-1] for s in re.split('\n{2,}', script) if s.startswith('"lots"')), None)
# fix the json
if data:
data = (re.sub(r':\s*([^",}]+)\s*', r':"\1"', data))
df = pd.DataFrame(json.loads(data))
print(df)
Output:
ID Name Price description
0 123 ABC 100 null
1 456 DEF 150 null
2 789 GHI 150 null

Scraped json data want to output CSV file

So I have this data that I scraped
[
{
"id": 4321069,
"points": 52535,
"name": "Dennis",
"avatar": "",
"leaderboardPosition": 1,
"rank": ""
},
{
"id": 9281450,
"points": 40930,
"name": "Dinh",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/aeaf8cc1-65b2-4d07-a838-1f078bbd2b60.png",
"leaderboardPosition": 2,
"rank": ""
},
{
"id": 1087209,
"points": 26053,
"name": "Sophia",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/c3e9ffb1-df72-46e8-9cd5-c66a000e98fa.png",
"leaderboardPosition": 3,
"rank": ""
And so on... Big leaderboard of 20 ppl
Scraped with this code
import json
import requests
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
# print for all time:
data = requests.get(url_all_time).json()
# for item in data:
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for item in data:
print(item['name'], item['points'])
And I want to be able to create a table that ressembles this
Every time I scrape data, I want it to update the table with the number of points with a new data stamped as the header. So basically what I was thinking is that my index = usernames and the header = date. The problem is, I can't even get to make a csv file with that NAME/POINTS columns.
The only thing I have succeeded doing so far is writing ALL the data into a csv file. I haven't been able to pinpoint the data I want like in the print command.
EDIT : After reading what #Shijith posted I succeeded at transferring data to .csv but with what I have in mind (add more data as time flies), I was asking myself if I should do a code with an Index or without.
WITH
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table.to_csv('products.csv', index=True, encoding='utf-8')
WITHOUT
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, columns=['points','name'])
table.to_csv('products.csv', index=False, encoding='utf-8')
Have you tried just reading the json directly into a pandas dataframe? From here it should be pretty easy to transform it like you want. You could add a column for today's date and pivot it.
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
df = pd.read_json(url_all_time)
data['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
data.pivot(index='name',columns='date',values='points')

How to save API response to csv with Python

I'm getting facedetection data from an API in this form:
{"id":1,"ageMin":0,"ageMax":100,"faceConfidence":66.72220611572266,"emotion":"ANGRY","emotionConfidence":50.0'
b'2540969848633,"eyeglasses":false,"eyeglassesConfidence":50.38102722167969,"eyesOpen":true,"eyesOpenConfidence":50.20328140258789'
b',"gender":"Male","genderConfidence":50.462989807128906,"smile":false,"smileConfidence":50.15522384643555,"sunglasses":false,"sun'
b'glassesConfidence":50.446510314941406}]'
I'd like to save this to a csv-file like this:
id ageMin ageMax faceConfidence
1 0 100 66
... and so on.
I tried to do it this way:
response = requests.get(url, headers=headers)
with open('detections.csv', 'w') as f:
writer = csv.writer(f)
for item in response:
writer.writerow(str(item))
That puts every char in its own cell. I've also tried to use item.id, but that gives an error: AttributeError: 'bytes' object has no attribute 'id'.
Could someone point me to the right direction?
Maybe an overkill for a small task, but you can do the following:
convert JSON response (do not forget to check exceptions, etc.) to python dictionary
dic = response.json()
Create a dataframe, for example using pandas:
df = pandas.DataFrame(dic)
Save to csv omitting index:
df.to_csv('detections.csv', index=False, sep="\t")
You can do this relatively easily with the pandas and json libraries.
import pandas as pd
import json
response = """{
"id": 1,
"ageMin": 0,
"ageMax": 100,
"faceConfidence": 66.72220611572266,
"emotion": "ANGRY",
"emotionConfidence": 50.0,
"eyeglasses": false,
"eyeglassesConfidence": 50.38102722167969,
"eyesOpen": true,
"eyesOpenConfidence": 50.20328140258789,
"gender": "Male",
"genderConfidence": 50.462989807128906,
"smile": false,
"smileConfidence": 50.15522384643555,
"sunglasses": false,
"glassesConfidence":50.446510314941406
}"""
file = json.loads(doc)
json = pd.DataFrame({"data": file})
json.to_csv("response.csv")
This is the response formatted to csv.
,data
ageMax,100
ageMin,0
emotion,ANGRY
emotionConfidence,50.0
eyeglasses,False
eyeglassesConfidence,50.38102722167969
eyesOpen,True
eyesOpenConfidence,50.20328140258789
faceConfidence,66.72220611572266
gender,Male
genderConfidence,50.462989807128906
glassesConfidence,50.446510314941406
id,1
smile,False
smileConfidence,50.15522384643555
sunglasses,False

Parsing values from JSON using Python

I wish to get the value of consumptionSavings from the following JSON format stored as .txt file.
{
"_id": "58edf905746de21c401a3dce",
"sites": [{
"ecms": [{
"consumptionSavings": 148,
"equipmentCost": 3455,
{
"energySource": "Electricity",
"consumptionReduction": {
"amount": 345435,
"unit": "MWh"
},
"projectDurationMonths": 36
}
}
}
]
]
}
I wrote the following code to extract the value of consumptionSavings;
import xlwings as xw
import pandas as pd
import json
data = json.load(open('data.txt'))
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'data':[data["sites"]["ecms"]["consumptionSavings"]]})
wb = xw.Book('Values.xlsx')
ws = wb.sheets['Sheet1']
ws.range('C3').options(index=False).value = df
wb = xw.Book('Result.xlsx')
wb.save()
xw.apps[0].quit()
and It returns the following error:
TypeError: list indices must be integers or slices, not str
I am bit confused how that could be. Thank you

csv to json in python

Hey so I have some hash ids in a csv file like
XbRPhe65YbC+xtgGQ8ukeZEr9xFOC4MEs9Z0wUidGSec=
XbRPhe65YbC+xtgGQ8uksrqSUJ/HhTPj1d2pL0/vuGrHM=
and I want to parse them into python wrap them in some additional code like
{"id" :"XbRPshe65YbC+xtGQ8ukqR2u2btfNeNe2gtcs72QbxPA=", "timestamp":"20150831"},
and then wrap all of that in some JSON syntax. This is then sent as a post request. Problem is I cannot seem to make it JSON readable. Everything seems to be ordered wrong and I am getting extra \.
import os
import pandas as pd
from pprint import pprint
df=pd.read_csv('test.csv',sep=',',header=None)
df[0] = '{"id" :"' + df[0].astype(str) + '", "timestamp":"20150831"}, '
df = df[:-1] # removes last comma
test = 'hello'
data =[ { "ids":[ df[0]],
"attributes":[
{
"name":"girl"
},
{
"name":"size"
}
]
}
]
json1 = data.to_json()
print(json1)
I agree that pandas doesn't seem to be the simplest tool for the job here. The built-in libraries will work great:
import csv
import json
with open('test.csv', newline='') as csvfile:
csvreader = csv.reader(csvfile)
data = {
"ids": [{"id": row[0], "timestamp": "20150831"} for row in csvreader],
"attributes": [
{"name": "girl"},
{"name": "size"}
]
}
json1 = json.dumps(data)
print(json1)

Categories