I want to combine some meta information together with a Pandas DataFrame as a JSON string.
I can call df.to_json(orient='values') to get the DataFrame's data as array, but how do I combine it with some additional data?
result = {
meta: {'some': 'meta info'},
data: [[dataframe.values], [list], [...]]
}
I could also ask: How do I merge a Python object (meta: {...}) into a serialised JSON string (df.to_json())?
You can always convert JSON into Python data.
import json
df_json = df.to_json(orient='values') # JSON
py_data = json.loads( df_json ) # Python data
result['extra_data'] = py_data # merge data
json_all = json.dumps( result ) # JSON again
EDIT:
I found better solution - use pandas.json.dumps
Standard module json got problem with numpy numbers used in dictionary made by pandas.
import pandas as pd
result = { 'meta': {'some': 'meta info'} }
df = pd.DataFrame([[1,2,3], [.1,.2,.3]], columns=('a','b','c'))
#result['extra_data'] = df.to_dict() # as dictonary
result['extra_data'] = df
print pd.json.dumps( result )
result
{
"extra_data":{
"a":{"0":1.0,"1":0.1},
"c":{"0":3.0,"1":0.3},
"b":{"0":2.0,"1":0.2}
},
"meta":{"some":"meta info"}
}
or
import pandas as pd
result = { 'meta': {'some': 'meta info'} }
df = pd.DataFrame([[1,2,3], [.1,.2,.3]], columns=('a','b','c'))
df_dict = df.to_dict()
df_dict['extra_data'] = result
print pd.json.dumps( df_dict )
result
{
"a":{"0":1.0,"1":0.1},
"c":{"0":3.0,"1":0.3},
"b":{"0":2.0,"1":0.2}
"extra_data":{"meta":{"some":"meta info"}},
}
Related
I have a JSON file and I need to convert that into CSV. But my JSON file contains JSON object which is an array and my all attributes are in that array but the code I am trying converts the first object into a single value but in actual I want all those attributes from JSON object.
JSON file content
{
"leads": [
{
"id": "31Y2V29CH0X82",
"product_type": "prelist"
},
{
"id": "2N649TAJBA50Z",
"product_type": "prelist"
}
],
"has_next_page": true,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"
}
Python code
import pandas as pd
df = pd.read_json (r'C:\Users\Ron\Desktop\Test\Product_List.json')
df.to_csv (r'C:\Users\Ron\Desktop\Test\New_Products.csv', index = None)
The output I am getting is as following
And the output I want
I want the attributes as CSV content with headers?
I think you'll have to do this row by row.
data = {"leads": [{"id": "31Y2V29CH0X82", "product_type": "prelist"}, {"id": "2N649TAJBA50Z", "product_type": "prelist"}], "has_next_page": True,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"}
headers = data.copy()
del headers['leads']
rows = []
for row in data['leads']:
row.update( headers )
rows.append( row )
import pandas as pd
df = pd.DataFrame( rows )
print(df)
Output:
id product_type has_next_page next_cursor
0 31Y2V29CH0X82 prelist True 2022-07-27T20:02:13.856000-07:00
1 2N649TAJBA50Z prelist True 2022-07-27T20:02:13.856000-07:00
I am trying to convert dictionary to json, and one of dictionary values is from dataframe.to_json, and I got some strange output as following:
Here is the code
import json
import pandas as pd
my_dict = {}
my_dict["ClassName"] = "First class"
# get student list
df = pd.read_csv("./test.csv")
my_dict["StudentList"] = df.to_json(orient='records')
# output
with open("./output.json", 'w') as fp:
json.dump(my_dict, fp, indent=4)
Here is the input file ./test.csv
Name,Age
Joe,20
Emily,22
John,21
Peter,23
Here is the output file ./output.json
{
"ClassName": "First class",
"StudentList": "[{\"Name\":\"Joe\",\"Age\":20},{\"Name\":\"Emily\",\"Age\":22},{\"Name\":\"John\",\"Age\":21},{\"Name\":\"Peter\",\"Age\":23}]"
}
Here is what I need:
{
"ClassName": "First class",
"StudentList": [{"Name":"Joe","Age":20},{"Name":"Emily","Age":22},{"Name":"John","Age":21},{"Name":"Peter","Age":23}]
}
Thanks for any help.
Use df.to_dict instead of df.to_json:
my_dict["StudentList"] = df.to_dict(orient='records')
to_json just returns a string representation of the JSON, while to_dict returns an actual JSON object.
Example: Desired output
{
"id": "",
"data": {
"package": ""
}
}
Here is the little script I have put together
import pandas as pd
df=pd.read_csv('example.csv')
df1=df[['request','text']]
dfnew=df1.rename(columns={'request':'id','text':'package'})
with open('something.json','w') as f:
f.write(dfnew.to_json(orient='records',lines=True))
Output I receive after running the script
{"id":"","package":}
I'll start with a mock dfnew since the code above it does not affect your problem.
If Pandas does not have a built-in method to export exactly what you want, you can manually manipulate the JSON before dumping it to file:
import json
dfnew = pd.DataFrame({
'id': [''],
'package': ['']
})
with open('something.json', 'w') as f:
jsonString = dfnew.to_json(orient='records', lines=True)
jsonObject = json.loads(jsonString)
package = jsonObject.pop('package')
jsonObject['data'] = {
'package': package
}
json.dump(jsonObject, f, indent=4)
I'm scraping data using the Twitter API, when I use the print command I can see all the data that i want, specifically as many rows of tweets and dates that I input.
However when I format the data into a pandas data frame/csv it only displays the first row of results. I'm really confused what to do and appreciate all help a lot. thanks :)
#importing key term and date of tweets from twitter archive
client_key = 'code'
client_secret = 'code'
import base64
key_secret = '{}:{}'.format(client_key, client_secret).encode('ascii')
b64_encoded_key = base64.b64encode(key_secret)
b64_encoded_key = b64_encoded_key.decode('ascii')
import requests
base_url = 'https://api.twitter.com/'
auth_url = '{}oauth2/token'.format(base_url)
auth_headers = {
'Authorization': 'Basic {}'.format(b64_encoded_key),
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
}
auth_data = {
'grant_type': 'client_credentials'
}
auth_resp = requests.post(auth_url, headers=auth_headers, data=auth_data)
auth_resp.status_code
auth_resp.json().keys()
access_token = auth_resp.json()['access_token']
search_headers = {
'Authorization': 'Bearer {}'.format(access_token)
}
search_params = {
'q': 'Key Term',
'count': 5,
'start_time' : '2019-1-1',
'end_time' : '2019-2-1',
'place.fields' : 'USA',
'lang' : 'en'
}
search_url = '{}1.1/search/tweets.json'.format(base_url)
search_resp = requests.get(search_url, headers=search_headers, params=search_params)
tweet_data = search_resp.json()
import numpy as np
import pandas as pd
for x in tweet_data['statuses']:
data = {'Date':[(x['created_at'])],'Text':[(x['text'])]}
df = pd.DataFrame(data)
df.to_csv("Tweet_data.csv")
print(df)
Hey before your loop define data=[], then inside your loop do data.append({…}).
What you have at the minute is a loop that at every iteration, creates a dictionary and assigns it to a variable called “data”. Overwriting the previous “data” assignment.
Then you are writing a csv with only one “data” row.
Hope that’s helpful!
I am reading API data from the cloud server in JSON format as shown here
How to write a code to store this data frame into the data frame in any database. How to convert JSON format to DataFrame?
The requirement output format is shown in table2
Here is an example:
from requests import request
import json
import pandas as pd
response = request(url="http://api.open-notify.org/astros.json", method='get')# API source
data=json.loads(response.text)['people']# pick the 'people' data source from json
pd.DataFrame(data) # convert to pandas dataframe
let me know if it works.
You can try this one, tell me if it works!
import request
import json
import pandas as pd
response = request(url='http://google.com') # Assuming the url
res = response.json()
df = pd.DataFrame(data)
Goodluck mate!
check out the docs here
import pandas as pd
df = pd.read_json("http://some.com/blah.json")
and as for storing it to a database you will need to know some things about your database connection. docs here
tablename = "my_tablename"
connection_values = <your sql alchemy connection here>
df.to_sql(name=tablename, con=connection_values)
To help others also, here is an example with nested json. I have tried to make this similar to the example you showed in your question.
import json
import pandas as pd
import pandas.io.json as pd_json
jsondata = '''
{
"source": { "id": "2480300" },
"time": "2013-07-02T16:32:30.152+02:00",
"type": "huawei_E3131SignalStrength",
"c8y_SignalStrength": {
"rssi": { "value": -53, "unit": "dBm" },
"ber": { "value": 0.14, "unit": "%" }
}
}
'''
data = pd_json.loads(jsondata) #load
df=pd_json.json_normalize(data) #normalise
df
Result:
c8y_SignalStrength.ber.unit c8y_SignalStrength.ber.value c8y_SignalStrength.rssi.unit c8y_SignalStrength.rssi.value source.id time type
% 0.14 dBm -53 2480300 2013-07-02T16:32:30.152+02:00 huawei_E3131SignalStrength