Converting Flattened JSON to Dataframe in Python 2.7

Converting Flattened JSON to Dataframe in Python 2.7 - python

I am trying to read some data using REST API and write that on a DB table. I have written the below code. But unfortunately, I am kind of stuck with the flattened JSON. Can you please help with a way to convert JSON to Data frame.
Code
import requests
import json
import pandas
from pandas.io.json import json_normalize
from flatten_json import flatten
j_username = 'ABCD'
j_password = '12456'
query = '"id = 112233445566"'
print query
r=requests.get('Url' % query, auth= (j_username,j_password))
print r.json()
first_response = r.json()
string_data = json.dumps(r.json())
normalized_r = json_normalize(r.json())
print flatten(r.json())
r_flattened = flatten(r.json())
r_flattened_str = json.dumps(flatten(r.json()))
print type (flatten(r.json()))
Flattened JSON Output is as below
{
'data_0_user-35': u'Xyz',
'data_0_user-34': None,
'data_0_user-37': u'CC',
'data_0_user-36': None,
'data_0_user-31': u'Regular',
'data_0_user-33': None,
'data_0_user-32': None,
'data_0_target-rcyc_id': 0101,
'data_0_to-mail': None,
'data_0_closing-version': None,
'data_0_user-44': None,
'data_0_test-reference': None,
'data_0_request-server': None,
'data_0_target-rcyc_type': u'regular type',
'data_0_project': None,
'data_0_user-01': u'Application Name',
'data_0_user-02': None,
'data_0_user-03': None, .......
.......
......
..... }
Expected Output is
data_0_user-35 data_0_user-34 data_0_user-37 .........
XYZ None CC ........

I finally cracked this. This code will read the data from a REST API and convert that into a data frame and eventually write in a Oracle database. Thanks to my friend and some of the wonderful people in the community whose answers helped me to come to this.
import requests
from pandas.io.json import json_normalize
import datetime as dt
import pandas as pd
import cx_Oracle
date = dt.datetime.today().strftime("%Y-%m-%d")
date = "'%s'" % date
query2 = '"creation-time=%s"' % date
r = requests.get('url?query=%s' % query2,
auth=('!username', 'password#'))
response_data_json = r.json()
response_data_normalize = json_normalize(response_data_json['data'])
subset = response_data_normalize.loc[:, ('value1', 'value2')]
Counter = subset['value1'].max()
converted_value = getattr(Counter, "tolist", lambda x=Counter: x)()
frame = pd.DataFrame()
for i in range(2175, converted_value + 1): #2175 is just a reference number to start the comparison from....specific to my work
id = '"id = %s"' % i
r = requests.get('url?&query=%s' % id, auth=('!username', 'password#'))
response_data_json1 = r.json()
response_data_normalize1 = json_normalize(response_data_json1['data'])
sub = response_data_normalize1.loc[:, ('value1', 'value2', 'value3', 'value4')]
frame = frame.append(sub, ignore_index=True)
con = cx_Oracle.connect('USERNAME','PASSWORD',cx_Oracle.makedsn('HOSTNAME',PORTNUMBER,'SERVICENAME'))
cur = con.cursor()
rows = [tuple(x) for x in frame.values]
print rows
cur.executemany('''INSERT INTO TABLENAME(Value1, Value2,Value3,Value4) VALUES (:1,:2,:3,:4)''',rows)
con.commit()
cur.close()
con.close()

Related

How to fix number of returned rows in Pandas?

I build a code that extracting data from YouTube by search query, and now I need to convert my output data into the pandas data frame, so later I will be able to export this as .csv.
But now I stuck one the issue that my pf.DataFrame actually return me only first row of parsed data instead of full massive. Please help!
Example: I want pandas give me back same row number as the maxResults im searching for
Now: pandas give me back only first line info from parsed data no matter how much data was found
Scraping code:
api_key = "***"
from googleapiclient.discovery import build
from pprint import PrettyPrinter
from google.colab import files
youtube = build('youtube','v3',developerKey = api_key)
print(type(youtube))
pp = PrettyPrinter()
nextPageToken = ''
for x in range(1):
#while True:
request = youtube.search().list(
q='star wars',
part='id,snippet',
maxResults=3,
order="viewCount",
pageToken=nextPageToken,
type='video')
print(type(request))
res = request.execute()
pp.pprint(res)
if 'nextPageToken' in res:
nextPageToken = res['nextPageToken']
# else:
# break
ids = [item['id']['videoId'] for item in res['items']]
results = youtube.videos().list(id=ids, part='snippet').execute()
for result in results.get('items', []):
print(result ['id'])
print(result ['snippet']['channelTitle'])
print(result ['snippet']['title'])
print(result ['snippet']['description'])
Pandas Code:
data = {'Channel Title': [result['snippet']['channelTitle']],
'Title': [result['snippet']['title']],
'Description': [result['snippet']['description']]
}
df = pd.DataFrame(data,
columns = ['Channel Title', 'Title', 'Description'],
)
#df3 = pd.concat([df], ignore_index = True)
#df3.reset_index()
df.head()
#print(df3)

IIUC~
This:
data = {'Channel Title': [result['snippet']['channelTitle']],
'Title': [result['snippet']['title']],
'Description': [result['snippet']['description']]
}
Should be:
data = {'Channel Title': [result['snippet']['channelTitle'] for result in results['items']],
'Title': [result['snippet']['title'] for result in results['items']],
'Description': [result['snippet']['description'] for result in results['items']]
}
Otherwise you're just using result from the last iteration of your for-loop....

Processing API data (json) into a singular data frame (list of list of dictionaries)?

So this is a somewhat of a continuation from a previous post of mine except now I have API data to work with. I am trying to get keys Type and Email as columns in a data frame to come up with a final number. My code:
jsp_full=[]
for p in payloads:
payload = {"payload": {"segmentId":p}}
r = requests.post(url,headers = header, json = payload)
#print(r, r.reason)
time.sleep(r.elapsed.total_seconds())
json_data = r.json() if r and r.status_code == 200 else None
json_keys = json_data['payload']['supporters']
json_package = []
jsp_full.append(json_package)
for row in json_keys:
SID = row['supporterId']
Handle = row['contacts']
a_key = 'value'
list_values = [a_list[a_key] for a_list in Handle]
string = str(list_values).split(",")
data = {
'SupporterID' : SID,
'Email' : strip_characters(string[-1]),
'Type' : labels(p)
}
json_package.append(data)
t2 = round(time.perf_counter(),2)
b_key = "Email"
e = len([b_list[b_key] for b_list in json_package])
t = str(labels(p))
#print(json_package)
print(f'There are {e} emails in the {t} segment')
print(f'Finished in {t2 - t1} seconds')
excel = pd.DataFrame(json_package)
excel.to_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(t, str(today)), sheet_name=t)
This part works all well and good. Each payload in the API represents a different segment of people so I split them out into different files. However, I am at a point where I need to combine all records into a single data frame hence why I append out to jsp_full. This is a list of a list of dictionaries.
Once I have that I would run the balance of my code which is like this:
S= pd.DataFrame(jsp_full[0], index = {0})
Advocacy_Supporters = S.sort_values("Type").groupby("Type", as_index=False)["Email"].first()
print(Advocacy_Supporters['Email'].count())
print("The number of Unique Advocacy Supporters is :")
Advocacy_Supporters_Group = Advocacy_Supporters.groupby("Type")["Email"].nunique()
print(Advocacy_Supporters_Group)
Some sample data:
[{'SupporterID': '565f6a2f-c7fd-4f1b-bac2-e33976ef4306', 'Email': 'somebody#somewhere.edu', 'Type': 'd_Student Ambassadors'}, {'SupporterID': '7508dc12-7647-4e95-a8b8-bcb067861faf', 'Email': 'someoneelse#email.somewhere.edu', 'Type': 'd_Student Ambassadors'},...`
My desired output is a dataframe that looks like so:
SupporterID Email Type
565f6a2f-c7fd-4f1b-bac2-e33976ef4306 somebody#somewhere.edu d_Student Ambassadors
7508dc12-7647-4e95-a8b8-bcb067861faf someoneelse#email.somewhere.edu d_Student Ambassadors
Any help is greatly appreciated!!

So because this code creates an excel file for each segment, all I did was read back in the excels via a for loop like so:
filesnames = ['e_S Donors', 'b_Contributors', 'c_Activists', 'd_Student Ambassadors', 'a_Volunteers', 'f_Offline Action Takers']
S= pd.DataFrame()
for i in filesnames:
data = pd.read_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(i, str(today)),sheet_name= i, engine = 'openpyxl')
S= S.append(data)
This did the trick since it was in a format I already wanted.

Convert pandas dataframe to .hyper extract

I have an SQL output in a pandas dataframe, that I would like to first convert to a .hyper Tableau extract, and then publish to Tableau server via the Extract API. When I run my code(below), I get the error: 'module' object is not callable for tdefile = tableausdk.HyperExtract(outfilename). I believe my code is correct, but maybe modules were installed incorrectly? Has anyone seen this error?
print("Importing modules...")
import pandas as pd
import pyodbc
import re
import numpy as np
import cx_Oracle
import smtplib
import schedule
import time
import win32com.client as win32
import tableauserverclient as TSC
import os
import tableausdk
from pandleau import *
from tableausdk import *
from tableausdk.HyperExtract import *
print("Done importing modules.")
server = x
db = y
conn_sql = pyodbc.connect(#fill in your connection data)
### sql query - change from getdate() - 4 to TD# ##
sql_1 = """
select
* from test
"""
df = pd.read_sql_query(sql_1, conn_sql)
df.head()
def job(df, outfilename):
if os.path.isfile(outfilename):
os.remove(outfilename)
os.remove('DataExtract.log')
try:
tdefile = tableausdk.HyperExtract(outfilename)
except:
#os.remove(outfilename)
os.system('del ' + outfilename)
os.system('del DataExtract.log')
tdefile = tableausdk.HyperExtract(outfilename)
# define the table definition
tableDef = tableausdk.TableDefinition()
# create a list of column names
colnames = df.columns
# create a list of column types
coltypes = df.dtypes
# create a dict for the field maps
# Define type maps
# Caveat: I am not including all of the possibilities here
fieldMap = {
'float64' : tde.Types.Type.DOUBLE,
'float32' : tde.Types.Type.DOUBLE,
'int64' : tde.Types.Type.DOUBLE,
'int32' : tde.Types.Type.DOUBLE,
'object': tde.Types.Type.UNICODE_STRING,
'bool' : tde.Types.Type.BOOLEAN,
'datetime64[ns]': tde.Types.Type.DATE,
}
# for each column, add the appropriate info the Table Definition
for i in range(0, len(colnames)):
cname = colnames[i] #header of column
coltype = coltypes[i] #pandas data type of column
ctype = fieldMap.get(str(coltype)) #get integer field type in Tableau Speak
tableDef.addColumn(cname, ctype)
# add the data to the table
with tdefile as extract:
table = extract.addTable("Extract", tableDef)
for r in range(0, df.shape[0]):
row = tde.Row(tableDef)
for c in range(0, len(coltypes)):
if df.iloc[r,c] is None:
row.setNull(c)
elif str(coltypes[c]) in ('float64', 'float32', 'int64', 'int32'):
try:
row.setDouble(c, df.iloc[r,c])
except:
row.setNull(c)
elif str(coltypes[c]) == 'object':
try:
row.setString(c, df.iloc[r,c])
except:
row.setNull(c)
elif str(coltypes[c]) == 'bool':
row.setBoolean(c, df.iloc[r,c])
elif str(coltypes[c]) == 'datetime64[ns]':
try:
row.setDate(c, df.iloc[r,c].year, df.iloc[r,c].month, df.iloc[r,c].day )
except:
row.setNull
else:
row.setNull(c)
# insert the row
table.insert(row)
tdefile.close()
#df_tableau = pandleau(df_1)
#df_tableau.set_spatial('SpatialDest', indicator=True)
#df_tableau.to_tableau('test.hyper', add_index=False)
job(df, 'test_1.hyper')

Convert Yahoo Finance List to Dataframe

import pandas as pd
import urllib
import time
import sys
baseurl = "https://query.yahooapis.com/v1/public/yql?"
yql_bs_query = 'select * from yahoo.finance.historicaldata where symbol = "YHOO" and startDate = "2009-09-11" and endDate = "2010-03-10"'
yql_bs_url = baseurl + urllib.parse.urlencode({'q':yql_bs_query}) + "&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback="
bs_json = pd.io.json.read_json(yql_bs_url)
bs_json.values
YHOO = bs_json.values.tolist()
Not able to convert this list in dataframe.

It is converting to a DataFrame but the frame has only 1 column and 5 rows since the form of the JSON is:
{u'query': {u'count': 124,
u'created': u'2017-01-26T05:44:52Z',
u'diagnostics': {u'build-version': u'2.0.84',
...
You just need to download the JSON separately, index in to get the quote data, and then convert that to a DataFrame:
# same code as above here:
import pandas as pd
import urllib
import time
import sys
baseurl = "https://query.yahooapis.com/v1/public/yql?"
yql_bs_query = 'select * from yahoo.finance.historicaldata where symbol = "YHOO" and startDate = "2009-09-11" and endDate = "2010-03-10"'
yql_bs_url = baseurl + urllib.parse.urlencode({'q':yql_bs_query}) + "&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback="
# now that you have the URL:
import requests
# download json data and convert to dict
data = requests.get(yql_bs_url).json()
# get quote data
quote = data["query"]["results"]["quote"]
# convert to dataframe
quote = pd.DataFrame.from_dict(quote)

Selecting values from a JSON file in Python

I am getting JIRA data using the following python code,
how do I store the response for more than one key (my example shows only one KEY but in general I get lot of data) and print only the values corresponding to total,key, customfield_12830, summary
import requests
import json
import logging
import datetime
import base64
import urllib
serverURL = 'https://jira-stability-tools.company.com/jira'
user = 'username'
password = 'password'
query = 'project = PROJECTNAME AND "Build Info" ~ BUILDNAME AND assignee=ASSIGNEENAME'
jql = '/rest/api/2/search?jql=%s' % urllib.quote(query)
response = requests.get(serverURL + jql,verify=False,auth=(user, password))
print response.json()
response.json() OUTPUT:-
http://pastebin.com/h8R4QMgB

From the the link you pasted to pastebin and from the json that I saw, its a you issues as list containing key, fields(which holds custom fields), self, id, expand.
You can simply iterate through this response and extract values for keys you want. You can go like.
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = {
'key': issue['key'],
'customfield': issue['fields']['customfield_12830'],
'total': issue['fields']['progress']['total']
}
x.append(temp)
print(x)
x is list of dictionaries containing the data for fields you mentioned. Let me know if I have been unclear somewhere or what I have given is not what you are looking for.
PS: It is always advisable to use dict.get('keyname', None) to get values as you can always put a default value if key is not found. For this solution I didn't do it as I just wanted to provide approach.
Update: In the comments you(OP) mentioned that it gives attributerror.Try this code
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = dict()
key = issue.get('key', None)
if key:
temp['key'] = key
fields = issue.get('fields', None)
if fields:
customfield = fields.get('customfield_12830', None)
temp['customfield'] = customfield
progress = fields.get('progress', None)
if progress:
total = progress.get('total', None)
temp['total'] = total
x.append(temp)
print(x)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting Flattened JSON to Dataframe in Python 2.7 - python

Related

How to fix number of returned rows in Pandas?

Processing API data (json) into a singular data frame (list of list of dictionaries)?

Convert pandas dataframe to .hyper extract

Convert Yahoo Finance List to Dataframe

Selecting values from a JSON file in Python

Categories

Resources