convert json to dataframe in for loops in python - python

I'm trying to call the data using api and making a dataframe using for loops with returned json. I am able to create the first dataframe but my for loop only returns the first json -> dataframe. After a few days struggle, I decided to ask guidance from experts here..
import requests
import json
import pandas as pd
# create an Empty DataFrame object
df = pd.DataFrame()
# api header
headers = {"Accept": "application/json","Authorization": "api_secret"}
#email for loops
email_list = ["abc#gmail.com", "xyz#gmail.com"]
#supposed to read 2 emails in the list and append each df but only reads the first one...#
for i in email_list:
querystring = {"where":i}
response = requests.request("GET", "https://example.com/api/2.0/export", headers=headers, params=querystring)
with open('test.jsonl', 'w') as writefile:
writefile.write(response.text)
data = [json.loads(line) for line in open('test.jsonl', 'r')]
FIELDS = ["event"]
df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
I wonder if I need to change something in df append but I can't pinpoint where needs to be changed. thank you so much in advance!

df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
overwrites the dataframe each time instead, create a new one before appending:
df2 = pd.json_normalize(data)[FIELDS]
df = df.append(df2)

Related

How do I get the groupby operation to work?

Can't get Pandas Groupby operation to work.
I suspect I need to convert the data to a pandas dataframe first? However, I can't seem to get that to work either.
import requests
import json
import pandas as pd
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
print(data)
def get_number_of_jobs(technology):
number_of_jobs = 0
number_of_jobs=data.groupby('technology').sum().loc[technology,:][0]
return technology,number_of_jobs
print(get_number_of_jobs('python'))
Thanks
data is a list of dictionaries, not DataFrame, so it doesn't have groupby. You don't really need it anyway, you can create the DataFrame while replacing the A and B columns with the first values in the json response and search for 'Python' there, it's already a single entry
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
df = pd.DataFrame(columns=list(data[0].values()), data=[d.values() for d in data[1:]])
number_of_jobs = df.loc[df['technology'] == 'Python', 'number of job posting'].iloc[0]
print(number_of_jobs) # 51

How to create a dataframe from urlopen (csv)

My code:
# parse json returned from the API to Pandas DF
openUrl = urlopen(url)
r = openUrl.read()
openUrl.close()
#d = json.loads(r.decode())
#df = pd.DataFrame(d, index=[0])
df = pd.DataFrame(r, index=[0])
The error:
ValueError: DataFrame constructor not properly called!
Help would be aprreacited.
The DataFrame constructor requires an nd-array like input (or dict, iterable).
You can use pandas.read_csv if you want to directly input a csv and get a DataFrame.
Try printing r to see what is actually inside the response.
pandas.read_csv has a lot of option parameters to handle different types of csv, which of course depends on what you're getting from the url.
This snippet might help you.
import urllib.request
import pandas as pd
r = urllib.request.urlopen('HERE GOES YOUR LINK')
x = r.read()
print(type(x))
y = str(x)
df = pd.DataFrame([y], columns=['string_values'])
print (df)

How to create a loop that appends new rows to CSV each time it loops?

I'm working with an API in and trying to pull out a complete list of surveys by looping through every users API token. My idea for the loop is that it reads each API token (stored in a list) one at a time, stores the data, converts it to a pandas dataframe and then stores the data in a CSV file. I've so far created a script that successfully loops through the list of API tokens but just overwrites the CSV file each time. Here is my script currently:
apiToken = ["n0000000000001", "N0000000002"]
for x in apiToken:
baseUrl = "https://group.qualtrics.com/API/v3/surveys"
headers = {
"x-api-token": x,
}
response = requests.get(baseUrl, headers=headers)
surveys = response.text
surveys2 = json.loads(response.text)
surveys3 = surveys2["result"]["elements"]
df = pd.DataFrame(surveys3)
df.to_csv('survey_list.csv', index=False)
What changes do I need to make in order for the new rows to be appended on to the CSV rather than overwrite it?
Assuming your code is right, this should work. You make a list of the separate dataframes, and concat them together. This works if all the DataFrames in the list of DataFrames (dfs) have the same column names.
apiToken = ["n0000000000001", "N0000000002"]
dfs = []
for x in apiToken:
baseUrl = "https://group.qualtrics.com/API/v3/surveys"
headers = {
"x-api-token": x,
}
response = requests.get(baseUrl, headers=headers)
surveys = response.text
surveys2 = json.loads(response.text)
surveys3 = surveys2["result"]["elements"]
dfs.append(pd.DataFrame(surveys3))
final_df = pd.concat(dfs)
final_df .to_csv('survey_list.csv', index=False)
Using the csv package you can append files to your csv as follows:
import csv
...
with open(filename,'a') as filecsv:
writer = csv.writer(filecsv)
line = # whatever csv line you want to append
writer.writerow(line)

Variable containing list of values to be saved in an excel using python

I'm new to python and trying my luck,
I have a Json to extract particular items and those items will be saved in variables and using FOR loop i'm displaying the entire json data as a output.
Basically, I want the entire output console in an excel with the help of dataframe(Panda) or if there is alternative way much appreciable.
import pandas as pd
import json
with open('i4.json', encoding = 'utf-8-sig') as f:
data = json.load(f)
for ib in data['documents']:
tit = ib['title']
stat = ib['status']
print(tit, stat)
df = pd.DataFrame({'Title' : [tit], 'Status' : [stat]})
df.to_excel('fromSIM.xls', index= False)
Output is: (Ex:)
title1 pass
title2 fail
The problem with excel is:
Am getting the excel saved as below,
Title Status
title2 fail
Anyone can en-light the above code to make all the output to be saved in the excel below each values one by one
The problem is that you are overwriting the data frame in each loop iteration. You should create the data frame out of the for, and thus only append the new rows in the DF inside the for.
import pandas as pd
columns = ['Title', 'Status']
df_ = pd.DataFrame( columns=columns)
for ib in data['documents']:
tit = ib['title']
stat = ib['status']
print(tit, stat)
df_ = df_.append(pd.Series([tit,stat], index=df_.columns ), ignore_index=True)
df_.to_excel('fromSIM.xls', index= False)

Python loop through URL's

I'm attempting take dates from a dataframe and loop through them within URL's. I've managed to print the URL's (1st code), but when I attempt to turn the URL's json into a dataframe (2nd code) I get this response.
AttributeError: 'str' object has no attribute 'json'
#1st code
import requests
import pandas as pd
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
print(url)
Any ideas on what I'm doing wrong?
#2nd code
import requests
import csv
import pandas as pd
url_template = "https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom={row}&DateTo={row}&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=Totals&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=SpeedDistance&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
stats = url.json()['resultSets'][0]['rowSet']
headers = url.json()['resultSets'][0]['headers']
stats_df = pd.DataFrame(stats, columns=headers)
# Append to the big dataframe
lineup_df = lineup_df.append(stats_df, ignore_index=True)
lineup_df.to_csv("Stats.csv")
I think you forgot to request the URL. you should send a request and if the response is a json, you should parse it

Categories