combining two pandas dataframes as single json output - python

I'm trying to combine two pandas dataframes into a single JSON output.
The json output below is the result from this code - df.to_json(orient = "split")
{
columns: [],
index: [],
data: [
[
"COMPANY ONE",
"123 HAPPY PLACE",
"GOTHAM CITY",
"NJ",
12345,
"US",
8675309,
"",
"",
"",
"",
""
],
[.....]
]
}
A little background, I get the data from a csv file, and usually I have to separate the file in two parts, one good and the other bad. I've been using pandas for this process, which is great. So df contains the good data and say dfbad contains the bad data.
I used df.to_json(orient = "split") to output the good data, which I really like the structure of it. Now I want to do the same thing for the bad data, same structure, so something like this:
[{good}, {bad}]
I apologize in advance if the example above is not clear.
I tried to this:
jsonify(good = df.to_json(orient = "split"), bad = dfbad.to_json(orient = "split"))
But i know this is not going to work because the result for good and bad are turned into a string; which I don't want, I want to be able to have access to it.
data_dict = {}
data_dict['bad'] = dfbad.to_dict()
data_dict['good'] = df.to_dict()
return pd.json.dumps(data_dict)
This returns fine as a json, but not the structure I want, the way .to_json(orient = "split") does, unless I have to customize it.
Can anybody help with this issue? or can pinpoint in another direction how to solve this issue.
Thanks in advance!
UPDATE:
I found the solution, here is what I did:
good_json = df.to_json(orient="split")
bad_json = dfbad.to_json(orient="split")
return jsonify(bad = json.loads(bad_json), good = json.loads(good_json))
I added json.loads, you have to import it - import json - and it's now returning as a JSON output. If you have other suggestions, please let me know. I'm open to learn more about Pandas.

Related

How to reorder Data from yahoo finance(Python)?

I'm trying to write down a python script that allow me to get some items of financial statement from Yahoo.I've tried with yahoofinancials library, but I can get only an entire page of data:
For istance,with this code:
from yahoofinancials import YahooFinancials
yahoo_financials = YahooFinancials('AAPL')
print(yahoo_financials.get_financial_stmts('annual', 'balance'))
I will get this:
{
"balanceSheetHistory": {
"AAPL": [
{
"2016-09-24": {
"otherCurrentLiab": 8080000000,
"otherCurrentAssets": 8283000000,
"goodWill": 5414000000,
"shortTermInvestments": 46671000000,
"longTermInvestments": 170430000000,
"cash": 20484000000,
"netTangibleAssets": 119629000000,
"totalAssets": 321686000000,
"otherLiab": 36074000000,
"totalStockholderEquity": 128249000000,
"inventory": 2132000000,
"retainedEarnings": 96364000000,
"intangibleAssets": 3206000000,
"totalCurrentAssets": 106869000000,
"otherStockholderEquity": 634000000,
"shortLongTermDebt": 11605000000,
"propertyPlantEquipment": 27010000000,
"deferredLongTermLiab": 2930000000,
"netReceivables": 29299000000,
"otherAssets": 8757000000,
"longTermDebt": 75427000000,
"totalLiab": 193437000000,
"commonStock": 31251000000,
"accountsPayable": 59321000000,
"totalCurrentLiabilities": 79006000000
}
}
]
}
}
I want to get every single element, such as "cash" and put it in a variable or an array with all these data,in order to get the single number.
So,for example, if I would get "cash",I would have a variable or an array/list that allow me to get the number(in this case 20484000000,for cash).
I hope I’ve made myself clear.
Someone knows how to do it?Thank you.
Since the output is in json format we must work with json.
from yahoofinancials import YahooFinancials
import json
yahoo_financials = YahooFinancials('AAPL')
w = yahoo_financials.get_financial_stmts('annual', 'balance')
print(w["balanceSheetHistory"]["AAPL"][2]['2019-09-28']['totalLiab'])
change 'totalLiab' to get desired data and to change '2019-09-28' you must also change [2].

Trouble when storing API data in Python list

I'm struggling with my json data that I get from an API. I've gone into several api urls to grab my data, and I've stored it in an empty list. I then want to take out all fields that say "reputation" and I'm only interested in that number. See my code here:
import json
import requests
f = requests.get('my_api_url')
if(f.ok):
data = json.loads(f.content)
url_list = [] #the list stores a number of urls that I want to request data from
for items in data:
url_list.append(items['details_url']) #grab the urls that I want to enter
total_url = [] #stores all data from all urls here
for index in range(len(url_list)):
url = requests.get(url_list[index])
if(url.ok):
url_data = json.loads(url.content)
total_url.append(url_data)
print(json.dumps(total_url, indent=2)) #only want to see if it's working
Thus far I'm happy and can enter all urls and get the data. It's in the next step I get trouble. The above code outputs the following json data for me:
[
[
{
"id": 316,
"name": "storabro",
"url": "https://storabro.net",
"customer": true,
"administrator": false,
"reputation": 568
}
],
[
{
"id": 541,
"name": "sega",
"url": "https://wedonthaveanyyet.com",
"customer": true,
"administrator": false,
"reputation": 45
},
{
"id": 90,
"name": "Villa",
"url": "https://brandvillas.co.uk",
"customer": true,
"administrator": false,
"reputation": 6
}
]
]
However, I only want to print out the reputation, and I cannot get it working. If I in my code instead use print(total_url['reputation']) it doesn't work and says "TypeError: list indices must be integers or slices, not str", and if I try:
for s in total_url:
print(s['reputation'])
I get the same TypeError.
Feels like I've tried everything but I can't find any answers on the web that can help me, but I understand I still have a lot to learn and that my error will be obvious to some people here. It seems very similar to other things I've done with Python, but this time I'm stuck. To clarify, I'm expecting an output similar to: [568, 45, 6]
Perhaps I used the wrong way to do this from the beginning and that's why it's not working all the way for me. Started to code with Python in October and it's still very new to me but I want to learn. Thank you all in advance!
It looks like your total_url is a list of lists, so you might write a function like:
def get_reputations(data):
for url in data:
for obj in url:
print(obj.get('reputation'))
get_reputations(total_url)
# output:
# 568
# 45
# 6
If you'd rather not work with a list of lists in the first place, you can extend the list with each result instead of append in the expression used to construct total_url
You can also use json.load and try to read the response
def get_rep():
response = urlopen(api_url)
r = response.read().decode('utf-8')
r_obj = json.loads(r)
for item in r_obj['response']:
print("Reputation: {}".format(item['reputation']))

Returning the minimum value in a JSON array

Good evening folks! I have been wracking my brain on this one for a good few hours now and could do with a little bit of a pointer in the right direction. I'm playing around with some API calls and trying to make a little project for myself.
The JSON data is stored in Arrays, and as such to get the information I want (it is from a Transport API) I have been making the following
x = apirequest
x = x.json
for i in range(0,4):
print(x['routes'][i]['duration'])
print(x['routes'][i]['departure_time'])
print(x['routes'][i]['arrival_time'])
This will return the following
06:58:00
23:39
06:37
05:08:00
05:14
10:22
03:41:00
05:30
09:11
03:47:00
06:24
10:11
What I am trying to do, is return only the shortest journeys - I could do it if it was a single layer JSON string but I am not too familliar with multi-level arrays. I can't return ['duration'] without utilising ['routes'] and route indicator (in this case 0 through 3 or 4).
I can use an if statement to iterate through them easily enough, but there must be a way to accomplish it directly through the JSON that I am missing. I also thought about adding the results to a separate array and then filtering that - but there is a few other fields I want to grab from the data when I've cracked this part.
What I am finding as I learn is that I tend to do things a long winded way, often finding out my 10-15 line solutions on codewars are actually aimed at being done in 2-3 lines.
Example JSON data
{
"request_time": "2018-05-29T19:03:04+01:00",
"source": "Traveline southeast journey planning API",
"acknowledgements": "Traveline southeast",
"routes": [{
"duration": "06:58:00",
"route_parts": [{
"mode": "foot",
"from_point_name": "Corunna Court, Wrexham",
"to_point_name": "Wrexham General Rail Station",
"destination": "",
"line_name": "",
"duration": "00:36:00",
"departure_time": "23:39",
"arrival_time": "00:15"
}]
}]
}
Hope you can help steer me in the right direction!
Here's one solution using datetime.timedelta. Data from #fferri.
from datetime import timedelta
x = {'routes': [{'duration':'06:58:00','departure_time':'23:39','arrival_time':'06:37'},
{'duration':'05:08:00','departure_time':'05:14','arrival_time':'10:22'},
{'duration':'03:41:00','departure_time':'05:30','arrival_time':'09:11'},
{'duration':'03:47:00','departure_time':'06:24','arrival_time':'10:11'}]}
def minimum_time(k):
h, m, s = map(int, x['routes'][k]['duration'].split(':'))
return timedelta(hours=h, minutes=m, seconds=s)
res = min(range(4), key=minimum_time) # 2
You can then access the appropriate sub-dictionary via x['routes'][res].
Using min() with a key argument to indicate which field should be used for finding the minimum value:
x={'routes':[
{'duration':'06:58:00','departure_time':'23:39','arrival_time':'06:37'},
{'duration':'05:08:00','departure_time':'05:14','arrival_time':'10:22'},
{'duration':'03:41:00','departure_time':'05:30','arrival_time':'09:11'},
{'duration':'03:47:00','departure_time':'06:24','arrival_time':'10:11'}
]}
best=min(x['routes'], key=lambda d: d['duration'])
# best={'duration': '03:41:00', 'departure_time': '05:30', 'arrival_time': '09:11'}
The min(iterable, key=...) function is what you are looking for:
x = { 'routes': [ {'dur':3, 'depart':1, 'arrive':4},
{'dur':2, 'depart':2, 'arrive':4}]}
min(x['routes'], key=lambda item: item['dur'])
Returns:
{'dur': 2, 'depart': 2, 'arrive': 4}
First, the fact that x is initialized from JSON isn't particularly relevant. It's a dict, and that's all that is important.
To answer your question, you just need the key attribute to min:
shortest = min(x['routes'], key=lambda d: d['duration'])

Json file to dictionary

I am using the yelp dataset and I want to parse the review json file to a dictionary. I tried loading it on a pandas DataFrame and then creating the dictionary, but because the file is too big it is time consuming. I want to keep only the user_id and stars values. A line of the json file looks like this:
{
"votes": {
"funny": 0, "useful": 2, "cool": 1},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17",
"text": (
"dr. goldberg offers everything i look for in a general practitioner. "
"he's nice and easy to talk to without being patronizing; he's always on "
"time in seeing his patients; he's affiliated with a top-notch hospital (nyu) "
"which my parents have explained to me is very important in case something "
"happens and you need surgery; and you can get referrals to see specialists "
"without having to see him first. really, what more do you need? i'm "
"sitting here trying to think of any complaints i have about him, but i'm "
"really drawing a blank."
),
"type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
How can i iterate over every 'field' (for the lack o a better word)? So far i can only iterate over each line.
EDIT
As requested pandas code :
reading the json
with open('yelp_academic_dataset_review.json') as f:
df = pd.DataFrame(json.loads(line) for line in f)
Creating the dictionary
dict = {}
for i, row in df.iterrows():
business_id = row['business_id']
user_id = row['user_id']
rating = row['stars']
key = (business_id, user_id)
dict[key] = rating
You don't need to read this into a DataFrame. json.load() returns a dictionary. For example:
sample.json
{
"votes": {
"funny": 0,
"useful": 2,
"cool": 1
},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg",
"stars": 5,
"date": "2007-05-17",
"text": "dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank.",
"type": "review",
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
read_json.py
import json
with open('sample.json', 'r') as fh:
result_dict = json.load(fh)
print(result_dict['user_id'])
print(result_dict['stars'])
output
Xqd0DzHaiyRqVH3WRG7hzg
5
With that output you can easily create a DataFrame.
There are several good discussions about parsing json as a stream on SO, but the gist is it's not possible natively, although some tools seem to attempt it.
In the interest of keeping your code simple and with minimal dependencies, you might see if reading the json directory into a dictionary is a sufficient improvement.

Python - Count JSON elements before extracting data

I use an API which gives me a JSON file structured like this:
{
offset: 0,
results: [
{
source_link: "http://www.example.com/1",
source_link/_title: "Title example 1",
source_link/_source: "/1",
source_link/_text: "Title example 1"
},
{
source_link: "http://www.example.com/2",
source_link/_title: "Title example 2",
source_link/_source: "/2",
source_link/_text: "Title example 2"
},
...
And I use this code in Python to extract the data I need:
import json
import urllib2
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
link = z['results'][1]['source_link']
title = z['results'][1]['source_link/_title']
The problem is that to use it I have to know the number of the element from which I'm extracting the data. My results can have different length every time, so what I want to do is to count the number of elements in results at first, so I would be able to set up a loop to extract data from each element.
To check the length of the results key:
len(z["results"])
But if you're just looping around them, a for loop is perfect:
for result in x["results"]:
print(result["source_link"])
You didn't need to know the length of the result, you are fine with a for loop:
for result in z['results']:
# process the results here
Anyway, if you want to know the length of 'results': len(z.results)
If you want to get the length, you can try:
len(z['result'])
But in python, what we usually do is:
for i in z['result']:
# do whatever you like with `i`
Hope this helps.
You don't need, or likely want, to count them in order to loop over them, you could do:
import json
import urllib2
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
for result in z['results']:
link = result['source_link']
title = result['source_link/_title']
# do something with link/title
Or you could do:
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
link = [result['source_link'] for result in z['results']]
title = [result['source_link/_title'] for result in z['results']]
# do something with links/titles lists
Few pointers:
No need to know results's length to iterate it. You can use for result in z['results'].
lists start from 0.
If you do need the index take a look at enumerate.
use this command to print the result on the terminal and then can check the number of results
print(len(z['results'][0]))

Categories