Exctract a value from a Json file(python) - python

Hi i'm not an expert and this problem kept me stuck for such a long time I hope that someone here can help me
i would like to exctract the value "interestExpense" from the following json file:
{'incomeBeforeTax': 17780000000,
'minorityInterest': 103000000,
'netIncome': 17937000000,
'sellingGeneralAdministrative': 5918000000,
'grossProfit': 16507000000,
'ebit': 10589000000,
'endDate': 1640908800,
'operatingIncome': 10589000000,
'interestExpense': -1803000000,
'incomeTaxExpense': -130000000,
'totalRevenue': 136341000000,
'totalOperatingExpenses': 125752000000,
'costOfRevenue': 119834000000,
'totalOtherIncomeExpenseNet': 7191000000,
'netIncomeFromContinuingOps': 17910000000,
'netIncomeApplicableToCommonShares': 17937000000}
In this case the result should be -130000000 as a string but i m trying to find a way to create an list(or an array) with all those floats so that i can decide which one to pick, i have no idea how to manipulate this kind of data(json)
For example
print(list[0])
should return 17780000000(the value associated with incomeBeforeTax)
is this actually possible?
The output is generated from this code:
annual_is_stms=[]
url_financials ='https://finance.yahoo.com/quote/{}/financials?p{}'
stock= 'F'
response = requests.get(url_financials.format(stock,stock),headers=headers)
soup = BeautifulSoup(response.text,'html.parser')
pattern = re.compile(r'\s--\sData\s--\s')
script_data = soup.find('script',text=pattern).contents[0]
script_data[:500]
script_data[-500:]
start = script_data.find("context")-2
json_data =json.loads(script_data[start:-12])
json_data['context']['dispatcher']['stores']['QuoteSummaryStore'].keys()
#all data relative financials
annual_is=json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistory']['incomeStatementHistory']
for s in annual_is:
statement = {}
for key, val in s.items():
try:
statement[key] = val['raw']
except TypeError:
continue
except KeyError:
continue
annual_is_stms.append(statement)
print(annual_is_stms[0])

If you are using python, you need to include the json module and parse it as an object:
import json
# some JSON:
x = '{ "name":"John", "age":30, "city":"New York"}'
# parse x:
y = json.loads(x)
# the result is a Python dictionary:
print(y["age"])
Regards
L.

Ok, so the output snippet you posted comes from this line:
print(annual_is_stms[0])
If you now want the: -1803000000 you should do:
print(annual_is_stms[0]['interestExpense'])
If you want the: -130000000 you should do:
print(annual_is_stms[0]['incomeTaxExpense'])
and if you want the: 17780000000 you should do:
print(annual_is_stms[0]['incomeBeforeTax'])

Copy and paste this into Python.
data = {'incomeBeforeTax': 17780000000,
'minorityInterest': 103000000,
'netIncome': 17937000000,
'sellingGeneralAdministrative': 5918000000,
'grossProfit': 16507000000,
'ebit': 10589000000,
'endDate': 1640908800,
'operatingIncome': 10589000000,
'interestExpense': -1803000000,
'incomeTaxExpense': -130000000,
'totalRevenue': 136341000000,
'totalOperatingExpenses': 125752000000,
'costOfRevenue': 119834000000,
'totalOtherIncomeExpenseNet': 7191000000,
'netIncomeFromContinuingOps': 17910000000,
'netIncomeApplicableToCommonShares': 17937000000}
print(data['interestExpense'])

Related

Json from url - array in array

I want take data from API from polish http by json format. But, I have problem take data from array in array.
From "normal" json I can took data, but this json have struture as 'krs_podmioty.id' => 'blabla' <= I have problem with . (dot) and array in array.
I try get data from https://api-v3.mojepanstwo.pl/dane/krs_podmioty/10186.json?layers[]=dzialalnosci&layers[]=reprezentacja.
You can decode on: http://freeonlinetools24.com/json-decode (and past text from http).
It's public website and data.
If you will look it, I want data from segment:
'krs_podmioty'.person_id' => array ( 0 => '14439' .... 11 => '1233301' )
import urllib.request
import json
res = urllib.request.urlopen('https://api-v3.mojepanstwo.pl/dane/krs_podmioty/10186.json?layers[]=dzialalnosci&layers[]=reprezentacja')
res_body = res.read()
j = json.loads(res_body.decode("utf-8"))
for item in j['data']:
ucmdbId = (item['krs_podmioty'])
print('Id podmioty: '.format(ucmdbId))
exit(0)
In perfect situation I need print list of all "krs_podmioty.person_id"
Thank you very much!
import requests
import json
result = requests.get('https://api-v3.mojepanstwo.pl/dane/krs_podmioty/10186.json?layers[]=dzialalnosci&layers[]=reprezentacja').json()
ids = result['data']['krs_podmioty.person_id']
for id in ids:
print('Id podmioty: ' + id)
Try this:
for item in j['data']['krs_podmioty.person_id']:
ucmdbId = item
print('Id podmioty: {0} '.format(ucmdbId))
j['data'] contained all of the objects in the 'data' array in which you could call for the krs_podmioty.person_id key to get its corresponding value array.

JSON.LOADS is picking only 2 resultset

I am trying to use JSON to search through googlemapapi. So, I give location "Plymouth" - in googlemapapi it is showing 6 resultset but when I try to parse in Json, I am getting length of only 2. I tried with multiple cities too, but all I am getting is resultset of 2 rather.
What is wrong below?
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1:
location = js1["results"][count]["formatted_address"]
lat = js1["results"][count]["geometry"]["location"]["lat"]
lng = js1["results"][count]["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)
Simply replace for result in js1: with for result in js1['results']:
By the way, as posted in a comment in the question, no need to use a counter. You can rewrite your for loop as:
for result in js1['results']:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
print('lat',lat,'lng',lng)
print(location)
If you look at the json that comes in, you'll see that its a single dict with two items ("results" and "status"). Add print('result:', result) to the top of your for loop and it will print result: status and result: results because all you are iterating the the keys of that outer dict. That's a general debugging trick in python... if you aren't getting the stuff you want, put in a print statement to see what you got.
The results (not surprisingly) and in a list under js1["results"]. In your for loop, you ignore the variable you are iterating and go back to the original js1 for its data. This is unnecessary and in your case, it hid the error. Had you tried to reference cities off of result you would gotten an error and it may have been easier to see that result was "status", not the array you were after.
Now a few tweaks fix the problem
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1["results"]:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)

for loop adding same value together and make JSON format

test=[]
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()
people = site.xpath("//p[#class='poeple']/text()").extract()
test.append('{"money":'+str(money[0])+',"people":'+str(people[0])+'}')
My result test is:
['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
I was stuck by two thing:
One is I print the type of test is string,so is not like JSON format
Two is the money value with 1 is duplicate,so I need to add the people together ,
so the final format I want is:
[
{"money":1,"people":77},
{"money":3,"people":21},
{"money":12,"people":82},
]
How can I do this??
I'd collect money entries in a dict and add up the people as values, the output to json should be done using a json library indeed (I've not tested the code but it should give you an idea how you can approach the problem):
money_map = {}
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()[0]
people = int(site.xpath("//p[#class='poeple']/text()").extract()[0])
if money not in money_map:
money_map[money] = 0
money_map[money] += people
import json
output = [{'money': key, 'people': value} for key, value in money_map.items()]
json_output = json.dumps(output)
basically this:
import json
foo = ['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
bar = []
for i in foo:
j = json.loads(i) # string to json/dict
# if j['money'] is not in bar:
bar.append(j)
# else:
# find index of duplicate and add j['people']
Above is incomplete solution, you have to implement the 'duplicate check and add'

Python - Count JSON elements before extracting data

I use an API which gives me a JSON file structured like this:
{
offset: 0,
results: [
{
source_link: "http://www.example.com/1",
source_link/_title: "Title example 1",
source_link/_source: "/1",
source_link/_text: "Title example 1"
},
{
source_link: "http://www.example.com/2",
source_link/_title: "Title example 2",
source_link/_source: "/2",
source_link/_text: "Title example 2"
},
...
And I use this code in Python to extract the data I need:
import json
import urllib2
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
link = z['results'][1]['source_link']
title = z['results'][1]['source_link/_title']
The problem is that to use it I have to know the number of the element from which I'm extracting the data. My results can have different length every time, so what I want to do is to count the number of elements in results at first, so I would be able to set up a loop to extract data from each element.
To check the length of the results key:
len(z["results"])
But if you're just looping around them, a for loop is perfect:
for result in x["results"]:
print(result["source_link"])
You didn't need to know the length of the result, you are fine with a for loop:
for result in z['results']:
# process the results here
Anyway, if you want to know the length of 'results': len(z.results)
If you want to get the length, you can try:
len(z['result'])
But in python, what we usually do is:
for i in z['result']:
# do whatever you like with `i`
Hope this helps.
You don't need, or likely want, to count them in order to loop over them, you could do:
import json
import urllib2
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
for result in z['results']:
link = result['source_link']
title = result['source_link/_title']
# do something with link/title
Or you could do:
u = urllib2.urlopen('myapiurl')
z = json.load(u)
u.close
link = [result['source_link'] for result in z['results']]
title = [result['source_link/_title'] for result in z['results']]
# do something with links/titles lists
Few pointers:
No need to know results's length to iterate it. You can use for result in z['results'].
lists start from 0.
If you do need the index take a look at enumerate.
use this command to print the result on the terminal and then can check the number of results
print(len(z['results'][0]))

Parsing Multiple json elements in python

I'm trying to build a small script that will go through the Etsy API and retrieve certain information. The API returns 25 different listing all in json and I would appreciate it if someone could help me learn how to handle one at a time.
Here is an example of the json I'm dealing with:
{"count":50100,"results":[{"listing_id":114179207,"state":"active"},{"listing_id":11344567,"state":"active"},
and so on.
Is there a simple way to handle only one of these listings at a time to minimize the amount of calls I must make to the API?
Here is some of the code of how I'm dealing with just one when I limit the results returned to 1:
r = requests.get('http://openapi.etsy.com/v2/listings/active?api_key=key&limit=1&offset='+str(offset_param)+'&category=Clothing')
raw_json = r.json()
encoded_json = json.dumps(raw_json)
dataObject = json.loads(encoded_json)
if dataObject["results"][0]["quantity"] > 1:
if dataObject["results"][0]["listing_id"] not in already_done:
already_done.append(dataObject["results"][0]["listing_id"])
s = requests.get('http://openapi.etsy.com/v2/users/'+str(dataObject["results"][0]["user_id"])+'/profile?api_key=key')
raw_json2 = s.json()
encoded_json2 = json.dumps(raw_json2)
dataObject2 = json.loads(encoded_json2)
t = requests.get('http://openapi.etsy.com/v2/users/'+str(dataObject["results"][0]["user_id"])+'?api_key=key')
raw_json3 = t.json()
encoded_json3 = json.dumps(raw_json3)
dataObject3 = json.loads(encoded_json3)
Seeing how the results field (or key) contains a list structure, you can simply iterate it through like the following
json_str = { ...other key-values, "results": [{"listing_id":114179207,"state":"active"},{"listing_id":11344567,"state":"active"}, ...and so on] }
results = json_str['results'] # this gives you a list of dicts
# iterate through this list
for result in results:
if result['state'] == 'active':
do_something_with( result['listing_id']
else:
do_someotherthing_with( result['listing_id'] # or none at all

Categories