test=[]
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()
people = site.xpath("//p[#class='poeple']/text()").extract()
test.append('{"money":'+str(money[0])+',"people":'+str(people[0])+'}')
My result test is:
['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
I was stuck by two thing:
One is I print the type of test is string,so is not like JSON format
Two is the money value with 1 is duplicate,so I need to add the people together ,
so the final format I want is:
[
{"money":1,"people":77},
{"money":3,"people":21},
{"money":12,"people":82},
]
How can I do this??
I'd collect money entries in a dict and add up the people as values, the output to json should be done using a json library indeed (I've not tested the code but it should give you an idea how you can approach the problem):
money_map = {}
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()[0]
people = int(site.xpath("//p[#class='poeple']/text()").extract()[0])
if money not in money_map:
money_map[money] = 0
money_map[money] += people
import json
output = [{'money': key, 'people': value} for key, value in money_map.items()]
json_output = json.dumps(output)
basically this:
import json
foo = ['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
bar = []
for i in foo:
j = json.loads(i) # string to json/dict
# if j['money'] is not in bar:
bar.append(j)
# else:
# find index of duplicate and add j['people']
Above is incomplete solution, you have to implement the 'duplicate check and add'
Related
Hi i'm not an expert and this problem kept me stuck for such a long time I hope that someone here can help me
i would like to exctract the value "interestExpense" from the following json file:
{'incomeBeforeTax': 17780000000,
'minorityInterest': 103000000,
'netIncome': 17937000000,
'sellingGeneralAdministrative': 5918000000,
'grossProfit': 16507000000,
'ebit': 10589000000,
'endDate': 1640908800,
'operatingIncome': 10589000000,
'interestExpense': -1803000000,
'incomeTaxExpense': -130000000,
'totalRevenue': 136341000000,
'totalOperatingExpenses': 125752000000,
'costOfRevenue': 119834000000,
'totalOtherIncomeExpenseNet': 7191000000,
'netIncomeFromContinuingOps': 17910000000,
'netIncomeApplicableToCommonShares': 17937000000}
In this case the result should be -130000000 as a string but i m trying to find a way to create an list(or an array) with all those floats so that i can decide which one to pick, i have no idea how to manipulate this kind of data(json)
For example
print(list[0])
should return 17780000000(the value associated with incomeBeforeTax)
is this actually possible?
The output is generated from this code:
annual_is_stms=[]
url_financials ='https://finance.yahoo.com/quote/{}/financials?p{}'
stock= 'F'
response = requests.get(url_financials.format(stock,stock),headers=headers)
soup = BeautifulSoup(response.text,'html.parser')
pattern = re.compile(r'\s--\sData\s--\s')
script_data = soup.find('script',text=pattern).contents[0]
script_data[:500]
script_data[-500:]
start = script_data.find("context")-2
json_data =json.loads(script_data[start:-12])
json_data['context']['dispatcher']['stores']['QuoteSummaryStore'].keys()
#all data relative financials
annual_is=json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistory']['incomeStatementHistory']
for s in annual_is:
statement = {}
for key, val in s.items():
try:
statement[key] = val['raw']
except TypeError:
continue
except KeyError:
continue
annual_is_stms.append(statement)
print(annual_is_stms[0])
If you are using python, you need to include the json module and parse it as an object:
import json
# some JSON:
x = '{ "name":"John", "age":30, "city":"New York"}'
# parse x:
y = json.loads(x)
# the result is a Python dictionary:
print(y["age"])
Regards
L.
Ok, so the output snippet you posted comes from this line:
print(annual_is_stms[0])
If you now want the: -1803000000 you should do:
print(annual_is_stms[0]['interestExpense'])
If you want the: -130000000 you should do:
print(annual_is_stms[0]['incomeTaxExpense'])
and if you want the: 17780000000 you should do:
print(annual_is_stms[0]['incomeBeforeTax'])
Copy and paste this into Python.
data = {'incomeBeforeTax': 17780000000,
'minorityInterest': 103000000,
'netIncome': 17937000000,
'sellingGeneralAdministrative': 5918000000,
'grossProfit': 16507000000,
'ebit': 10589000000,
'endDate': 1640908800,
'operatingIncome': 10589000000,
'interestExpense': -1803000000,
'incomeTaxExpense': -130000000,
'totalRevenue': 136341000000,
'totalOperatingExpenses': 125752000000,
'costOfRevenue': 119834000000,
'totalOtherIncomeExpenseNet': 7191000000,
'netIncomeFromContinuingOps': 17910000000,
'netIncomeApplicableToCommonShares': 17937000000}
print(data['interestExpense'])
I'm working with an api that gives me 61 items that I include in a discord embed in a for loop.
As all of this is planned to be included into a discord bot using pagination from DiscordUtils, I need to make it so it male an embed for each 10 entry to avoid a too long message / 2000 character message.
Currently what I use to do my loop is here: https://api.nepmia.fr/spc/ (I recomend the usage of a parsing extention for your browser or it will be a bit hard to read it)
But what I want to create is something that will look like that : https://api.nepmia.fr/spc/formated/
So I can iterate each range in a different embed and then use pagination.
I use TinyDB to generate the JSON files I shown before with this script:
import urllib.request, json
from shutil import copyfile
from termcolor import colored
from tinydb import TinyDB, Query
db = TinyDB("/home/nepmia/Myazu/db/db.json")
def api_get():
print(colored("[Myazu]","cyan"), colored("Fetching WynncraftAPI...", "white"))
try:
with urllib.request.urlopen("https://api.wynncraft.com/public_api.php?action=guildStats&command=Spectral%20Cabbage") as u1:
api_1 = json.loads(u1.read().decode())
count = 0
if members := api_1.get("members"):
print(colored("[Myazu]","cyan"),
colored("Got expecteded answer, starting saving process.", "white"))
for member in members:
nick = member.get("name")
ur2 = f"https://api.wynncraft.com/v2/player/{nick}/stats"
u2 = urllib.request.urlopen(ur2)
api_2 = json.loads(u2.read().decode())
data = api_2.get("data")
for item in data:
meta = item.get("meta")
playtime = meta.get("playtime")
print(colored("[Myazu]","cyan"),
colored("Saving playtime for player", "white"),
colored(f"{nick}...","green"))
db.insert({"username": nick, "playtime": playtime})
count += 1
else:
print(colored("[Myazu]","cyan"),
colored("Unexpected answer from WynncraftAPI [ERROR 1]", "white"))
except:
print(colored("[Myazu]","cyan"),
colored("Unhandled error in saving process [ERROR 2]", "white"))
finally:
print(colored("[Myazu]","cyan"),
colored(f"Finished saving data for", "white"),
colored(f"{count}", "green"),
colored("players.", "white"))
but this will only create a range like this : https://api.nepmia.fr/spc/
what I would like is something like this : https://api.nepmia.fr/spc/formated/
Thanks for your help!
PS: Sorry for your eyes I'm still new to Python so I know I don't do stuff really properly :s
To follow up from the comments, you shouldn't store items in your database in a format that is specific to how you want to return results from the database to a different API, as it will make it more difficult to query in other contexts, among other reasons.
If you want to paginate items from a database it's better to do that when you query it.
According to the docs, you can iterate over all documents in a TinyDB database just by iterating directly over the DB like:
for doc in db:
...
For any iterable you can use the enumerate function to associate an index to each item like:
for idx, doc in enumerate(db):
...
If you want the indices to start with 1 as in your examples you would just use idx + 1.
Finally, to paginate the results, you need some function that can return items from an iterable in fixed-sized batches, such as one of the many solutions on this question or elsewhere. E.g. given a function chunked(iter, size) you could do:
pages = enumerate(chunked(enumerate(db), 10))
Then list(pages) gives a list of lists of tuples like [(page_num, [(player_num, player), ...].
The only difference between a list of lists and what you want is you seem to want a dictionary structure like
{'range1': {'1': {...}, '2': {...}, ...}, 'range2': {'11': {...}, ...}}
This is no different from a list of lists; the only difference is you're using dictionary keys to give numerical indices to each item in a collection, rather than the indices being implict in the list structure. There's many ways you can go from a list of lists to this. The easiest I think is using a (nested) dict comprehension:
{f'range{page_num + 1}': {str(player_num + 1): player for player_num, player in page}
for page_num, page in pages}
This will give output in exactly the format you want.
Thanks #Iguananaut for your precious help.
In the end I made something similar from your solution using a generator.
def chunker(seq, size):
for i in range(0, len(seq), size):
yield seq[i:i+size]
def embed_creator(embeds):
pages = []
current_page = None
for i, chunk in enumerate(chunker(embeds, 10)):
current_page = discord.Embed(
title=f'**SPC** Last week online time',
color=3903947)
for elt in chunk:
current_page.add_field(
name=elt.get("username"),
value=elt.get("play_output"),
inline=False)
current_page.set_footer(
icon_url="https://cdn.discordapp.com/icons/513160124219523086/a_3dc65aae06b2cf7bddcb3c33d7a5ecef.gif?size=128",
text=f"{i + 1} / {ceil(len(embeds) / 10)}"
)
pages.append(current_page)
current_page = None
return pages
Using embed_creator I generate a list named pages that I can simply use with DiscordUtils paginator.
Before saying I didn't search for an answer, I did and even if i'm not a Python expert, I didn't find any explicit answer.
For me to be clear, I'd like to extract 2 infos ("name" & "fame") from a specific "clan".
In the json file extracted, the info are on [items] then in [0] and 1 and 2 and [3] and [4]. In this dictionnary on [standings]. Then, my issue is in the next dictionnary, it can be or in [0] or 1 or 2 or [3] or [4]. I don't know how to filter, for exemple by using something like "filter with tag = #9VL9L9Y".
Here is my code:
data = json.loads(response)
for item in data ["items"]:
for p in item ["standings"]:
for q in p ["clan"]["participants"]:
if (p["clan"] = '#9VL9L9YQ'):
print("%s %s" % (
q["name"],
q["fame"],
))
I know my line "if (p["clan"] = '#9VL9L9YQ'):" is not correct but this is what i'd like to do.
How the JSON file looks like:
Thanks for your help !
Reorder the logic a bit:
data = json.loads(response)
for item in data ["items"]:
for p in item ["standings"]:
clan = p["clan"]
# check tag first:
if clan["tag"] == '#9VL9L9YQ': # remove extraneous )
for q in clan["participants"]:
print("%s %s" % (q["name"], ["fame"]))
There is a syntax error in your code, simply correct it:
Replace:
if (p["clan"] = '#9VL9L9YQ'):
With:
if (p["clan"] == '#9VL9L9YQ'):
Note: your syntax was almost correct you just had a small and common mistake to forget using double "=" for comparison.
I am trying to use JSON to search through googlemapapi. So, I give location "Plymouth" - in googlemapapi it is showing 6 resultset but when I try to parse in Json, I am getting length of only 2. I tried with multiple cities too, but all I am getting is resultset of 2 rather.
What is wrong below?
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1:
location = js1["results"][count]["formatted_address"]
lat = js1["results"][count]["geometry"]["location"]["lat"]
lng = js1["results"][count]["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)
Simply replace for result in js1: with for result in js1['results']:
By the way, as posted in a comment in the question, no need to use a counter. You can rewrite your for loop as:
for result in js1['results']:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
print('lat',lat,'lng',lng)
print(location)
If you look at the json that comes in, you'll see that its a single dict with two items ("results" and "status"). Add print('result:', result) to the top of your for loop and it will print result: status and result: results because all you are iterating the the keys of that outer dict. That's a general debugging trick in python... if you aren't getting the stuff you want, put in a print statement to see what you got.
The results (not surprisingly) and in a list under js1["results"]. In your for loop, you ignore the variable you are iterating and go back to the original js1 for its data. This is unnecessary and in your case, it hid the error. Had you tried to reference cities off of result you would gotten an error and it may have been easier to see that result was "status", not the array you were after.
Now a few tweaks fix the problem
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1["results"]:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)
I'm trying to build a small script that will go through the Etsy API and retrieve certain information. The API returns 25 different listing all in json and I would appreciate it if someone could help me learn how to handle one at a time.
Here is an example of the json I'm dealing with:
{"count":50100,"results":[{"listing_id":114179207,"state":"active"},{"listing_id":11344567,"state":"active"},
and so on.
Is there a simple way to handle only one of these listings at a time to minimize the amount of calls I must make to the API?
Here is some of the code of how I'm dealing with just one when I limit the results returned to 1:
r = requests.get('http://openapi.etsy.com/v2/listings/active?api_key=key&limit=1&offset='+str(offset_param)+'&category=Clothing')
raw_json = r.json()
encoded_json = json.dumps(raw_json)
dataObject = json.loads(encoded_json)
if dataObject["results"][0]["quantity"] > 1:
if dataObject["results"][0]["listing_id"] not in already_done:
already_done.append(dataObject["results"][0]["listing_id"])
s = requests.get('http://openapi.etsy.com/v2/users/'+str(dataObject["results"][0]["user_id"])+'/profile?api_key=key')
raw_json2 = s.json()
encoded_json2 = json.dumps(raw_json2)
dataObject2 = json.loads(encoded_json2)
t = requests.get('http://openapi.etsy.com/v2/users/'+str(dataObject["results"][0]["user_id"])+'?api_key=key')
raw_json3 = t.json()
encoded_json3 = json.dumps(raw_json3)
dataObject3 = json.loads(encoded_json3)
Seeing how the results field (or key) contains a list structure, you can simply iterate it through like the following
json_str = { ...other key-values, "results": [{"listing_id":114179207,"state":"active"},{"listing_id":11344567,"state":"active"}, ...and so on] }
results = json_str['results'] # this gives you a list of dicts
# iterate through this list
for result in results:
if result['state'] == 'active':
do_something_with( result['listing_id']
else:
do_someotherthing_with( result['listing_id'] # or none at all