This program is using a website's API to scrape the latest sale. This program works fine for products that have recent sales, but not for one's that don't have a recent sale within the last day.
The array is [] and I of course get the IndexError: list index is out of range.
Here is my code:
import requests
cybersole_url = 'https://www.botbroker.io/bots/6/chart?key_type=lifetime&days=1'
response = requests.get(cybersole_url)
response.raise_for_status()
if (response.json()[0][1] == None):
cyber = "No recent sales."
else:
cyber = "$" + str(response.json()[0][1])
How can I work around this error to get one of the two results listed in my if statement? I believe I used try and except, but it only performed the except even when it had objects in the array.
import requests
cybersole_url = 'https://www.botbroker.io/bots/6/chart?key_type=lifetime&days=1'
response = requests.get(cybersole_url)
response.raise_for_status()
# Try to index the result, otherwise set result=None
try:
result = response.json()[0][1]
except IndexError:
result = None
cyber = 'No recent sales.' if not result else f'${result}'
Note you might want to add another layer of try-catching since you not only want to grab the element at [0], but also the element at [0][1] – there are two layers of indexing here.
Related
hello fellows as you see on my code here i got the min price value
but the problem is i cant get the rest of the data linked with the minimum price such as
ingame_name and status especially status for the minimum price :
for example we take this item url :
https://api.warframe.market/v1/items/mirage_prime_systems/orders?include=item
as result we will get lot of open orders from that JSON Link , the thing i need to do here is to get min price of selling from an online player with all his basic info for the user .
here is my code
import requests
import json
item = input('put your item here please : ')
gg = item.replace(' ', '_')
response_url = requests.get(f'https://api.warframe.market/v1/items/'+ gg +'/orders?include=item')
data = json.loads(response_url.text)
orders = data["payload"]["orders"]
min_price = min(o["platinum"] for o in orders if o["order_type"] == "sell")
print(min_price)
and i seem i cant get do it unless u help me guys and i really appreciate it .
You can use built-in min() with custom key. As you need to apply some filtering on initial data, there're two possible ways:
Filter data before searching minimum (one of possible methods shown in this answer);
Exploit lexicographical comparison which is used in python to compare sequences (docs).
I found second method is better as it makes possible to iterate over all orders just once (which more efficient).
So to implement this we should return from lambda which we will pass to min() not just "platinum" key value, but also boolean result of inverted conditions.
Why inverted? Because we are searching for minimum value. It means that every return of our lambda will be compared with previous minimum. In python False equals 0 and True equals 1 (docs). 0 < 1 so if any of inverted conditions will be True all key will be greater ignoring "platinum" value.
Code:
import requests
from json import JSONDecodeError
item = input("Put your item here please: ")
try:
response = requests.get("https://api.warframe.market/v1/items/" +
item.lower().replace(" ", "_") + "/orders",
params={"include": "item"})
json_data = response.json()
orders = json_data["payload"]["orders"]
except requests.RequestException as e:
print("An error occurred during processing request:", e)
except JSONDecodeError:
print("Server response is not valid json:", response.text)
except KeyError as e:
print("Response doesn't contain", e, "key")
except Exception as e:
print("Unexpected error:", e)
else:
min_price_order = min(orders, key=lambda x: (x["order_type"] != "sell",
x["user"]["status"] != "ingame",
x["platinum"]))
print(min_price_order)
Filter the list by order type and find the min using lambda
orders = [{'order_type':'sell','other':88,'platinum':4},
{'order_type':'sell','other':77,'platinum':9},
{'order_type':'buy','other':88,'platinum':4}]
min_price = min([o for o in orders if o["order_type"] == "sell"],key= lambda x : x ['platinum'])
print(min_price)
output
{'order_type': 'sell', 'other': 88, 'platinum': 4}
I am trying to scrape sales data from eBay with BeautifulSoup in Python for recently sold items and it works very well with the following code which finds all prices and all dates from sold items.
price = []
try:
p = soup.find_all('span', class_='POSITIVE')
except:
p = 'nan'
for x in p:
x = str(x)
x = x.replace(' ','"')
x = x.split('"')
if '>Sold' in x:
continue
else:
price.append(x)
Now I am running into a problem though. As seen in the picture below for this URL (https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=babe+ruth+1933+goudey+149+psa+%281.5%29&_sacat=0&LH_TitleDesc=0&_osacat=0&_odkw=babe+ruth+1933+goudey+149+psa+1.5&LH_Complete=1&rt=nc&LH_Sold=1), eBay sometimes suggests other search results if there are not enough for specific search queries. Check out the image
By that, my code not only finds the correct prices but also those of the suggested results below the warning. I was trying to find out where the warning message is located and delete every listing that is being found afterward, but I cannot figure it out. I also thought that I can search for the prices one by one but even then I cannot figure out how to notice when the warning appears.
Is there any other way you guys can think of to solve this?
I am aware that this is really specific
You can scrape the number of results (Shown in picture) and make a loop with the range of the results.
The code will be something like:
results = soup.find...
#You have to make the variable a int so replace everything extra
results = int(results)
for i in range(1, results):
price[i] = str(price[i])
price[i] = price[i].replace(' ','"')
price[i] = price[i].split()
if '>Sold' in price[i]:
continue
else:
i'm trying to scrape more 500 posts with the reddit api - without praw. however, since i'm only allowed 100 posts at a time, i'm saving the scraped objects in an array called subreddit_content and will be scraping until there are 500 posts in subreddit_content.
the code below gives me NameError: name 'subreddit_content_more' is not defined. if i instantiate subreddit_data_more = None before the while loop, i get TypeError: 'NoneType' object is not subscriptable. i've tried the same thing with a for loop but get the same results.
EDIT: updated code, while loop now uses subreddit_data instead of subreddit_data_more, but now getting TypeError: 'Response' object is not subscriptable despite converting subreddit_data to json.
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_data.json()['data']['children']
lastline_json = subreddit_content[-1]['data']['name']
while (len(subreddit_content) < 500):
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_json}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_content.append(subreddit_data.json()['data']['children'])
lastline_json = subreddit_data[-1]['data']['name']
time.sleep(2.5)
EDIT2: using .extend instead of .append and removing the variable assignment in the loop seemed to do the trick. this is the snippet of working code (also renamed my variables for readability, courtesy of Wups):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list = data.json()['data']['children']
lastline_name = content_list[-1]['data']['name']
while (len(content_list) < 500):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_name}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list.extend(data.json()['data']['children'])
lastline_name = content_list[-1]['data']['name']
time.sleep(2)
You want to just add one list to another list, but you're doing it wrong. One way to do that is:
the_next_hundred_records = subreddit_data.json()['data']['children']
subreddit_content.extend(the_next_hundred_records)
compare append and extend at https://docs.python.org/3/tutorial/datastructures.html
What you did with append was add the full list of the next 100 as a single sub-list at position 101. Then, because list.append returns None, you set subreddit_content = None
Let's try some smaller numbers so you can see what's going on in the debugger. Here is your code, super simplified, except instead of doing requests to get a list from subreddit, I just made a small list. Same thing, really. And I used multiples of ten instead of 100.
def do_query(start):
return list(range(start, start+10))
# content is initialized to a list by the first query
content = do_query(0)
while len(content) < 50:
next_number = len(content)
# there are a few valid ways to add to a list. Here's one.
content.extend(do_query(next_number))
for x in content:
print(x)
It would be better to use a generator, but maybe that's a later topic. Also, you might have problems if the subreddit actually has less than 500 records.
I'm scraping Tripadvisor with Scrapy ( https://www.tripadvisor.com/Hotel_Review-g189541-d15051151-Reviews-CitizenM_Copenhagen_Radhuspladsen-Copenhagen_Zealand.html ).
One of the items I scrape is attractions count and radius as well as the count and radius of the restaurants. This information is not always present ( https://www.tripadvisor.com/Hotel_Review-g189541-d292667-Reviews-Strandmotellet_Greve-Copenhagen_Zealand.html ). If it is not present I get this error message : "IndexError: list index out of range" ( https://pastebin.com/pphM8FSM)
I tried to write a try-error construction without any success:
try:
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
except IndexError:
nearby_restaurants0_attractions1_distance = [None,None]
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
Thanks a lot for your help!
List indices are zero-based, not one-based. If you are expecting a two-item list, you need to modify your last two lines to use [0] and [1] instead of [1] and [2]:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[0]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[1]
I am not sure the IndexError was coming from when the data was missing, either. It might have just been hitting this bug even when the data was present. You may need to catch a different exception if the data is missing.
Answer for everybody who is interested:
Scrapy searches for items in nearby_restaurants0_attractions1_distance but if nothing can be found it returns None. So there is no IndexError at that stage.
The IndexError occures later when items only fetches a part of the list - which is obviously not present when Scrapy returned a None-Object. [The pastebin also shows in a line above the IndexError that the problem was with items]
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
try:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
except IndexError:
items["hotel_nearby_restaurants_distance"] = None
try:
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
except:
items["hotel_nearby_attractions_distance"] = None
I keep getting an error when I am using an if else statement in python. I want my script to check if an index exists and if it does then run the code, if not then run another code. I get the error ValueError: 'Named Administrator' is not in list
import requests
from bs4 import BeautifulSoup
url_3 = 'https://www.brightscope.com/form-5500/basic-info/107299/Orthopedic-Institute-Of-Pennsylvania/15801790/Orthopedic-Institute-Of-Pennsylvania-401k-Profit-Sharing-Plan/'
page = requests.get(url_3)
soup = BeautifulSoup(page.text, 'html.parser')
divs = [e.get_text() for e in soup.findAll('span')]
if divs.index('Named Administrator'):
index = divs.index('Named Administrator')
contact = divs[index + 1]
else:
contact = '-'
Rather than doing index, do a __contains__ test:
if 'Named Administrator' in divs:
and move forward only if Named Administrator actually exists in divs list, so you won't get the ValueError.
Another consideration is that membership test in lists has O(N) time complexity, so if you are doing this for a large list, probably use a set instead:
{e.get_text() for e in soup.findAll('span')}
but as sets are unordered you won't be able to use index-ing.
So either think about something else that would work on sets as well i.e. no need to get next value by indexing.
Or you can use a set for membership test, and list for getting the next value. The cost here might be higher or lower based on your actual context and you can only find out that by profiling:
divs_list = [e.get_text() for e in soup.findAll('span')]
divs_set = set(divs_list)
if 'Named Administrator' in divs_set:
index = divs_list.index('Named Administrator')
contact = divs_list[index + 1]