I'm scraping Tripadvisor with Scrapy ( https://www.tripadvisor.com/Hotel_Review-g189541-d15051151-Reviews-CitizenM_Copenhagen_Radhuspladsen-Copenhagen_Zealand.html ).
One of the items I scrape is attractions count and radius as well as the count and radius of the restaurants. This information is not always present ( https://www.tripadvisor.com/Hotel_Review-g189541-d292667-Reviews-Strandmotellet_Greve-Copenhagen_Zealand.html ). If it is not present I get this error message : "IndexError: list index out of range" ( https://pastebin.com/pphM8FSM)
I tried to write a try-error construction without any success:
try:
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
except IndexError:
nearby_restaurants0_attractions1_distance = [None,None]
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
Thanks a lot for your help!
List indices are zero-based, not one-based. If you are expecting a two-item list, you need to modify your last two lines to use [0] and [1] instead of [1] and [2]:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[0]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[1]
I am not sure the IndexError was coming from when the data was missing, either. It might have just been hitting this bug even when the data was present. You may need to catch a different exception if the data is missing.
Answer for everybody who is interested:
Scrapy searches for items in nearby_restaurants0_attractions1_distance but if nothing can be found it returns None. So there is no IndexError at that stage.
The IndexError occures later when items only fetches a part of the list - which is obviously not present when Scrapy returned a None-Object. [The pastebin also shows in a line above the IndexError that the problem was with items]
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
try:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
except IndexError:
items["hotel_nearby_restaurants_distance"] = None
try:
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
except:
items["hotel_nearby_attractions_distance"] = None
Related
I have a data frame with the text entries dataframe['text'] as well as a list of features to compute for the function. Although not all features work for all text entries, so I was trying to compute everything possible, without manually checking which one worked for which entry. So I wanted the loop to continue after the point where it errors:
with Processor('config.yaml', 'en') as doc_proc:
try:
for j in range (0,len(features)):
for i in range (0, len(dataframe['text'])) :
doc = doc_proc.analyze(dataframe['text'][i], 'string')
result = (doc.compute_features([features[j]]))
dataframe.loc[dataframe.index[i], [features[j]]] = list(result.values())
except:
continue
but I got the SyntaxError: unexpected EOF while parsing. The loop without try works, so I understand it's the reason but can't seem to find the correct way to change the syntax
Put the try/except inside the loop. Then it will resume with the next iteration.
with Processor('config.yaml', 'en') as doc_proc:
for feature in features:
for i in range (0, len(dataframe['text'])):
try:
doc = doc_proc.analyze(dataframe['text'][i], 'string')
result = (doc.compute_features([feature]))
dataframe.loc[dataframe.index[i], [feature]] = list(result.values())
except:
pass
i'm trying to scrape more 500 posts with the reddit api - without praw. however, since i'm only allowed 100 posts at a time, i'm saving the scraped objects in an array called subreddit_content and will be scraping until there are 500 posts in subreddit_content.
the code below gives me NameError: name 'subreddit_content_more' is not defined. if i instantiate subreddit_data_more = None before the while loop, i get TypeError: 'NoneType' object is not subscriptable. i've tried the same thing with a for loop but get the same results.
EDIT: updated code, while loop now uses subreddit_data instead of subreddit_data_more, but now getting TypeError: 'Response' object is not subscriptable despite converting subreddit_data to json.
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_data.json()['data']['children']
lastline_json = subreddit_content[-1]['data']['name']
while (len(subreddit_content) < 500):
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_json}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_content.append(subreddit_data.json()['data']['children'])
lastline_json = subreddit_data[-1]['data']['name']
time.sleep(2.5)
EDIT2: using .extend instead of .append and removing the variable assignment in the loop seemed to do the trick. this is the snippet of working code (also renamed my variables for readability, courtesy of Wups):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list = data.json()['data']['children']
lastline_name = content_list[-1]['data']['name']
while (len(content_list) < 500):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_name}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list.extend(data.json()['data']['children'])
lastline_name = content_list[-1]['data']['name']
time.sleep(2)
You want to just add one list to another list, but you're doing it wrong. One way to do that is:
the_next_hundred_records = subreddit_data.json()['data']['children']
subreddit_content.extend(the_next_hundred_records)
compare append and extend at https://docs.python.org/3/tutorial/datastructures.html
What you did with append was add the full list of the next 100 as a single sub-list at position 101. Then, because list.append returns None, you set subreddit_content = None
Let's try some smaller numbers so you can see what's going on in the debugger. Here is your code, super simplified, except instead of doing requests to get a list from subreddit, I just made a small list. Same thing, really. And I used multiples of ten instead of 100.
def do_query(start):
return list(range(start, start+10))
# content is initialized to a list by the first query
content = do_query(0)
while len(content) < 50:
next_number = len(content)
# there are a few valid ways to add to a list. Here's one.
content.extend(do_query(next_number))
for x in content:
print(x)
It would be better to use a generator, but maybe that's a later topic. Also, you might have problems if the subreddit actually has less than 500 records.
This program is using a website's API to scrape the latest sale. This program works fine for products that have recent sales, but not for one's that don't have a recent sale within the last day.
The array is [] and I of course get the IndexError: list index is out of range.
Here is my code:
import requests
cybersole_url = 'https://www.botbroker.io/bots/6/chart?key_type=lifetime&days=1'
response = requests.get(cybersole_url)
response.raise_for_status()
if (response.json()[0][1] == None):
cyber = "No recent sales."
else:
cyber = "$" + str(response.json()[0][1])
How can I work around this error to get one of the two results listed in my if statement? I believe I used try and except, but it only performed the except even when it had objects in the array.
import requests
cybersole_url = 'https://www.botbroker.io/bots/6/chart?key_type=lifetime&days=1'
response = requests.get(cybersole_url)
response.raise_for_status()
# Try to index the result, otherwise set result=None
try:
result = response.json()[0][1]
except IndexError:
result = None
cyber = 'No recent sales.' if not result else f'${result}'
Note you might want to add another layer of try-catching since you not only want to grab the element at [0], but also the element at [0][1] – there are two layers of indexing here.
I'm trying to check if an object has a skinCluster on it. My code is pretty basic. Here's an example:
cmds.select(d=True)
joint = cmds.joint()
skinnedSphere = cmds.polySphere(r=2)
notSkinnedSphere = cmds.polySphere(r=2)
skinTestList = [skinnedSphere, notSkinnedSphere]
# Bind the joint chain that contains joint1 to pPlane1
# and assign a dropoff of 4.5 to all the joints
#
cmds.skinCluster( joint, skinnedSphere, dr=4.5)
for obj in skinTestList:
objHist = cmds.listHistory(obj, pdo=True)
skinCluster = cmds.ls(objHist, type="skinCluster")
if skinCluster == "":
print(obj + " has NO skinCluster, skipping.")
else:
print obj, skinCluster
#cmds.select(obj, d=True)
My issue is that even if it can't find a skincluster, it still prints out the "obj, skincluster" rather than the error that it can't find a skinCluster.
I thought a skinCluster returns a string. So if the string is empty, it should print out the error rather than "obj, skincluster".
Any help would be appreciated!
This is a classic Maya issue -- the problem is that Maya frequently wants to give you lists, not single items, even when you know the result ought to be a single item. This means you end up writing a bunch of code to either get one item from a one-item list or to avoid errors that come from trying to get an index into an empty list.
You've got the basics, it's the == "" which is messing you up:
for obj in skinTestList:
objHist = cmds.listHistory(obj, pdo=True)
skinCluster = cmds.ls(objHist, type="skinCluster") or [None]
cluster = skinCluster[0]
print obj, cluster
The or [None] guarantees that you'll always get a list with something in it so it's safe to use the [0] to get the single value. None is a good return value here because (as pointed out in the comments) you can if cluster: and skip empty values.
I have the following piece of code which calculates the number of page views for each user and adds that to a new variable in the DataFrame:
def get_user_pageview(row, pageview_spiker_indexed):
user_id = row['user_id']
num_pageview = len(pageview_spiker_indexed.ix[user_id])
#print num_pageview
return num_pageview
userDF['num_pageview'] = userDF.apply(get_user_pageview, args = (pageview_spiker_indexed,), axis = 1)
`
The issue is that I get the following error:
KeyError: ('a342bf9', u'occurred at index 188')
I have done a bit of debugging using print statements inside the function and what is probably going wrong is that the function works fine till it reaches index 188. I cannot seem to fix the problem. Any clues on how I can fix this issue?