do variables need to be instantiated before while loop python - python

i'm trying to scrape more 500 posts with the reddit api - without praw. however, since i'm only allowed 100 posts at a time, i'm saving the scraped objects in an array called subreddit_content and will be scraping until there are 500 posts in subreddit_content.
the code below gives me NameError: name 'subreddit_content_more' is not defined. if i instantiate subreddit_data_more = None before the while loop, i get TypeError: 'NoneType' object is not subscriptable. i've tried the same thing with a for loop but get the same results.
EDIT: updated code, while loop now uses subreddit_data instead of subreddit_data_more, but now getting TypeError: 'Response' object is not subscriptable despite converting subreddit_data to json.
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_data.json()['data']['children']
lastline_json = subreddit_content[-1]['data']['name']
while (len(subreddit_content) < 500):
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_json}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_content.append(subreddit_data.json()['data']['children'])
lastline_json = subreddit_data[-1]['data']['name']
time.sleep(2.5)
EDIT2: using .extend instead of .append and removing the variable assignment in the loop seemed to do the trick. this is the snippet of working code (also renamed my variables for readability, courtesy of Wups):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list = data.json()['data']['children']
lastline_name = content_list[-1]['data']['name']
while (len(content_list) < 500):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_name}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list.extend(data.json()['data']['children'])
lastline_name = content_list[-1]['data']['name']
time.sleep(2)

You want to just add one list to another list, but you're doing it wrong. One way to do that is:
the_next_hundred_records = subreddit_data.json()['data']['children']
subreddit_content.extend(the_next_hundred_records)
compare append and extend at https://docs.python.org/3/tutorial/datastructures.html
What you did with append was add the full list of the next 100 as a single sub-list at position 101. Then, because list.append returns None, you set subreddit_content = None
Let's try some smaller numbers so you can see what's going on in the debugger. Here is your code, super simplified, except instead of doing requests to get a list from subreddit, I just made a small list. Same thing, really. And I used multiples of ten instead of 100.
def do_query(start):
return list(range(start, start+10))
# content is initialized to a list by the first query
content = do_query(0)
while len(content) < 50:
next_number = len(content)
# there are a few valid ways to add to a list. Here's one.
content.extend(do_query(next_number))
for x in content:
print(x)
It would be better to use a generator, but maybe that's a later topic. Also, you might have problems if the subreddit actually has less than 500 records.

Related

Python : correct way to formulate try-except

I'm scraping Tripadvisor with Scrapy ( https://www.tripadvisor.com/Hotel_Review-g189541-d15051151-Reviews-CitizenM_Copenhagen_Radhuspladsen-Copenhagen_Zealand.html ).
One of the items I scrape is attractions count and radius as well as the count and radius of the restaurants. This information is not always present ( https://www.tripadvisor.com/Hotel_Review-g189541-d292667-Reviews-Strandmotellet_Greve-Copenhagen_Zealand.html ). If it is not present I get this error message : "IndexError: list index out of range" ( https://pastebin.com/pphM8FSM)
I tried to write a try-error construction without any success:
try:
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
except IndexError:
nearby_restaurants0_attractions1_distance = [None,None]
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
Thanks a lot for your help!
List indices are zero-based, not one-based. If you are expecting a two-item list, you need to modify your last two lines to use [0] and [1] instead of [1] and [2]:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[0]
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[1]
I am not sure the IndexError was coming from when the data was missing, either. It might have just been hitting this bug even when the data was present. You may need to catch a different exception if the data is missing.
Answer for everybody who is interested:
Scrapy searches for items in nearby_restaurants0_attractions1_distance but if nothing can be found it returns None. So there is no IndexError at that stage.
The IndexError occures later when items only fetches a part of the list - which is obviously not present when Scrapy returned a None-Object. [The pastebin also shows in a line above the IndexError that the problem was with items]
nearby_restaurants0_attractions1_distance = response.css("._1aFljvmJ::text").extract()
try:
items["hotel_nearby_restaurants_distance"] = nearby_restaurants0_attractions1_distance[1]
except IndexError:
items["hotel_nearby_restaurants_distance"] = None
try:
items["hotel_nearby_attractions_distance"] = nearby_restaurants0_attractions1_distance[2]
except:
items["hotel_nearby_attractions_distance"] = None

Single remove clause in while loop is removing two elements

I am writing a simple secret santa script that selects a "GiftReceiver" and a "GiftGiver" from a list. Two lists and an empty dataframe to be populated are produced:
import pandas as pd
import random
santaslist_receivers = ['Rudolf',
'Blitzen',
'Prancer',
'Dasher',
'Vixen',
'Comet'
]
santaslist_givers = santaslist_receivers
finalDataFrame = pd.DataFrame(columns = ['GiftGiver','GiftReceiver'])
I then have a while loop that selects random elements from each list to pick a gift giver and receiver, then remove from the respective list:
while len(santaslist_receivers) > 0:
print (len(santaslist_receivers)) #Used for testing.
gift_receiver = random.choice(santaslist_receivers)
santaslist_receivers.remove(gift_receiver)
print (len(santaslist_receivers)) #Used for testing.
gift_giver = random.choice(santaslist_givers)
while gift_giver == gift_receiver: #While loop ensures that gift_giver != gift_receiver
gift_giver = random.choice(santaslist_givers)
santaslist_givers.remove(gift_giver)
dummyDF = pd.DataFrame({'GiftGiver':gift_giver,'GiftReceiver':gift_receiver}, index = [0])
finalDataFrame = finalDataFrame.append(dummyDF)
The final dataframe only contains three elements instead of six:
print(finalDataframe)
returns
GiftGiver GiftReceiver
0 Dasher Prancer
0 Comet Vixen
0 Rudolf Blitzen
I have inserted two print lines within the while loop to investigate. These print the length of the list santaslist_receivers before and after the removal of an element. The expected return is to see original list length on the first print, then minus 1 on the second print, then the same length again on the first print of the next iteration of the while loop, then so on. Specifically I expect:
6,5,5,4,4,3,3... and so on.
What is returned is
6,5,4,3,2,1
Which is consistent with the DataFrame having only 3 rows, but I do not see the cause of this.
What is the error in my code or my approach?
You can solve it by simply changing this line
santaslist_givers = santaslist_receivers
to
santaslist_givers = list(santaslist_receivers)
Python variables are pointers essentially so they refer to the same list , ie santaslist_givers and santaslist_receivers were accessing the same location in memory in your implementation . To make them different use a list function
And for some extra information , you can refer copy.deepcopy
You should make an explicit copy of your list here
santaslist_givers = santaslist_receivers
there are multiple options for doing this as explained in this question.
In this case I would recommend (if you have Python >= 3.3):
santaslist_givers = santaslist_receivers.copy()
If you are on an older version of Python, the typical way to do it is:
santaslist_givers = santaslist_receivers[:]

Python Stop a For-Loop at a special number?

I wanna stop my for-loop at a certain point. I know the method range() but this doesn´t help me because I am iterating in a list. Either it doesnt Work with range() or I just dont know.
Globally I save this Variable.
productAmount = 4
That is my method. Everything works fine. I must delete some Code hopefully you understand this.
def amazonChecker(keyword):
driver = webdriver.Chrome('./driver/chromedriver.exe')
driver.get(url)
titels = driver.find_elements_by_tag_name('h2')
for titel in titels:
counter =+ 1
if counter < productAmount:
print(titel.text)
sleep(5)
driver.close
Best regards
KaanDev
driver.find_elements_by_tag_name() returns a list of WebElements. You can use list slicing to make a copy of this list containing only the subset of items you specify.
For example... to print the text from only the first 4 h2 elements:
titles = driver.find_elements_by_tag_name('h2')
for title in titles[:4]:
print(title.text)

Maya Python skinCluster return type not string?

I'm trying to check if an object has a skinCluster on it. My code is pretty basic. Here's an example:
cmds.select(d=True)
joint = cmds.joint()
skinnedSphere = cmds.polySphere(r=2)
notSkinnedSphere = cmds.polySphere(r=2)
skinTestList = [skinnedSphere, notSkinnedSphere]
# Bind the joint chain that contains joint1 to pPlane1
# and assign a dropoff of 4.5 to all the joints
#
cmds.skinCluster( joint, skinnedSphere, dr=4.5)
for obj in skinTestList:
objHist = cmds.listHistory(obj, pdo=True)
skinCluster = cmds.ls(objHist, type="skinCluster")
if skinCluster == "":
print(obj + " has NO skinCluster, skipping.")
else:
print obj, skinCluster
#cmds.select(obj, d=True)
My issue is that even if it can't find a skincluster, it still prints out the "obj, skincluster" rather than the error that it can't find a skinCluster.
I thought a skinCluster returns a string. So if the string is empty, it should print out the error rather than "obj, skincluster".
Any help would be appreciated!
This is a classic Maya issue -- the problem is that Maya frequently wants to give you lists, not single items, even when you know the result ought to be a single item. This means you end up writing a bunch of code to either get one item from a one-item list or to avoid errors that come from trying to get an index into an empty list.
You've got the basics, it's the == "" which is messing you up:
for obj in skinTestList:
objHist = cmds.listHistory(obj, pdo=True)
skinCluster = cmds.ls(objHist, type="skinCluster") or [None]
cluster = skinCluster[0]
print obj, cluster
The or [None] guarantees that you'll always get a list with something in it so it's safe to use the [0] to get the single value. None is a good return value here because (as pointed out in the comments) you can if cluster: and skip empty values.

Nested "for" in Django view won´t work

I want to generate a JSON type object for a HttpResponse and in order to build it i´m using a nested "for" structure. I wrote down some code, tried it with my python interpreter but when I used it on my django view it refuses to work correctly.
My structure is something like this:
tarifas = ['2.0A','2.0DHA','2.0DHSA']
terminos = ['Dia','Hora','GEN','NOC','VHC','COFGEN','COFNOC','COFVHC','PMHGEN','PMHNOC','PMHVHC','SAHGEN','SAHNOC','SAHVHC','FOMGEN','FOMNOC','FOMVHC','FOSGEN','FOSNOC','FOSVHC','INTGEN','INTNOC','INTVHC','PCAPGEN','PCAPNOC','PCAPVHC','TEUGEN','TEUNOC','TEUVHC']
data_json = {}
data_json['datos_TOT'] = []
data_json['datos_TEU'] = []
data_json['fecha'] = fecha
for i in range(3):
data_json['datos_TOT'].append({})
data_json['datos_TEU'].append({})
data_json['datos_TOT'][i]['tarifa'] = tarifas[i]
data_json['datos_TEU'][i]['tarifa'] = tarifas[i]
for j in range(0,24):
data_json['datos_TEU'][i]['values'] = []
data_json['datos_TEU'][i]['values'].append({})
data_json['datos_TEU'][i]['values'][j]['periodo'] = "{0}-{1}".format(j,j+1)
return HttpResponse(json.dumps(data_json), content_type="application/json")
In fact it has one more depth level but as the second don´t work I didn´t put it here.
With this nested structure I expected a JSON object with (b-a) entries in the first level with (d-c) entries each one. But what I see is that the second loop only returns the last value! So if the "j" loop goes from 0 to 24 it will just return "23" and nothing more. Seems like it just works one "lap".
Is there any limit in nesting loops in the views? If there is, where could I place them? I´m trying to keep the models.py free from logic.
Your problem is that you reset data_json['datos_TEU'][i]['values'] to an empty list at the beginning of every iteration of the j loop, so it will only ever have one element. Move that line to before the nested loop.
Note that your code could be written much more Pythonically:
for tarifa in tarifas:
tot = {'tarifa': tarifa}
data_json['datos_TOT'].append(tot)
teu = {'tarifa': tarifa}
values = []
for j, termino in enumerate(terminos):
value = {'termino': termino, 'periodo': "{0}-{1}".format(j,j+1)}
values.append(value)
teu['values'] = values
data_json['datos_TEU'].append(teu)

Categories