I'm trying to ignore the urls that are blacklisted from my search. 'ltp_block' contains the data which contain different urls.
p = re.compile('href="(.*?)" rel="nofollow"')
url = "http://www.****.**" + p.findall(current)[0]
r = requests.get(url)
The above code is used to fetch different urls from 'ltp_block'. r.url defines the url in loop when called for.
for each_row in blacklist:
if(re.findall('\\b'+each_row[0]+'\\b', r.url, flags=re.IGNORECASE) != []):
print "found"
QUESTION - Above 'for' loop works only ONCE. When 'check' becomes 1 OR the main loop chooses another url this second 'for' loop is simply skipped like it doesn't exist. Why ?
conn = sqlite3.connect('test.db')
c = conn.cursor()
blacklist = c.execute("SELECT `name` FROM `blacklist`")
check = 0
for row in ltp_block:
p = re.compile('versan')
current = ltp_block[check]
if(p.findall(current) != []):
p = re.compile('price=(.*?)&')
ltp = p.findall(current)[0]
del p
else:
p = re.compile('Gesa: (.*?) &')
ltp = p.findall(current)[0]
del p
p = re.compile('href="(.*?)" rel="nofollow"')
url = "http://www.****.**" + p.findall(current)[0]
r = requests.get(url)
for each_row in blacklist:
if(re.findall('\\b'+each_row[0]+'\\b', r.url, flags=re.IGNORECASE) != []):
print "found"
check = check + 1
Answer -
I had to recompile blacklist = c.execute("SELECT name FROM blacklist") each time
I placed it in the main 'for' loop and everything is working now
c.execute is returning an iterator. Iterators can only be iterated over once. As a simpler example, try this:
numbers = (x for x in xrange(10)) # a simple iterator
for number in numbers:
print number
print "Repeat"
for number in numbers:
print number
Only the first loop gives any output, as the iterator is exhausted and empty at the start of the second. Compare that with:
numbers = (x for x in xrange(10))
numbers = list(numbers) # turn the iterator into a list
for number in numbers:
print number
print "Repeat"
for number in numbers:
print number
Which works as expected. In your case, you want:
blacklist = list(c.execute("SELECT `name` FROM `blacklist`"))
The only way for a for loop to be "simply skipped like it doesn't exist" is if there is nothing to iterate over. If that is truly the case, the only explanation is that blacklist is empty. There simply is no other explanation if what you report is true.
Such a thing is very easy to prove. Add a print statement immediately before the loop, and print out the value of blacklist:
print "blacklist:", blacklist
for each_row in blacklist:
...
Related
I am iterating through some web scraped data and I am trying to put the results into one of two list. There is a case where the split used below raises exception to skip the record because there essentially is no ' at ' in the entry. If I run the code for 10 records and this was the situation for one of those records then I would have 10 in a_list and 9 in b_list. I want to keep everything matched up correctly and keep 10 records in both list but put in a blank or some string like "null" or 0 into the list where the exception was true. After this, I want the script to continue doing its thing. Is there an easy way to achieve this?
a_list = []; b_list = [];
for i in range(1,11):
try:
a = driver.find_element_by_xpath('abc').text.split(' at ')[0]
b = driver.find_element_by_xpath('abc').text.split(' at ')[1]
a_list.append(a)
b_list.append(b)
except:
continue
i +=1
You can just append within the except block.
a_list = []; b_list = [];
for i in range(1,11):
try:
a = driver.find_element_by_xpath('abc').text.split(' at ')[0]
b = driver.find_element_by_xpath('abc').text.split(' at ')[1]
a_list.append(a)
b_list.append(b)
except:
a_list.append(None)
b_list.append(None)
Also you do not need to increment the loop variable. for loop does that for you
Try this one:
for i in range(1, 11):
try:
element_splitted = driver.find_element_by_xpath('abc').text.split(' at ')
except: # No element found
a_list.append(None)
b_list.append(None)
continue
if len(element_splitted) > 1: # it has ' at '
a_list.append(element_splitted[0])
b_list.append(element_splitted[1])
else:
a_list.append(None)
b_list.append(None)
I haven't used Selenium before but I guess this is how it works. First I checked it the element was found. Then I checked if it has " at " in it.
Additional notes:
No need to increment i.
Do not use driver.find_element_by_xpath('abc').text.split(' at ') part twice.
Do not use broad exception handler like me --> check the document for specific exception that would raise.
when I swap return for print, I do not get all the same values back in my defined function.
Here is my code with print()
def seo():
sou = soup.findAll(class_ = 'rtf l-row')
for x in sou:
l = x.findAll('p')
s = x.findAll('h4')
for i in l:
lolz = i.text
print(lolz)
for j in s:
h = j.text
print(h)
Here is the exact same code with return:
def seo():
sou = soup.findAll(class_ = 'rtf l-row')
for x in sou:
l = x.findAll('p')
s = x.findAll('h4')
for i in l:
lolz = i.text
return lolz
for j in s:
h = j.text
return h
when I use return, I only get back the first line of code. Thanks!
There should be only one return statement in a function, and it should be the last statement in it.
You have two return statements inside your seo function. The function reaches the first return statement, and the rest of the code in the function never runs.
You should either break it into two different functions, or return a list or a dictionary so you can have several values returned in a single variable :)
I would like to ask as a python beginner, I would like to obtain strings from inside a square bracket and best if without trying to import any modules from python. If not it's okay.
For example,
def find_tags
#do some codes
x = find_tags('Hi[Pear]')
print(x)
it will return
1-Pear
if there are more than one brackets for example,
x = find_tags('[apple]and[orange]and[apple]again!')
print(x)
it will return
1-apple,2-orange,3-apple
I would greatly appreciate if someone could help me out thanks!
Here, I tried solving it. Here is my code :
bracket_string = '[apple]and[orange]and[apple]again!'
def find_tags(string1):
start = False
data = ''
data_list = []
for i in string1:
if i == '[':
start = True
if i != ']' and start == True:
if i != '[':
data += i
else:
if data != '':
data_list.append(data)
data = ''
start = False
return(data_list)
x = find_tags(bracket_string)
print(x)
The function will return a list of items that were between brackets of a given string parameter.
Any advice will be appreciated.
If your pattern is consistent like [sometext]sometext[sometext]... you can implement your function like this:
import re
def find_tags(expression):
r = re.findall('(\[[a-zA-Z]+\])', expression)
return ",".join([str(index + 1) + "-" + item.replace("[", "").replace("]", "") for index, item in enumerate(r)])
Btw you can use stack data structure (FIFO) to solve this problem.
You can solve this using a simple for loop over all characters of your text.
You have to remember if you are inside a tag or outside a tag - if inside you add the letter to a temporary list, if you encounter the end of a tag, you add the whole templorary list as word to a return list.
You can solve the numbering using enumerate(iterable, start=1) of the list of words:
def find_tags(text):
inside_tag = False
tags = [] # list of all tag-words
t = [] # list to collect all letters of a single tag
for c in text:
if not inside_tag:
inside_tag = c == "[" # we are inside as soon as we encounter [
elif c != "]":
t.append(c) # happens only if inside a tag and not tag ending
else:
tags.append(''.join(t)) # construct tag from t and set inside back to false
inside_tag = False
t = [] # clear temporary list
if t:
tags.append(''.join(t)) # in case we have leftover tag characters ( "[tag" )
return list(enumerate(tags,start=1)) # create enumerated list
x = find_tags('[apple]and[orange]and[apple]again!')
# x is a list of tuples (number, tag):
for nr, tag in x:
print("{}-{}".format(nr, tag), end = ", ")
Then you specify ',' as delimiter after each print-command to get your output.
x looks like: [(1, 'apple'), (2, 'orange'), (3, 'apple')]
Sometimes I get confused as to where to use the return statement. I get what it does, it's just that I don't get its placement properly.
Here's a short example of the same code.
Correct way:
def product_list(list_of_numbers):
c = 1
for e in list_of_numbers:
c = c * e
return c
Wrong way (which I did initially):
def product_list(list_of_numbers):
c = 1
for e in list_of_numbers:
c = c * e
return c
Can someone clarify what's the difference between the two and where should the return be when using a loop in a function?
return in a function means you are leaving the function immediately and returning to the place where you call it.
So you should use return when you are 100% certain that you wanna exit the function immediately.
In your example, I think you don't want to exit the function until you get the final value of c, so you should place the return outside of the loop.
You're putting too much emphasis on the impact of return on controlling the behaviour of the for loop. Instead, return applies to the function and happens to terminate the for loop prematurely by primarily bringing an end to the function.
Instead, you can control the behaviour of the for loop independently from the function itself using break. In addition, you can have multiple return statements in a function depending on what action should be taken in response to particular criteria (as in my_func1). Consider the following:
import random
def my_func1(my_list, entry):
'''
Search a list for a specific entry. When found, terminate search
and return the list index immediately
Return False if not found
'''
print "\n Starting func1"
index = 0
for item in my_list:
if item != entry:
print "Not found yet at index: {}".format(index)
index += 1
else:
print "found item, at index {}".format(index)
print "Terminating function AND loop at same time"
return index
print "########### ENTRY NOT IN LIST. RETURN FAlSE #############"
return False
a = my_func1(['my', 'name', 'is', 'john'], 'is')
b = my_func1(['my', 'name', 'is', 'john'], 'harry')
def my_func2(my_list):
''' Iterate through a list
For first 4 items in list, double them and save result to a list that will
be returned, otherwise terminate the loop
Also, return another list of random numbers
'''
print '\n starting func2'
return_list = []
for i in range(len(my_list)):
if i < 4:
print 'Value of i is {}'.format(i)
return_list.append(my_list[i] * 2)
else:
print 'terminating for loop, but ** keep the function going **'
break
other_list = [random.randint(1, 10) for x in range(10)]
print 'Returning both lists'
return return_list, other_list
c = my_func2([x for x in range(10)])
I've got a loop that is supposed to select features and keep looping until it is no longer selecting new features
arcpy.SelectLayerByLocation_management("antiRivStart","INTERSECT","polygon")
previousselectcount = -1
selectcount = arcpy.GetCount_management("StreamT_StreamO1")
while True:
#selectCount = arcpy.GetCount_management("StreamT_StreamO1")
mylist = []
with arcpy.da.SearchCursor("antiRivStart","ORIG_FID") as mycursor:
for feat in mycursor:
mylist.append(feat[0])
liststring = str(mylist)
queryIn1 = liststring.replace('[','(')
queryIn2 = queryIn1.replace(']',')')
arcpy.SelectLayerByAttribute_management('StreamT_StreamO1',"ADD_TO_SELECTION",'OBJECTID IN '+ queryIn2 )
arcpy.SelectLayerByLocation_management("antiRivStart","INTERSECT","StreamT_StreamO1","","ADD_TO_SELECTION")
previousselectcount = selectcount
selectcount = arcpy.GetCount_management("StreamT_StreamO1")
print str(selectcount), str(previousselectcount)
if selectcount == previousselectcount:
break
By my reckoning, once it starts print the name number twice it should stop, but it doesn't, its keeps print "15548 15548" over and over again. Is it ingnoring the break or is the if condition not being met?
I've also tried with
while selectcount != previousselectcount:
but this gave me the same result
Variables in Python are dynamic. Just because you initialise previousselectcount as an integer doesn't mean it will be one when you call previousselectcount = selectcount. You can feel free to get rid of that line.
If you replace:
selectcount = arcpy.GetCount_management("StreamT_StreamO1")
With:
selectcount = int(arcpy.GetCount_management("StreamT_StreamO1").getOutput(0))
For both lines you'll be comparing the integer values instead of whatever the equality operator is comparing for the object.
Even better, why not write a function to do it for you:
def GetCount():
return int(arcpy.GetCount_management("StreamT_StreamO1").getOutput(0))
Save yourself repeating yourself.