Python Stop a For-Loop at a special number? - python

I wanna stop my for-loop at a certain point. I know the method range() but this doesn´t help me because I am iterating in a list. Either it doesnt Work with range() or I just dont know.
Globally I save this Variable.
productAmount = 4
That is my method. Everything works fine. I must delete some Code hopefully you understand this.
def amazonChecker(keyword):
driver = webdriver.Chrome('./driver/chromedriver.exe')
driver.get(url)
titels = driver.find_elements_by_tag_name('h2')
for titel in titels:
counter =+ 1
if counter < productAmount:
print(titel.text)
sleep(5)
driver.close
Best regards
KaanDev

driver.find_elements_by_tag_name() returns a list of WebElements. You can use list slicing to make a copy of this list containing only the subset of items you specify.
For example... to print the text from only the first 4 h2 elements:
titles = driver.find_elements_by_tag_name('h2')
for title in titles[:4]:
print(title.text)

Related

Problems while trying to extract part of id

so i tried to gather all of ids from site and "extract" numbers from them
Its looking like that on that site:
<div class="market_listing_row number_490159191836499" id="number_490159191836499">
<div class="market_listing_row number_490159191836499" id="number_490159191836499">
<div class="market_listing_row number_490159170836499" id="number_490159170836499">
So i located all of them using that xpath and to be sure printed lenght of that list(and all of elements in it while testing but deleted this part of code) so i know for sure its
working and collecting all of 50 different elements from site.
elements = driver.find_elements_by_xpath('//*[starts-with(#id, "number_") and not(contains(#id, "_name")) ]')
print("List 2 lenght is:", len(elements))
But when i try to make list of numbers without "number_ " that id starts with i have problem. List called id that i create with get_attribute("id") is just one id(number_490159170836499 for example) repeated 22 times(its lenght of that id so it has to do something with it). list_of_ids is working as intended and i get 490159170836499 as result but its only one element(i guess its because theres only that number only repeated). Thats the code that i used:
for x in elements:
id = x.get_attribute("id")
list_of_ids = re.findall("\d+", id)
I also used this code to print all of ids on site so i know for sure that elements list have all of them in it and that get_attribute is working.
for ii in elements:
print(ii.get_attribute("id"))
To be clear I did import re
Another guess:
import re
ids = []
for x in elements:
id = x.get_attribute("id")
ids.append(re.search("\d+",id)[0])
print(ids)
You can use split method as well.
for x in elements:
id = x.get_attribute("id")
a =id.split("_")[1]
print(a)

do variables need to be instantiated before while loop python

i'm trying to scrape more 500 posts with the reddit api - without praw. however, since i'm only allowed 100 posts at a time, i'm saving the scraped objects in an array called subreddit_content and will be scraping until there are 500 posts in subreddit_content.
the code below gives me NameError: name 'subreddit_content_more' is not defined. if i instantiate subreddit_data_more = None before the while loop, i get TypeError: 'NoneType' object is not subscriptable. i've tried the same thing with a for loop but get the same results.
EDIT: updated code, while loop now uses subreddit_data instead of subreddit_data_more, but now getting TypeError: 'Response' object is not subscriptable despite converting subreddit_data to json.
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_data.json()['data']['children']
lastline_json = subreddit_content[-1]['data']['name']
while (len(subreddit_content) < 500):
subreddit_data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_json}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
subreddit_content = subreddit_content.append(subreddit_data.json()['data']['children'])
lastline_json = subreddit_data[-1]['data']['name']
time.sleep(2.5)
EDIT2: using .extend instead of .append and removing the variable assignment in the loop seemed to do the trick. this is the snippet of working code (also renamed my variables for readability, courtesy of Wups):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list = data.json()['data']['children']
lastline_name = content_list[-1]['data']['name']
while (len(content_list) < 500):
data = requests.get(f'https://api.reddit.com/r/{subreddit}/hot?limit=100&after={lastline_name}', headers={'User-Agent': 'windows:requests (by /u/xxx)'})
content_list.extend(data.json()['data']['children'])
lastline_name = content_list[-1]['data']['name']
time.sleep(2)
You want to just add one list to another list, but you're doing it wrong. One way to do that is:
the_next_hundred_records = subreddit_data.json()['data']['children']
subreddit_content.extend(the_next_hundred_records)
compare append and extend at https://docs.python.org/3/tutorial/datastructures.html
What you did with append was add the full list of the next 100 as a single sub-list at position 101. Then, because list.append returns None, you set subreddit_content = None
Let's try some smaller numbers so you can see what's going on in the debugger. Here is your code, super simplified, except instead of doing requests to get a list from subreddit, I just made a small list. Same thing, really. And I used multiples of ten instead of 100.
def do_query(start):
return list(range(start, start+10))
# content is initialized to a list by the first query
content = do_query(0)
while len(content) < 50:
next_number = len(content)
# there are a few valid ways to add to a list. Here's one.
content.extend(do_query(next_number))
for x in content:
print(x)
It would be better to use a generator, but maybe that's a later topic. Also, you might have problems if the subreddit actually has less than 500 records.

Selenium - Iterating through groups of elements - Python

I'm trying to iterate over a number of elements returned by matching class names, which I have stored in an array users. The print(len(users)) outputs as 12, which is accurately correct as to how many returned there should be. This is my code:
def follow():
time.sleep(2)
# iterate here
users = []
users = browser.find_elements_by_class_name('wo9IH')
print(len(users))
for user in users:
user_button = browser.find_element_by_css_selector('li.wo9IH div.Pkbci').click()
#user_button = browser.find_element_by_xpath('//div[#class="Pkbci"]/button').click()
However currently, only index [0] is being .click()'d and the program is terminating after this first click. What would be the problem as to why the index being iterated isn't incrementing?
resource: image - red shows what's being iterated through and blue is each button being .click()'d
try this,
You can directly make array of buttons rather than li array,
Go click all buttons contains text as Follow,
simple,
browser.maximize_window()
users = []
users = browser.find_elements_by_xpath('*//button[text()='Follow']')
print(len(users)) # check it must be 12
for user in users:
browser.execute_script("arguments[0].click()", user)
# user.click() Go click all buttons
Find all your css_selector elements as a list and then iterate that list to perform .click()
yourList = browser.find_elements_by_css_selector('w0o9IH div.Pkbci')
users = browser.find_elements_by_class_name('wo9IH') returns a list of selenium.webdriver.remote.webelement.WebElement instances that can also be transversed.
In your implementation of the iteration, the above fact about the items in the list is overlooked and the entire page is search by transversing the page source from the WebDriver instance (i.e. browser.find_element_by_css_selector).
Here is how to go about getting the button in the matched WebElements:
for user_web_element in users:
# The next line given that there is only a single <button>
# in the screenshot for the matched WebElements.
user_button = user_web_element.find_element_by_tag_name('button')
user_button.click()

(Python, Selenium WD) Нow to receive and transfer the list to a cycle?

I have this code
lst = ["Appearence","Logotype", "Catalog", "Product Groups", "Option Groups","Manufacturers","Suppliers",
"Delivery Statuses","Sold Out Statuses", "Quantity Units", "CSV Import/Export", "Countries","Currencies","Customers"]
for item in lst:
wd.find_element_by_link_text(item).click()
assert wd.title != None
I not want to write list by hand.
I want to receive the list - lst directly from the browser.
I use
m = wd.find_elements_by_css_selector('li[id=app-]')
print(m[0].text)
Appearence
I don't know how to transfer the list to a cycle
look this picture screen browser
Please help me to understand how to use the list and to transfer it to a cycle
In your example variable m will be a list of WebElements you get the length of it and iterate CSS pseudo selector :nth-child() with a range:
m = wd.get_elements_by_css_selector('li#app-')
for elem in range(1, len(m)+1):
wd.get_element_by_css_selector('li#app-:nth-child({})'.format(elem)).click()
assert wd.title is not None
In the for loop it will iterate over a range of integers starting with 1 and ending with the length of the element list (+1 because is not inclusive), the we will click the nth-child of the selector using the iterating number, .format(elem) will replace th {} appearance in the string for the elem variable, in this case the integer iteration.

Looping through xpath variables

How can I increment the Xpath variable value in a loop in python for a selenium webdriver script ?
search_result1 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[1])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[1])").text
search_result2 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[2])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[2])").text
search_result3 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[3])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[3])").text
why dont you create a list for storing search results similar to
search_results=[]
for i in range(1,11) #I am assuming 10 results in a page so you can set your own range
result=sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[%s])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[%s])"%(i,i)).text
search_results.append(result)
this sample code will create list of 10 values of results. you can get idea from this code to write your own. its just matter of automating task.
so
search_results[0] will give you first search result
search_results[1] will give you second search results
...
...
search_results[9] will give you 10th search result
#Alok Singh Mahor, I don't like hardcoding ranges. Guess, better approach is to iterate through the list of webelements:
search_results=[]
result_elements = sel.find_elements_by_xpath("//not/indexed/xpath/for/any/search/result")
for element in result_elements:
search_result = element.text
search_results.append(search_result)

Categories