so i tried to gather all of ids from site and "extract" numbers from them
Its looking like that on that site:
<div class="market_listing_row number_490159191836499" id="number_490159191836499">
<div class="market_listing_row number_490159191836499" id="number_490159191836499">
<div class="market_listing_row number_490159170836499" id="number_490159170836499">
So i located all of them using that xpath and to be sure printed lenght of that list(and all of elements in it while testing but deleted this part of code) so i know for sure its
working and collecting all of 50 different elements from site.
elements = driver.find_elements_by_xpath('//*[starts-with(#id, "number_") and not(contains(#id, "_name")) ]')
print("List 2 lenght is:", len(elements))
But when i try to make list of numbers without "number_ " that id starts with i have problem. List called id that i create with get_attribute("id") is just one id(number_490159170836499 for example) repeated 22 times(its lenght of that id so it has to do something with it). list_of_ids is working as intended and i get 490159170836499 as result but its only one element(i guess its because theres only that number only repeated). Thats the code that i used:
for x in elements:
id = x.get_attribute("id")
list_of_ids = re.findall("\d+", id)
I also used this code to print all of ids on site so i know for sure that elements list have all of them in it and that get_attribute is working.
for ii in elements:
print(ii.get_attribute("id"))
To be clear I did import re
Another guess:
import re
ids = []
for x in elements:
id = x.get_attribute("id")
ids.append(re.search("\d+",id)[0])
print(ids)
You can use split method as well.
for x in elements:
id = x.get_attribute("id")
a =id.split("_")[1]
print(a)
Related
Clarification: The user needs to specify an ID from a displayed list of ID's, how can I simplify the process by numbering the ID's so that the user just has to specify a simple number instead of the whole ID
I have a list ID = ['ekjgnrg','lrkjnk','etc..'] that I need to interact with by inputting the ID later on. The issue is it's a little tedious to type it in. (Copy-pasting would solve this but it doesn't work on mobile due to the way the message with the list is displayed.)
How can I assign a number to each string in the list so that the user can enter '1' instead of 'ekjgnrg' for simplicity? The list gets longer and shorter occasionally, so how can I make it scale to the size of the list?
Not sure how you present to user, but, you can do something like:
ID = ['ekjgnrg', 'lrkjnk', 'etc..']
print('Choose one of:')
for n,item in enumerate(ID):
print(f'{n}: {item}')
n = int(input('Enter ID number: '))
print(f'You choose number "{n}" which is "{ID[n]}".')
This really needs error checking, like gracefully handling if someone enters invalid data like "foo" or "1000"...
Results in:
Choose one of:
0: ekjgnrg
1: lrkjnk
2: etc..
Enter ID number: 1
You choose number "1" which is "lrkjnk".
You can access the Nth item using my_list[n] (in your case my_list is ID).
I suggest you to read: Python - Access List Items - W3Schools to understand how to work with list or other data structure in Python.
is this what you mean? if not, please elaborate more.
ID = ['ekjgnrg','lrkjnk','etc..']
print(ID)
needed_id = int(input("What item do you want from the above list?"))
needed_id -= 1 # since lists start at 0.
print(ID[needed_id])
when run:
['ekjgnrg', 'lrkjnk', 'etc..']
What item do you want from the above list?1
ekjgnrg
I have four lists:
user = [0,0,1,1]
names = ["jake","ryan","paul","david"]
disliked_index = [0,1]
ranked_names = ["paul","ryan","david","jake"]
List "user" holds a user's response to which names they like (1 if like, 0 if dislike) from list "names". disliked_index holds the list spots that user indicated 0 in the user list. ranked_names holds the names ranked from most popular to least popular based on the data set (multiple students). What I am trying to achieve is to return the most popular name that the user responded they didn't like. So that
mostpopular_unlikedname = "ryan"
So far what I have done is:
placement = []
for i in disliked_index:
a = names[i]
placement.append(a)
Where I now have a list that holds the names the user did not like.
placement = ["jake","ryan"]
Here my logic is to run a loop to check which names in the placement list appear in the ranked_names list and get added to the top_name list in the order from most popular to least.
top_name =[]
for i in range(len(ranked_names)):
if ranked_names[i] == placement:
top_name.append[i]
Nothing ends up being added to the top_name list. I am stuck on this part and wanted to see if this is an alright direction to continue or if I should try something else.
Any guidance would be appreciated, thanks!
You don't really need disliked_index list for this. Just do something along these lines:
dis_pos = []
for name, sentiment in zip(names,user):
if sentiment == 0:
dis_pos.append(ranked_names.index(name))
mostpopular_unlikedname = ranked_names[min(dis_pos)]
print(mostpopular_unlikedname)
Output:
ryan
There are multiple buttons on my page containing similar href. They only differ with id_invoices. I want to click one button on page using xpath and href which looks like:
href="/pl/payment/getinvoice/?id_invoices=461"
I can select all buttons using:
invoices = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]")
but I need to select only button with highest id_invoices. Can it be done? :)
What you can do is:
hrefList = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]/#href")
for i in hrefList:
hrefList[i]=hrefList[i].split("id_invoices=")[-1]
max = max(hrefList)
driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/?id_invoices="+str(max))+"'"+"]").click()
I don't know much about python so giving you a direction/algorithm to achieve same
Using getAttribute('#href');
You will get strings of URLs
You need to split all element after getText() you get in invoice List.
Split by= and pick up the last array value.
Now you need to typecast string to int, as last value after = will be a number
Now you just need to pick the highest value.
Since you have an XPath that returns all the desired elements, you just need to grab the href attribute from each one, split the href by '=' to get the id (2nd part of string), find the largest id, and then use the id to find the element you want and click on it.
invoices = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]")
ids = []
for invoice in invoices
ids.append(invoice.get_attribute("href").split('=')[2])
results = list(map(int, ids)) // you can't do max on a list of string, you won't get the right answer
id = max(results)
driver.find_element_by_xpath("//a[#href='/pl/payment/getinvoice/?id_invoices=" + id + "']").click
unfortunately I'm a beginner in XPath and not completly sure how ir works. For a project of mine I'm looking for a way to parse 5 columns of a 9 column table. here is what I got working so far:
url="".join(["http://www.basketball-reference.com/leagues/NBA_2011_games.html"])
#getting the columns 4-7
page=requests.get(url)
tree=html.fromstring(page.content)
# the //text() is because some of the entries are inside <a></a>s
data = tree.xpath('//table[#id="games"]/tbody/tr/td[position()>3 and position()<8]//text()')
so what my workaround idea is, is to just get another list that gets only the first column and then combining the two in an extra step however, that seems unelgegant and unnecessary.
for the XPath I tried so far
//table[#id="games"]/tbody/tr/td[position() = 1]/text() | //table[#id="games"]/tbody/tr/td[position()>3 and position()<8]//text()
That doesn't include the first column (date) too somehow. (according to w3schools) the | is the operator to connect two XPath statements.
so here is my complete code right now. The data will then be put into two lists as of now.
In hopes that I didn't do anything too stupid, thank you for your help.
from lxml import html
import requests
url="".join(["http://www.basketball-reference.com/leagues/NBA_1952_games.html"])
page=requests.get(url)
tree=html.fromstring(page.content)
reg_data = tree.xpath('//table[#id="games"]/tbody/tr/td[position() = 1]/text() | //table[#id="games"]/tbody/tr/td[position()>3 and position()<8]//text()')
po_data = tree.xpath('//table[#id="games_playoffs"]/tbody/tr/td[position() = 1]/text() | //table[#id="games_playoffs"]/tbody/tr/td[position()>3 and position()<8]//text()')
n=int(len(reg_data)/5)
if int(year) == 2016:
for i in range(0,len(reg_data)):
if len(reg_data[i])>3 and len(reg_data[i+1])>3:
n = int((i)/5)
break
games=[]
for i in range(0,n):
games.append([])
for j in range(0,5):
games[i].append(reg_data[5*i+j])
po_games=[]
m=int(len(po_data)/5)
if year != 2016:
for i in range(0,m):
po_games.append([])
for j in range(0,5):
po_games[i].append(po_data[5*i+j])
print(games)
print(po_games)
It looks like a lot of the data is wrapped in link (a) tags so that when you are asking for text node children, you aren't finding any because you need to go one level deeper.
Instead of
/text()
do
//text()
The two slashes means to select text() nodes which are decendants at ANY level.
You can also combine the entire expression into
//table[#id="games"]/tbody/tr/td[position() = 1 or (position()>3 and position()<8)]//text()
instead of having two expressions.
We can even shorten further to
//table[#id="games"]//td[position() = 1 or (position()>3 and position()<8)]//text()
but there is a risk to this expression, as it will pick up td elements which occur anywhere in the table (provided they are a 1st, 4th, 5th, 6th, or 7th column), not just in rows in the body. In your target this will work, however.
Note also that an expression like [position()=1] is not necessary. You can shorten it to [1]. You only need the position function if you need the position of a node other than the context node, or need to write a more complex selection like we have when needing more than just one specific index.
How can I increment the Xpath variable value in a loop in python for a selenium webdriver script ?
search_result1 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[1])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[1])").text
search_result2 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[2])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[2])").text
search_result3 = sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[3])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[3])").text
why dont you create a list for storing search results similar to
search_results=[]
for i in range(1,11) #I am assuming 10 results in a page so you can set your own range
result=sel.find_element_by_xpath("//a[not((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[%s])]|((//div[contains(#class,'s')]//div[contains(#class,'kv')]//cite)[%s])"%(i,i)).text
search_results.append(result)
this sample code will create list of 10 values of results. you can get idea from this code to write your own. its just matter of automating task.
so
search_results[0] will give you first search result
search_results[1] will give you second search results
...
...
search_results[9] will give you 10th search result
#Alok Singh Mahor, I don't like hardcoding ranges. Guess, better approach is to iterate through the list of webelements:
search_results=[]
result_elements = sel.find_elements_by_xpath("//not/indexed/xpath/for/any/search/result")
for element in result_elements:
search_result = element.text
search_results.append(search_result)