Syntax error when using find_elements_by_xpath [closed] - python

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
Web scraping beginner here.
I am trying to get the amount of items on this webpage: https://www.asos.com/dk/maend/a-to-z-of-brands/nike/cat/?cid=4766&refine=attribute_10992:61388&nlid=mw|sko|shop+efter+brand
However when I use the len()-function, it says there is an error in the syntax.
from bs4 import BeautifulSoup
import requests
import selenium
from selenium.webdriver import Firefox
driver = Firefox()
url = "https://www.asos.com/dk/maend/a-to-z-of-brands/nike/cat/?cid=4766&refine=attribute_10992:61388&nlid=mw|sko|shop+efter+brand"
driver.get(url)
items = len(driver.find_elements_by_xpath(//*[#id="product-12257648"])
for item in range(items):
price = item.find_element_by_xpath("/html/body/main/div/div/div/div[2]/div/div[1]/section/div/article[16]/a/p/span[1]")
print(price)
It then outputs this error:
File "C:/Users/rasmu/PycharmProjects/du nu ffs/jsscrape.py", line 13
items = len(driver.find_elements_by_xpath(//*[#id="product-12257648"])
^
SyntaxError: invalid syntax
Process finished with exit code 1

Try this:
items = len(driver.find_elements_by_xpath("//*[#id='product-12257648']"))
You need double quotes surrounding the XPath.
If you want all prices, you can refactor your code as such --
from selenium import webdriver
# start driver, navigate to url
driver = webdriver.Firefox()
url = "https://www.asos.com/dk/maend/a-to-z-of-brands/nike/cat/?cid=4766&refine=attribute_10992:61388&nlid=mw|sko|shop+efter+brand"
driver.get(url)
# iterate product price elements
for item in driver.find_elements_by_xpath("//p[./span[#data-auto-id='productTilePrice']]"):
# print price text of element
print(item.text)
driver.close()

Related

i'm learning how to crawling site in python! but i don't know how to do it to tree structure [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 days ago.
Improve this question
When I press each item on "https://dicom.innolitics.com/ciods" this site (like CR Image, Patient, Referenced Patient Sequence ...these values) , I want to save the descriptions of the items in the right layout in a variable.
I'm trying to save the values by clicking on the items on the left.
But, I found out that none of the values in the tree were crawled!
driver = webdriver.Chrome()
url = "https://dicom.innolitics.com/ciods"
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'tree-table')))
table_list = []
tree_table = driver.find_element(By.CLASS_NAME, 'tree-table')
tree_rows = tree_table.find_elements(By.TAG_NAME, 'tr')
for i, row in enumerate(tree_rows):
row.click()
td = row.find_element(By.TAG_NAME, 'td')
a = td.find_element(By.CLASS_NAME, 'row-name')
row_name = a.find_element(By.TAG_NAME, 'span').text
print(f'Row {i+1} name: {row_name}')
driver.quit()
i did like this
wanna know how to crawl the values in the tree.
It'd be better if you teach me how to crawl the layout on the right :)

Web scraping with Selenium not capturing full text [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm trying to mine quite a bit of text from a list of links using Selenium/Python.
In this example, I scrape only one of the pages and that successfully grabs the full text:
page = 'https://xxxxxx.net/xxxxx/September%202020/2020-09-24'
driver = webdriver.Firefox()
driver.get(page)
elements = driver.find_element_by_class_name('text').text
elements
Then, when I try to loop through the whole list of links (all the by day links on this page: https://overrustlelogs.net/Destinygg%20chatlog/September%202020) (using the same method that worked for grabbing the text from a single page), it is not grabbing the full text:
for i in tqdm(chat_links):
driver.get(i)
#driver.implicitly_wait(200)
elements = driver.find_element_by_class_name('text').text
#elements = driver.find_element_by_xpath('/html/body/main/div[1]/div[1]').text
#elements = elements.text
temp={'elements':elements}
chat_text.append(temp)
driver.close()
chat_text
My thought is that maybe it doesn't have the chance to load the whole thing, but it works on the single page. Also, the driver.get method seems meant to load the whole given page.
Any ideas? Thanks, much appreciated.
The page is lazy loading you need scroll the pages and add data in the list.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Chrome()
driver.get("https://overrustlelogs.net/Destinygg%20chatlog/September%202020/2020-09-30")
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".text>span")))
height=driver.execute_script("return document.body.scrollHeight")
data=[]
while True:
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(1)
for item in driver.find_elements_by_css_selector(".text>span"):
if item.text in data:
continue
else:
data.append(item.text)
lastheight=driver.execute_script("return document.body.scrollHeight")
if height==lastheight:
break
height=lastheight
print(data)

First programm won't print price [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 days ago.
Improve this question
I'm writing my first real life project for an Amazon Price Tracker. I got the idea from this video: https://www.youtube.com/watch?v=d_s-jygcJ1k&t=856s
Unfortunately just after some lines I get the "object of type method has no len()" error and can't figure out where I forgot something since my code looks like the code of the guy in the video:
import bs4
import urllib.request
import smtplib
import time
url ='https://www.amazon.de/BASN-Ear-Kopfh%C3%B6rer-Ger%C3%A4uschunterdr%C3%BCckung-HiFi-Ohrh%C3%B6rer-Kopfh%C3%B6rer/dp/B07JLYHFC8/ref=sr_1_21?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=in+ear+mmcx&qid=1598721906&s=apparel&sr=1-21'
sauce = urllib.request.urlopen(url).read
soup = bs4.BeautifulSoup(sauce, "html.parser")
prices = soup.find(id="priceblock_ourprice").get_text()
prices = float(prices.replace("€", ""))
print(prices)
I get the error for the line with 'soup' at the beginning. If anybody can help I'd be thankful!
import bs4
import urllib.request
import smtplib
import time
url ='https://www.amazon.de/BASN-Ear-Kopfh%C3%B6rer-Ger%C3%A4uschunterdr%C3%BCckung-HiFi-Ohrh%C3%B6rer-Kopfh%C3%B6rer/dp/B07JLYHFC8/ref=sr_1_21?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=in+ear+mmcx&qid=1598721906&s=apparel&sr=1-21'
sauce = urllib.request.urlopen(url).read()
soup = bs4.BeautifulSoup(sauce, "html.parser")
prices = soup.find(id="priceblock_ourprice").get_text()
prices = prices.replace("€", "")
print(prices)

how to pull out text from a div class using selenium headless [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to pull out the "0%" from the following div tag:
<div class="sem-report-header-td-diff ">0%</div>
my current code is:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(executable_path='mypath/chrome.exe',
chrome_options=options)
url = 'https://www.semrush.com/info/burton.com'
driver.get(url)
driver.implicitly_wait(2)
change_elements = driver.find_elements_by_xpath(xpath='//div[#class="sem-report-header-td-diff "]')
not sure what I'm doing wrong. This works with href tags, but its not working for this.
As per the HTML you have shared to extract the text 0% you need to use the method get_attribute("innerHTML") and you can use either of the following solutions:
css_selector:
myText = driver.find_element_by_css_selector("div.sem-report-header-td-diff").get_attribute("innerHTML")
xpath:
myText = driver.find_element_by_xpath("div[#class='sem-report-header-td-diff']").get_attribute("innerHTML")
First of all, not "elements", it is "element". and, second point is, you didn't get ttext. you just called element.
so, here is the code:
element_text =
driver.find_element_by_xpath("//div[#class='sem-report-header-td-diff']").text

Web Scrape in Python [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
So I am trying to web scrape https://en.wikipedia.org/wiki/FIFA_World_Rankings and scrape the first table on the page, but it has not worked and I get an error 'NoneType' object is callable.
Here is my code:
from bs4 import BeautifulSoup
import urllib2
soup = BeautifulSoup(urllib2.urlopen("https://en.wikipedia.org/wiki/FIFA_World_Rankings").read())
for row in soup('table', {'class': 'wikitable'})[0].tbody('tr'):
tds = row('td')
print tds[0].string, tds[1].string
I don't know much about HTML and I know very little about web scraping.
You are missing the findAll (or find_all, if you want to be Pythonic) function to search for all tags under an element.
You may also want to do a check on the data to make sure you don't get an IndexError like so.
for row in soup('table', {'class': 'wikitable'})[0].findAll('tr'):
tds = row.findAll('td')
if len(tds) > 1:
print tds[0].text, tds[1].text
And here's the output it gives
Argentina 1532
 Belgium 1352
 Chile 1348
 Colombia 1337
 Germany 1309
 Spain 1277
 Brazil 1261
import requests
from bs4 import BeautifulSoup
request = requests.get("https://en.wikipedia.org/wiki/FIFA_World_Rankings")
sourceCode = BeautifulSoup(request.content)
tables = sourceCode.select('table.wikitable')
table = tables[0]
print table.get_text()
also if you want the results as a list:
list = [text for text in table.stripped_strings]
This should work. You need to use find_all to look for tags. Also, in the Wiki article, team ranks are present in table rows 3-22, hence the if condition.
from bs4 import BeautifulSoup
import urllib2
soup = BeautifulSoup(urllib2.urlopen("https://en.wikipedia.org/wiki/FIFA_World_Rankings").read())
for i,row in enumerate(soup('table', {'class': 'wikitable'})[0].find_all('tr')):
if i > 2 and i < 23:
data = row.find_all('td')
print i,data[0].text, data[1].text

Categories