I have a web bot that looks at a textbox and gathers information. Depending on what page the textbox is on, the HTML is just slightly different.
This chunk of code repeatedly hits me with the syntax error "unexpected EOF while parsing" as soon as it sees the except block. I've written similar scripts that have worked no problem but this one has been giving me issues.
text_box = '/html/body/div[7]/div[45]/div[37]/div/table[1]/tbody/tr/td[1]/div/div[1]/span'
try:
title = driver.find_element(by = By.XPATH, value = text_box).text.split('\n')[0]
except NoSuchElementException:
text_box = '/html/body/div[8]/div[45]/div[37]/div/table[1]/tbody/tr/td[1]/div/div[1]/span'
finally:
title = driver.find_element(by = By.XPATH, value = text_box).text.split('\n')[0]
content = driver.find_element(by = By.XPATH, value = text_box).text.replace(post_title + "\n" + "\n", "")
Any thoughts are appreciated. Thank you.
Related
I am trying to scrape certain information from a webpage [https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/]. I have a list of links from which I need to go through all of them. Each links contains the same information about a different company. However few of the companies didn't add phone number for example. However if that happens the whole program is terminated with an exception error. This is my code :
for link in link_list:
try:
driver.get(', '.join(link))
time.sleep(2)
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = driver.find_element_by_css_selector("span.btn__text")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
addresses = driver.find_element_by_css_selector("p.font-sm")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
email = driver.find_element_by_css_selector("a.item-icon")
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
except Exception:
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
continue
As you can see I am trying to prevent the whole program from terminating. I have tried adding the Exception > continue so the program wouldn't get terminated but I noticed that even though the program doesn't terminate none of the information is then scraped from the webpages that the exception occurred on. I am trying to prevent the loss of the data by requesting the desired output once again when the exception occurs so it would get printed out and saved into a CSV file with the missing information as an empty slot.
However the whole problem is that when I request the output in the exception the exception once again terminates the whole program instead of printing out what it knows and moving on with "continue". Now my question is why is that happening? Why doesn't the program print out the output and then won't follow the "continue" and terminates itself instead? How can one print the output that the program got without the missing information and prevent the program from terminating?
when it hits the exception the continue will allow subsequent lines to execute, it will NOT pick up where it left off.
See This answer: https://stackoverflow.com/a/19523054/1387701
No, you cannot do that. That's just the way Python has its syntax. Once you exit a try-block because of an exception, there is no way back in.
So I think this might help the problem you're finding, as an example, when phone number is missing/not found:
for link in link_list:
driver.get(', '.join(link))
time.sleep(2)
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = driver.find_element_by_css_selector("span.btn__text")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
addresses = driver.find_element_by_css_selector("p.font-sm")
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
except Exception as e:
print ("Phone number exception: %s" % str(e) )
continue
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
email = driver.find_element_by_css_selector("a.item-icon")
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
You could looks for the specifc exception (TimeoutException from the wait, or NoSuchElementException on the find element command) to be a bit more specific for each element.
Or you could use an if find_elements_by_css_Selector size is > 0 before proceeding, but i'd prefer the try catch myself.
I want to send whatsapp using selenium python
Im getting my contact numbers from a csv file
So
With a loop
Im typing phone numbers in contact search box (WhatsApp web)
(Because that some of my phone contact are duplicate so I'm using their phone in search box instead of their name)
And entering Enter button (off course with selenium)
And with that it's entering the only result chat
So i can send the message and etc.
The problem is that when there is no result in searching number it's sending the messages to the last person that was sent to
So the last person gets duplicate message
How can i determine if the search is giving me any result
Or in this case
How can i know if the number has whatsapp or not
Thanks
from selenium import webdriver
import time
import pandas as pd
import os
import xlrd
import autoit
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
fileName = 'test.csv'
messages_excel = 'messages.xlsx'
driver = webdriver.Chrome('D:\python\chromedriver')
driver.get('https://web.whatsapp.com/')
input('after QR Code')
with open(fileName) as file:
data = pd.read_csv(file)
df = pd.DataFrame(data)
msgdata = pd.read_excel(messages_excel, sheet_name=r'Sheet1')
for index, row in df.iterrows():
try:
search_phone = int(row['phone'])
search_box = driver.find_element_by_class_name('_2zCfw')
search_box.send_keys(search_phone)
time.sleep(2)
search_box.send_keys(u'\ue007')
for i in msgdata.index:
try:
clipButton = driver.find_element_by_xpath('//*[#id="main"]/header/div[3]/div/div[2]/div/span')
clipButton.click()
time.sleep(1)
# To send Videos and Images.
mediaButton = driver.find_element_by_xpath(
'//*[#id="main"]/header/div[3]/div/div[2]/span/div/div/ul/li[1]/button')
mediaButton.click()
time.sleep(3)
image_path = os.getcwd() + "\\Media\\" + msgdata['photoName'][i]+'.jpg'
autoit.control_focus("Open", "Edit1")
autoit.control_set_text("Open", "Edit1", (image_path))
autoit.control_click("Open", "Button1")
time.sleep(1)
previewMsg = driver.find_element_by_class_name("_3u328").send_keys(u'\ue007')
time.sleep(3)
productName = str(msgdata['name'][i])
oldPrice = str(msgdata['oldqimat'][i])
newPrice = str(msgdata['newqimat'][i])
inventory = str(msgdata['inventory'][i])
msg_box = driver.find_element_by_xpath('//*[#id="main"]/footer/div[1]/div[2]/div/div[2]')
msg_box.send_keys("stocks")
msg_box.send_keys(Keys.SHIFT + '\ue007')
msg_box.send_keys(productName)
msg_box.send_keys(Keys.SHIFT + '\ue007')
if oldPrice != 'nan':
msg_box.send_keys("oldPrice : "+ oldPrice)
msg_box.send_keys(Keys.SHIFT + '\ue007')
if newPrice != 'nan':
msg_box.send_keys("newPrice : "+ newPrice)
msg_box.send_keys(Keys.SHIFT + '\ue007')
if inventory!= 'nan':
msg_box.send_keys("inventory : "+ inventory)
time.sleep(1)
msg_box.send_keys(Keys.ENTER)
time.sleep(3)
except NoSuchElementException:
continue
except NoSuchElementException:
continue
print("sucessfully Done")
when there is no result in searching number it's sending the messages to the last person that was sent to So the last person gets duplicate message
Im getting my contact numbers from a csv file So With a loop Im typing phone numbers in contact search box (WhatsApp web)
You could store the last # you contacted as a variable and check if the current recipient of the message, matches the stored contact #.
A simple If/Else should do the trick.
Code
last_contacted = None
for index, row in df.iterrows():
try:
if row['phone'] == last_contacted:
print("number already contacted")
next
else:
search_phone = int(row['phone'])
last_contacted = search_phone
print(search_phone)
After you fill the search contact box string and send the Enter key, the “best match” contact name will be displayed at top of the right message panel.
Inspect that element and make sure it matches your search before continuing.
I built a selenium web scraper (see below for code). It works fine and normally takes 4-6 seconds per loop. However, if I use a different web browser to do something else, say check my email, the web scraper slows down (sometimes taking up to a couple minutes per loop) and it also takes a long time to load my email (or whatever else I am trying to do with the internet.
Is there something wrong with my scraper? Or is it not possible to run a web scraper while also using the internet to do other things? Or...
Thanks!
counter = 36386
options = Options()
options.set_headless(True)
driver = webdriver.Firefox(options=options, executable_path = r'C:\Users\jajacobs\Downloads\geckodriver.exe')
while counter <= 50000:
start_time = time.time()
try:
driver.get("url goes here")
timeout = 20
inputElement = driver.find_element_by_name("naics_lookup[companyName]")
inputElement.send_keys(naics.iloc[counter, 1])
inputElement = driver.find_element_by_name("naics_lookup[city]")
inputElement.send_keys(naics.iloc[counter, 3])
inputElement = driver.find_element_by_name("naics_lookup[state]")
inputElement.send_keys(naics.iloc[counter, 2])
inputElement.submit()
print('Looking for NAICS code of company number ', counter)
try:
element_present = EC.presence_of_element_located((By.CLASS_NAME, 'results'))
WebDriverWait(driver, timeout).until(element_present)
print("element is ready")
try:
data = driver.find_element_by_class_name('results').text
naics.at[counter, 'naics'] = re.findall(r"\D(\d{6})\D", data)[0]
print(re.findall(r"\D(\d{6})\D", data)[0])
except:
print("No NAICS code")
pass
except:
print("element did not load")
pass
list = [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,
14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000,
25000,26000,27000,28000,29000,30000,31000,32000,33000,34000,35000,36000,
37000,38000,39000,40000,41000,42000,43000,44000,45000,46000,47000,48000,
49000,50000,]
if counter in list:
data_folder = Path('C:/Users/jajacobs/Documents/ipynb/')
file_to_save = data_folder / ('naics' + str(counter) + '.csv')
naics.to_csv(file_to_save)
counter += 1
except Exception as e:
print(e)
pass
print("total time taken this loop: ", time.time() - start_time)
driver.close()
while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
How do i put in a try-catch routine to this code, once per 2 days i get an Error in Line 4 at json.loads() and i cant reproduce it. What i´m trying to do is that the while loop is in a "try:" block and an catch block that only triggers when an error occurs inside the while loop. Additionally it would be great if the while loop doesnt stop on an error. How could i do this. Thank you very much for your help. (I started programming python just a week ago)
If you just want to catch the error in forth line, a "Try except" wrap the forth line will catch what error happened.
while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
try:
parsed_json = json.loads(get_response.text)
except Exception as e:
print(str(e))
print('error data is {}',format(get_response.text))
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
You can simply
while var == 1:
try:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
except Exception as e:
print "an exception {} of type {} occurred".format(e, type(e).__name__)
An exception occurs when my program can't find the element its looking for, I want to log the event within the CSV, Display a message the error occurred and continue. I have successfully logged the event in the CSV and display the message, Then my program jumps out of the loop and stops. How can I instruct python to continue. Please check out my code.
sites = ['TCF00670','TCF00671','TCF00672','TCF00674','TCF00675','TCF00676','TCF00677']`
with open('list4.csv','wb') as f:
writer = csv.writer(f)
try:
for s in sites:
adrs = "http://turnpikeshoes.com/shop/" + str(s)
driver = webdriver.PhantomJS()
driver.get(adrs)
time.sleep(5)
LongDsc = driver.find_element_by_class_name("productLongDescription").text
print "Working.." + str(s)
writer.writerows([[LongDsc]])
except:
writer.writerows(['Error'])
print ("Error Logged..")
pass
driver.quit()
print "Complete."
Just put the try/except block inside the loop. And there is no need in that pass statement at the end of the except block.
with open('list4.csv','wb') as f:
writer = csv.writer(f)
for s in sites:
try:
adrs = "http://turnpikeshoes.com/shop/" + str(s)
driver = webdriver.PhantomJS()
driver.get(adrs)
time.sleep(5)
LongDsc = driver.find_element_by_class_name("productLongDescription").text
print "Working.." + str(s)
writer.writerows([[LongDsc]])
except:
writer.writerows(['Error'])
print ("Error Logged..")
NOTE It's generally a bad practice to use except without a particular exception class, e.g. you should do except Exception:...