If Statment in Python while Scraping

If Statment in Python while Scraping - python

I want to add if statement there:
if website_link and phone is not found than write N,N
This Script is typing if website link is not found it type N and if phone is not found it type N
I Want Also Add This:
If Website And Phone is not found Type N,N
Here is my Code:
from selenium import webdriver
import csv
import pandas
import itertools
with open("sans.csv",'r') as s:
s.read()
driver = webdriver.Firefox()
url = 'https://www.yelp.com/biz/konomama-san-francisco?osq=Restaurants'
driver.get(url)
website_link = driver.find_elements_by_css_selector('.text--offscreen__373c0__1SeFX+ .link-size--default__373c0__1skgq')
phone = driver.find_elements_by_css_selector('.text--offscreen__373c0__1SeFX+ .text-align--left__373c0__2pnx_')
items = len(website_link)
with open("sans.csv", 'a',encoding="utf-8") as s:
for combination in itertools.zip_longest(website_link, phone):
s.write(f'{combination[0].text if combination[0] else "N"}, {combination[1].text if combination[1] else "N"}\n')
driver.close()
print("Done")
Thanks!

You can add else to the for loop. It will run in case the loop doesn't when website_link and phone are empty
with open("sans.csv", 'a', encoding="utf-8") as s:
for combination in itertools.zip_longest(website_link, phone):
s.write(f'{combination[0].text if combination[0] else "N"}, {combination[1].text if combination[1] else "N"}\n')
else:
s.write('N, N')
For more info about for - else structure see this answer.

may be he wants the iterator to iterate and check for None in website_link and phone
with open("sans.csv", 'a',encoding="utf-8") as s:
for combination in itertools.zip_longest(website_link, phone):
s.write(f'{combination[0].text if combination[0] else "N"}, {combination[1].text if combination[1] else "N"}\n')
if(website_link == None) and (Phone == None):
s.write('N,N')

Related

looping through several columns and rows from csv to fill a form

Have been trying to emulate examples posted earlier, yet got stuck.
I have a simple web form: Last name, name, email, password, confirm password.
Also a .csv with 4 columns that corresponds to the form
Last name Name Email Password
0 Brown Stan brown#stan.com 12345678
1 White Eagle white#eagle.com 123456789
2 Dante Aligr adant#mail.au 98765432
So, all I want is to feed the 3 entries to the form and click "Sent" after each entry.
I copycated a code from here that seemed passing but I keep getting this
File "C:\Users\untitled2.py", line 43, in <module>
last.send_keys(lname)
AttributeError: 'list' object has no attribute 'send_keys'
the code I tried
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
options = Options()
options.binary_location = FirefoxBinary(r"C:\Program Files\Mozilla Firefox\firefox.exe")
driver = webdriver.Firefox(executable_path=r'C:\WebDriver\bin\geckodriver.exe', firefox_options=options)
import time
import pandas as pd
reg = pd.read_csv('Form.csv', header=0, delimiter=';', sep=r'\s*;\s*')
print(reg)
lname = reg['Last name'].tolist()
name = reg['Name'].tolist()
mail = reg['Email'].tolist()
password = reg['Password'].tolist()
password_c = reg['Password'].tolist()
driver.get('url')
first = driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div/div[2]/div[2]/div/div/div[2]/div/div[1]/div/div[1]/input")
last = driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div/div[2]/div[1]/div/div/div[2]/div/div[1]/div/div[1]/input")
mail = driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div/div[2]/div[3]/div/div/div[2]/div/div[1]/div/div[1]/input")
password = driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div/div[2]/div[4]/div/div/div[2]/div/div[1]/div/div[1]/input")
password_confirm = driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div/div[2]/div[4]/div/div/div[2]/div/div[1]/div/div[1]/input")
submit = driver.find_elements_by_xpath('/html/body/div/div[3]/form/div[2]/div/div[3]/div[1]/div/div/span/span')
for lname, name, mail, password, password_c in zip(lname, name, mail, password, password_c):
last.send_keys(lname)
time.sleep(1)
first.send_keys(name)
time.sleep(1)
mail.send_keys(mail)
time.sleep(1)
password.send_keys(password)
time.sleep(1)
password_confirm.send_keys(password_c)
time.sleep(1)
submit.click()
time.sleep(3)
Any nudge into the right direction will be highly appreciated since I have seen plenty of examples of using lists with send_keys()
Thanks!

The error message indicates that you are using send_keys() with plain python lists.
According to Selenium docs, find_elements_by_xpath does indeed return a list.
It's possible that you meant to use find_element_by_xpath (without the 's' after element)?

Anyhow,
the execution part should have look like this
for lnames in lname:
count = 0
for names in name:
count_name = 0
for mails in mail_:
count_mail = 0
for value in last:
value.send_keys(lname[count])
count +=1
for x in mail:
x.send_keys(mail_[count_mail])
count_mail +=1
for y in first:
y.send_keys(name[count_name])
count_name +=1
submit = driver.find_element_by_xpath('//*[#id="mG61Hd"]/div[2]/div/div[3]/div[1]/div/div/span/span')
submit.click()

Python Selenium accepting the "Before you continue" form on google maps in GoogleChrome

Yesterday I started to change my code a little.
One part apparently was very tricky.
Its only use was to click on the continue button so the next click on the car symbol won't raise an error for. As I learned the error is raised because the form is in front of the button. (Apparently it sometimes works nevertheless)
This code snipped worked perfectly,
but changing it resulted in some errors I didn't see coming.
This code should work for itself if anyone wants to test it.
The variables *_strasse, (street name '+' separated)
*_hausnummer, (House number)
*_plz, (Post Code)
*_stadt (city) worked for me.
try:
#Getting the HTML
link = f"https://www.google.de/maps/dir/{start_strasse}+{start_hausnummer},+{start_plz}+{start_stadt},+Deutschland/{end_strasse}+{end_hausnummer},+D-{end_plz}+{end_stadt},+Deutschland"
driver.get(link)
driver.implicitly_wait(6)
#Wait for the button to appear
try:
elem = driver.find_element_by_xpath('//*[#id="introAgreeButton"]').click()
except :
continue
#Find the button for routes by car
list(filter(lambda x: x.get_attribute('jstcache') == '508', driver.find_elements_by_tag_name('button')))[0].click()
#Parse the fastest times
times = [ re.findall(r'[0-9]+', x.text[:-3]) for x in list(filter(lambda x: x.get_attribute('jstcache') == '265', driver.find_elements_by_tag_name('span')))]
print('Elemente:', times )
#Select the fastest time
length = times[0][0]
#Convert hh:mm format to minutes
length = int(length[0]) if len(length) == 1 else 60*int(length[0]) + int(length[1])
print(length)
except:
#used for debugging
#print(str(runde) + f'/{len(adresses.keys())}', link)
raise
My new code looks like this, but it's not able to find the continue button by XPATH.
try:
link = f"https://www.google.de/maps/dir/{start_strasse}+{start_hausnummer},+{start_plz}+{start_stadt},+Deutschland/{end_strasse}+{end_hausnummer},+D-{end_plz}+{end_stadt},+Deutschland"
driver.get(link)
driver.implicitly_wait(6)
elem = driver.find_element_by_xpath('//*[#id="introAgreeButton"]').click()
#element = driver.find_element_by_css('div[class*="U26fgb"]')
#driver.execute_script("arguments[0].click();", element)
#list(filter(lambda x: x.get_attribute('jstcache') == '508', driver.find_elements_by_tag_name('button')))[0].click()
driver.find_element_by_xpath("//button[#jstcache='508']").click()
#times = [ re.findall(r'[0-9]+', x.text[:-3]) for x in list(filter(lambda x: x.get_attribute('jstcache') == '265', driver.find_elements_by_tag_name('span')))]
#print('Elemente:', times )
#length = re.findall(r'[0-9]+', list(filter(lambda x: x.get_attribute('jstcache') == '265', driver.find_elements_by_tag_name('span')))[0].text[:-3])
#length = times[0][0]
elem = driver.find_element_by_xpath("//*[#jstcache='265']")
zeit = re.findall(r'[0-9]+', elem.text[:-3])
length = int(zeit[0]) if len(zeit) == 1 else 60*int(zeit[0]) + int(zeit[1])
verbindung = ','.join([start, end, str(length)]) + '\n'
datei.write(verbindung)
print(verbindung)
except:
print(link)
raise
I tested countless ideas from the web including invoking some java script functions instead of clicking, switching to an iframe, or to the active element or the active alert.
I think one of the ideas was actually right but I implemented it wrong.
Apparently implicit waits worked way better for me than any wait until what I found strange.
So I thought that my code could already have the problem inside.
I appreciate all help and comments!
It's my first post here so if I missed something pls tell me!
How the Browser looks like
The HTML of the button

Selenium Python driver.find.elements get attribute

def n_seguidores(self, username):
driver = self.driver
driver.get('https://www.instagram.com/'+ username +'/')
time.sleep(3)
user_botao = driver.find_elements_by_class_name('g47SY ')
print_us = user_botao.get_attribute('title')
print(print_us)
please help me to find numbers of following from html

.find_elements_* return a list, so you need access by index.
There are 3 numbers with the same class name in the page, and the numbers of following you mean is the third.
And to get the number you can use .text, not .get_attribute('title')
Try following code:
user_botao = driver.find_elements_by_class_name('g47SY ')
#second index
print_us = user_botao[2].text
print(print_us)

Using Python and Selenium, How would I bypass 2 factor authentication?

This is my code:
from selenium import webdriver
import time
x = 0
#while x < 5:
browser = webdriver.Chrome('/Users/John Smith/Downloads/chromedriver.exe')
browser.get('https://kdp.amazon.com/en_US/')
x = 0
def Sign():
browser.find_element_by_id('signinButton-announce')
elem = browser.find_element_by_id('signinButton-announce')
elem.click()
email = browser.find_element_by_id('ap_email')
email.send_keys(' ')
password = browser.find_element_by_id('ap_password')
password.send_keys(' ')
SignIn = browser.find_element_by_id('signInSubmit')
SignIn.click()
def Sign1():
browser.find_element_by_id('signinButton-announce')
elem = browser.find_element_by_id('signinButton-announce')
elem.click()
email = browser.find_element_by_id('ap_email')
email.send_keys(' ')
password = browser.find_element_by_id('ap_password')
password.send_keys(' ')
SignIn = browser.find_element_by_id('signInSubmit')
SignIn.click()
browser.execute_script("window.open('https://kdp.amazon.com/en_US/', 'new window')")
Sign()
time.sleep(20)
browser.execute_script("window.open('https://kdp.amazon.com/en_US/', 'new window')")
if (browser.current_url == 'https://kdp.amazon.com/en_US/'):
Sign1()
x += 1
I tried to solve the problem by creating another tab, so if I put in the authentication code once, maybe it would stop asking for it after. I can't really find a way to solve this.. Is there a way?
I've heard of using a current chrome profile, however I could not find a way to actually use any commands on it. I've got the chrome file to work, but it would not go to the URL I needed it to go to.

Parsing with placeholders

I am trying to scrape all the different variations of this webpage.For instance the code that should scrape this webpage http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&ID=11849.
should be the same as the code i use to scrape this webpage
http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&ID=11849
def extract_contact(url):
r=requests.get(url)
soup=BeautifulSoup(r.content,'lxml')
tbl=soup.findAll('table')[2]
list=[]
Contact=tbl.findAll('p')[0]
for br in Contact.findAll('br'):
next = br.nextSibling
if not (next and isinstance(next,NavigableString)):
continue
next2 = next.nextSibling
if next2 and isinstance(next2,Tag) and next2.name == 'br':
text = re.sub(r'[\n\r\t\xa0]','',next).replace('Phone:','').strip()
list.append(text)
print list
#Street=list.pop(0)
#CityStateZip=list.pop(0)
#Phone=list.pop(0)
#City,StateZip= CityStateZip.split(',')
#State,Zip= StateZip.split(' ')
#ContactName = Contact.findAll('b')[1]
#ContactEmail = Contact.findAll('a')[1]
#Body=tbl.findAll('p')[1]
#Website = Contact.findAll('a')[2]
#Email = ContactEmail.text.strip()
#ContactName = ContactName.text.strip()
#Website = Website.text.strip()
#Body = Body.text
#Body = re.sub(r'[\n\r\t\xa0]','',Body).strip()
#list.extend([Street,City,State,Zip,ContactName,Phone,Email,Website,Body])
return list
The way i believe i will need to write the code in order it to work, is to set it up so that print list returns the same number of values, ordered identically.Currently, the above script returns these values
[u'2133 Craigs Store Road', u'Afton,VA 22920', u'434-882-3150']
[u'Alexandria,VA 22305']
Accounting for missing values,in order to be able to parse this page consistently,
I need the print list command to return something similar to this
[u'2133 Craigs Store Road', u'Afton,VA 22920', u'434-882-3150']
['',u'Alexandria,VA 22305','']
this way i will be able to manipulate values by position(as they will be in consistent order). The problem is that i don't know how to accomplish this as I am still very new to parsing. If anybody has any insight as to how to solve the problem i would be highly appreciative.

def extract_contact(url):
r=requests.get(url)
soup=BeautifulSoup(r.content,'lxml')
tbl=soup.findAll('table')[2]
list=[]
Contact=tbl.findAll('p')[0]
for br in Contact.findAll('br'):
next = br.nextSibling
if not (next and isinstance(next,NavigableString)):
continue
next2 = next.nextSibling
if next2 and isinstance(next2,Tag) and next2.name == 'br':
text = re.sub(r'[\n\r\t\xa0]','',next).replace('Phone:','').strip()
list.append(text)
Street=[s for s in list if ',' not in s and '-' not in s]
CityStateZip=[s for s in list if ',' in s]
Phone = [s for s in list if '-' in s]
if not Street:
Street=''
else:
Street=Street[0]
if not CityStateZip:
CityStateZip=''
else:
City,StateZip= CityStateZip[0].split(',')
State,Zip= StateZip.split(' ')
if not Phone:
Phone=''
else:
Phone=Phone[0]
list=[]
I figured out an alternative solution using substrings and if statements. Since there are only 3 values max in the list, all with defining characteristics i realized that i could delegate by looking for special characters rather than the position of the record.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

If Statment in Python while Scraping - python

Related

looping through several columns and rows from csv to fill a form

Python Selenium accepting the "Before you continue" form on google maps in GoogleChrome

Selenium Python driver.find.elements get attribute

Using Python and Selenium, How would I bypass 2 factor authentication?

Parsing with placeholders

Categories

Resources