I have decided to attempt to create a simple web scraper script in python. As a small challenge I decided to create a script which will be able to log me into facebook and fetch the current birthdays displayed in the side. I have managed to write a script which is able to log me into my facebook, however I have no idea how to fetch the birthdays displayed.
This is my scrypt.
from selenium import webdriver
from time import sleep
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
usr = 'EMAIL'
pwd = 'PASSWORD'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://www.facebook.com/')
print ("Opened facebook")
sleep(1)
username_box = driver.find_element_by_id('email')
username_box.send_keys(usr)
print ("Email Id entered")
sleep(1)
password_box = driver.find_element_by_id('pass')
password_box.send_keys(pwd)
print ("Password entered")
login_box = driver.find_element_by_id('u_0_b')
login_box.click()
print ("Login Sucessfull")
print ("Fetched needed data")
input('Press anything to quit')
driver.quit()
print("Finished")
This is my first time creating a script of this type. My assumption is that I am supposed to traverse through the children of the "jsc_c_3d" div element until I get to the displayed birthdays. Furthermore the id of this element changes everytime the page is refreshed. Can anyone tell me how this is done or if this is the right way that I should go on about solving this problem?
The div for the birthday after expecting elements:
<div class="" id="jsc_c_3d">
<div class="j83agx80 cbu4d94t ew0dbk1b irj2b8pg">
<div class="qzhwtbm6 knvmm38d"><span class="oi732d6d ik7dh3pa d2edcug0 qv66sw1b c1et5uql
a8c37x1j muag1w35 enqfppq2 jq4qci2q a3bd9o3v knj5qynh oo9gr5id hzawbc8m" dir="auto">
<strong>Bobi Mitrevski</strong>
and
<strong>Trajce Tusev</strong> have birthdays today.</span></div></div></div>
You are correct that you would need to traverse through the inner elements of jsc_c_3d to extract the birthdays that you want. However this whole automated web-scraping is a problem if the id value is dynamic, such that it changes on each occasion. In this case, text parsers such as bs4 would do the job.
With the bs4 approach you simply have to extract the relevant div tags from the DOM and then you can parse the data to obtain the required contents.
More generally, this problem is solvable using the Facebook-API which could be as simple as
import facebook
token = 'a token' # token omitted here, this is the same token when I use in https://developers.facebook.com/tools/explorer/
graph = facebook.GraphAPI(token)
args = {'fields' : 'birthday,name' }
friends = graph.get_object("me/friends",**args)
Related
I am trying to write a script to automate job applications on Linkedin using selenium and python.
The steps are simple:
open the LinkedIn page, enter id password and log in
open https://linkedin.com/jobs and enter the search keyword and location and click search(directly opening links like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia get stuck as loading, probably due to lack of some post information from the previous page)
the click opens the job search page but this doesn't seem to update the driver as it still searches on the previous page.
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd
import yaml
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
url = "https://linkedin.com/"
driver.get(url)
content = driver.page_source
stream = open("details.yaml", 'r')
details = yaml.safe_load(stream)
def login():
username = driver.find_element_by_id("session_key")
password = driver.find_element_by_id("session_password")
username.send_keys(details["login_details"]["id"])
password.send_keys(details["login_details"]["password"])
driver.find_element_by_class_name("sign-in-form__submit-button").click()
def get_experience():
return "1%C22"
login()
jobs_url = f'https://www.linkedin.com/jobs/'
driver.get(jobs_url)
keyword = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-keyword-id-ember')]")
location = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-location-id-ember')]")
keyword.send_keys("python")
location.send_keys("Australia")
driver.find_element_by_xpath("//button[normalize-space()='Search']").click()
WebDriverWait(driver, 10)
# content = driver.page_source
# soup = BeautifulSoup(content)
# with open("a.html", 'w') as a:
# a.write(str(soup))
print(driver.current_url)
driver.current_url returns https://linkedin.com/jobs/ instead of https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia as it should. I have tried to print the content to a file, it is indeed from the previous jobs page and not from the search page. I have also tried to search elements from page like experience and easy apply button but the search results in a not found error.
I am not sure why this isn't working.
Any ideas? Thanks in Advance
UPDATE
It works if try to directly open something like https://www.linkedin.com/jobs/search/?f_AL=True&f_E=2&keywords=python&location=Australia but not https://www.linkedin.com/jobs/search/?f_AL=True&f_E=1%2C2&keywords=python&location=Australia
the difference in both these links is that one of them takes only one value for experience level while the other one takes two values. This means it's probably not a post values issue.
You are getting and printing the current URL immediately after clicking on the search button, before the page changed with the response received from the server.
This is why it outputs you with https://linkedin.com/jobs/ instead of something like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia.
WebDriverWait(driver, 10) or wait = WebDriverWait(driver, 20) will not cause any kind of delay like time.sleep(10) does.
wait = WebDriverWait(driver, 20) only instantiates a wait object, instance of WebDriverWait module / class
Hi guys I am still learning selenium and basics of coding so sorry for noob question :)
I have a list of links in the CSV
I want to open each link, do some action and go to another link from CSV list, but if the website doesnt have that button I want just to move to another link from the csv
For now when the page has those elements i can loop trough, but sometimes this page doesnt have this elements and wants to redirect me to external site.
I have value like this on the page:
<p class="uk-text-large uk-text-center uk-alert">You are leaving this site</p>
'
<input type="submit" name="submit" onclick="ga('send', 'event', 'external', 'click',
'logedin');" id="submit" value="redirect" class="uk-button uk-button-primary uk-button-large">
I would like if selenium finds on the page text "You are leaving this site" or value "redirect" on the page just to move to another link from the list.
This is the first part that works when all pages have button and dont want to redirect me to the external site
One extra moment as well would be if i could add a note was the link skipped or not the CSV as new column
'
from selenium import webdriver
import time
import pandas as pd
import config
from webdriver_manager.chrome import ChromeDriverManager
df = pd.read_csv("listofurls.csv")
df = pd.read_csv('listofurls.csv')
urls = df['link']
for url in urls:
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://www.website.com/login?redirect=%2F')
driver.maximize_window()
usernamebox = driver.find_element_by_id("Email_login")
usernamebox.send_keys(config.email)
passbox = driver.find_element_by_id("Password_login")
passbox.send_keys(config.password)
loginbutton = driver.find_element_by_xpath('//*[#id="Password_login"]')
loginbutton.submit()
time.sleep(3)
#data = {}
driver.get(url)
cookies = driver.find_element_by_xpath('//*[#id="__allow_ct_container"]/div/div/a')
cookies.click()
cv = driver.find_element_by_css_selector("input[type='radio'][value='293608']")
cv.click()
submit = driver.find_element_by_xpath('//*[#id="__submit"]')
submit.click()
#driver.close()
print('done')
An element having id __submit and containing foo text can be found using the following XPATH:
driver.find_element_by_xpath("//*[#id='__submit' and contains(text(), 'foo')]")
Have a look at this XPATH cheatsheet.
This is my code:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# loading the webpage
browser = webdriver.Chrome()
browser.get("https://instagram.com")
time.sleep(1)
# finding essential requirements
user_name = browser.find_element_by_name("username")
password = browser.find_element_by_name("password")
login_button = browser.find_element_by_xpath("//button [#type = 'submit']")
# filling out the user name box
user_name.click()
user_name.clear()
user_name.send_keys("username")
# filling out the password box
password.click()
password.clear()
password.send_keys("password")
# clicking on the login button
login_button.click()
time.sleep(3)
# information save permission denial
not_now_button = browser.find_element_by_xpath("//button [#class = 'sqdOP yWX7d y3zKF ']")
not_now_button.click()
time.sleep(3)
# notification permission denial
not_now_button_2 = browser.find_element_by_xpath("//button [#class = 'aOOlW HoLwm ']")
not_now_button_2.click()
time.sleep(3)
# finding search box and searching + going to the page
search_box = browser.find_element_by_xpath('//input [#placeholder="Search"]')
search_box.send_keys("sb else's page")
time.sleep(3)
search_box.send_keys(Keys.RETURN)
search_box.send_keys(Keys.RETURN)
time.sleep(3)
# opening ((followers)) list
followers = browser.find_element_by_xpath('//a [#class="-nal3 "]')
followers.click()
time.sleep(10)
# following each follower
follower = browser.find_elements_by_xpath('//button [#class="sqdOP L3NKy y3zKF "]')
browser.close()
In this code, I normally simulate what a normal person does to follow another person.
I want to follow each follower of a page. I have thought all day long; But couldn't find any algorithms.
Got some good ideas, but just realized I don't know how I can scroll down to the end of the list to get the entire list. Can you help? (If you don't get me, try running the code and then extract the list of followers.)
# following each follower
get list of followers
for each follower - click 'follow' if it's possible
if button text haven't changed, it means that you reached the limit of follows, or maybe banned
Also, be sure to limit your actions, instagram had limit of follows (30 per hour, it was before)
And you can get the followers directly through instagram API.
And don't forget to unfollow them, because unfollowing also has limits. And the limit of current follows is 7500( it was before, not sure how about now)
First you need to get a list of the users that follows someone, then you just execute the same code in a loop. You can either scrape the users separate, or within selenium. Then run the code needed to follow a given person, in Ex. a for loop. Step 6: profit
I have created a screen scraping program using selenium, which prints out a few variables. I want to take the numbers it spits out and compare it to numbers in a text document. I am unsure on the process of going about this. What would be the best way to go about this. The text file will contain a 3 numbers which will be compared to 3 numbers that have been screen scraped.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
#The above is downloading the needed files for this code to work
chrome_path = r"C:\Users\ashabandha\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://signin.acellus.com/SignIn/index.html")
time.sleep(2)
username = driver.find_element_by_id("Name")
password = driver.find_element_by_id("Psswrd")
username.send_keys("my login")
password.send_keys("my password")
time.sleep(2)
driver.find_element_by_xpath("""//*[#id="loginform"]/table[2]/tbody/tr/td[2]/input""").click()
#The program has now signed in and is going to navigate to the progress tab
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=484")
time.sleep(2)
#now we are on the progress tab
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
#this gives me the first class log
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326")
#This gives me second class log
time.sleep(2)
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=292")
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
Save selenium output on a data structure, like list or dictionary, then open the file, extract the info you want to compare it to and do the algorithm or expression you wish to it: https://www.python.org/doc/
check out working with file.
I know this question might seem quite straight forward, but I have tried every suggestion and none has worked.
I want to build a Python script that checks my school website to see if new grades have been put up. However I cannot for the life of me figure out how to scrape it.
The website redirects to a different page to login. I have tried all the scripts and answers I could find but I am lost.
I use Python 3, the website is in a https://blah.schooldomate.state.edu.country/website/grades/summary.aspx
format
The username section contains the following:
<input class="txt" id="username" name="username" type="text" autocomplete="off" style="cursor: auto;">
The password is the name except it contains an onfocus HTML element.
One successfully authenticated, I am automatically redirected to the correct page.
I have tried:
using Python 2's cookielib and Mechanize
Using HTTPBasicAuth
Passing the information as a dict to a requests.get()
Trying out many different peoples code including answers I found on this site
You can try with requests:
http://docs.python-requests.org/en/master/
from the web site:
import requests
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
Maybe you can use Selenium library.
I let you my code example:
from selenium import webdriver
def loging():
browser = webdriver.Firefox()
browser.get("www.your_url.com")
#Edit the XPATH of Loging INPUT username
xpath_username = "//input[#class='username']"
#Edit the XPATH of Loging INPUT password
xpath_password = "//input[#class='password']"
#THIS will write the YOUR_USERNAME/pass in the xpath (Custom function)
click_xpath(browser, xpath_username, "YOUR_USERNAME")
click_xpath(browser, xpath_username, "YOUR_PASSWORD")
#THEN SCRAPE WHAT YOU NEED
#Here is the custom function
#If NO input, will only click on the element (on a button for example)
def click_xpath(self, browser, xpath, input="", time_wait=10):
try:
browser.implicitly_wait(time_wait)
wait = WebDriverWait(browser, time_wait)
search = wait.until(EC.element_to_be_clickable((By.XPATH, xpath)))
search.click()
sleep(1)
#Write in the element
if input:
search.send_keys(str(input) + Keys.RETURN)
return search
except Exception as e:
#print("ERROR-click_xpath: "+xpath)
return False