selenium facebook comment reply - python

Trying to reply to facebook comments using selenium and python.
I've been able to select the field using
find_elements_by_css_selector(".UFIAddCommentInput")
But I can't post text using the send_keys method. Heres a simplified structure of the comment html for facebook:
<div><input tabindex="-1" name="add_comment_text">
<div class="UFIAddCommentInput _1osb _5yk1"><div class="_5yk2">
<div class="_5yw9"><div class="_5ywb">
<div class="_3br6">Write a comment...</div></div>
<div class="_5ywa">
<div title="Write a comment..." role="combobox"
class="_54z"contenteditable="true">
<div data-contents="true">
<div class="_209g _2vxa">

It works perfectly fine. The only catch being, clear the div everytime before you begin to type a new comment.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
f = webdriver.Firefox()
f.get("https://facebook.com")
# Skipping the logging in and going to a particular post part.
# The selector I used correspond to id of the post and
# class of the div that contains editable text part
e = f.find_element_by_css_selector("div#u_jsonp_8_q div._54-z")
e.send_keys("bob")
e.send_keys(Keys.ENTER)
e.clear()
e.send_keys("kevin")
e.send_keys(Keys.ENTER)
e.clear()

Related

How to target a specific div class within a specific parent div class, python selenium

I have two divs that look like this:
<div id="RightColumn">
<div class="profile-info">
<div class= "info">
</div>
<div class="title">
</div>
</div>
</div>
How do I target the internal div labelled "title"? It appears multiple times on the page but the one that I need to target is within "RightColumn".
Here is the code I tried:
mainDIV = driver.find_element_by_id("RightColumn")
targetDIV = mainDIV.find_element_by_xpath('//*[#class="title"]').text
Unfortunately the above code still pulls all title divs on the page vs the one I need within the mainDiv.
//div[#id='RightColumn']//child::div[#class='title']
this should get the job done.
first use id RightColumn to taget div and then title class div is a child.
This will select the first title div under this element:
mainDIV.find_element_by_xpath('.//div[#class="title"]
However, this will select the first title on the page:
mainDIV.find_element_by_xpath('//div[#class="title"]
Try:
targetDIV = mainDIV.find_element_by_xpath('.//div[#class="title"]').text
Note as of Selenium 4.0.0, the find_element_by_* functions are deprecated and should be replaced with find_element().
targetDIV = mainDIV.find_element(By.XPATH, './/div[#class="title"]').text
Reference:
WebDriver API - find_element_by_xpath

How to scrape elements in Selenium/Python by calling different css selectors at the same time?

I am trying to select the title of posts that are loaded in a webpage by integrating multiple css selectors. See below my process:
Load relevant libraries
import time
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Then load the content I wish to analyse
options = Options()
options.set_preference("dom.push.enabled", False)
browser = webdriver.Firefox(options=options)
browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[#type='search']").send_keys("international development",Keys.ENTER)
time.sleep(5)
scrolls = 2
while True:
scrolls -= 1
browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(5)
if scrolls < 0:
break
Then to get the content for each selector separately, call for css_selector
titles=browser.find_elements_by_css_selector("h3[class^='graf']")
TitlesList = []
for names in titles:
names.text
TitlesList.append(names.text)
times=browser.find_elements_by_css_selector("time[datetime^='2016']")
Times = []
for names in times:
names.text
Times.append(names.text)
It all works so far...Now trying to bring them together with the aim to identify only choices from 2016
choices = browser.find_elements_by_css_selector("time[datetime^='2016'] and h3[class^='graf']")
browser.quit()
On this last snippet, I always get an empty list.
So I wonder 1) How can I select multiple elements by considering different css_selector as conditions for selection at the same time 2) if the syntax to find under multiple conditions would be the same to link elements by using different approaches like css_selector or x_paths and 3) if there is a way to get the text for elements identified by calling for multiple css selectors along a similar line of what below:
[pair.text for pair in browser.find_elements_by_css_selector("h3[class^='graf']") if pair.text]
Thanks
Firstly, I think what you're trying to do is to get any title that has time posted in 2016 right?
You're using CSS selector "time[datetime^='2016'] and h3[class^='graf']", but this will not work because its syntax is not valid (and is not valid). Plus, these are 2 different elements, CSS selector can only find 1 element. In your case, to add a condition from another element, use a common element like a parent element or something.
I've checked the site, here's the HTML that you need to take a look at (if you're trying to the title that published in 2016). This is the minimal HTML part that can help you identify what you need to get.
<div class="postArticle postArticle--short js-postArticle js-trackPostPresentation" data-post-id="d17220aecaa8"
data-source="search_post---------2">
<div class="u-clearfix u-marginBottom15 u-paddingTop5">
<div class="postMetaInline u-floatLeft u-sm-maxWidthFullWidth">
<div class="u-flexCenter">
<div class="postMetaInline postMetaInline-authorLockup ui-captionStrong u-flex1 u-noWrapWithEllipsis">
<div
class="ui-caption u-fontSize12 u-baseColor--textNormal u-textColorNormal js-postMetaInlineSupplemental">
<a class="link link--darken"
href="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action="open-post"
data-action-value="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action-source="preview-listing">
<time datetime="2016-09-05T13:55:05.811Z">Sep 5, 2016</time>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="postArticle-content">
<a href="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action="open-post" data-action-source="search_post---------2"
data-action-value="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action-index="2" data-post-id="d17220aecaa8">
<section class="section section--body section--first section--last">
<div class="section-divider">
<hr class="section-divider">
</div>
<div class="section-content">
<div class="section-inner sectionLayout--insetColumn">
<h3 name="5910" id="5910" class="graf graf--h3 graf--leading graf--title">Reimagining
International Development for the 21st Century.</h3>
</div>
</div>
</section>
</a>
</div>
</div>
Both time and h3 are in a big div with class of postArticle. The article contains time published & the title, so it makes sense to get the whole article div that published in 2016 right?
Using XPATH is much more powerful & easier to write:
This will get all articles div that contains class name of postArticle--short: article_xpath = '//div[contains(#class, "postArticle--short")]'
This will get all time tag that contains class name of 2016: //time[contains(#datetime, "2016")]
Let's combine both of them. I want to get article div that contains a time tag with classname of 2016:
article_2016_xpath = '//div[contains(#class, "postArticle--short")][.//time[contains(#datetime, "2016")]]'
article_element_list = driver.find_elements_by_xpath(article_2016_xpath)
# now let's get the title
for article in article_element_list:
title = article.find_element_by_tag_name("h3").text
I haven't tested the code yet, only the xpath. You might need to adapt the code to work on your side.
By the way, using find_element... is not a good idea, try using explicit wait: https://selenium-python.readthedocs.io/waits.html
This will help you to avoid making stupid time.sleep waits and improve your app performance, and you can handle errors pretty well.
Only use find_element... when you already located the element, and you need to find a child element inside. For example, in this case if I want to find articles, I will find by explicit wait, then after the element is located, I will use find_element... to find child element h3.

Taking a certain part of the page with selenium

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver import ActionChains
import selenium.webdriver.common.keys
from bs4 import BeautifulSoup
import requests
import time
driver = webdriver.Chrome(executable_path="../drivers/chromedriver.exe")
driver.get("https://www.Here the address of the relevant website ends with aspx.com.aspx")
element=driver.find_element_by_id("ctl00_ContentPlaceHolder1_LB_SEKTOR")
drp=Select(element)
drp.select_by_index(0)
element1=driver.find_element_by_id("ctl00_ContentPlaceHolder1_Lb_Oran")
drp=Select(element1)
drp.select_by_index(41)
element2=driver.find_element_by_id("ctl00_ContentPlaceHolder1_LB_DONEM")
drp=Select(element2)
drp.select_by_index(1)
driver.find_element_by_id("ctl00_ContentPlaceHolder1_ImageButton1").click()
time.sleep(1)
print(driver.page_source)
The last part of these codes, I can print the source codes of the page as a result. So I can get the source codes of the page as a print.
But in source codes of the page I just need the following table part written in java. How can I extract this section. and I can output csv as a table. (How can I get the table in the Java section.)
Not:In the Selenium test, I thought of pressing the CTRL U keys while in Chrome, but I was not successful in this.The web page is a user interactive page. Some interactions are required to get the data I want. That's why I used Selenium.
<span id="ctl00_ContentPlaceHolder1_Label2" class="Georgia_10pt_Red"></span>
<div id="ctl00_ContentPlaceHolder1_Divtable">
<div id="table">
<layer name="table" top="0"><IMG height="2" src="../images/spacer.gif" width="2"><br>
<font face="arial" color="#000000" size="2"><b>Tablo Yükleniyor. Lütfen Bekleyiniz...</b></font><br>
</layer>
</div>
</div>
<script language=JavaScript> var theHlp='/yardim/matris.asp';var theTitle = 'Piya Deg';var theCaption='OtomoT (TL)';var lastmod = '';var h='<a class=hislink href=../Hisse/Hisealiz.aspx?HNO=';var e='<a class=hislink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('cksart',4,50);theCols[1] = new Array('2018.12',1,60);theCols[2] = new Array('2019.03',1,60);theCols[3] = new Array('2019.06',1,60);theCols[4] = new Array('2019.09',1,60);theCols[5] = new Array('2019.12',1,60);theCols[6] = new Array('2020.03',1,60);var theRows = new Array();theRows[0] = new Array ('<b>'+h+'42>AHRT</B></a>','519,120,000.00','590,520,000.00','597,240,000.00','789,600,000.00','1,022,280,000.00','710,640,000.00');
theRows[1] = new Array ('<b>'+h+'427>SEEL</B></a>','954,800,000.00','983,400,000.00','1,201,200,000.00','1,716,000,000.00','2,094,400,000.00','-');
theRows[2] = new Array ('<b>'+h+'140>TOFO</B></a>','17,545,500,000.00','17,117,389,800.00','21,931,875,000.00','20,844,054,000.00','24,861,973,500.00','17,292,844,800.00');
theRows[3] = new Array ('<b>'+h+'183>MSO</B></a>','768,000,000.00','900,000,000.00','732,000,000.00','696,000,000.00','1,422,000,000.00','1,134,000,000.00');
theRows[4] = new Array ('<b>'+h+'237>KURT</B></a>','2,118,000,000.00','2,517,600,000.00','2,736,000,000.00','3,240,000,000.00','3,816,000,000.00','2,488,800,000.00');
theRows[5] = new Array ('<b>'+h+'668>GRTY</B></a>','517,500,000.00','500,250,000.00','445,050,000.00','552,000,000.00','737,150,000.00','-');
theRows[6] = new Array ('<b>'+h+'291>MEME</B></a>','8,450,000,000.00','8,555,000,000.00','9,650,000,000.00','10,140,000,000.00','13,430,000,000.00','8,225,000,000.00');
theRows[7] = new Array ('<b>'+h+'292>AMMI</B></a>','-','-','-','-','-','-');
theRows[8] = new Array ('<b>'+h+'426>GOTE</B></a>','1,862,578,100.00','1,638,428,300.00','1,689,662,540.00','2,307,675,560.00','2,956,642,600.00','2,121,951,440.00');
var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script></form>
<div style="clear: both; margin-top: 10px;">
<div style="background-color: Red; border: 2px solid Green; display: none">
TABLO-ALT</div>
<div id="Bannerctl00_SiteBannerControl2">
<div id="_bannerctl00_SiteBannerControl2">
<div id="Sayfabannerctl00_SiteBannerControl2" class="banner_Codex">
</div>
Please, note that I've only used Selenium in Java, so I'll give you the most generic and languaje-agnostic answer I can. Keep in mind that Python Selenium MAY provide a method to do this directly.
Steps:
Make all Selenium interactions so the WebDriver actually has a VALID page version with all your contents loaded
Extract from selenium the current contents of the whole page
Load it with a HTML parsing library. I use JSoup in Java, I don't now if there's a Python version. From now on, Selenium does not matter.
Use CSS selectors on your parser Object to get the section you want
Convert that section to String to print.
If performance is a requeriment this approach may be a bit too expensive, as the contents are parsed twice: Selenium does it first, and your HTML parser will do it again later with the extracted String from Selenium.
ALTERNATIVE: If your "target page" uses AJAX, you may directly interact with the REST API that javascript is accesing to get the data to fill for you. I tend to follow this approach when doing serious web scraping, but sometimes this is not an option, so I use the above approach.
EDIT
Some more details base on questions in comments:
You can use BeautifullSoup as a html parsing library.
To load a page in BeautifullSoup use:
html = "<html><head></head><body><div id=\"events-horizontal\">Hello world</div></body></html>"
soup = BeautifulSoup(html, "html.parser")
Then look at this answer to see how to extract the specific contents from your soup:
your_div = soup.select_one('div#events-horizontal')
That would give you the first div with events-horizontal id:
<div id="events-horizontal">Hello world</div>
BeautifullSoup code based on:
How to use CSS selectors to retrieve specific links lying in some class using BeautifulSoup?

Extract Text From same class name(Python web scraping)

I'm beginner in Python Webscriping using beautifulsoup. I was trying to scrape one real estate website using beautifulsoup but there is row with different information in each column. However each column's class name is same so When I trying to scrape information of each column, I got a same result becuase of same class name.
Link of the website I was trying to scrape.
Code From The HTML
<div class="lst-middle-section resale">
<div class="item-datapoint va-middle">
<div class="lst-sub-title stub text-ellipsis">Built Up Area</div>
<div class="lst-sub-value stub text-ellipsis">2294 sq.ft.</div>
</div>
<div class="item-datapoint va-middle">
<div class="lst-sub-title stub text-ellipsis">Avg. Price</div>
<div class="lst-sub-value stub text-ellipsis"><i class="icon-rupee"></i> 6.5k / sq.ft.</div>
</div>
<div class="item-datapoint va-middle">
<div class="lst-sub-title stub text-ellipsis">Possession Date</div>
<div class="lst-sub-value stub text-ellipsis">31st Dec, 2020</div>
</div>
Code I Tried!
for item in all:
try:
print(item.find('span', {'class': 'lst-price'}).getText())
print(item.find('div',{'class': 'lst-heading'}).getText())
print(item.find('div', {'class': 'item-datapoint va-middle'}).getText())
print('')
except AttributeError:
pass
If I use class 'item-datapoint va-middle' again then it shows sq.ft area not avg.price or Possession date.
Solution? TIA!
Use find_elements_by_class_name instead of find_element_by_class_name.
find_elements_by_class_name("item-datapoint.va-middle")
You will get a list of elements.
Selenium docs: Locating Elements
Edit:
from selenium import webdriver
url = 'https://housing.com/in/buy/search?f=eyJiYXNlIjpbeyJ0eXBlIjoiUE9MWSIsInV1aWQiOiJhMWE1MjFmYjUzNDdjYT' \
'AxNWZlNyIsImxhYmVsIjoiQWhtZWRhYmFkIn1dLCJub25CYXNlQ291bnQiOjAsImV4cGVjdGVkUXVlcnkiOiIlMjBBaG1lZGFiYWQiL' \
'CJxdWVyeSI6IiBBaG1lZGFiYWQiLCJ2IjoyLCJzIjoiZCJ9'
driver = webdriver.Chrome()
driver.get(url)
fields = driver.find_elements_by_class_name("item-datapoint.va-middle")
for i, field in enumerate(fields):
print(i, field.text)
driver.quit()
Now you see the index in the list (fields) for every element.
Print the elements you want like here:
poss_date = fields[2].text

Webdriver - Locate Input via Label (Python)

How do I locate an input field via its label using webdriver?
I like to test a certain web form which unfortunately uses dynamically generated
ids, so they're unsuitable as identifiers.
Yet, the labels associated with each web element strike me as suitable.
Unfortunately I was not able to do it with the few suggestions
offered on the web. There is one thread here at SO, but which did not
yield an accepted answer:
Selenium WebDriver Java - Clicking on element by label not working on certain labels
To solve this problem in Java, it is commonly suggested to locate the label as an anchor via its text content and then specifying the xpath to the input element:
//label[contains(text(), 'TEXT_TO_FIND')]
I am not sure how to do this in python though.
My web element:
<div class="InputText">
<label for="INPUT">
<span>
LABEL TEXT
</span>
<span id="idd" class="Required" title="required">
*
</span>
</label>
<span class="Text">
<input id="INPUT" class="Text ColouredFocus" type="text" onchange="var wcall=wicketAjaxPost(';jsessionid= ... ;" maxlength="30" name="z1013400259" value=""></input>
</span>
<div class="RequiredLabel"> … </div>
<span> … </span>
</div>
Unfortunately I was not able to use CSS or XPATH expressions
on the site. IDs and names always changed.
The only solution to my problem I found was a dirty one - parsing
the source code of the page and extract the ids by string operations.
Certainly this is not the way webdriver was intended to be used, but
it works robustly.
Code:
lines = []
for line in driver.page_source.splitlines():
lines.append(line)
if 'LABEL TEXT 1' in line:
id_l1 = lines[-2].split('"', 2)[1]
You should start with a div and check that there is a label with an appropriate span inside, then get the input element from the span tag with class Text:
//div[#class='InputText' and contains(label/span, 'TEXT_TO_FIND')]/span[#class='Text']/input

Categories