I'm filling a form using requests and python but i'm blocked with the recaptcha.
I need to send a g-recaptcha-response but I don't know how to get it.
Here is the website code:
<div class="g-recaptcha" data-callback="checkoutAfterCaptcha" data-sitekey="6LeWwRkUAAAAAOBsau7xxxx-xxxxxxxxxx" data-size="invisible">
<div class="grecaptcha-badge" data-style="bottomright" style="width: 256px; height: 60px; transition: right 0.3s ease; position: fixed; bottom: 14px; right: -186px; box-shadow: gray 0px 0px 5px;">
<div class="grecaptcha-logo">
<iframe src="https://www.google.com/recaptcha/api2/anchor?ar=1&k=6LeWwRkUAAAAAOBsau7KpuC9AV-6J8mhw4AjC3Xz&co=aHR0cHM6Ly93d3cuc3VwcmVtZW5ld3lvcmsuY29tOjQ0Mw..&hl=fr&v=v1525674693836&size=invisible&cb=g8s5582r6zik" width="256" height="60"
role="presentation" frameborder="0" scrolling="no" sandbox="allow-forms allow-popups allow-same-origin allow-scripts allow-top-navigation allow-modals allow-popups-to-escape-sandbox" kwframeid="3">
</iframe>
</div>
<div class="grecaptcha-error">
</div>
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none; display: none; ">
</textarea>
</div>
</div>
here is my code, I manage to get the data-sitekey but I don't understand how to get the g-recaptcha-response:
page = c.get(link_checkout)
soup = BeautifulSoup(page.text, 'html.parser')
find_class = soup.find(class_='g-recaptcha')
get_captcha_token = find_class.get('data-sitekey')
print (get_captcha_token)
# try:
# content = requests.post(
# 'https://www.google.com/recaptcha/api/siteverify',
# data={
# 'secret': RECAPTCHA_SECRET,
# 'response': get_captcha_token,
# 'remoteip': ip
# }
# ).content
# except:
# print ("fail")
# print (get_captcha_token)
c.post(url, data=payload_FORM, headers={"Refered": link_checkout})
page = c.get(link_checkout)
thank you all for your help! This is my last probleme to finish my programme and there is really little on google
If you need more info tell me in the coms I will add it
Related
I have to following HTML that is on a website I'm trying to scrape:
<div class="test-section-container">
<div>
<span class="test-section-title">Section Title</span>
<div style="display: inline-block; padding: 0.05rem;"></div>
</div>
<div style="cursor: pointer; background-color: rgb(248, 248, 248); display: flex; line-height: 1.2; margin-bottom: 0.07rem;">
<div style="width: 0.5rem; flex-shrink: 0; background-color: rgb(245, 222, 136);"></div>
<div style="padding: 0.07rem; overflow: hidden;">
<div style="font-size: 0.18rem; text-overflow: ellipsis; overflow: hidden; white-space: nowrap;">Newsletter 1</div>
<div style="font-size: 0.13rem; color: rgb(102, 102, 102);">2021 11 8</div>
</div>
</div>
<div style="cursor: pointer; background-color: rgb(248, 248, 248); display: flex; line-height: 1.2; margin-bottom: 0.07rem;">
<div style="width: 0.5rem; flex-shrink: 0; background-color: rgb(221, 221, 221);"></div>
<div style="padding: 0.07rem; overflow: hidden;">
<div style="font-size: 0.18rem; text-overflow: ellipsis; overflow: hidden; white-space: nowrap;">Newsletter 2 </div>
<div style="font-size: 0.13rem; color: rgb(102, 102, 102);">2021 11 3</div>
</div>
</div>
This is the selenium/python code that I'm using:
driver.get("http://www.testwesbite.org/#/newsarticles")
results = driver.find_elements_by_class_name('test-section-container')
texts = []
for result in results:
text = result.text
texts.append(text)
print(text)
This gives me an output off:
Newsletter 1
2021 11 8
Newsletter 2
2021 11 3
If I use the following code:
first_result = results[0]
first_result.click()
It does click into the first article but a results[1] give me an out of bounds error.
How would I go about click on the second article?
As you have used driver.find_elements_by_class_name('test-section-container') all the following texts:
Newsletter 1
2021 11 8
Newsletter 2
2021 11 3
Are within the results[0] element and results[1] desn't exists. Hence you face out of bounds error
Solution
To click on each results[0] and results[1] you can use:
driver.get("http://www.testwesbite.org/#/newsarticles")
results = driver.find_elements(By.CSS_SELECTOR, "div.test-section-container div[style*='nowrap']")
texts = []
for result in results:
text = result.text
texts.append(text)
print(text)
Now you can click the individual items as:
first_result = results[0]
first_result.click()
and
second_result = results[1]
second_result.click()
Note: You have to add the following imports :
from selenium.webdriver.common.by import By
This code can send an formated email with info of the investment fund. But when have more than one idk how to send it.
I was trying to append to a list than formatted it after to send with HTML. But its not working
#"""Use pandas to format the table Guia de Fundos"""
import pandas as pd
import PySimpleGUI as sg
import yagmail
df = pd.read_csv("HERE GOES ONE SPREAD SHEET USING AS DB")
df = df.set_index('CNPJ', drop = False)
# Very basic window. Return values as a list
layout = [`
[sg.Text('Entre os dados a seguir, Nome, E-mail')],
[sg.Text('CNPJ do Fundo', size=(25, 1)), sg.InputText('')],
[sg.Text('Nome do Cliente', size=(25, 1)), sg.InputText('')],
[sg.Text('E-mail do Cliente', size=(25, 1)), sg.InputText('')],
[sg.Submit(), sg.Cancel()]
]
window = sg.Window('Opção de Investimento').Layout(layout)
button, values = window.Read()
window.Close()
print(button, values[0], values[1], values[2])
#'TO TEST USE THIS CNPJ 18860180000147'
text_input = int(values[0])
Nome_cliente = values[1]
email_cliente = values[2]
df = df.loc[text_input]
print(df['FUNDO'], df['Aplicação Inicial'], df['Desde_Início'],
df['Liquidez_total_(CotizaçãoLiquidação)'], df['CVM'], df['Tributação'])
name = df['FUNDO']
rentabilidade = df['Desde_Início']
liquidez = df['Liquidez_total_(CotizaçãoLiquidação)']
categoria = df['CVM']
aplicacaomin = df['Aplicação Inicial']
sg.PopupScrolled("""
Nome do Cliente: {0}
E-mail: {1}
Fundo: {2}
Rentabilidade: {3}
Liquidez: {4}
Categoria: {5}
Aplicacao Minima: {6}""".format(Nome_cliente, email_cliente, name,
rentabilidade, liquidez, categoria, aplicacaomin),
title='Verificar', yes_no=True)
"""Get information from the table Guia de FUndos and send an email"""
yagmail.register('YOUR E-MAIL', 'PASSWORD')
yag = yagmail.SMTP('YOUR E-MAIL')
contents = ['''
<html>
<body align="center" style="-ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; width: 100%; height: 100%; margin: 0; padding: 0; background-color: #06202e;">
<tr>
<img src="https://d335luupugsy2.cloudfront.net/cms/files/68608/1552053344/$aw5o0p8ghll" alt="" width="173" border="0" style="max-width: 1674px; border: 0; -ms-interpolation-mode: bicubic; height: auto; outline: none; text-decoration: none; vertical-align: bottom;" class="img-max">
</tr>
<table width="100%" border="0" cellspacing="0" cellpadding="0" style="-webkit-text-size-adjust:100%; -ms-text-size-adjust:100%; mso-table-lspace:0pt; mso-table-rspace:0pt; border-collapse:collapse !important;">
<p>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;">Ola {5}, conforme conversado segue opcao de investimento... </span></li>
</p>
<tr>
<td bgcolor="#072738" align="left" style="-webkit-text-size-adjust:100%; -ms-text-size-adjust:100%; mso-table-lspace:0pt; mso-table-rspace:0pt; font-size: 16px; line-height: 150%; font-family: Helvetica, Arial, sans-serif; color: #666666; padding: 9px 18px; word-break: break-word !important;" class="padding">
<p style="line-height: 150%;"><span style="font-size: 13px; color: #ffffff;"><strong> {0} (Lamina em Anexo)</strong></span></p>
<ul>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;"> Rentabilidade: {1} CDI (Acumulada)</span></li>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;">IR: respeita a tabela regressiva de renda fixa (come cotas)</span></li>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;">Liquidez: D+{2}</span></li>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;">Categoria: {3}</span></li>
<li style="line-height: 150%;"><span style="font-size: 12px; color: #ffffff;">Aplicação Minima: {4}</span></li>
</ul>
</td>
</tr>
</table>
</body>
</html>
'''.format(name, rentabilidade, liquidez, categoria, aplicacaomin, Nome_cliente)]
"""You can add attachments=filename, to send with the email."""
yag.send(email_cliente, 'SUBJECT', contents)
So if anyone could help. I need to send with more the one Funds. Today this code can only send with one fund.
Example e-mail:
-Today
Hello ....
Fund1 info...
-Expected
Hello ....
Fund1 info...
Fund2 info...
Fund3 info...
I'm trying to use the findChildren() function. I basically want all the <p> under a particular <h3> tag. I'm trying a simple amount of code but the set children. I'm getting back is empty. h3 returns the correct line (see print(h3) comment) and the print(type(children)) prints type: <class 'bs4.element.ResultSet'>. Please tell me what I'm doing wrong.
soup = BeautifulSoup(contents, 'html.parser')
h3 = soup.find('h3', text=re.compile('chapter', re.IGNORECASE))
print(h3) #result prints <h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
children = h3.findChildren('p')
print(type(children)) #returns type: <class 'bs4.element.ResultSet'>
I also tried h3.findChildren('p', Recursive=True) and children = h3.findChildren(Recursive=True). Which also come back empty.
Here's the section of HTML I'm trying to grab:
<h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;">
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span>
</p>
<p></p>
Thanks to those who responded. My problem is that <h3> and the sub <p>s are siblings not parent/child. I think these posts are what I'm after code-wise but my comment above remains. http://stackoverflow.com/questions/51571609/… and http://stackoverflow.com/questions/51852588/
In the sample you provided, the h3 node has no children. All of the p nodes are outside of that scope.
If you wrap your contents in a div (say) then you can see you're using the right technique
>>> soup = BeautifulSoup('<div>' + contents + '</div>', 'html.parser')
>>> div = soup.find('div')
>>> div.findChildren('p')
[<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span></p>, <p> </p>]
>>>
Edit
As you mention in your comments above, the h3 and p nodes are siblings in the content you've supplied. I'm not sure it makes sense to have p elements that are children of h3, but if you did it would look like
<h3>
This content is within the h3 tag
<p>this is a child of h3</p>
<p>another child</p>
</h3>
<p>this is not a child of h3 as it is after the h3 close tag</p>
It's not really clear what the conditions for selecting p nodes in your example content should be - a simple soup.find('p') would return all of those tags, but I suspect you need to limit it in some way to prevent other content from being included. Can you elaborate? You possibly just want something like:
>>> soup = BeautifulSoup(content, 'html.parser')
>>> h3 = soup.find('h3')
>>> h3.find_next_sibling('p')
<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;">
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span>
</p>
Thank you for your patience. I had to figure out how to get the html structure, prettify the html and write to a file to see the relationships better, etc. The pages I need to process (I didn't write them) have a structure as below. After building the bs4 structure, I figured out my desired content starts at the <article..> tag and ends at the beginning of the next <script...> code here</<script> <h3>Comments</h3>. I'm not sure how to terminate a search between two different tags. I was able to grab EVERYTHING between an <h3> tag and the next <h3> tag. But that pulls the <script> section which I don't want. Thanks again for continuing help! -Meghan
....
<div id="rt-main" class="sa3-mb9">
<div class="rt-container">
<div class="rt-grid-9 rt-push-3">
<div class="rt-block">
<div id="rt-mainbody">
<div class="component-content">
<article class="item-pageDarkening">
<h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
<p> </p>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">text.. ż/span></p>
<p> </p>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">text here</span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;"></span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span></p>
<p> </p>
<p>dljlg</p>
<span></span>
<p>dljlg</p>
<span></span>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><em><span style="font-size: 16px; font-family: 'arial black', 'avant garde'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;"> </span></em></p>
<script type='text/javascript'>
Komento.ready(function($) {
// declare master namespace variable for shared values
Komento.component = "com_content";
Komento.cid = "1211";
Komento.contentLink = "...";
Komento.sort = "latest";
Komento.loadedCount = parseInt(10);
Komento.totalCount = parseInt(56);
if( Komento.options.konfig.enable_shorten_link == 0 ) {
Komento.shortenLink = Komento.contentLink;
}
});
</script>
<div id="section-kmt" class="theme-kuro">
<script type="text/javascript">
Komento.require()
.library('dialog')
.script(
'komento.language',
'komento.common',
'komento.commentform'
)
.done(function($) {
if($('.commentForm').exists()) {
Komento.options.element.form = new Komento.Controller.CommentForm($('.commentForm'));
Komento.options.element.form.kmt = Komento.options.element;
}
});
</script>
<div id="kmt-form" class="commentForm kmt-form clearfix">
<a class="addCommentButton kmt-form-addbutton" href="javascript:void(0);"><b>Add comment</b></a>
<div class="formArea kmt-form-area hidden">
<h3 class="kmt-title">Leave your comments</h3>
I want to print both the CVE-IDs "CVE-2013-2566" and "CVE-2015-2808" under References and "tcp 23" which corresponds to Unencrypted telnet server using beautiful soup. Couldn't think of a logic for that.
<div xmlns="" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;">42263 - Unencrypted Telnet Server</div>
<div xmlns="" style="margin: 0 0 45px 0;">
<div class="details-header">Risk Factor<div class="clear"></div>
</div>
<div style="line-height: 20px; padding: 0 0 20px 0;">Medium<div class="clear"></div>
<div class="details-header">Plugin Information: <div class="clear"></div>
</div>
<div style="line-height: 20px; padding: 0 0 20px 0;">Published: 2009/10/27, Modified: 2015/10/21<div class="clear"></div>
</div>
<div class="details-header">**References**<div class="clear"></div>
</div>
<div id="idm8894160" style="display: block;" class="table-wrapper see-also">
<table cellpadding="0" cellspacing="0">
<thead><tr>
<th width="15%"></th>
<th width="85%"></th>
</tr></thead>
<tbody>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff">CVE-2013-2566</td>
</tr>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff">CVE-2015-2808</td>
</tr>
</tbody>
<div class="details-header">Plugin Output<div class="clear"></div>
</div>
<h2>tcp/23</h2>
This is what I have written and I am stuck where I have put the comments.
I am very much a beginner in bs4 so just bear with me please and I have to submit a report tomorrow so, please help.
from bs4 import BeautifulSoup
import csv
import urllib.request as urllib2
with open(r"C:\Users\sourabhk076\Documents\CHIDRMUM_DR8016CHI1_CTSINWDB01_9xtqpj.html") as fp:
soup = BeautifulSoup(fp.read(), 'html.parser')
f = csv.writer(open("Report.csv", "w"))
f.writerow(["Observation", "Port", "CVE-ID"])
medium = soup.find_all('div', attrs={'style':'box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;'})
####this will search for text "Unencrypted telnet server"####
for x in medium:
port = x.find('h2')
cve = x.find('div', class_='table-wrapper see-also').findAll('tr')
######## don't know what to do next #############
obsv = x.text
portd = port.text
print([obsv,portd,cve])
Code:
from bs4 import BeautifulSoup
with open('/path/to/some.html') as f:
soup = BeautifulSoup(f.read(), 'html.parser')
service = soup.find('div', style='box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;').get_text(strip=True)
cve_ids = [cve_elem.text for cve_elem in soup.select('table > tbody > tr > td > a')]
protocol, port = soup.select_one('table > h2').text.split('/')
print('{}, {}/{}, CVE-IDs: {}'.format(service, protocol, port, cve_ids))
Output:
42263 - Unencrypted Telnet Server, tcp/23, CVE-IDs: ['CVE-2013-2566', 'CVE-2015-2808']
Notice usage of select() that works with CSS selectors. I also used >, which is a child combinator.
The child combinator (>) is placed between two CSS selectors. It
matches only those elements matched by the second selector that are
the children of elements matched by the first.
you can search your tags for child tags. So maybe something like
tbody = cve.find("tbody")
for row in tbody.find_all("tr"):
print row.find_all("td")[1].text
https://niioa.immigration.gov.tw/NIA_OnlineApply_inter/visafreeApply/visafreeApplyForm.action
Something pop up after I select the first item and I cannot handle the popup . I do not know what it is, it's not alert. and I cant find the frame for the (switch to frame)
its a Chinese website....
so I have pasted the elements that's loaded after I selected the first item
<div class="blockUI" style="display:none"></div>
<div class="blockUI blockOverlay" style="z-index: 1000; border: none; margin: 0px; padding: 0px; width: 100%; height: 100%; top: 0px; left: 0px; background-color: rgb(0, 0, 0); opacity: 0.6; cursor: wait; position: fixed;"></div>
<div class="blockUI blockMsg blockPage" style="z-index: 1011; position: fixed; padding: 0px; margin: 0px; width: 450px; top: 539.5px; left: 119.5px; text-align: center; color: rgb(0, 0, 0); border: 3px solid rgb(170, 170, 170); background-color: rgb(255, 255, 255); height: 140px; overflow: hidden;"><div id="showWarnMessage1" style="">
<table class="application" style="margin: 10px;">
<tbody><tr>
<td>
<p class="Prompt" style="text-align: center">注意</p>
<p>除香港居民持有BNO護照及澳門居民持有1999年前取得之葡萄牙護照外,持有外國護照,不適合辦理本許可。</p>
</td>
</tr>
</tbody></table>
<div>
<input class="btn" value="確認" type="button" onclick="$.unblockUI();">
</div>
</div></div>
This worked for me to get past the pop-up:
chromedriver = "your_path"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(15)
driver.get('https://niioa.immigration.gov.tw/NIA_OnlineApply_inter/visafreeApply/visafreeApplyForm.action')
driver.find_element_by_xpath('//*[#id="isHKMOVisaN"]').click()
And then this last line is what gets rid of the pop-up:
driver.find_element_by_xpath('//*[#id="showWarnMessage1"]/div/input').click()