https://niioa.immigration.gov.tw/NIA_OnlineApply_inter/visafreeApply/visafreeApplyForm.action
Something pop up after I select the first item and I cannot handle the popup . I do not know what it is, it's not alert. and I cant find the frame for the (switch to frame)
its a Chinese website....
so I have pasted the elements that's loaded after I selected the first item
<div class="blockUI" style="display:none"></div>
<div class="blockUI blockOverlay" style="z-index: 1000; border: none; margin: 0px; padding: 0px; width: 100%; height: 100%; top: 0px; left: 0px; background-color: rgb(0, 0, 0); opacity: 0.6; cursor: wait; position: fixed;"></div>
<div class="blockUI blockMsg blockPage" style="z-index: 1011; position: fixed; padding: 0px; margin: 0px; width: 450px; top: 539.5px; left: 119.5px; text-align: center; color: rgb(0, 0, 0); border: 3px solid rgb(170, 170, 170); background-color: rgb(255, 255, 255); height: 140px; overflow: hidden;"><div id="showWarnMessage1" style="">
<table class="application" style="margin: 10px;">
<tbody><tr>
<td>
<p class="Prompt" style="text-align: center">注意</p>
<p>除香港居民持有BNO護照及澳門居民持有1999年前取得之葡萄牙護照外,持有外國護照,不適合辦理本許可。</p>
</td>
</tr>
</tbody></table>
<div>
<input class="btn" value="確認" type="button" onclick="$.unblockUI();">
</div>
</div></div>
This worked for me to get past the pop-up:
chromedriver = "your_path"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(15)
driver.get('https://niioa.immigration.gov.tw/NIA_OnlineApply_inter/visafreeApply/visafreeApplyForm.action')
driver.find_element_by_xpath('//*[#id="isHKMOVisaN"]').click()
And then this last line is what gets rid of the pop-up:
driver.find_element_by_xpath('//*[#id="showWarnMessage1"]/div/input').click()
Related
There is an Attribution problem. I tried all references but I can't fix it.
This code is only successful when the cursor is on the textbox.
code:
browser.get('https://zeus.gist.ac.kr/sys/main/main.do')
browser.implicitly_wait(10)
iframe = browser.find_element_by_id("TOBE_JSP")
browser.switch_to.frame(iframe)
browser.find_element_by_id('mainframe_VFrameSet_HFrameSet_leftFrame_form_gridMenu_body_gridrow_7').click()
browser.implicitly_wait(10)
browser.find_element_by_id('mainframe_VFrameSet_HFrameSet_leftFrame_form_gridMenu_body_gridrow_12').click()
browser.implicitly_wait(10)
bodytemp = browser.find_element_by_xpath("~~~~/input").click()
bodytemp.send_keys("36.5")
element:
<div id="~~~~" style="position: absolute; overflow: hidden; background-color: transparent; left: 0px; top: 0px; width: 54px; height: 18px; cursor: text; user-select: initial;"><input id="mainframe_VFrameSet_HFrameSet_MDIFrameSet_ctxFrameSet_ctxFrame_PERS07^PERS07_08^005^AmcDailyTempRegE_form_div_sample_divMain_divForm_edtTemp_input" tabindex="-1" style="border: none; outline: none; position: absolute; overflow: hidden; background-color: transparent; left: 0px; top: 0px; width: 54px; height: 18px; cursor: text; font: 9pt NanumGothic; color: rgb(34, 34, 34); text-align: left; padding: 0px 1px;"></div>
<input id="~~~~_input" tabindex="-1" style="border: none; outline: none; position: absolute; overflow: hidden; background-color: transparent; left: 0px; top: 0px; width: 54px; height: 18px; cursor: text; font: 9pt NanumGothic; color: rgb(34, 34, 34); text-align: left; padding: 0px 1px;">
And I also tried:
bodytemp = browser.find_element_by_xpath("~~~~/input")
bodytemp.send_keys("36.5")
->selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
(Session info: chrome=89.0.4389.114)
It sometimes runs well(maybe 1/10..?)
And I also tried:
bodytemp = browser.find_element_by_xpath("~~~~/input")
bodytemp.clear()
bodytemp.send_keys("36.5")
-> AttributeError: 'NoneType' object has no attribute 'clear'
The element is visible. I already check it.
please help me. I can't sleep.
I just solve it to click and send_keys both.
browser.find_element_by_xpath("~~~~_input").click()
browser.find_element_by_xpath("~~~~_input").send_keys("36.5")
How can I get the URL from this output of Selenium in Python?
<div style="z-index: 999; overflow: hidden; background-position: 0px 0px; text-align: center; background-color: rgb(255, 255, 255); width: 480px; height: 672.172px; float: left; background-size: 1054px 1476px; display: none; border: 0px solid rgb(136, 136, 136); background-repeat: no-repeat; position: absolute; background-image: url("https://photo.venus.com/im/19230307.jpg?preset=zoom");" class="zoomWindow"> </div>
I got the above output from the following command line:
driver.find_element_by_class_name('zoomWindowContainer')
Firstly, get style atribute by:
div = driver.find_element_by_class_name('zoomWindow')
style = div.get_attribute("style") # str
Then, using regex to find url from style:
import re
urls = re.findall(r"https?://.+\.jpg", style) # list
print (urls[0])
I'm trying to use the findChildren() function. I basically want all the <p> under a particular <h3> tag. I'm trying a simple amount of code but the set children. I'm getting back is empty. h3 returns the correct line (see print(h3) comment) and the print(type(children)) prints type: <class 'bs4.element.ResultSet'>. Please tell me what I'm doing wrong.
soup = BeautifulSoup(contents, 'html.parser')
h3 = soup.find('h3', text=re.compile('chapter', re.IGNORECASE))
print(h3) #result prints <h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
children = h3.findChildren('p')
print(type(children)) #returns type: <class 'bs4.element.ResultSet'>
I also tried h3.findChildren('p', Recursive=True) and children = h3.findChildren(Recursive=True). Which also come back empty.
Here's the section of HTML I'm trying to grab:
<h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;">
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span>
</p>
<p></p>
Thanks to those who responded. My problem is that <h3> and the sub <p>s are siblings not parent/child. I think these posts are what I'm after code-wise but my comment above remains. http://stackoverflow.com/questions/51571609/… and http://stackoverflow.com/questions/51852588/
In the sample you provided, the h3 node has no children. All of the p nodes are outside of that scope.
If you wrap your contents in a div (say) then you can see you're using the right technique
>>> soup = BeautifulSoup('<div>' + contents + '</div>', 'html.parser')
>>> div = soup.find('div')
>>> div.findChildren('p')
[<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span></p>, <p> </p>]
>>>
Edit
As you mention in your comments above, the h3 and p nodes are siblings in the content you've supplied. I'm not sure it makes sense to have p elements that are children of h3, but if you did it would look like
<h3>
This content is within the h3 tag
<p>this is a child of h3</p>
<p>another child</p>
</h3>
<p>this is not a child of h3 as it is after the h3 close tag</p>
It's not really clear what the conditions for selecting p nodes in your example content should be - a simple soup.find('p') would return all of those tags, but I suspect you need to limit it in some way to prevent other content from being included. Can you elaborate? You possibly just want something like:
>>> soup = BeautifulSoup(content, 'html.parser')
>>> h3 = soup.find('h3')
>>> h3.find_next_sibling('p')
<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;">
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span>
</p>
Thank you for your patience. I had to figure out how to get the html structure, prettify the html and write to a file to see the relationships better, etc. The pages I need to process (I didn't write them) have a structure as below. After building the bs4 structure, I figured out my desired content starts at the <article..> tag and ends at the beginning of the next <script...> code here</<script> <h3>Comments</h3>. I'm not sure how to terminate a search between two different tags. I was able to grab EVERYTHING between an <h3> tag and the next <h3> tag. But that pulls the <script> section which I don't want. Thanks again for continuing help! -Meghan
....
<div id="rt-main" class="sa3-mb9">
<div class="rt-container">
<div class="rt-grid-9 rt-push-3">
<div class="rt-block">
<div id="rt-mainbody">
<div class="component-content">
<article class="item-pageDarkening">
<h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
<p> </p>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">text.. ż/span></p>
<p> </p>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">text here</span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;"></span><span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span></p>
<p> </p>
<p>dljlg</p>
<span></span>
<p>dljlg</p>
<span></span>
<p style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;" dir="ltr"><em><span style="font-size: 16px; font-family: 'arial black', 'avant garde'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;"> </span></em></p>
<script type='text/javascript'>
Komento.ready(function($) {
// declare master namespace variable for shared values
Komento.component = "com_content";
Komento.cid = "1211";
Komento.contentLink = "...";
Komento.sort = "latest";
Komento.loadedCount = parseInt(10);
Komento.totalCount = parseInt(56);
if( Komento.options.konfig.enable_shorten_link == 0 ) {
Komento.shortenLink = Komento.contentLink;
}
});
</script>
<div id="section-kmt" class="theme-kuro">
<script type="text/javascript">
Komento.require()
.library('dialog')
.script(
'komento.language',
'komento.common',
'komento.commentform'
)
.done(function($) {
if($('.commentForm').exists()) {
Komento.options.element.form = new Komento.Controller.CommentForm($('.commentForm'));
Komento.options.element.form.kmt = Komento.options.element;
}
});
</script>
<div id="kmt-form" class="commentForm kmt-form clearfix">
<a class="addCommentButton kmt-form-addbutton" href="javascript:void(0);"><b>Add comment</b></a>
<div class="formArea kmt-form-area hidden">
<h3 class="kmt-title">Leave your comments</h3>
I am extracting plaintext from HTML emails using BeautifulSoup. I've got everything working nicely except for one issue. My emails often have replies included below the message at the top. So I have threaded emails, and I end up capturing the same text repeatedly. In most cases, I want to just get rid of everything after the first <div> tag I find. If I print, soup.contents, it outputs the following:
p
None
p
None
p
None
p
None
p
None
p
None
p
None
p
None
p
None
p
None
p
None
div
None
meta
None
style
None
div
None
p
I am looking to return a BeautifulSoup object with everything passed the first div tag removed.
HTMLwise, here's the before and after I'm going for:
Before:
<p> Hi Joe </p>
<p> I will be at the meeting tonight</p>
<p> Allison </p>
<div style='border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentColor currentColor; font-family: "Arial","sans-serif";'>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>From: </b>John Doe <jdoe#example.com></p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>Sent: </b>Wednesday, May 30, 2018 6:48 AM</p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>To: </b>Allison <allison#example.com></p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>Subject: </b>RE: meeting tonight</p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
</p>
</div>
<p>Will you be at the meeting tonight?</p>
After:
<p> Hi Joe </p>
<p> I will be at the meeting tonight</p>
<p> Allison </p>
In BeautifulSoup4, you can use the find_all_next method to delete everything after the tag, including the tag itself. This is only going to work if the elements afterwards are defined, e.g. they can't just belong to the Body element.
target = soup.find('div')
for e in target.find_all_next():
e.clear()
The easiest way in this case is just run re and remove all contents after first <div> tag:
s = """<p> Hi Joe </p>
<p> I will be at the meeting tonight</p>
<p> Allison </p>
<div style='border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentColor currentColor; font-family: "Arial","sans-serif";'>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>From: </b>John Doe <jdoe#example.com></p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>Sent: </b>Wednesday, May 30, 2018 6:48 AM</p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>To: </b>Allison <allison#example.com></p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
<b>Subject: </b>RE: meeting tonight</p>
<p style="margin: 2px 0px; padding: 0px; color: rgb(34, 34, 34); font-family: Arial; font-size: 10pt; background-color: rgb(255, 255, 255);">
</p>
</div>
<p>Will you be at the meeting tonight?</p>"""
import re
new_s = re.sub(r'<div.*', '', s, flags=re.DOTALL).strip()
print(new_s)
Prints:
<p> Hi Joe </p>
<p> I will be at the meeting tonight</p>
<p> Allison </p>
Then you can feed this new string to BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(re.sub(new_s, 'lxml')
print(soup.prettify())
Outputs:
<html>
<body>
<p>
Hi Joe
</p>
<p>
I will be at the meeting tonight
</p>
<p>
Allison
</p>
</body>
</html>
I'm filling a form using requests and python but i'm blocked with the recaptcha.
I need to send a g-recaptcha-response but I don't know how to get it.
Here is the website code:
<div class="g-recaptcha" data-callback="checkoutAfterCaptcha" data-sitekey="6LeWwRkUAAAAAOBsau7xxxx-xxxxxxxxxx" data-size="invisible">
<div class="grecaptcha-badge" data-style="bottomright" style="width: 256px; height: 60px; transition: right 0.3s ease; position: fixed; bottom: 14px; right: -186px; box-shadow: gray 0px 0px 5px;">
<div class="grecaptcha-logo">
<iframe src="https://www.google.com/recaptcha/api2/anchor?ar=1&k=6LeWwRkUAAAAAOBsau7KpuC9AV-6J8mhw4AjC3Xz&co=aHR0cHM6Ly93d3cuc3VwcmVtZW5ld3lvcmsuY29tOjQ0Mw..&hl=fr&v=v1525674693836&size=invisible&cb=g8s5582r6zik" width="256" height="60"
role="presentation" frameborder="0" scrolling="no" sandbox="allow-forms allow-popups allow-same-origin allow-scripts allow-top-navigation allow-modals allow-popups-to-escape-sandbox" kwframeid="3">
</iframe>
</div>
<div class="grecaptcha-error">
</div>
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none; display: none; ">
</textarea>
</div>
</div>
here is my code, I manage to get the data-sitekey but I don't understand how to get the g-recaptcha-response:
page = c.get(link_checkout)
soup = BeautifulSoup(page.text, 'html.parser')
find_class = soup.find(class_='g-recaptcha')
get_captcha_token = find_class.get('data-sitekey')
print (get_captcha_token)
# try:
# content = requests.post(
# 'https://www.google.com/recaptcha/api/siteverify',
# data={
# 'secret': RECAPTCHA_SECRET,
# 'response': get_captcha_token,
# 'remoteip': ip
# }
# ).content
# except:
# print ("fail")
# print (get_captcha_token)
c.post(url, data=payload_FORM, headers={"Refered": link_checkout})
page = c.get(link_checkout)
thank you all for your help! This is my last probleme to finish my programme and there is really little on google
If you need more info tell me in the coms I will add it