Im very new to python and web scraping etc but Im trying to learn while I read but Im stuck now. I have managed to use python and BeautifulSoup to grab a kinda web form from a page where theres alot of checkboxes. What Im trying to do is change some of these checkboxes from checked to unchecked or the other way around. But dont know where to really go from here.
The output from a checkbox thats checked looks like this:
<div class="checkbox">
<label>
<input checked="checked" name="Permissions" type="checkbox" value="SeeDetailedInformation"/>
</label>
</div>
The output from a checkbox thats not checked looks like this:
<div class="checkbox">
<label>
<input name="Permissions" type="checkbox" value="AdjustCounters"/>
</label>
</div>
Question is how do I change the checkbox to not beeing check or checked if thats what I want using requests.post or any other good method of changing this.
Any help with either code I can try or pointers to where I should read up on this is much appriciated. I have read abit about selenium and webdriver but dont think this will do it for me as I have 500+ pages/forms on different urls to change. (Going from url to url isnt a problem, I just need some input on have to change the checkbox)
You should look into selenium and its use as a driver for your browser. Beautifulsoup is great for parsing documents, and I'm not familiar with it's usage in form completion, but selenium can certainly do what you're looking for. I would check out this snippet of code too:
checkboxes = webdriver.find_elements_by_class('checkbox')
for i in range(len(checkboxes)):
checkbox = checkboxes[i]
if ((values[i] and not checkbox.is_selected()) or (not values[i] and checkbox.is_selected())):
checkbox.click()
where values is a list of booleans representing whether each checkbox should be selected or not.
Related
I've perused SO for quite a while and cannot find the exact or similar solution to my current problem. This is my first post on SO, so I apologize if my formatting is off.
The Problem -
I'm trying to find a button on a webpage to punch me into a timeclock automatically. I am able to sign in and navigate to the correct page (it seems the page is dynamically loaded, as switching from different tabs like "Time Management" or "Pay Period" do not change the URL).
Attempts to solve -
I've tried using direct and indirect XPaths, CSS Selectors, IDs, Classes, Names, and all have failed. Included below are the different code attempts to find the button, and also a snippet of code including the button.
Button - HTML
Full Page HTML Source Code
<td>
<a onclick="return OnEmpPunchClick2(this);" id="btnEMPPUNCH_PUNCH" class="timesheet button icon " href="javascript:__doPostBack('btnEMPPUNCH_PUNCH','')">
<span> Punch</span></a>
<input type="hidden" name="hdfEMPPUNCH_PUNCH" id="hdfEMPPUNCH_PUNCH" value="0">
</td>
Attempts - PYTHON - ALL FAIL TO FIND
#All these return: "Unable to locate element"
self.browser.find_element_by_id("btnEMPPUNCH_PUNCH")
self.browser.find_element_by_xpath("//a[#id='btnEMPPUNCH_PUNCH']")
self.browser.find_element_by_css_selector('#btnEMPPUNCH_PUNCH')
#I attempted a manual wait:
wait=WebDriverWait(self.browser,30)
button = wait.until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,'#btnEMPPUNCH_PUNCH')))
#And even manually triggering the script:
self.browser.execute_script("javascript:__doPostBack('btnEMPPUNCH_PUNCH','')")
self.browser.execute_script("__doPostBack('btnEMPPUNCH_PUNCH','')")
#Returns Message: ReferenceError: __doPostBack is not defined
None of these work, and I cannot seem to figure out why that is. Any help will be greatly appreciated!
I'm currently learning Python, and as a project for myself, I'm learning how to use Selenium to interact with websites. I have found the element through its id in HTML, but when I don't know how to reference the heading inside the element. In this case, I just want the string from <h4>.
<div class="estimate-box equal-width" id="estimate-box">
<a href="#worth" id="worthBox" style="text-decoration: none;">
<h5>Worth</h5>
<h4>$5.02</h4>
</a>
</div>
My question is, how do I get python to extract just the text in <h4>? I'm sorry if I formatted this wrong, this is my first post. Thanks in advance!
Use following xpath.
print(driver.find_element_by_xpath("//a[id='worthBox']/h4").text)
Or following css selector.
print(driver.find_element_by_css_selector("#worthBox>h4").text)
I'm trying to write a small python script that will get a list of URLs (which are mostly just a search bar with some other elements), "write" something in the search bar and "press" enter.
My goal is to get to the next page, where all the search results are, and look at the new URL
The sites are different, so I can't just get the query parameter since I don't know it, and every site is different.
I was thinking about searching for the "input" part of the page (since the search bar is supposed to be the only input there), and send something to it. Then wait for the new URL.
Is that possible? Is there a smarter way?
I'm trying to avoid using something else but python for now (selenium, etc..)
I was searching every possible answer here and on the web, but nothing was possible so I was thinking about using the input part somehow...
Without selenium and similar softwares, you'll have to understand what is actually happening when you click such a button.
I'll take an example on a famous site (hint: if you read this you'll know which site I mean). In the HTML source I can see that (truncated) piece of code:
<form id="search" role="search" action=/search method="get" class="grid--cell fl-grow1 searchbar px12 js-searchbar " autocomplete="off">
<div class="ps-relative">
<input name="q"
type="text"
...
What it means is that when you submit the form, the browser would hit the URL /search (which here means https://stackoverflow.com/search), with a get request, wrapping all of the form's fields in the URL, after a ?. If I searched for the term python it would lead to http://stackoverflow.com/search?q=python.
Note that according to their content, the parameters would need to be URL-encoded.
If the form contains more input fields, you'll have to wrap them too, separating them by & signs, in such a way: param1=value1¶m2=value2&...
To sum up, searching only inputs won't be sufficient, you'll have to parse the forms.
Not knowing more about your data, I cannot elaborate, but I think you might be able to do something with that.
I have this little website i want to fill in a form with the requests library. The problem is i cant get to the next site when filling the form data and hitting the button(Enter does not work).
The important thing is I can't do it via a clicking bot of some kind. This needs to be done so I can run in without graphics.
info = {'name':'JohnJohn',
'message':'XXX',
'sign':"XXX",
'step':'1'}
First three entries name, message, sign are the text areas and step is I think the button.
r = requests.get(url)
r = requests.post(url, data=info)
print(r.text)
The Form Data looks like this when i send a request via chrome manually:
name:JohnJohn
message:XXX
sign:XXX
step:1
The button element looks like this:
<td colspan="2" style="text-align: center;">
<input name="step" type="hidden" value="1">
<button id="button" type="button" onclick="myClick();"
style="background-color: #ef4023; width: 80px; font-face: times; font-size: 14pt;">
Wyślij
</button>
</td>
The next site if i do this manually has the same adres.
As you might see from the snipped you posted, clicking the button is triggering some JavaScript code, namely a method called myClick().
It is not straightforward to click on this thing using pythons requests library. You might have more luck trying to find out what happens inside myClick(). My guess would be that at some point, a POST request will be made to a HTTP endpoint. If you can figure this out, you can translate it into your python code.
If that does not work, another option would be to use something like Selenium/PhantomJS, which give you the ability to have a real, headless and scriptable browser. Using such a tool, you can actually have it fill out forms and click buttons. You can have a look at this so answer, as it shows you how to use Selenium+PhantomJS from python.
Please make sure not to abuse such methods by spamming forums or [insert illegal or otherwise abusive activity here].
In such a situation when you need to forge scripted button's request, it may be easier not to guess the logic of JS but instead perform a physical click and look into chrome devtools' network sniffer which gives you a plain request made which, in turn, can be easily forged in Python
Sort of like a bot, i have already checked out some sites such as pyjamas and scrapy, I know how to print data of websites, but still dont know how to interact with buttons. Can somebody help me with some demonstarative code?
Lets say i have a form
<form name="input" action="html_form_action.asp" method="get">
Username: <input type="text" name="user" />
<input type="submit" value="Submit" />
</form>
how do i identify the button to be clicked, so that python can click it for me?
If anybody knows any sites with demonstarative code, i would be very pleased.
You can use mechanize for that. It provides an easy way for interacting with websites.
If you’re looking to really simulate a browser, you might want to look at Selenium, which allows you to control a real web browser.
If the website you’re looking to interface with uses a lot of JavaScript (e.g. onclick handlers), it can be very handy.