Almost all examples on line for POST and GET request in python are using the same URL (https://api.github.com/events or similar). I'd like to have a real concrete example to understand how it works.
My aim is to download stockexchange data from this website
https://www.abcbourse.com/download/historiques.aspx
By looking at the HTML code I found the "download button" :
type="submit" name="ctl00$BodyABC$Button1" value="Télécharger" id="ctl00_BodyABC_Button1
and one of the checkbox: #id="ctl00_BodyABC_xcac40p" type="checkbox" name="ctl00$BodyABC$xcac40p" />
I have no idea now how to use to make a script saying 'check the box ***, press the download button'
Any help would be appreciate,
it's not so simple here to explain in detail what I would do ... But I can show you a way forward ..., Python's Selenium library makes it possible to generate bots on webpages to perform actions you want. You can through a findElement, localize the check button and press it. I did a recent job using this library and I know that it is perfect for solving your problem.
Related
I'm trying to write a small python script that will get a list of URLs (which are mostly just a search bar with some other elements), "write" something in the search bar and "press" enter.
My goal is to get to the next page, where all the search results are, and look at the new URL
The sites are different, so I can't just get the query parameter since I don't know it, and every site is different.
I was thinking about searching for the "input" part of the page (since the search bar is supposed to be the only input there), and send something to it. Then wait for the new URL.
Is that possible? Is there a smarter way?
I'm trying to avoid using something else but python for now (selenium, etc..)
I was searching every possible answer here and on the web, but nothing was possible so I was thinking about using the input part somehow...
Without selenium and similar softwares, you'll have to understand what is actually happening when you click such a button.
I'll take an example on a famous site (hint: if you read this you'll know which site I mean). In the HTML source I can see that (truncated) piece of code:
<form id="search" role="search" action=/search method="get" class="grid--cell fl-grow1 searchbar px12 js-searchbar " autocomplete="off">
<div class="ps-relative">
<input name="q"
type="text"
...
What it means is that when you submit the form, the browser would hit the URL /search (which here means https://stackoverflow.com/search), with a get request, wrapping all of the form's fields in the URL, after a ?. If I searched for the term python it would lead to http://stackoverflow.com/search?q=python.
Note that according to their content, the parameters would need to be URL-encoded.
If the form contains more input fields, you'll have to wrap them too, separating them by & signs, in such a way: param1=value1¶m2=value2&...
To sum up, searching only inputs won't be sufficient, you'll have to parse the forms.
Not knowing more about your data, I cannot elaborate, but I think you might be able to do something with that.
If you go to the site, you'd notice that there is an age confirmation window which I want to bypass through scrapy but I messed up with that and I had to move on to selenium webdriver and now I'm using
driver.find_element_by_xpath('xpath').click()
to bypass that age confirmation window. Honestly I don't want to go with selenium webdriver because of its time consumption. Is there any way to bypass that window?
I searched a lot in stackoverflow and google
but didn't get any answer which may resolves my problem. If you've any link or idea of resolving it by Scrapy, that'd be appreciated. A single helpful comment will be up-voted!
To expand on Chillie's answer.
The age verification is irrelavant here. The data you are looking for is loaded via AJAX request:
See related question: Can scrapy be used to scrape dynamic content from websites that are using AJAX? to understand how they work.
You need to figure out how https://ns5bwtai8m-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%203.19.1&x-algolia-application-id=NS5BWTAI8M&x-algolia-api-key=e676b05f3844d3adf54a29732af6e43c url works and how can you retrieve in it scrapy.
But the age verification "window" is just a div that gets hidden when you press the button, not a real separate window:
<div class="age-check-modal" id="age-check-modal">
You can use the browser's Network tab in developer tools to see that no new info is uploaded or sent when you press the button. So everything is already loaded when you request a page. The "popup" is not even a popup, just an element whose display is changed to none when you click the button.
So Scrapy doesn't really care what's meant to be displayed as long as all html is loaded. If the elements are loaded, they are accessible. Or have you seen some information being unavailable without pressing the button?
You should inspect the html code more to see what each website does, this might make your scraping tasks easier.
Edit: After inspecting the original html you can see the following:
<div class="products-list">
<div class="products-container-block">
<div class="products-container">
<div id="hits" class='row'>
</div>
</div>
</div>
</div>
You can also see a lot of JS script tags.
The browser element inspector shows us the following:
The ::before part gives away that this was manipulated by JS, as you cannot do this with simple CSS. See Granitosaurus' answer for details on this.
What this means is that you need to somehow execute the arbitrary JS code on those pages. So you either need a solution with Scrapy, or just use Selenium, as many do, and as you already have.
I am attempting to click a button on a html page using Python and selenium web driver.
This is the source code of the page http://pastebin.com/112g1Gje.
I believe the relevant portion is at the end. I'm trying to click the button that says "Message"
Normally I would do something like:
driver.find_element_by_id("message-modal").click()
However that doesn't work.
I have tried:
driver.find_element_by_id("message_label").click()
driver.execute_script('document.getElementByName(" Message ").click();')
driver.execute_script('document.getElementById("message-senderId").click();')
driver.execute_script('document.getElementById("message- label").addEventListener("submit", function())')
...etc.
All of them don't work.
For the stars by the way I had the same issue. It was hard to click it, but I figured this part out. This worked:
driver.execute_script('document.getElementById("star_41094_4").checked = true;')
I think this page is switching up the numbers for the star. So that number may not work right now. But that's a separate issue. Does anybody know?
EDIT: I have asked a moderator to delete this thread. I had a number of things wrong here. I am creating a new one.
Try
driver.find_element_by_xpath("//*[text()='Open Message Modal']").click()
Happy Coding :)
I think you forgot to code a button that opens that message-modal. Better create that button first like.
<button class="btn" id="btn-message-modal" data-toggle="modal" data-target="#message-modal"> Open Message Modal</btn>\
Then try this:
driver.find_element_by_id("btn-message-modal").click()
PS
message-modal is the id of the modal container that is why nothing happens
On this code
driver.find_element_by_id("message-modal").click()
driver.find_element_by_classname("btn").click() works
I have this little website i want to fill in a form with the requests library. The problem is i cant get to the next site when filling the form data and hitting the button(Enter does not work).
The important thing is I can't do it via a clicking bot of some kind. This needs to be done so I can run in without graphics.
info = {'name':'JohnJohn',
'message':'XXX',
'sign':"XXX",
'step':'1'}
First three entries name, message, sign are the text areas and step is I think the button.
r = requests.get(url)
r = requests.post(url, data=info)
print(r.text)
The Form Data looks like this when i send a request via chrome manually:
name:JohnJohn
message:XXX
sign:XXX
step:1
The button element looks like this:
<td colspan="2" style="text-align: center;">
<input name="step" type="hidden" value="1">
<button id="button" type="button" onclick="myClick();"
style="background-color: #ef4023; width: 80px; font-face: times; font-size: 14pt;">
Wyślij
</button>
</td>
The next site if i do this manually has the same adres.
As you might see from the snipped you posted, clicking the button is triggering some JavaScript code, namely a method called myClick().
It is not straightforward to click on this thing using pythons requests library. You might have more luck trying to find out what happens inside myClick(). My guess would be that at some point, a POST request will be made to a HTTP endpoint. If you can figure this out, you can translate it into your python code.
If that does not work, another option would be to use something like Selenium/PhantomJS, which give you the ability to have a real, headless and scriptable browser. Using such a tool, you can actually have it fill out forms and click buttons. You can have a look at this so answer, as it shows you how to use Selenium+PhantomJS from python.
Please make sure not to abuse such methods by spamming forums or [insert illegal or otherwise abusive activity here].
In such a situation when you need to forge scripted button's request, it may be easier not to guess the logic of JS but instead perform a physical click and look into chrome devtools' network sniffer which gives you a plain request made which, in turn, can be easily forged in Python
Here's the link: http://nikeplus.nike.com/plus/
The email/password option only shows when I click "Log in" button. So how to use python to log into this website?
I tried twill and got the forms on the page but it includes only the search bar. So not sure how to proceed
While not a python solution, I wrote a PHP class that actually lets you get the data from the Nike+ website: https://nikeplusphp.charanj.it
The class works by faking the login on the website and then makes requests to the feeds. If you look through the code you'll find all the URLs to make the necessary GET requests and there is a method called _login() and this should give you an idea of what parameters are posted.