I am trying to make a post request to that website: http://archive.eso.org/wdb/wdb/asm/dimm_paranal/form
so far I did that:
import requests
import bs4
url = 'http://archive.eso.org/wdb/wdb/asm/dimm_paranal/form'
p = {'search': 'Search',
'start_date' : '2019-09-17..2019-09-18'}
post = requests.post(url,data=p)
when I analyse the text from the post I only get the form webpage html code and not the result of the query. How can I simulate the query?
Additional question: How can I check the checkboxes in the form?
The form has an action, in this case it is /wdb/wdb/asm/dimm_paranal/query. Try to send the request there...
In devtools (Ctrl+Shift+I) you have "Network". Go there and see what is actually requested, check all the data, response, headers and so on.
Another help I would recommend is a programm caled Postman. You can create requests there, no need to code it.
Additional answer to your additional question: The checkboxes have no default value. Just set anything. 1, true, whatever. It should work.
Related
good evening,
im trying to write a programme that extracts the sell price of certain stocks and shares on a website called hl.co.uk
As you can imagine you have to search for the stock you want to see the sale price of.
my code so far is as follows:
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.hl.co.uk/shares"
page = requests.get(url)
parsed_html = soup(page.content, 'html.parser')
form = parsed_html.find('form', id="stock_search")
input_tag = form.find('input').get('name')
submit = form.find('input', id="stock_search_submit").get('alt')
post_data = {input_tag: "fgt", "alt": submit}
i have been able to extract the correct form tag and the input names i require. but the website has multiple forms on this page.
how can i submit a post request to this website using the data i have in "post_data" to that specfic form in order for it to search the stockk/share that i desire and then give me the next page?
thanks in advance
Actually when you submit the form from the homepage, it redirect you to the the target page with an url looking like this, "https://www.hl.co.uk/shares/search-for-investments?stock_search_input=abc&x=56&y=35&category_list=CEHGINOPW", so in my opinion, instead of submitting the homepage form, you should directly call the target page with your own GET parameters, the url you're supposed to call will look like this https://www.hl.co.uk/shares/search-for-investments?stock_search_input=[your_keywords].
Hope this helped you
This is a pretty general problem which you can use google chrome's devtools to solve. Basically,
1- Navigate to the page where you have a form and bunch of fields.
In your case page should look like this:
2- Then choose XHR tab under Network tab which will filter out all Fetch and XHR requests. These requests are generally sent after a form submission and they return a JSON with resulting data most of the time.
3- Make sure you enable the checkbox on the top left Preserve Log so the list doesn't refresh when form is submitted.
4- Submit the form, then you'll see bunch of requests are being made. Inspect them to hopefully find what you're looking for.
In this case I found this URL endpoint which gives out the results as response.
https://www.hl.co.uk/ajax/funds/fund-search/search?investment=&companyid=1324§orid=132&wealth=&unitTypePref=&tracker=&payment_frequency=&payment_type=&yield=&standard_ocf=&perf12m=&perf36m=&perf60m=&fund_size=&num_holdings=&start=0&rpp=20&lo=0&sort=fd.full_description&sort_dir=asc&
You can see all the query parameters here as companyid, sectorid what you need to do is change those and just make a request to URL. Then you'll get the relevant information.
To retrieve those companyid and sectorid values you can send a get request to the page https://www.hl.co.uk/shares/search-for-investments?stock_search_input=ftg&x=17&y=23&category_list=CEHGINOPW which has those dropdowns and filter the html to find these values in the screenshot below:
You can see this documentation for BS4 to find tags inside HTML source, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find
I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.
The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.
My code after trying multiple solutions to no avail:
import requests
LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
"loginId": "myemail#email.com",
"password": "mypassword"
}
with requests.Session() as sess:
#go to login page
sess.get(LOGIN_URL)
#attempt to log in with my login info
sess.post(LOGIN_URL, data=LOGIN_INFO)
#go to 'My AliExpress' page to verify successful login
success = sess.get("https://home.aliexpress.com/index.htm")
#manually check html to see if I was sent to the login page again
print(success.text)
This is pretty much what's left after my many failed attempts. Some of the things I've tried are:
Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns
this but I don't know what to do with it (in key:value format):
ali_apache_tracktmp :
ali_apache_track :
xman_f :
t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
JSESSIONID : 30678741D7473C80BEB85825718FB1C6
acs_usuc_t :
acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
xman_us_f : x_l=0
ali_apache_id : 23.76.146.14.1510893827939.187695.4
xman_t :
PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.
Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).
Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.
Googled for a few hours but didn't find anything that would work.
At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.
I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.
I have a suspicion this will not work the way you want it to.
Even after somehow accomplishing the login prompt, the following page presents a "slider verification" which to my knowledge requests is unable to do anything about. (If there is a method please let me know).
I have been trying to use cookies instead:
session = requests.Session()
cj = requests.cookies.RequestsCookieJar()
cj.set('KEY', 'VALUE')
session.cookies = cj
response = session.get(url, timeout=5, headers=headers, proxies=proxies)
Previously the scraper worked using headers and proxies for a time, but recently it always prompts a login.
I have tried all the keys and values in the cookies as well to no avail.
An idea would be to use selenium to login and capture cookies, then pass it to requests session.
AntoG has a solution to do this:
https://stackoverflow.com/a/42114843
I'm using Python library requests for this, but I can't seem to be able to log in to this website.
The url is https://www.bet365affiliates.com/ui/pages/affiliates/, and I've been trying post requests to https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1 with the data of "ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox", "ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox", etc, but I never seem to be able to get logged in.
Could someone more experienced check the page's source code and tell me what am I am missing here?
The solution could be this: Please Take attention, you could do it without selenium. If you want to do without it, firstly you should get the main affiliate page, and from the response data you could fetch all the required information (which I gather by xpaths). I just didn't have enough time to write it in fully requests.
To gather the informations from response data you could use XML tree library. With the same XPATH method, you could easily find all the requested informations.
import requests
from selenium import webdriver
Password = 'YOURPASS'
Username = 'YOURUSERNAME'
browser = webdriver.Chrome(os.getcwd()+"/"+"Chromedriver.exe")
browser.get('https://www.bet365affiliates.com/ui/pages/affiliates/Affiliates.aspx')
VIEWSTATE=browser.find_element_by_xpath('//*[#id="__VIEWSTATE"]')
SESSIONID=browser.find_element_by_xpath('//*[#id="CMSessionId"]')
PREVPAG=browser.find_element_by_xpath('//*[#id="__PREVIOUSPAGE"]')
EVENTVALIDATION=browser.find_element_by_xpath('//* [#id="__EVENTVALIDATION"]')
cookies = browser.get_cookies()
session = requests.session()
for cookie in cookies:
print cookie['name']
print cookie['value']
session.cookies.set(cookie['name'], cookie['value'])
payload = {'ctl00_AjaxScriptManager_HiddenField':'',
'__EVENTTARGET':'ctl00$MasterHeaderPlaceHolder$ctl00$goButton',
'__EVENTARGUMENT':'',
'__VIEWSTATE':VIEWSTATE,
'__PREVIOUSPAGE':PREVPAG,
'__EVENTVALIDATION':EVENTVALIDATION,
'txtPassword':Username,
'txtUserName':Password,
'CMSessionId':SESSIONID,
'returnURL':'/ui/pages/affiliates/Affiliates.aspx',
'ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox':Username,
'ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox':Password,
'ctl00$MasterHeaderPlaceHolder$ctl00$tempPasswordTextbox':'Password'}
session.post('https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1',data=payload)
Did you inspected the http request used by the browser to log you in?
You should replicate it.
FB
I'm not sure if such a thing is possible, but I am trying to submit to a form such as https://lambdaschool.com/contact using a POST request.
I currently have the following:
import requests
payload = {"name":"MyName","lastname":"MyLast","email":"someemail#gmail.com","message":"My message"}
r = requests.post('http://lambdaschool.com/contact',params=payload)
print(r.text)
But I get the following error:
<title>405 Method Not Allowed</title>
etc.
Is such a thing possible to submit using a POST request?
If it were that simple, you'd see a lot of bots attacking every login form ever.
That URL obviously doesn't accept POST requests. That doesn't mean the submit button is POST-ing to that page (though clicking the button also gives that same error...)
You need to open the chrome / Firefox dev tools and watch the request to see what happens on form submit and replicate that data in Python.
Another option would be the mechanize or Selenium webdriver libraries to simulate a browser and fill out the form
params is for query parameters. You either want data, for a form encoded body, or json, for a JSON body.
I think the url should be 'http://lambdaschool.com/contact-form'.
I am using Python requests to get information from the mobile website of the german railways company (https://mobile.bahn.de/bin/mobil/query.exe/dox')
For instance:
import requests
query = {'S':'Stuttgart Hbf', 'Z':'München Hbf'}
rsp = requests.get('https://mobile.bahn.de/bin/mobil/query.exe/dox', params=query)
which in this case gives the correct page.
However, using the following query:
query = {'S':'Cottbus', 'Z':'München Hbf'}
It gives another response, where the user is required to choose one of the given options (The server is confused about the starting stations, since there are many beginning with 'Cottbus')
Now, my question is: given this response, how can I choose one of the given options, and then repeat the request without getting this error ?
I tried to look at the cookies, to use a session instead of a simple get request. But nothing worked so far.
I hope you can help me.
Thanks.
You can use Beautifulsoup to parse the response and get the options if there is a select on the response:
import requests
from bs4 import BeautifulSoup
query = {'S': u'Cottbus', 'Z': u'München Hbf'}
rsp = requests.get('https://mobile.bahn.de/bin/mobil/query.exe/dox', params=query)
soup = BeautifulSoup(rsp.content, 'lxml')
# check if has choice dropdown
if soup.find('select'):
# Get list of tuples with text and input values that you will nee do use in the next POST request
options_value = [(option['value'], option.text) for option in soup.find_all('option')]