I have a script to post a form from a distant web page.
This webpage sends a cookie, and I like to get this and insert it in the header to post the form.
How could I do that ?
Thank you
If you want to get cookies from GET request and use it in POST, you can try to use session() as below:
session = requests.session()
response = session.get(URL) # send GET to get cookies
cookies = response.cookies.get_dict() # Grab cookies
post = session.post(URL, cookies=cookies) # Send POST with grabbed cookies
Thanks a lot,
When I go to the website from a web browser I have this :
Cookie: __qca=P0-1275531178-1484650046149; _ga=GA1.2.1669629019.1484650088; optimizelyEndUserId=oeu1486719631164r0.37280650070167387; optimizelySegments=%7B%224541596561%22%3A%22ff%22%2C%224547081444%22%3A%22none%22%2C%224502945149%22%3A%22direct%22%2C%224547981473%22%3A%22false%22%7D; optimizelyBuckets=%7B%7D; IRF_1322=%7Bvisits%3A1%2Cuser%3A%7Btime%3A1486719631639%2Cref%3A%22direct%22%2Cpv%3A1%2Ccap%3A%7B%7D%2Cv%3A%7B%7D%7D%2Cvisit%3A%7Btime%3A1486719631639%2Cref%3A%22direct%22%2Cpv%3A1%2Ccap%3A%7B%7D%2Cv%3A%7B%7D%7D%2Clp%3A%22http%3A%2F%2Flastingmemoriesbylisa.zenfolio.com%2F%22%2Cdebug%3A0%2Ca%3A1486719631639%7D; zf_5y_visitor=s7fL-1Wogsx-E1mBu-WkZbl-5gcAAAAATer6DQw9WNnr; zf_10y_tz=120; zf_pat=256887569$labellevuephotography$$396676290$492530544; zf_lsc=NPUPdGc1+bbM3gpqJ/i3nH4o...0
but when I use my script I online have :
zf_5y_visitor=s7fL-1Wogsx-E1mBu-WkZbl-5gcAAAAATer6DQw9WNnr; zf_10y_tz=120;
Is that possible to get the other parts ? At least the one starting with "zf"
Related
I'am trying to scrape this website: https://gb4.typewriter.at/. I'am struggling with logging in. The weird thing is: the form is being filled out but it is not sent, and therefore I don't get any cookies.
Here is what I tried:
import requests
url = "https://gb4.typewriter.at/"
s = requests.Session()
s.get(url)
payload = {
"LoginForm[username]": "some_username",
"LoginForm[pw]": "some_password",
}
r = self.session.post(
f"{url}/index.php?r=site/login",
data=payload,
)
print(r.content) # this prints the html with both forms filled
print(r.cookies) # gives me a PHPSESSID cookie, not the cookie I need, the site uses another one (some random chrachters) as authentication cookie.
I also tried to send a post request with Postman, which worked completely fine and logged me in.
I would really appreciate some help!
I understand there are similar questions out there, however, I couldn't make this code to work out. Does anyone know how to login and scrape the data from this website?
from bs4 import BeautifulSoup
import requests
# Start the session
session = requests.Session()
# Create the payload
payload = {'login':<USERNAME>,
'password':<PASSWORD>
}
# Post the payload to the site to log in
s = session.post("https://www.beeradvocate.com/community/login", data=payload)
# Navigate to the next page and scrape the data
s = session.get('https://www.beeradvocate.com/place/list/?c_id=AR&s_id=0&brewery=Y')
soup = BeautifulSoup(s.text, 'html.parser')
soup.find('div', class_='titleBar')
print(soup)
The process is different for almost each site, the best way to know how to do it is to use your browser's request inspector (firefox) and look at how the site behaves when you try to login.
For your website, when you click the login button a post request is sent to https://www.beeradvocate.com/community/login/login, with a little bit of trial and error your should be able to replicate it.
Make sure you match the content-type and request headers (specifically cookies in case you need auth tokens).
I have problem with simple authorization and upload API script.
When authorized, client receives several cookies, including PHPSESSID cookie (in browser).
I use requests.post method with form data for authorization:
r = requests.post(url, headers = self.headers, data = formData)
self.cookies = requests.utils.dict_from_cookieja(r.cookies)
Headers are used for custom User-Agent only.
Authorization is 100% fine (there is a logout link on the page).
Later, i try to upload data using the authorized session cookies:
r = requests.post(url, files = files, data = formData, headers = self.headers, cookies = self.cookies)
But site rejects the request. If we compare the requests from script and google chrome (using Wireshark), there is no differences in request body.
Only difference is that 2 cookies sent by requests class, while google chrome sends 7.
Update: Double checked, first request receives 7 cookies. post method just ignore half...
My mistake in code was that i was assigning cookies from each next API request to the session cookies dictionary. On each request since logged in, cookies was 'reset' by upcoming response cookies, that's was the problem. As auth cookies are assigned only at login request, they were lost at the next request.
After each authorized request i use update(), not assigning.
self.cookies.update( requests.utils.dict_from_cookiejar(r.cookies) )
Solves my issue, upload works fine!
I'm just studying the requests library(http://docs.python-requests.org/en/latest/),
and got a problem on how to fetch a page with cookies using requests.
for example:
url2= 'https://passport.baidu.com'
parsedCookies={'PTOKEN': '412f...', 'BDUSS': 'hnN2...', ...} #Sorry that the cookies value is replaced by ... for instance of privacy
req = requests.get(url2, cookies=parsedCookies)
text=req.text.encode('utf-8','ignore')
f=open('before.html','w')
f.write(text)
f.close()
req.close()
when I use the codes above to fetch the page, it just saves the login page to 'before.html' instead of logined page, it refers that actually I haven't logged in successfully.
But if I use URLlib2 to fetch the page, it works properly as expected.
parsedCookies="PTOKEN=412f...;BDUSS=hnN2...;..." #Different format but same content with the aboved cookies
req = urllib2.Request(url2)
req.add_header('Cookie', parsedCookies)
ret = urllib2.urlopen(req)
f=open('before_urllib2.html','w')
f.write(ret.read())
f.close()
ret.close()
When I use these codes, it saves the logined page in before_urllib2.html.
--
Are there any mistakes in my code?
Any reply would be grateful.
You can use Session object to get what you desire:
url2='http://passport.baidu.com'
session = requests.Session() # create a Session object
cookie = requests.utils.cookiejar_from_dict(parsedCookies)
session.cookies.update(cookie) # set the cookies of the Session object
req = session.get(url2, headers=headers,allow_redirects=True)
If you use the requests.get function, it doesn't send cookies for the redirected page. Instead, if you use the Session().get function, it will maintain and send cookies for all http requests, this is what the concept "session" exactly means.
Let me try to elaborate to you what happens here:
When I sent cookies to http://passport.baidu.com/center and set the parameter allow_redirects as false, the returned status code is 302 and one of the headers of the response is 'location': '/center?_t=1380462657' (This is a dynamic value generated by server, you can replace it with what you get from server):
url2= 'http://passport.baidu.com/center'
req = requests.get(url2, cookies=parsedCookies, allow_redirects=False)
print req.status_code # output 302
print req.headers
But when I set the parameter allow_redirects as True, it still doesn't redirect to the page (http://passport.baidu.com/center?_t=1380462657) and the server return the login page. The reason is that the requests.get doesn't send cookies for the redirected page, here is http://passport.baidu.com/center?_t=1380462657, so we can login successfully. That is why we need the Session object.
If I set url2 = http://passport.baidu.com/center?_t=1380462657, it will return the page you want. One solution is use the above code to get the dynamic location value and form a path to you account like http://passport.baidu.com/center?_t=1380462657 , then you can get the desired page.
url2= 'http://passport.baidu.com' + req.headers.get('location')
req = session.get(url2, cookies=parsedCookies, allow_redirects=True )
But this is cumbersome, so when dealing with cookies, Session object do excellent job for us!
I am trying to scrape some selling data using the StubHub API. An example of this data seen here:
https://sell.stubhub.com/sellapi/event/4236070/section/null/seatmapdata
You'll notice that if you try and visit that url without logging into stubhub.com, it won't work. You will need to login first.
Once I've signed in via my web browser, I open the URL which I want to scrape in a new tab, then use the following command to retrieve the scraped data:
r = requests.get('https://sell.stubhub.com/sellapi/event/4236070/section/null/seatmapdata')
However, once the browser session expires after ten minutes, I get this error:
<FormErrors>
<FormField>User Auth Check</FormField>
<ErrorMessage>
Either is not active or the session might have expired. Please login again.
</ErrorMessage>
I think that I need to implement the session ID via cookie to keep my authentication alive and well.
The Requests library documentation is pretty terrible for someone who has never done this sort of thing before, so I was hoping you folks might be able to help.
The example provided by Requests is:
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")
print r.text
# '{"cookies": {"sessioncookie": "123456789"}}'
I honestly can't make heads or tails of that. How do I preserve cookies between POST requests?
I don't know how stubhub's api works, but generally it should look like this:
s = requests.Session()
data = {"login":"my_login", "password":"my_password"}
url = "http://example.net/login"
r = s.post(url, data=data)
Now your session contains cookies provided by login form. To access cookies of this session simply use
s.cookies
Any further actions like another requests will have this cookie