Generating a Cookie in Python Requests - python

I'm relatively new to Python so excuse any errors or misconceptions I may have. I've done hours and hours of research and have hit a stopping point.
I'm using the Requests library to pull data from a website that requires a login. I was initially successful logging in through through a session.post,(payload)/session.get. I had a [200] response. Once I tried to view the JSON data that was beyond the login, I hit a [403] response. Long story short, I can make it work by logging in through a browser and inspecting the web elements to find the current session cookie and then defining the headers in requests to pass along that exact cookie with session.get
My questions is...is it possible to set/generate/find this cookie through python after logging in? After logging in and out a few times, I can see that some of the components of the cookie remain the same but others do not. The website I'm using is garmin connect.
Any and all help is appreciated.

If your issue is about login purposes, then you can use a session object. It stores the corresponding cookies so you can make requests, and it generally handles the cookies for you. Here is an example:
s = requests.Session()
# all cookies received will be stored in the session object
s.post('http://www...',data=payload)
s.get('http://www...')
Furthermore, with the requests library, you can get a cookie from a response, like this:
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies
But you can also give cookie back to the server on subsequent requests, like this:
url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
I hope this helps!
Reference: How to use cookies in Python Requests

Related

Scraping an internal web page

I have to scrape an internal web page of my organization. If I use Beautiful soap I get
"Unauthorized access"
I don't want to put my username/password in the source code because it will be shared across collegues.
If I open the same web url using Firefox It doesn't not ask me to login, the only problem is when I make the same request using python script.
Is there a way to share the same session used by firefox with a python script?
I think my authentication is with my PC because if I log off deleting all cookies When i re-enter I because logged in automatically. Do you know why with my python script this doesn’t not happen?
When you use the browser to login to your organization, you provide your credentials and the server returns a cookie tied to your organization's domain. This cookie has an expiration and allows to use navigate your organization's site without having to login as long as the cookie is valid.
You can read about cookies here:
https://en.wikipedia.org/wiki/HTTP_cookie
Your website scraper does not need to store your credentials. First delete the cookies then, using your browser's developer tools, you can (look at the network tab):
Figure out if your organization uses a separate auth end point
If it's not evident, then you might ask the IT department
Use the auth endpoint to get a cookie using credentials passed in
See how this cookie is used by the system (look at the HTTP request/response headers)
Use this cookie to scrape the website
Share your code freely - if someone needs to scrape the website then they can either pass in their credentials, or use a curl command to get/set a valid cookie header
1) After authenticating in your Firefox browser, make sure to get the cookie key/value.
2) Use that data in the code below :
from bs4 import BeautifulSoup
import requests
browser_cookies = {'your_cookie_key':'your_cookie_value'}
s = requests.Session()
r = s.get(your_url, cookies=browser_cookies)
bsoup = BeautifulSoup(r.text, 'lxml')
The requests.Session() is for persistence.
One more tips, you could also call your script like that :
python3 /path/to/script/script.py cookies_key cookies_value
Then, get the two values with sys module. The code will be :
import sys
browser_cookies = {sys.argv[1]:sys.argv[2]}

How to make HTTP POST on website that uses asp.net?

I'm using Python library requests for this, but I can't seem to be able to log in to this website.
The url is https://www.bet365affiliates.com/ui/pages/affiliates/, and I've been trying post requests to https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1 with the data of "ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox", "ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox", etc, but I never seem to be able to get logged in.
Could someone more experienced check the page's source code and tell me what am I am missing here?
The solution could be this: Please Take attention, you could do it without selenium. If you want to do without it, firstly you should get the main affiliate page, and from the response data you could fetch all the required information (which I gather by xpaths). I just didn't have enough time to write it in fully requests.
To gather the informations from response data you could use XML tree library. With the same XPATH method, you could easily find all the requested informations.
import requests
from selenium import webdriver
Password = 'YOURPASS'
Username = 'YOURUSERNAME'
browser = webdriver.Chrome(os.getcwd()+"/"+"Chromedriver.exe")
browser.get('https://www.bet365affiliates.com/ui/pages/affiliates/Affiliates.aspx')
VIEWSTATE=browser.find_element_by_xpath('//*[#id="__VIEWSTATE"]')
SESSIONID=browser.find_element_by_xpath('//*[#id="CMSessionId"]')
PREVPAG=browser.find_element_by_xpath('//*[#id="__PREVIOUSPAGE"]')
EVENTVALIDATION=browser.find_element_by_xpath('//* [#id="__EVENTVALIDATION"]')
cookies = browser.get_cookies()
session = requests.session()
for cookie in cookies:
print cookie['name']
print cookie['value']
session.cookies.set(cookie['name'], cookie['value'])
payload = {'ctl00_AjaxScriptManager_HiddenField':'',
'__EVENTTARGET':'ctl00$MasterHeaderPlaceHolder$ctl00$goButton',
'__EVENTARGUMENT':'',
'__VIEWSTATE':VIEWSTATE,
'__PREVIOUSPAGE':PREVPAG,
'__EVENTVALIDATION':EVENTVALIDATION,
'txtPassword':Username,
'txtUserName':Password,
'CMSessionId':SESSIONID,
'returnURL':'/ui/pages/affiliates/Affiliates.aspx',
'ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox':Username,
'ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox':Password,
'ctl00$MasterHeaderPlaceHolder$ctl00$tempPasswordTextbox':'Password'}
session.post('https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1',data=payload)
Did you inspected the http request used by the browser to log you in?
You should replicate it.
FB

Using Python Requests: Sessions, Cookies, and POST

I am trying to scrape some selling data using the StubHub API. An example of this data seen here:
https://sell.stubhub.com/sellapi/event/4236070/section/null/seatmapdata
You'll notice that if you try and visit that url without logging into stubhub.com, it won't work. You will need to login first.
Once I've signed in via my web browser, I open the URL which I want to scrape in a new tab, then use the following command to retrieve the scraped data:
r = requests.get('https://sell.stubhub.com/sellapi/event/4236070/section/null/seatmapdata')
However, once the browser session expires after ten minutes, I get this error:
<FormErrors>
<FormField>User Auth Check</FormField>
<ErrorMessage>
Either is not active or the session might have expired. Please login again.
</ErrorMessage>
I think that I need to implement the session ID via cookie to keep my authentication alive and well.
The Requests library documentation is pretty terrible for someone who has never done this sort of thing before, so I was hoping you folks might be able to help.
The example provided by Requests is:
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")
print r.text
# '{"cookies": {"sessioncookie": "123456789"}}'
I honestly can't make heads or tails of that. How do I preserve cookies between POST requests?
I don't know how stubhub's api works, but generally it should look like this:
s = requests.Session()
data = {"login":"my_login", "password":"my_password"}
url = "http://example.net/login"
r = s.post(url, data=data)
Now your session contains cookies provided by login form. To access cookies of this session simply use
s.cookies
Any further actions like another requests will have this cookie

Python CookieJar saves cookie, but doesn't send it to website

I am trying to login to website using urllib2 and cookiejar. It saves the session id, but when I try to open another link, which requires authentication it says that I am not logged in. What am I doing wrong?
Here's the code, which fails for me:
import urllib
import urllib2
import cookielib
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
# Gives response saying that I logged in succesfully
response = opener.open("http://site.com/login", "username=testuser&password=" + md5encode("testpassword"))
# Gives response saying that I am not logged in
response1 = opener.open("http://site.com/check")
Your implementation seems fine... and should work.
It should be sending in the correct cookies, but I see it as the case when the site is actually not logging you in.
How can you say that its not sending the cookies or may be cookies that you are getting are not the one that authenticates you.
Use : response.info() to see the headers of the responses to see what cookies you are receiving actually.
The site may not be logging you in because :
Its having a check on User-agent that you are not setting, since some sites open from 4 major browsers only to disallow bot access.
The site might be looking for some special hidden form field that you might not be sending in.
1 piece of advise:
from urllib import urlencode
# Use urlencode to encode your data
data = urlencode(dict(username='testuser', password=md5encode("testpassword")))
response = opener.open("http://site.com/login", data)
Moreover 1 thing is strange here :
You are md5 encoding your password before sending it over. (Strange)
This is generally done by the server before comparing to database.
This is possible only if the site.com implements md5 in javascript.
Its a very rare case, since only may be 0.01 % websites do that..
Check that - that might be the problem, and you are providing the hashed form and not the actual password to the server.
So, server would have been again calculating a md5 for your md5 hash.
Check out.. !!
:)
I had a similar problem with my own test server, which worked fine with a browser, but not with the urllib2.build_opener solution.
The problem seems to be in urllib2. As these answers suggest, it's easy to use more powerful mechanize library instead of urllib2:
cookieJar = cookielib.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cookieJar)
opener = mechanize.build_opener(*browser.handlers)
And the opener will work as expected!

Cookie Problem in Python

I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
The code compiles and runs, but all that prints is:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
I assume there is some error in how I'm handling cookies, but just can't seem to spot it. I've heard Mechanize is a very useful module for this type of program, but as this seems to be the only speed bump left, I was hoping to find my bug.
What you're seeing is a ajax return. It is probably using javascript to set the cookie, and screwing up your attempts to authenticate.
The error message you are getting back could be misleading. For example the server might be looking at user-agent and seeing that say it's not one of the supported browsers, or looking at HTTP_REFERER expecting it to be coming from hulu domain. My point is there are two many variables coming in the request to keep guessing them one by one
I recommend using an http analyzer tool, e.g. Charles or the one in Firebug to figure out what (header fields, cookies, parameters) the client sends to server when you doing hulu login via a browser. This will give you the exact request that you need to construct in your python code.

Categories