Here is the code I am working with.
import requests
headers = { 'Accept':'*/*',
'Accept-Language':'en-US,en;q=0.8',
'Cookie':'Cookie:PHPSESSID=vev1ekv3grqhh37e8leu1coob1',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Proxy-Authorization':'Basic ZWRjZ3Vlc3Q6ZWRjZ3Vlc3Q=',
'If-Modified-Since':'Fri, 13 Nov 2015 17:47:23 GMT',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
}
with requests.Session() as c:
url = 'http://172.31.13.135/tpo/spp/'
c.get(url, headers=headers)
payload = {'regno': 'myregno', 'password': 'mypassword'}
c.post(url, data = payload, headers=headers)
r = c.get('http://172.31.13.135/tpo/spp/home.php', headers=headers)
print r.content
I get the following message when I run this script.
<script>
alert("Session timeout !");
window.location = "logout.php";
</script><script>
alert("Unauthorised Access!");
window.location = "index.php";
</script>
<!DOCTYPE html>
<html lang="en">
How do I deal with this "session timeout" issue ?
Many thanks in advance.
It really makes tough to answer when I can't visit the website to scrape.
So here's my guess,
1) Try removing cookies from your headers you don't need that.
Because requests.Session() will generate cookies of its own when you visit url = 'http://172.31.13.135/tpo/spp/' for the first time.
So your headers will be,
headers = { 'Accept':'*/*',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Proxy-Authorization':'Basic ZWRjZ3Vlc3Q6ZWRjZ3Vlc3Q=',
'If-Modified-Since':'Fri, 13 Nov 2015 17:47:23 GMT',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
}
2) Make sure that 'If-Modified-Since' field in header is static to what you have mentioned and it doesn't change. If it does change then please code it accordingly to set the date and time on realtime basis.
3) I am not sure why you have 'Proxy-Authorization':'Basic ZWRjZ3Vlc3Q6ZWRjZ3Vlc3Q=' in headers. Try headers without it.
But, if you have to have it then please make sure that this auth code is static too and it doesn't change everytime.
Let me know if that helps
You could also just pass the timeout variable in your get and post method:
timeout = 10 # ten sec.
with requests.Session() as c:
url = 'http://172.31.13.135/tpo/spp/'
c.get(url, headers=headers, timeout=timeout)
payload = {'regno': 'myregno', 'password': 'mypassword'}
c.post(url, data = payload, headers=headers, timeout=timeout)
r = c.get('http://172.31.13.135/tpo/spp/home.php', headers=headers, timeout=timeout)
print r.content
You could tweak the max time you wan to wait for a response by doing this
For more about request see the docs
Related
The website I am trying to log in to is https://realitysportsonline.com/RSOLanding.aspx. I can't seem to get the login to work since the process is a little different to a typical site that has a login specific page. I haven't got any errors, but the log in action doesn't work, which then causes the main to redirect to the homepage.
import requests
url = "https://realitysportsonline.com/RSOLanding.aspx"
main = "https://realitysportsonline.com/SetLineup_Contracts.aspx?leagueId=3000&viewingTeam=1"
data = {"username": "", "password": "", "vc_btn3 vc_btn3-size-md vc_btn3-shape-rounded vc_btn3-style-3d vc_btn3-color-danger" : "Log In"}
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer': 'https://realitysportsonline.com/RSOLanding.aspx',
'Host': 'realitysportsonline.com',
'Connection': 'keep-alive',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'}
s = requests.session()
s.get(url)
r = s.post(url, data, headers=header)
page = requests.get(main)
First of all, you create a session and assuming your POST request worked, you then request an authorised page without using your previously created session.
You need to make the request with the s object you created like so:
page = s.get(main)
However, there were also a few issues with your POST request. You were making a request to the home page instead of the /Login route. You were also missing the Content-Type header.
import requests
url = "https://realitysportsonline.com/Services/AccountService.svc/Login"
main = "https://realitysportsonline.com/LeagueSetup.aspx?create=true"
payload = {"username":"","password":""}
headers = {
'Content-Type': "text/json",
'Cache-Control': "no-cache"
}
s = requests.session()
response = s.post(url, json=payload, headers=headers)
page = s.get(main)
PS your main request url redirects to the homepage, even with a valid session (at least for me).
So I have this page I need to make login in constantly, but I can't even make login in the page if I don't use some cookies in the headers.
The problem is if I use the cookies postman gives me, they will eventually expire and I have to replace all of them again in the code.
#####this makes the login
url = "https://www.APAGE.com"#login url
payload = "user="+dude["user"]+"&password="+dude["password"]+"&action=login"
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0", #for some reason i need this or the page returns error
'Content-Type': "application/x-www-form-urlencoded",
'X-Requested-With': "XMLHttpRequest",
'Cookie': "ASP.NET_SessionId=<an usually expirable cookie>; __RequestVerificationToken_L0NvbnN1bHRhV2Vi0=<another expirable cookie>",#i need THESE!
'Cache-Control': "no-cache",
}
login = session.post(url, data=payload, headers=headers)# makes the login
print "Login open"
cookie = session.cookies.get_dict() #get the recursive cookie
#this here is me trying to grab the request-cookies just after the login so i can repass them so they don't expire
print '================'
print login.request.headers
print '================'
print '\n\n\n'
cookie2 = login.headers.get('Set-Cookie')
print login.headers
print cookie2
print login.cookies.get_dict()
#makes a get request to change to the initial page
url = "www.APAGE-after-login.com"
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0",
'Referer': "www.APAGE-after-login.com",
'Cookie': "ASP.NET_SessionId=<the cookie again>.; __RequestVerificationToken_L0NvbnN1bHRhV2Vi0=<the other cookie>; .ASPXAUTH="+str(cookie['.ASPXAUTH']), #here i need to repost the .ASPAUTH cookie every time after a request or the session expires
'Upgrade-Insecure-Requests': "1",
'Cache-Control': "no-cache",
}
moving = session.get(url,headers=headers)
cookie = session.cookies.get_dict()
I need help here to get those cookies so, when they change, I don't have to change entire sections of the code again and again.
Does anyone know how I can intercept those request-cookies so I can use them?
thanks!
Edit: i already have the session = requests.session() declared in the code and i've already tried several solutions to solve the problem... the code works if i manually place the cookie on the headers but the cookie will expire in a couple days... The requests library for some reason is not handling the cookies automatically...
if i use this header:
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0",
'Cookie': .ASPXAUTH="+str(cookie['.ASPXAUTH']),
'Cache-Control': "no-cache",
}
or any other variation as
moving = session.get(url,headers=headers,cookies=cookie) #cookie that i tried to get before
the login simply dont work. it returns an error page.
Thanks
Edit2:
for customer in customers:
session = requests.session()
##create a folder
if not os.path.exists("C:\\Users\\Desktop\\customers\\" + customer["dir"] + "/page"):
os.makedirs("C:\\Users\\Desktop\\customers\\" + customer["dir"] + "/page", 0755)
search_date= datetime.datetime.now().strftime("%d-%m-%Y-%H-%M-%S")
search_date_end= (datetime.datetime.now() - timedelta(days = 30)).strftime("%d/%m/%Y")
search_date_begining= (datetime.datetime.now() - timedelta(days = 30)).strftime("%d/%m/%Y")
search_date_closing= (datetime.datetime.now() - timedelta(days = 45)).strftime("%d/%m/%Y")
search_date_closing= urllib.quote_plus(data_busca_fechamento)
search_date_begining= urllib.quote_plus(data_busca_inicio)
search_date_end= urllib.quote_plus(data_busca_fim)
print str(search_date_end)
######makes the login
url = "www.ASITE.com/aunthenticate/APAGELogin" #login
payload = "user="+customer["user"]+"&password="+customer["pass"]+"&action=login"
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0", #for some reason i need this or the login breaks
'Content-Type': "application/x-www-form-urlencoded",
'X-Requested-With': "XMLHttpRequest",
'Cookie': "ASP.NET_SessionId=<some cookie>; __RequestVerificationToken_L0NvbnN1bHRhV2Vi0=<part1cookie>-<part2cookie>-<part3cookie>", #i need to get these cookies to login, for some reason i cant get them by any means
'Cache-Control': "no-cache",
}
login = session.post(url, data=payload, headers=headers)#open the login session on the page
print "Login session open"
cookie = session.cookies.get_dict() #when i get this cookie i only get the recursive cookie '.ASPXAUTH' that i need to get again every request or the session expires
print login.text
#The response has only one line with some site data confirming the login
#if the login fails it returns an HTML with the error message
#here i try to get the request cookies and not the response ones, but the headers dont return any cookies at all
print '================'
print login.request.headers
print '================'
print '\n\n\n'
cookie2 = login.headers.get('Set-Cookie')
print login.headers
print cookie2
print login.cookies.get_dict() #this cookie is returned, but just the '.ASPXAUTH' one, the one i already know how to get
#makes the get request to the index page
url = "www.ASITE/index/home"
headers = {
'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0", #again i need to pass the user-agent
'Accept-Language': "pt-BR,pt;q=0.8,en-US;q=0.5,en;q=0.3",
'Cookie': "ASP.NET_SessionId=<a cookie>; __RequestVerificationToken_L0NvbnN1bHRhV2Vi0=<other long cookie>; .ASPXAUTH="+str(cookie['.ASPXAUTH']), #here i need to start passing the recursive cookie again and again every request on the site
'Upgrade-Insecure-Requests': "1",
'Cache-Control': "no-cache",
}
moving = session.get(url,headers=headers)
cookie = session.cookies.get_dict() #get the '.ASPXAUTH' again
The problem here is that if i manually set the missing cookies the code will work for a couple days, but when they expire or if another machine uses the code i have to set them again manually.
in this way i tried several things to get those 2 other cookies before the requests, none actually worked, and, for some reason, the 'requests' library is not handling them automatically as it should... I honestly dont know what to do anymore.
The code started working. The good news is the code now is getting the cookies the right way; the bad news is I have absolutely no idea how that happened.
The only thing I added was this piece of code: (the catch is I added it yesterday and it didn't work then... now it works).
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0",
'Content-Type': "application/x-www-form-urlencoded",
'X-Requested-With': "XMLHttpRequest",
'Cache-Control': "no-cache",
}
url = "www.asite.com/login" #login page
login = session.get(url, headers=headers) #login get
print 'login.request.headers================'
print login.request.headers
print '================'
print '\n\n\n'
cookie2 = login.headers.get('Set-Cookie')
print 'login headers ============================='
print login.headers
headers = login.headers
print '\n\n\n'
print 'login.headers.get(''Set-Cookie'') ================================'
print cookie2
print '\n\n\n'
print "login.cookies.get_dict() ========================="
test = login.cookies.get_dict()
print test
print '\n\n\n'
yesterday the login.cookies.get_dict() just returned empty dict or none or, if placed after the login, returned only the recursive cookie... now... it is working.
I'm trying to login to a webpage using python 3 using requests and lxml. However, after sending a post request to the login page, I can't enter pages that are available after login. What am I missing?
import requests
from lxml import html
session_requests = requests.session()
login_URL = 'https://www.voetbal.nl/inloggen'
r = session_requests.get(login_URL)
tree = html.fromstring(r.text)
form_build_id = list(set(tree.xpath("//input[#name='form_build_id']/#value")))[0]
payload = {
'email':'mom.soccer#mail.com',
'password':'testaccount',
'form_build_id':form_build_id
}
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Type':'multipart/form-data; boundary=----WebKitFormBoundarymGk1EraI6yqTHktz',
'Host':'www.voetbal.nl',
'Origin':'https://www.voetbal.nl',
'Referer':'https://www.voetbal.nl/inloggen',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
result = session_requests.post(
login_URL,
data = payload,
headers = headers
)
pvc_url = 'https://www.voetbal.nl/club/BBCB10Z/overzicht'
result_pvc = session_requests.get(
pvc_url,
headers = headers
)
print(result_pvc.text)
The account in this sample is activated, but it is just a test-account which I created to put my question up here. Feel free to try it out.
Answer:
there where multiple problems:
Payload: 'form_id': 'voetbal_login_login_form' was missing. Thanks #t.m.adam
Cookies: request cookies where missing. They seem to be static, so I tried to add them manually, which worked. Thanks #match and #Patrick Doyle
Headers: removed the 'content-type' line; which contained a dynamic part.
Login works like a charm now!
I know there are tons of threads and videos on how to do this, I've gone through them all and am in need of a little advanced guidance.
I am trying to log into this webpage where I have an account so I can send a request to download a report.
First I send the get request to the login page, then send the post request but when I print(resp.content) I get the code back for the login page. I do get a code[200] but I can't get to the index page. No matter what page I try to get after the post it keeps redirecting me back to the login page
Here are a couple things I'm not sure if I did correctly:
For the header I just put everything that was listed when I inspected the page
Not sure if I need to do something with the cookies?
Below is my code:
import requests
import urllib.parse
url = 'https://myurl.com/login.php'
next_url = 'https://myurl.com/index.php'
username = 'myuser'
password = 'mypw'
headers = {
'Host': 'url.myurl.com',
'Connection': 'keep-alive',
'Content-Length': '127',
'Cache-Control': 'max-age=0',
'Origin': 'https://url.myurl.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer': 'https://url.myurl.com/login.php?redirect=1',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.8',
'Cookie': 'PHPSESSID=3rgtou3h0tpjfts77kuho4nnm3'
}
login_payload = {
'XXX_login_name': username,
'XXX_login_password': password,
}
login_payload = urllib.parse.urlencode(login_payload)
r = requests.Session()
r.get(url, headers = headers)
r.post(url, headers = headers, data = login_payload)
resp = r.get(next_url, headers = headers)
print(resp.content)
You don't need to send separate requests for authorization and file download. You need to send single POST with specifying credentials. Also in most cases you don't need to send headers. In common your code should looks like follow:
from requests.auth import HTTPBasicAuth
url_to_download = "http://some_site/download?id=100500"
response = requests.post(url_to_download, auth=HTTPBasicAuth('your_login', 'your_password'))
with open('C:\\path\\to\\save\\file', 'w') as my_file:
my_file.write(response.content)
There are a few more fields in the form data to post:
import requests
data = {"redirect": "1",
"XXX_login_name": "your_username",
"XXX_login_password": "your_password",
"XXX_actionSUBMITLOGIN": "Login",
"XXX_login_php": "1"}
with requests.Session() as s:
s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
r1 = s.get("https://eym.sicomasp.com/login.php")
s.headers["cookie"] = r1.headers["Set-Cookie"]
pst = s.post("https://eym.sicomasp.com/login.php", data=data)
print(pst.history)
You may get redirected to index.php automatically after the post, you can check r1.history and r1.content to see exactly what is happening.
So I figured out what my problem was, just in case anyone in the future has the same issue. I am sure different websites have different requirements but in this case the Cookie: I was sending in the request header was blocking it. What I did was grab my cookie in the headers AFTER I logged in. I updated my headers and then I sent the request. This is what ended up working:
(also the form data needs to be encoded in HTML)
import requests
import urllib.parse
headers = {
'Host' : 'eym.sicomasp.com',
'Content-Length' : '62',
'Origin' : 'https://eym.sicomasp.com',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Referer' : 'https://eym.sicomasp.com/login.php?redirect=1',
'Cookie' : 'PHPSESSID=vdn4er761ash4sb765ud7jakl0; SICOMUSER=31+147234553'
} #Additional cookie information after logging in ^^^^
data = {
'XXX_login_name': 'myuser',
'XXX_login_password': 'mypw',
}
data = urllib.parse.urlencode(data)
with requests.Session() as s:
s.headers.update(headers)
resp = s.post('https://eym.sicomasp.com/index.php', data=data2)
print(resp.content)
I am trying to make a login to http://site24.way2sms.com/content/index.html
This is the script I've written.
import urllib
import urllib2
url = 'http://site21.way2sms.com/content/index.html'
values = {'username' : 'myusername',
'password' : 'mypassword'}
headers = {'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, sdch',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'If-Modified-Since':'Fri, 13 Nov 2015 17:47:23 GMT',
'Referer':'https://packetforger.wordpress.com/2013/09/13/changing-user-agent-in-python-requests-and-requesocks-and-using-it-in-an-exploit/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers=headers)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
I am getting the response from the website. But it's kind of encrypted or something like:
��:�����G��ʯ#��C���G�X�*�6�?���ך��5�\���:�tF�D1�٫W��<�bnV+w\���q�����$�Q��͇���Aq`��m�*��Օ���)���)�
in my ubuntu terminal. How can I fix this ?
Am I being logged in correctly ?
Please help.
The form on that page doesn't post back to the same URL, it posts to http://site21.way2sms.com/content/Login.action.