I am trying to make a login to http://site24.way2sms.com/content/index.html
This is the script I've written.
import urllib
import urllib2
url = 'http://site21.way2sms.com/content/index.html'
values = {'username' : 'myusername',
'password' : 'mypassword'}
headers = {'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, sdch',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'If-Modified-Since':'Fri, 13 Nov 2015 17:47:23 GMT',
'Referer':'https://packetforger.wordpress.com/2013/09/13/changing-user-agent-in-python-requests-and-requesocks-and-using-it-in-an-exploit/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers=headers)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
I am getting the response from the website. But it's kind of encrypted or something like:
��:�����G��ʯ#��C���G�X�*�6�?���ך��5�\���:�tF�D1�٫W��<�bnV+w\���q�����$�Q��͇���Aq`��m�*��Օ���)���)�
in my ubuntu terminal. How can I fix this ?
Am I being logged in correctly ?
Please help.
The form on that page doesn't post back to the same URL, it posts to http://site21.way2sms.com/content/Login.action.
Related
I'm trying to return a GET request from an API using HTTPBasicAuth.
I've tested the following in Postman, and received the correct response
URL:"https://someapi.data.io"
username:"username"
password:"password"
And this returns me the data I expect, and all is well.
When I've tried this in python however, I get kicked back a 403 error, alongside a
""error_type":"ACCESS DENIED","message":"Please confirm api-key, api-secret, and permission is correct."
Below is my code:
import requests
from requests.auth import HTTPBasicAuth
URL = 'https://someapi.data.io'
authBasic=HTTPBasicAuth(username='username', password='password')
r = requests.get(URL, auth = authBasic)
print(r)
I honestly can't tell why this isn't working since the same username and password passes in Postman using HTTPBasicAuth
You have not conveyed all the required parameters. And postman is doing this automatically for you.
To be able to use in python requests just specify all the required parameters.
headers = {
'Host': 'sub.example.com',
'User-Agent': 'Chrome v22.2 Linux Ubuntu',
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest'
}
url = 'https://sub.example.com'
response = requests.get(url, headers=headers)
It could be due to the fact that the user-agent is not defined
try the following:
import requests
from requests.auth import HTTPBasicAuth
URL = 'https://someapi.data.io'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
authBasic=HTTPBasicAuth(username='username', password='password')
r = requests.get(URL, auth = authBasic, headers=headers)
print(r)
I have implemented a code to download bhav-copies for all the dates in the stock market. After scraping about 2 years, it seems like my IP got blocked.
This code doesn't work for me.
import urllib.request
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
response = urllib.request.urlopen(url)
It gives the following error :
urllib.error.HTTPError: HTTP Error 403: Forbidden
I would like to know how I can use some proxy to get the data. Any help would be really appreciated.
import urllib.request
proxy_host = '1.2.3.4:8080' # host and port of your proxy
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
req = urllib.request.Request(url)
req.set_proxy(proxy_host, 'http')
response = urllib.request.urlopen(req)
For more flexibility, you can use a Proxy Handler - https://docs.python.org/3/library/urllib.request.html
proxy_handler = urllib.request.ProxyHandler({'http': '1.2.3.4:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
This works fine,
import requests
headers = {
'authority': 'www.nseindia.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
'accept': '*/*',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.nseindia.com/content/',
'accept-language': 'en-US,en;q=0.9,lb;q=0.8',
}
url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip"
r = requests.get(url,headers=headers)
with open("data.zip","wb") as f:
f.write(r.content)
if you have proxies,
proxy = {"http" : "x.x.x.x:pppp",
"https" :"x.x.x.x:pppp",
}
r = requests.get(url, headers=headers, proxies=proxy)
You don't need to use proxies to download this file. The code below will work like a charm:
import urllib.request
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
req = urllib.request.Request(url)
# Add referer header to bypass "HTTP Error 403: Forbidden"
req.add_header('Referer', 'https://www.nseindia.com')
res = urllib.request.urlopen(req)
# Save it into file.zip
with open("file.zip", "wb") as f:
f.write(res.read())
In case you want to get free proxies, visit https://free-proxy-list.net/. Then follow the answer of #pyd at https://stackoverflow.com/a/63328368/8009647
I'm trying to download data from Fragantica.com using urlopen but an error occurs ("HTTP Error 403: Forbidden") even after changing the user-agent and adding headers.
I have tried the code from here as well with no success (http://wolfprojects.altervista.org/changeua.php#problem).
Here is my code:
import urllib.request
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
url = "https://www.fragrantica.com/perfume/Tom-Ford/Tobacco-Vanille-1825.html"
headers={'User-Agent':user_agent,}
request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need
This is the error I encounter: HTTPError: HTTP Error 403: Forbidden
You might need to specify more headers, try this:
import urllib.request
url = "https://www.fragrantica.com/perfume/Tom-Ford/Tobacco-Vanille-1825.html"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
request=urllib.request.Request(url=url, headers=headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need
For some reason python requests does not do rePOST after encountered redirect header
import requests
proxies = {'http': 'http://127.0.0.1:8888',}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url, data, headers=headers, timeout=timeout, proxies=proxies, allow_redirects=True,)
html = r.text
So it means I can't login to any form that is behind redirect. How can I solve this issue? Thank you!
I know there are tons of threads and videos on how to do this, I've gone through them all and am in need of a little advanced guidance.
I am trying to log into this webpage where I have an account so I can send a request to download a report.
First I send the get request to the login page, then send the post request but when I print(resp.content) I get the code back for the login page. I do get a code[200] but I can't get to the index page. No matter what page I try to get after the post it keeps redirecting me back to the login page
Here are a couple things I'm not sure if I did correctly:
For the header I just put everything that was listed when I inspected the page
Not sure if I need to do something with the cookies?
Below is my code:
import requests
import urllib.parse
url = 'https://myurl.com/login.php'
next_url = 'https://myurl.com/index.php'
username = 'myuser'
password = 'mypw'
headers = {
'Host': 'url.myurl.com',
'Connection': 'keep-alive',
'Content-Length': '127',
'Cache-Control': 'max-age=0',
'Origin': 'https://url.myurl.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer': 'https://url.myurl.com/login.php?redirect=1',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.8',
'Cookie': 'PHPSESSID=3rgtou3h0tpjfts77kuho4nnm3'
}
login_payload = {
'XXX_login_name': username,
'XXX_login_password': password,
}
login_payload = urllib.parse.urlencode(login_payload)
r = requests.Session()
r.get(url, headers = headers)
r.post(url, headers = headers, data = login_payload)
resp = r.get(next_url, headers = headers)
print(resp.content)
You don't need to send separate requests for authorization and file download. You need to send single POST with specifying credentials. Also in most cases you don't need to send headers. In common your code should looks like follow:
from requests.auth import HTTPBasicAuth
url_to_download = "http://some_site/download?id=100500"
response = requests.post(url_to_download, auth=HTTPBasicAuth('your_login', 'your_password'))
with open('C:\\path\\to\\save\\file', 'w') as my_file:
my_file.write(response.content)
There are a few more fields in the form data to post:
import requests
data = {"redirect": "1",
"XXX_login_name": "your_username",
"XXX_login_password": "your_password",
"XXX_actionSUBMITLOGIN": "Login",
"XXX_login_php": "1"}
with requests.Session() as s:
s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
r1 = s.get("https://eym.sicomasp.com/login.php")
s.headers["cookie"] = r1.headers["Set-Cookie"]
pst = s.post("https://eym.sicomasp.com/login.php", data=data)
print(pst.history)
You may get redirected to index.php automatically after the post, you can check r1.history and r1.content to see exactly what is happening.
So I figured out what my problem was, just in case anyone in the future has the same issue. I am sure different websites have different requirements but in this case the Cookie: I was sending in the request header was blocking it. What I did was grab my cookie in the headers AFTER I logged in. I updated my headers and then I sent the request. This is what ended up working:
(also the form data needs to be encoded in HTML)
import requests
import urllib.parse
headers = {
'Host' : 'eym.sicomasp.com',
'Content-Length' : '62',
'Origin' : 'https://eym.sicomasp.com',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Referer' : 'https://eym.sicomasp.com/login.php?redirect=1',
'Cookie' : 'PHPSESSID=vdn4er761ash4sb765ud7jakl0; SICOMUSER=31+147234553'
} #Additional cookie information after logging in ^^^^
data = {
'XXX_login_name': 'myuser',
'XXX_login_password': 'mypw',
}
data = urllib.parse.urlencode(data)
with requests.Session() as s:
s.headers.update(headers)
resp = s.post('https://eym.sicomasp.com/index.php', data=data2)
print(resp.content)