I'm trying to perform a request at Python 3, to a url that should return a JSON. Instead it's returning a sequence of bytes that i'm unable to convert. Why am i receiving this type of response and how can i convert it into human-readable data?
Bellow a snippet of my code:
headers = {}
headers['Host']= 'XXXXX' # hidden
headers['Connection']= 'keep-alive'
headers['Content-Length']= '122'
headers['Accept']= 'application/json, text/javascript, */*; q=0.01'
headers['Origin']= 'XXXXX' # hidden
headers['X-Requested-With']= 'XMLHttpRequest'
headers['User-Agent']= 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
headers['Content-Type']= 'application/json'
headers['Referer']= 'XXXXX' # hidden
headers['Accept-Encoding']= 'gzip, deflate, br'
headers['Accept-Language']= 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7'
headers['Cookie'] = 'XXXXX' # hidden
try:
req = request.Request(url,post_data,headers)
x = request.urlopen(req)
print(x.read())
print(x.info())
except Exception as e:
print(e)
Bellow the response received:
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03L\x8fAK\x031\x10\x85\xff\xca0\x07Q\x88\x899(\xb2\xd0\x93\xf4\xe2\xa1-z]\x90\xecf\xb6\x1b\xd9d\xca$-H\xe9\x7f7\x91\x8a^\x86\x997\xef\x1b\xde\x9c\xf1D\x92\x03\'\xec\xd0j\x8b\nI\x84\x05\xbb\xf3_\x13)g\xb7\xa7\xea\x88n\x99X"yx}\xdfn \x17\ti\xaf Q(3\t8\x11\xf7\xa5\x80\x87O\x1aK\x95\x8fq QW\x1bp5\x14\x8e\xaaV\x18g\'n,\x95\xe1i\xcaT\xe0\x01n\x07\xaa\xb7\t\xfa\xdfD\xab\x9a\xe7&R\x99\xd9\xaf\xd6Z\xeb\x1e\xef\x1aj\x8eYL\xae<\x99\x03\xc9\xf2hN\x94<\xcbG\x1bL\x8b\xa5\x0f\x11\x96\x90\x08\xec\xd3\xb3\xee\x13^\x14&\x17[\xfc\xb6}\xdb\xbd\xac\x7f\x1eS\xff\xfe\xda9\xc9\x04t\xd5G\xf6M\xb4\x8d\x0c\x1e\xbb{{\xf9\x06\x00\x00\xff\xff\x03\x00\xc4\xd9gg\'\x01\x00\x00'
Date: Wed, 26 Dec 2018 16:46:48 GMT
Server: Apache
Strict-Transport-Security: max-age=16070400
X-UA-Compatible: IE=Edge,chrome=1, IE=Edge,chrome=1
Content-Type: application/json; charset=utf-8
Vary: Accept-Encoding
Content-Encoding: gzip
X-Frame-Options: SAMEORIGIN
Connection: close
Transfer-Encoding: chunked
It seems to be zipped: Content-Encoding: gzip.
Unzip it and then use json.decode.
Example:
import zlib
decompressed_data=zlib.decompress(f.read(), 16+zlib.MAX_WBITS)
Another option - tell server you're upset with zipped content. Remove gzip and probably other types of compression from Accept-Encoding request header
try something like this
import requests
r = requests.post('your URL',data=YourData)
r.json()
Related
I am trying to scrape https://www.foodhall.co.id/grand-indonesia/catalog .
I found the api https://api.foodhall.co.id/v1/catalog/productbycategoryv2 for the url above where the products are loaded from. I checked the response headers via inspect element and the returned response headers is as so:
HTTP/1.1 200 OK
Date: Mon, 16 Jan 2023 03:07:59 GMT
Server: Apache/2.4.41 (Ubuntu)
Set-Cookie: advanced-api=cigighcd1tcmdoj0eic643mogl; path=/; HttpOnly
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Access-Control-Allow-Origin: *
Content-Length: 2685
Keep-Alive: timeout=5, max=94
Connection: Keep-Alive
Content-Type: application/json; charset=UTF-8
What do I have to notice for in the response headers so that I don't get an invalid authorization error.
Currently my code is as follows
import requests
payload={
'store':'49',
'category_id':'',
'search':'',
'filter':"",
'tag':"",
'lang':'ID',
'page':'0'
}
headers={
'Authorization': 'Bearer 17485f41ae19fbba0f4edf3241c9f033bb1af4e1c843789acfc9cf5136d443ea1673838475',
'Connection': 'keep-alive',
'Content-Length': '57',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'api.foodhall.co.id',
'Origin': 'https://www.foodhall.co.id',
'Referer': 'https://www.foodhall.co.id/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
reponse=requests.post('https://api.foodhall.co.id/v1/catalog/productbycategoryv2',json=payload,headers=headers)
The response json is returning a {'success': 0, 'message': 'invalid Authorization'}.I thought that the set cookie response needs an authorization, so my next step is to figure out how to get the authorization code I guess.
Can someone help me?
# selenium-request.py
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
driver.get('https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json')
for request in driver.requests:
if request.response:
print(request.response.headers)
When I run that code I get the headers Selenium uses:
$ python selenium-request.py
Accept-Ranges: bytes
Access-Control-Allow-Origin: http://star-website.com
Content-Type: application/json
ETag: W/"36b8a-5d3d28ed9cc43"
Last-Modified: Thu, 23 Dec 2021 16:16:16 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: Apache
ServerID: e1
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=86400
Date: Thu, 23 Dec 2021 16:16:16 GMT
Content-Length: 46236
Connection: keep-alive
Content-Security-Policy: frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;
Set-Cookie: ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly
I copy these exact headers into a python dict and request as follows:
# python-request.py
import requests
headers = {
"Accept-Ranges": "bytes",
"Access-Control-Allow-Origin": "http://star-website.com",
"Content-Type": "application/json",
"ETag": 'W/"36b8a-5d3d28ed9cc43"',
"Last-Modified": "Thu, 23 Dec 2021 16:16:16 GMT",
"Referrer-Policy": "no-referrer-when-downgrade",
"Server": "Apache",
"ServerID": "e1",
"Strict-Transport-Security": "max-age=31536000; includeSubDomains",
"Vary": "Accept-Encoding",
"Content-Encoding": "gzip",
"Cache-Control": "max-age=86400",
"Date": "Thu, 23 Dec 2021 16:16:16 GMT",
"Content-Length": "46236",
"Connection": "keep-alive",
"Content-Security-Policy": "frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;",
"Set-Cookie": "ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly"
}
requests.get(
"https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json",
headers=headers)
When I run this it just hangs indefinitely, so there is some issue with the request.
Apart from the headers, what is the difference between the requests made by python and Selenium - how could I identify the issue and hopefully get this working with the python requests library?
Update
I updated the code to get the request.headers instead:
Host: www.cmegroup.com
Connection: keep-alive
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
... but the python requests script has the same result when using these headers, just hanging (or timing out if I set a timeout parameter).
Further update
Debug output is as follows:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.cmegroup.com:443
send: b'GET /content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12%7C07%7C2021.01%7C01%7C2008.json HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nConnection: keep-alive\r\nHost: www.cmegroup.com\r\nsec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"\r\nsec-ch-ua-mobile: ?0\r\nsec-ch-ua-platform: Linux\r\nUpgrade-Insecure-Requests: 1\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nSec-Fetch-User: ?1\r\nSec-Fetch-Dest: document\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n'
It looks like it only needs a compatible User-Agent header.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
}
url = 'https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json'
response = requests.get(url, headers = headers, timeout = 30) # A
print(response.status_code) # Prints 200 (OK).
print(response.json()) # Prints the output as JSON. "item" key has 50 values in a list.
^ This snippet did the trick for me.
It looks, you are using the response headers, not request headers.
Try
print(request.headers)
I'm trying to download icecast json status data from a server using python.
This is my code (after different attempts).
def checkStream(url):
request = urllib2.Request(url)
request.add_header("Connection", "keep-alive")
request.add_header("Cache-Control", "max-age=0")
request.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
request.add_header("Upgrade-Insecure-Requests", "1")
request.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36")
request.add_header("Accept-Encoding", "gzip, deflate, sdch")
response = urllib2.urlopen(request)
line = response.read()
print line
return
checkStream("http://108.168.175.149:10128/status-json.xsl")
The problem is that my response is printed like this
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Origin, Accept, X-Requested-With, Content-Type
Access-Control-Allow-Methods: GET, OPTIONS, HEAD
{"icestats":{"admin":"icemaster#localhost","banned_IPs":0,"build":20141112090605,"host":"pro02.caster.fm","location":"Earth","outgoing_kbitrate":3799,"server_id":"Icecast 2.3.3-kh11","server_start":"05/Oct/2015:10:43:46 -0500","stream_kbytes_read":104422400,"stream_kbytes_sent":5123403693,"source":[{"audio_codecid":2,"audio_info":"ice-samplerate=44100;ice-bitrate=96;ice-channels=2","bitrate":96,"connected":33748,"genre":"Various","ice-bitrate":96,"ice-channels":2,"ice-samplerate":44100,"incoming_bitrate":95920,"listener_peak":153,"listeners":42,"listenurl":"http://pro02.caster.fm:10128/live","mpeg_channels":2,"mpeg_samplerate":44100,"outgoing_kbitrate":3883,"queue_size":358609,"se
The end of the json response is short 272 bytes which is exactly the number of bytes of the response headers which are returned in the data.
If I open the link on chrome the response appears ok.
I also tested using requests lib with no luck.
>>> import requests
>>> r = requests.get("http://108.168.175.149:10128/status-json.xsl")
>>> r.text
u'Expires: Thu, 19 Nov 1981 08:52:00 GMT\r\nCache-Control: no-store, no-cache, must-revalidate\r\nPragma: no-cache\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Headers: Origin, Accept, X-Requested-With, Content-Type\r\nAccess-Control-Allow-Methods: GET, OPTIONS, HEAD\r\n\r\n{"icestats":{"admin":"icemaster#localhost","banned_IPs":0,"build":20141112090605,"host":"pro02.caster.fm","location":"Earth","outgoing_kbitrate":3844,"server_id":"Icecast 2.3.3-kh11","server_start":"05/Oct/2015:10:43:46 -0500","stream_kbytes_read":104438630,"stream_kbytes_sent":5124109510,"source":[{"audio_codecid":2,"audio_info":"ice-samplerate=44100;ice-bitrate=96;ice-channels=2","bitrate":96,"connected":35133,"genre":"Various","ice-bitrate":96,"ice-channels":2,"ice-samplerate":44100,"incoming_bitrate":95920,"listener_peak":153,"listeners":43,"listenurl":"http://pro02.caster.fm:10128/live","mpeg_channels":2,"mpeg_samplerate":44100,"outgoing_kbitrate":3837,"queue_size":164258,"se'
>>>
How can I retrieve the complete data?
The server you are requesting this from is running an ancient version of an Icecast fork.
This bug was fixed and the fix released long ago in mainline. I'd recommend to upgrade (or tell the operator to upgrade) the server to the latest official Icecast version from http://icecast.org
I am trying to download a file using python requests module bu logging in to the site first. I am able to login but when i send a get request to download the file it shows me the login page again.
Code:
login_url = 'https://seller.flipkart.com/login'
manifest_url = 'https://seller.flipkart.com/order_management/manifest.pdf'
username = 'username#gmail.com'
password = 'password'
params = {'sellerId':'seller_id'}
payload = {'authName':'flipkart',
'username':username,
'password':password}
ses = requests.Session()
ses.post(login_url, data=payload, headers={'Content-Type':'application/x-www-form-urlencoded','Connection':'keep-alive'})
response = ses.get(manifest_url, params=params, headers={'Content-Type':'application/pdf','Connection':'keep-alive'})
print response.status_code
print response.url
print response.content
On running this code I get the html of login page as content.
I used fiddler and got the below data:
Request URL: https://seller.flipkart.com/order_management/manifest.pdf?sellerId=seller_id
Request Method: GET
sellerId: seller_id
# Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
Referer: https://seller.flipkart.com/order_management?sellerId=seller_id
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
# Response Headers
Server: nginx
Date: Wed, 30 Dec 2015 13:12:31 GMT
Content-Type: application/pdf
Content-Length: 3652
Connection: keep-alive
X-XSS-Protection: 1; mode=block
strict-transport-security: max-age=31536000; preload
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Cache-Control: private, no-cache, no-store, must-revalidate
Expires: -1
Pragma: no-cache
X-Req-Id: REQ-14d7434a-e429-40e4-801f-6010d7c0b48c
X-Host-Id: 0008
content-disposition: attachment; filename=Manifest-seller_id-30-Dec-2015-18-42-30.pdf
vary: Accept-Encoding
How to download the file ?
Set stream=True and then write the contents to a file.
import re
# Send request by setting 'stream=True'
r = ses.get(manifest_url, ..., stream=True)
# Fetch filename
d = r.headers['content-disposition']
fname = re.findall("filename=(.+)", d)
# Write content to file
with open(fname, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
Docs.
My objective is to gain access to my homebanking to carry out a control of expenses. The problem is that I always get a code 302, maybe because I sent a bad the user and key. Is the following a correct way of sending the head with post ssl?
import httplib
host = 'www.bancoprovincia.bancainternet.com.ar'
conn = httplib.HTTPSConnection(host)
conn.set_debuglevel(1)
conn.putrequest("POST", "/eBanking/login")
header = {'accept-Language': 'es-ES,es;q=0.8,en;q=0.6',
'accept-Encoding': 'gzip,deflate,sdch', 'content-Type':
'application/x-www-form-urlencoded', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36'}
for k, v in header.iteritems():
conn.putheader(k, v)
user = '##'
passwd = '##'
conn.endheaders()
conn.send('usuario:'+user+'clave:'+passwd)
res = conn.getresponse()
print res.status
print res.getheaders()
Here is the response from the server
send: 'POST /eBanking/login HTTP/1.1\r\nHost: www.bancoprovincia.bancainternet.com.ar\r\nAccept-Encoding: identity\r\ncontent-Type: application/x-www-form-urlencoded\r\naccept-Language: es-ES,es;q=0.8,en;q=0.6\r\naccept-Encoding: gzip,deflate,sdch\r\naccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nuser-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36\r\n\r\n'
send: 'usuario:##:clave##'
reply: 'HTTP/1.1 302 Movido tempor\xe1lmente\r\n'
header: Date: Wed, 12 Feb 2014 19:40:18 GMT
header: Server: Apache/2.2.17 (Red Hat Enterprise Web Server)
header: Set-Cookie: JSESSIONID=696BA6EB7C85B74476817A42C211ED29.tcc1; Path=/eBanking; Secure
header: Location: https://www.bancoprovincia.bancainternet.com.ar/eBanking/login/inicio.htm;jsessionid=696BA6EB7C85B74476817A42C211ED29.tcc1?login_error=1
header: Content-Length: 0
header: Connection: close
header: Content-Type: text/plain; charset=UTF-8
302
See the location header in the response?
The server tells you you should use that URI
For more info, see http status codes