Python Download Zip file from URL with Authentication - python

First post here so go easy on me. I'm new to Python.
I'm trying to download a .zip file from a website that requires a login to access the link.
In the Zip file is a .csv.
I've tried a few different methods but can't get the file to download.
This doesn't throw any errors but the file doesn't appear.
Sorry i can't give out the password and username.
Am i missing something? Do i need a different way to do it?
Thanks for your help.
UPDATE: still no joy. I managed to do it with VBA in excel but not python. Any ideas?
import requests
from requests.auth import HTTPBasicAuth
theurl= 'https://www.pencarrie.com/export/products.zip'
username = 'Secret'
password = 'Secret'
filename = os.path.basename(urlparse(theurl).path)
r=requests.get(theurl, auth=HTTPBasicAuth(username, password))
I've now switched to a different approach and am still getting the same result. The file that is downloaded is now complete. or is just the zip generated by the write command. the download complete command says its complete.......HELP PLEASE.
import requests
userid = '???????'
password = '??????'
site_url = 'https://www.pencarrie.com/marketing/website-options/data/enhanced-data'
file_url = 'https://www.pencarrie.com/export/products.zip'
o_file = 'products.zip'
# create session
s = requests.Session()
# GET request. This will generate cookie for you
s.get(site_url)
# login to site.
s.post(site_url, data={'_username': userid, '_password': password})
# Next thing will be to visit URL for file you would like to download.
r = s.get(file_url)
# Download file
with open(o_file, 'wb') as output:
output.write(r.content)
print(f"requests:: File {o_file} downloaded successfully!")
# Close session once all work done
s.close()
I have it running with VBA so I assume it's basic authentication but it just downloads html. But I'm not sure.

requests.get does not save whatever it downloads to file, instead it saves it to content field of the Response object.
What you need is to write it to file manually:
f = open(filename, "wb")
f.write(r.content)
f.close()

Related

How to send requests to a downloaded html file

I have a .html file downloaded and want to send a request to this file to grab it's content.
However, if I do the following:
import requests
html_file = "/user/some_html.html"
r = requests.get(html_file)
Gives the following error:
Invalid URL 'some_html.html': No schema supplied.
If I add a schema I get the following error:
HTTPConnectionPool(host='some_html.html', port=80): Max retries exceeded with url:
I want to know how to specifically send a request to a html file when it's downloaded.
You are accessing html file from local directory. get() method uses HTTPConnection and port 80 to access data from website not a local directory. To access file from local directory using get() method use Xampp or Wampp.
for accessing file from local directory you can use open() while requests.get() is for accessing file from Port 80 using http Connection in simple word from internet not local directory
import requests
html_file = "/user/some_html.html"
t=open(html_file, "r")
for v in t.readlines():
print(v)
Output:
You don't "send a request to a html file". Instead, you can send a request to a HTTP server on the internet which will return a response with the contents of a html file.
The file itself knows nothing about "requests". If you have the file stored locally and want to do something with it, then you can open it just like any other file.
If you are interested in learning more about the request and response model, I suggest you try a something like
response = requests.get("http://stackoverflow.com")
You should also read about HTTP and requests and responses to better understand how this works.
You can do it by setting up a local server to your html file.
If you use Visual Studio Code, you can install Live Server by Ritwick Dey.
Then you do as follows:
1 - Make the first request and save the html content into a .html file:
my_req.py
import requests
file_path = './'
file_name = 'my_file'
url = "https://www.qwant.com/"
response = requests.request("GET", url)
w = open(file_path + file_name + '.html', 'w')
w.write(response.text)
2 - With Live Server installed on Visual Studio Code, click on my_file.html and then click on Go Live.
and
3 - Now you can make a request to your local http schema:
second request
import requests
url = "http://127.0.0.1:5500/my_file.html"
response = requests.request("GET", url)
print(response.text)
And, tcharan!! do what you need to do.
On a crawler work, I had one situation where there was a difference between the content displayed on the website and the content retrieved with the response.text so the xpaths did not were the same as on the website, so I needed to download the content, making a local html file, and get the new ones xpaths to get the info that I needed.
You can try this:
from requests_html import HTML
with open("htmlfile.html") as htmlfile:
sourcecode = htmlfile.read()
parsedHtml = HTML(html=sourcecode)
print(parsedHtml)

Python requests - download image and write to file not working due to nature of URI and authentification procedure

I am writing a script that downloads Sentinel 2 products (satellite imagery) using sentinelsat Python API.
A product's description is structured as JSON and contains the parameter quicklook_url.
Example:
https://apihub.copernicus.eu/apihub/odata/v1/Products('862619d6-9b82-4fe0-b2bf-4e1c78296990')/Products('Quicklook')/$value
Any Sentinel API calls require credentials. So does retrieving a product and also opening the link stored inside quicklook_url. When I call the example in my browser I get asked to enter username and password in order to get
with the name S2A_MSIL2A_20210625T065621_N0300_R063_T39NTJ_20210625T093748-ql.jpg.
Needless to say I am just starting with the API so I am probably missing something but
requests.post(product_description['quicklook_url'], verify=False, auth=HTTPBasicAuth(username, password)).content
yields 0KB damaged file and
requests.get(product_description['quicklook_url']).content
yields 1KB damaged file.
I have looked into requests.Session
session = requests.Session()
session.auth = (username, password)
auth = session.post('URL_FOR_LOGING_IN')
img = session.get(product_description['quicklook_url']).content
The problem is I am unable to find the URL I need to post my session authentification. I am somewhat sure that the sentinelsat API does that but my looks have not yielded any successful result.
I am currently looking into the SentinelAPI class. It has the download_quicklook() function, which I am using right now but I am still curious how to do this without the function.
I guess you don't need to sent a post request. Basic authentication works by sending a header along with each request. The following should work
session = requests.Session()
session.auth = (username, password)
img = session.get(product_description['quicklook_url']).content
Your first attempt is failed because of using POST I think.
requests.gett(product_description['quicklook_url'], verify=False, auth=HTTPBasicAuth(username, password)).content
should also work.

ServiceNow - How to use SOAP to download reports

I need to automate download of reports from serviceNow.
I've been able to automate it using python and selenium and win32com by following method.
https://test.service-now.com/sys_report_template.do?CSV&jvar_report_id=92a....7aa
And using selenium to access serviceNow as well as modify firefox default download option to dump the file to a folder on windows machine.
However, Since all of this may be ported to a linux server , we would like to port it to SOAP or CURL.
I came across serviceNow libraries for python here.
I tried it out and following code is working if I set login , password and instance-name as listed at the site using following from ServiceNow.py
class Change(Base):
__table__ = 'change_request.do'
and following within clientside script as listed on site.
# Fetch changes updated on the last 5 minutes
changes = chg.last_updated(minutes=5)
#print changes client side script.
for eachline in changes:
print eachline
However, When I replace URL with sys_report_template.do, I am getting error
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\SOAPpy\Parser.py", line 1080, in _parseSOAP
parser.parse(inpsrc)
File "C:\Python27\Lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Python27\Lib\xml\sax\xmlreader.py", line 125, in parse
self.close()
File "C:\Python27\Lib\xml\sax\expatreader.py", line 220, in close
self.feed("", isFinal = 1)
File "C:\Python27\Lib\xml\sax\expatreader.py", line 214, in feed
self._err_handler.fatalError(exc)
File "C:\Python27\Lib\xml\sax\handler.py", line 38, in fatalError
raise exception
SAXParseException: <unknown>:1:0: no element found
Here is relevent code
from servicenow import ServiceNow
from servicenow import Connection
from servicenow.drivers import SOAP
# For SOAP connection
conn = SOAP.Auth(username='abc', password='def', instance='test')
rpt = ServiceNow.Base(conn)
rpt.__table__ = "sys_report_template.do?CSV"
#jvar_report_id replaced with .... to protect confidentiality
report = rpt.fetch_one({'jvar_report_id': '92a6760a......aas'})
for eachline in report:
print eachline
So, my question is , what can be done to make this work?
I looked on web for resources and help, but didn't find any.
Any help is appreciated.
After much research I was able to use following method to get report in csv format from servicenow. I thought I will post over here in case anyone else runs into similar issue.
import requests
import json
# Set the request parameters
url= 'https://myinstance.service-now.com/sys_report_template.do?CSV&jvar_report_id=929xxxxxxxxxxxxxxxxxxxx0c755'
user = 'my_username'
pwd = 'my_password'
# Set proper headers
headers = {"Accept":"application/json"}
# Do the HTTP request
response = requests.get(url, auth=(user, pwd), headers=headers )
response.raise_for_status()
print response.text
response.text now has report in csv format.
I need to next figure out, how to parse the response object to extract csv data in correct format.
Once done, I will post over here. But for now this answers my question.
I tried this and its working as expected.
`import requests
import json
url= 'https://myinstance.service-now.com/sys_report_template.do?CSV&jvar_report_id=929xxxxxxxxxxxxxxxxxxxx0c755'
user = 'my_username'
pwd = 'my_password'
response = requests.get(url, auth=(user, pwd), headers=headers )
file_name = "abc.csv"
with open(file_name, 'wb') as out_file:
out_file.write(response.content)
del response`

Receive attachment with urllib - Python

I am testing my webpage software by sending requests from python to it. I am able to send requests, receive responses and parse the json. However, one option on the webpage is to download files. I send the download request and can confirm that the response headers contain what I expect (application/octet-stream and the appropriate filename) but the Content-Length is 0. If the length is 0, I assume the file was not actually sent. I am able to download files from other means so I know my software works but I am having trouble with getting it to work with python.
I build up the request then do:
f = urllib.request.urlopen(request)
f.body = f.read()
I expect data to be in f.body but it is empty (I see "b''")
Is there a different way to access the file contents from an attachment in python?
Is there a different way to access the file contents from an attachment in python?
This is in python-requests instead urllib, since I'm more familiar with that.
import requests
url = "http://example.com/foobar.jpg"
#make request
r = requests.get(url)
attachment_data = r.content
#save to file
with open(r"C:/pictures/foobar.jpg", 'wb') as f:
f.write(attachment_data)
Turns out I needed to throw some data into the file in order to have something in the body. I should've noticed this much sooner.

Help with Python urllib2 and openers - How to make only 1 remote file read

I am trying to download content from a content provider that charges me every time I access a document. The code I have written correctly downloads the content and saves them in a local file but apparently it requests the file twice and I am being double charged. I'm not sure where the file is being requested twice, here is my code:
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
password_mgr.add_password(None, top_level_url, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)
# use the opener to fetch a URL
file_stream = opener.open(url)
# Open our local file for writing
local_file = open(directory + doc_name, "w+")
#Write to our local file
local_file.write(file_stream.read())
I need to figure out how to read the content while only requesting the document once. Any help would be greatly appreciated.
Could it be that it requests the file twice, but only downloads it once? The first request would be a normal GET (without an "Authorization" header), followed by a response of HTTP 401 (Authorization Required), followed by the same request with the Authorization header.
If thats the case, you shold talk to your content provider, since you accessed it only once.

Categories