Iam trying to find a way how to keep the cookie sessions while performing the urllib command to download files.
Here what I have so far.
session.get(url, data=login_data, verify=False)
f = urllib2.urlopen(url)
data = f.read()
with open('C:/Files/filename.docx', "wb") as code:
code.write(data)
url is URL
data are login details username and password
So when I run this, no success.
Does anyone have experience with that?
Related
I am trying to use requests to download an SSRS report. The following code will download an empty Excel file:
url = 'http://MY REPORT URL HERE/ReportServer?/REPORT NAME HERE&rs:Format=EXCELOPENXML'
s = requests.Session()
s.post(url, data={'_username': 'username, '_password': 'password'})
r = s.get(url)
output_file = r'C:\Saved Reports\File.xlsx'
downloaded_file = open(output_file, 'wb')
for chunk in r.iter_content(100000):
downloaded_file.write(chunk)
I have successfully used requests_ntlm to complete this task, but I am wondering why the above code is not working as intended. The Excel file turns out to be empty; I feel it is due to an issue with logging in and passing those cookies to the GET request.
I was able to get this to work, but for pdfs. I found the solution here
Here's a piece of my code snippet:
import requests
from requests_ntlm import HttpNtlmAuth
session = requests.Session()
session.auth = HttpNtlmAuth(domain+uid,pwd)
response = session.get(reporturl,stream=True)
print response.status_code
with open(outputlocation+mdcProp+'.pdf','wb') as pdf:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)
session.close()
I'm working with the website 'musescore.com' that has many files in the '.mxl' format that I need to download automatically with Python.
Each file on the website has a unique ID number. Here's a link to an example file:
https://musescore.com/user/43726/scores/76643
The last number in the URL is the id number for this file. I have no idea where on the website the mxl file for score is located, but I know that to download the file, one must visit this url:
https://musescore.com/score/76643/download/mxl
This link is the same for every file, but with that file's particular ID number in it. As I understand it, this url executes code that downloads the file, and is not an actual path to the file.
Here's my code:
import requests
url = 'https://musescore.com/score/76643/download/mxl'
user = 'myusername'
password = 'mypassword'
r = requests.get(url, auth=(user, password), stream=True)
with open('file.mxl', 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
This code downloads a webpage saying I need to sign in to download the file. It is supposed to download the mxl file for this score. This must mean I am improperly authenticating the website. How can I fix this?
By passing an auth parameter to get, you're attempting to utilize HTTP Basic Authentication, which is not what this particular site uses. You'll need to use an instance of request.Session to post to their login endpoint and maintain the cookie(s) that result from that process.
Additionally, this site utilizes a csrf token that you must first extract from the login page in order to include it with your post to the login endpoint.
Here is a working example, obviously you will need to change the username and password to your own:
import requests
from bs4 import BeautifulSoup
s = requests.Session()
r = s.get('https://musescore.com/user/login')
soup = BeautifulSoup(r.content, 'html.parser')
csrf = soup.find('input', {'name': '_csrf'})['value']
s.post('https://musescore.com/user/auth/login/process', data={
'username': 'herp#derp.biz',
'password': 'secret',
'_csrf': csrf,
'op': 'Log in'
})
r = s.get('https://musescore.com/score/76643/download/mxl')
print(f"Status: {r.status_code}")
print(f"Content-Type: {r.headers['content-type']}")
Result, with content type showing it is successfully downloading the file:
Status: 200
Content-Type: application/vnd.recordare.musicxml
Trying to download the following file:
https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf
I first need to sign into the following site before doing so:
https://urs.earthdata.nasa.gov
After reviewing my browser's web console, I believe it's using a cookie to allow me to download the file. How can I do this using python? I find out how to retrieve the cookies:
import os, requests
username = 'user'
password = 'pwd'
url = 'https://urs.earthdata.nasa.gov'
r = requests.get(url, auth=(username,password))
cookies = r.cookies
How can I then use this to download the HDF file? I've tried the following but always receive 401 error.
url2 = "https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf"
r2 = requests.get(url2, cookies=r.cookies)
Have you tried a simple basic authentification :
from requests.auth import HTTPBasicAuth
url2='https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf'
requests.get(url2, auth=HTTPBasicAuth('user', 'pass'))
or read this example
To download a file using the Requests library with the browser cookies, you can use the next function:
import browser_cookie3
import requests
import shutil
import os
cj = browser_cookie3.brave()
def download_file(url, root_des_path='./'):
local_filename = url.split('/')[-1]
local_filename = os.path.join(root_des_path, local_filename)
# r = requests.get(link, cookies=cj)
with requests.get(url, cookies=cj, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
return local_filename
a = download_file(link)
In this example, cj is the cookies of Brave browser ( you can use ffox or chrome). then, these cj are passed to Requests to download the file.
Note, you need to get "browser_cookie3" library
pip install browser-cookie3
I am trying to download a file over https using python requests. I wrote a sample code for this. When i run my code it doesnot download the pdf file given in link. Instead downloads the html code for the login page. I checked the response status code and it is giving 200. To download the file login is necessary. How to download the file?
My code:
import requests
import json
# Original File url = "https://seller.flipkart.com/order_management/manifest.pdf?sellerId=8k5wk7b2qk83iff7"
url = "https://seller.flipkart.com/order_management/manifest.pdf"
uname = "xxx#gmail.com"
pwd = "xxx"
pl1 = {'sellerId':'8k5wk7b2qk83i'}
payload = {uname:pwd}
ses = requests.Session()
res = ses.post(url, data=json.dumps(payload))
resp = ses.get(url, params = pl1)
print resp.status_code
print resp.content
I tried several solutions including Sending a POST request with my login creadentials using requests' session object then downloading file using same session object. but it didn't worked.
EDIT:
It still is returning the html for login page.
Have you tried to pass the auth param to the GET? something like this:
resp = requests.get(url, params=pl1, auth=(uname, pwd))
And you can write resp.content to a local file myfile.pdf
fd = open('myfile.pdf', 'wb')
fd.write(resp.content)
fd.close()
I need to download a CSV file, which works fine in browsers using:
http://www.ftse.com/objects/csv_to_csv.jsp?infoCode=100a&theseFilters=&csvAll=&theseColumns=Mw==&theseTitles=&tableTitle=FTSE%20100%20Index%20Constituents&dl=&p_encoded=1&e=.csv
The following code works for any other file (url) (with a fully qualified path), however with the above URL is downloads 800 bytes of gibberish.
def getFile(self,URL):
proxy_support = urllib2.ProxyHandler({'http': 'http://proxy.REMOVED.com:8080/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
response = urllib2.urlopen(URL)
print response.geturl()
newfile = response.read()
output = open("testFile.csv",'wb')
output.write(newfile)
output.close()
urllib2 uses httplib under the hood, so the best way to diagnose this is to turn on http connection debugging. Add this code before you access the url and you should get a nice summary of exactly what http traffic is being generated:
import httplib
httplib.HTTPConnection.debuglevel = 1