I am trying to download a file over https using python requests. I wrote a sample code for this. When i run my code it doesnot download the pdf file given in link. Instead downloads the html code for the login page. I checked the response status code and it is giving 200. To download the file login is necessary. How to download the file?
My code:
import requests
import json
# Original File url = "https://seller.flipkart.com/order_management/manifest.pdf?sellerId=8k5wk7b2qk83iff7"
url = "https://seller.flipkart.com/order_management/manifest.pdf"
uname = "xxx#gmail.com"
pwd = "xxx"
pl1 = {'sellerId':'8k5wk7b2qk83i'}
payload = {uname:pwd}
ses = requests.Session()
res = ses.post(url, data=json.dumps(payload))
resp = ses.get(url, params = pl1)
print resp.status_code
print resp.content
I tried several solutions including Sending a POST request with my login creadentials using requests' session object then downloading file using same session object. but it didn't worked.
EDIT:
It still is returning the html for login page.
Have you tried to pass the auth param to the GET? something like this:
resp = requests.get(url, params=pl1, auth=(uname, pwd))
And you can write resp.content to a local file myfile.pdf
fd = open('myfile.pdf', 'wb')
fd.write(resp.content)
fd.close()
Related
I'm trying to download a excel file using a python request.
Below is the actual payload pass by web browser when I download required excel file manually.
I'm using below python code to do the same via automation.
import requests
import json
s = requests.Session()
response = s.get(url)
if response.status_code!=200:
return 'loading error!'
collectionId = response.content['collectionId']
payload = {
"collectionId": collectionId
}
response = s.post(url, json=payload)
Here I'm getting status code 400.
Can someone please help me to set payload to match with below snapshot.
I am trying to use requests to download an SSRS report. The following code will download an empty Excel file:
url = 'http://MY REPORT URL HERE/ReportServer?/REPORT NAME HERE&rs:Format=EXCELOPENXML'
s = requests.Session()
s.post(url, data={'_username': 'username, '_password': 'password'})
r = s.get(url)
output_file = r'C:\Saved Reports\File.xlsx'
downloaded_file = open(output_file, 'wb')
for chunk in r.iter_content(100000):
downloaded_file.write(chunk)
I have successfully used requests_ntlm to complete this task, but I am wondering why the above code is not working as intended. The Excel file turns out to be empty; I feel it is due to an issue with logging in and passing those cookies to the GET request.
I was able to get this to work, but for pdfs. I found the solution here
Here's a piece of my code snippet:
import requests
from requests_ntlm import HttpNtlmAuth
session = requests.Session()
session.auth = HttpNtlmAuth(domain+uid,pwd)
response = session.get(reporturl,stream=True)
print response.status_code
with open(outputlocation+mdcProp+'.pdf','wb') as pdf:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)
session.close()
I'm working with the website 'musescore.com' that has many files in the '.mxl' format that I need to download automatically with Python.
Each file on the website has a unique ID number. Here's a link to an example file:
https://musescore.com/user/43726/scores/76643
The last number in the URL is the id number for this file. I have no idea where on the website the mxl file for score is located, but I know that to download the file, one must visit this url:
https://musescore.com/score/76643/download/mxl
This link is the same for every file, but with that file's particular ID number in it. As I understand it, this url executes code that downloads the file, and is not an actual path to the file.
Here's my code:
import requests
url = 'https://musescore.com/score/76643/download/mxl'
user = 'myusername'
password = 'mypassword'
r = requests.get(url, auth=(user, password), stream=True)
with open('file.mxl', 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
This code downloads a webpage saying I need to sign in to download the file. It is supposed to download the mxl file for this score. This must mean I am improperly authenticating the website. How can I fix this?
By passing an auth parameter to get, you're attempting to utilize HTTP Basic Authentication, which is not what this particular site uses. You'll need to use an instance of request.Session to post to their login endpoint and maintain the cookie(s) that result from that process.
Additionally, this site utilizes a csrf token that you must first extract from the login page in order to include it with your post to the login endpoint.
Here is a working example, obviously you will need to change the username and password to your own:
import requests
from bs4 import BeautifulSoup
s = requests.Session()
r = s.get('https://musescore.com/user/login')
soup = BeautifulSoup(r.content, 'html.parser')
csrf = soup.find('input', {'name': '_csrf'})['value']
s.post('https://musescore.com/user/auth/login/process', data={
'username': 'herp#derp.biz',
'password': 'secret',
'_csrf': csrf,
'op': 'Log in'
})
r = s.get('https://musescore.com/score/76643/download/mxl')
print(f"Status: {r.status_code}")
print(f"Content-Type: {r.headers['content-type']}")
Result, with content type showing it is successfully downloading the file:
Status: 200
Content-Type: application/vnd.recordare.musicxml
What I want to accomplish is to download an .xlsx file from a link like the below:
https://......./something.do?parameter=[parameter_value]
Please note that it's meaningless to show the exact link since it's an internal link.
The problem is that the download starts automatically if I open the link in a browser. But when I want to do it programmatically, I cannot get the exact link of the file.
I figured out that in http response header the content-disposition attribute contains the file name like this:
Content-Disposition: attachment; filename="ABCD.xlsx"
But I couldn't catch the file so far, only the html code of the site.
Currently my python code looks like this:
import requests
urlBase = 'link to the authetication page'
urlFile = 'https://......./something.do?parameter=[parameter_value]' //like the above link
user = 'username'
pw = 'password'
session = requests.Session()
session.auth = (user, pw)
auth = session.post(urlBase)
response = session.get(urlFile)
Response is currently showing the html code.
Thanks in advance
we are trying to pass the Virustotal API through our proxy which is getting denied. HTTP websites are accessible using the code but HTTPS is not going through. Request any of you to post few sample codes which would help us.
import postfile
host = "www.virustotal.com"
selector = "https://www.virustotal.com/vtapi/v2/file/scan"
fields = [("apikey", "-- YOUR API KEY --")]
file_to_send = open("test.txt", "rb").read()
files = [("file", "test.txt", file_to_send)]
json = postfile.post_multipart(host, selector, fields, files)
print json
Two ideas to try
1) without HTTPS you can post to http://www.virustotal.com/vtapi/v2/file/scan
2) try the python requests library
import requests
params = {'apikey': '-YOUR API KEY HERE-'}
files = {'file': ('myfile.exe', open('myfile.exe', 'rb'))}
response = requests.post('https://www.virustotal.com/vtapi/v2/file/scan', files=files, params=params)
json_response = response.json()