I am trying to parse a webpage and download a series of csv files that are in zip folders. When I click on the link on the website, I can download it without any trouble. However whenever I paste the URL into my browser (ie: example.com/file.zip), I get a 400 Bad Request error. I am not sure but I have deduced this issue is caused because the link uses a download attribute
The problem now is when I use urllib.request.urlretrieve to download the zip files, I can not. My code is fairly simple:
# Look to a specific folder in my computer
# Compare the zip files in that folder to the zip files on the website
# What ever is on the website, but not on my local machine
# is added to a dictionary called remoteFiles
for remoteFile in remoteFiles:
try:
filename = ntpath.basename(remoteFile)
urllib.request.urlretrieve(remoteFile, filename)
print('finished downloading: ' + filename)
except Exception as e:
print('error with file: ' + filename)
print(e)
Here is a PasteBin link to my full .py file. Wherever I run it I get an error:
HTTP Error 400: Bad Request
Related
How to download specific files from .txt url?
I have a url https://.storage.public.eu/opendata/files/open_data_files_access.txt (not real) with multiple files (here are just a few, in reality there are around 5k files) that can be downloaded separately, however I would need to download only specific files, and do this with Python.
For instance, I have a list with folder name and list of file name. How do I download only those file that are on the list? Let's say the list is:
files = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
How to download only these in the list and save in specified directory?
I assume that the answer is somewhere here, but no work for me:
uurl = 'https://.storage.public.eu/opendata/files/open_data_files_access.txt'
from requests import get # to make GET request
def download(url, file_name):
# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
file_name' = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
download(uurl, file_name)
I tried using wget:
url = https://yts.lt/torrent/download/A4A68F25347C709B55ED2DF946507C413D636DCA
wget.download(url, 'c:/path/')
The result was that I got a file with the name A4A68F25347C709B55ED2DF946507C413D636DCA and without any extension.
Whereas when I put the link in the navigator bar and click enter, a torrent file gets downloaded.
EDIT:
Answer must be generic not case dependent.
It must be a way to download .torrent files with their original name.
You can get the filename inside the content-disposition header, i.e.:
import re, requests, traceback
try:
url = "https://yts.lt/torrent/download/A4A68F25347C709B55ED2DF946507C413D636DCA"
r = requests.get(url)
d = r.headers['content-disposition']
fname = re.findall('filename="(.+)"', d)
if fname:
with open(fname[0], 'wb') as f:
f.write(r.content)
except:
print(traceback.format_exc())
Py3 Demo
The code above is for python3. I don't have python2 installed and I normally don't post code without testing it.
Have a look at https://stackoverflow.com/a/11783325/797495, the method is the same.
I found an a way that gets the torrent files downloaded with their original name like as they were actually downloaded by putting the link in the browser's nav bar.
The solution consists of opening the user's browser from Python :
import webbrowser
url = "https://yts.lt/torrent/download/A4A68F25347C709B55ED2DF946507C413D636DCA"
webbrowser.open(url, new=0, autoraise=True)
Read more:
Call to operating system to open url?
However the downside is :
I don't get the option to choose the folder where I want to save the
file (unless I changed it in the browser but still, in case I want to save
torrents that matches some criteria in an other
path, it won't be possible).
And of course, your browser goes insane opening all those links XD
I need to open the page automatically and download the file returned by the server
I have a simple code to open the page and download the content. I am also pulling the headers so I know the name of the returned file. below is the code
downloadPageRequest = self.reqSession.get( self.url_file ,stream=True)
headers = downloadPageRequest.headers
if 'content-disposition' in headers:
file_name = re.findall("filename=(.+)", headers['content-disposition'])
that's what I got, it returns an array with the filename, but now I am stuck and have no idea how to open and go through returned excel file
this has to be done using requests, that's why i cannot use any other method (e.g selenium)
will be thankful for your support
I am new at python.
I want to download through a code data from this URL: "ftp://cddis.nasa.gov/gnss/products/ionex/". However the files that I want have this format: "codgxxxx.xxx.Z".
All these files are inside each year(enter image description here) as it is show here:enter image description here.
How can I download it just those files using python?.
Until now I have been using wget with this code: wget ftp://cddis.nasa.gov/gnss/products/ionex/2008/246/codg0246.07i.Z", for each one of files but is to tedious.
Can anyone help me please!!.
Thank you
Since you know the structure in the FTP server, this can be pretty easy to accomplish without having to use ftplib.
It would be cleaner to actually retrieve a directory listing from the server such as this question
(I don't seem to be able to connect to that nasa URL though)
I would recommend reading here for more on how to actually perform an FTP download.
But something like this may work. (full disclosure: I haven't tested it)
import urllib
YEARS_TO_DOWNLOAD = 12
BASE_URL = "ftp://cddis.nasa.gov/gnss/products/ionex/"
FILE_PATTERN = "codg{}.{}.Z"
SAVE_DIR = "/home/your_name/nasa_ftp/"
year = 2006
three_digit_number = 0
for i in range(0, YEARS_TO_DOWNLOAD):
target = FILE_PATTERN.format(str(year + i), str(three_digit_number.zfill(3))
try:
urllib.urlretrieve(BASE_URL + target, SAVE_DIR + target)
except urllib.error as e:
print("An error occurred trying to download {}.\nReason: {}".format(target,
str(e))
else:
print("{} -> {}".format(target, SAVE_DIR + target))
print("Download finished!")
I am trying to upload a spreadsheet on Sharepoint for which I am using REST API function.
The code that I am using for generating the url as well as uploading the file is-
import sys
import requests, os
from requests_ntlm import HttpNtlmAuth
sharePointUrl = 'https://Sharepoint.asr.ith.itl.com/Skt/patchboard'
folderUrl = '/Documents/Patch_automation_work_area'
fileName='/abc/asc/roj/skx/skx_val/rsingh/Patch/Excel.xlsm'
#Setting up the url for requesting a file upload
requestUrl = sharePointUrl + '/_api/web/getfolderbyserverrelativeurl(\'' + folderUrl + '\')/Files/addas(url=\'' + fileName + '\',overwrite=true)'
print(requestUrl)
When printing the URL generated getting the output as-
https://Sharepoint.asr.ith.itl.com/Skt/patchboard/_api/web/getfolderbyserverrelativeurl('/Documents/Patch_automation_work_area')/Files/addas(url='/abc/asc/roj/skx/skx_val/rsingh/Patch/Excel.xlsm',overwrite=true)
So the complete URL is not generated for uploading the file and it is showing 404 error when accessing the link using requests module in python. Can somebody please help me why I am getting this erroe and how to generate link for uploading the document??
EDIT
my link for upload is something like this
https://sharepoint.asr.ith.itl.com/sites/SK/patchboard/_layouts/Upload.aspx?List={CE897D7B-8DC4-4F9C-AF4D-D41DB89DA6D3}&RootFolder=%2Fsites%2FSKX%2Fpatchboard%2FDocuments%2FPatch%5Fautomation%5Fwork%5Farea
This link brings me to a page where in I need to browse the complete path to the file and then after giving the path I would be able to upload the document.
My file path is-
/abc/asc/roj/skx/skx_val/rsingh/Patch/Excel.xlsm
Now I want to concatenate this file path to my above url so that a path for direct upload can be formed.Direct Concatenation is not working as I think direct concatenation does not knows the meaning of browse option and may be that's while its unable to put the file path at its desired location.
Can somebody tell me how to resolve it.
I have resolved the problem. Instead of giving the url link from the browser,I have given the base url for the sharepoint like-
https://sharepoint.asr.ith.itl.com
and then added path to the desired location in the sharepoint where I wanted to upload the file like-
sites/SK/patchboard/shared_documents/patch_work_area
This formed the complete link as-
https://sharepoint.asr.ith.itl.com/sites/SK/patchboard/shared_documents/patch_work_area
then I have used the command as-
curl --ntlm --user username:password --upload-file <filename> https://sharepoint.amr.ith.intel.com/sites/SK/patchboard/shared_documents/patch_work_area/<file_name to upload>
This had worked for me.