If I'm downloading a file with a certain extension from a certain link, but want to download the file with another extension (e.g. .doc instead of .bin), how would I go about doing this in python code?
It can be done in the following way:
Download file in the default / original format
Use pypandoc in a Python script to create a new file with the desired format from the original file.
delete original file.
These 3 steps can all be automated from a Python script.
https://pypi.org/project/pypandoc/
Example, convert a markdown file to a rst-file (remember to correct the URL):
import os
import requests
import pypandoc
# Download file
# TODO: Update URL
url = 'some_url/somefile.md'
r = requests.get(url)
orig_file = '/Users/user11508332/Downloads/somefile.md'
with open(orig_file, 'wb') as f:
f.write(r.content)
# pypandoc file extention conversion
output = pypandoc.convert_file(orig_file, 'rst')
# TODO: Place a check here to see if the new file got created
# Clean-up: Delete original file
# TODO: Place a check here to see if the old file still exists, in that case, proceed with deletion:
# os.remove(orig_file)
Related
How to download specific files from .txt url?
I have a url https://.storage.public.eu/opendata/files/open_data_files_access.txt (not real) with multiple files (here are just a few, in reality there are around 5k files) that can be downloaded separately, however I would need to download only specific files, and do this with Python.
For instance, I have a list with folder name and list of file name. How do I download only those file that are on the list? Let's say the list is:
files = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
How to download only these in the list and save in specified directory?
I assume that the answer is somewhere here, but no work for me:
uurl = 'https://.storage.public.eu/opendata/files/open_data_files_access.txt'
from requests import get # to make GET request
def download(url, file_name):
# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
file_name' = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
download(uurl, file_name)
Exactly as the title says, I have this code
from shareplum import Site
from shareplum import Office365
from shareplum.site import Version
authcookie = Office365('https://mysite.sharepoint.com/', username='username', password='password').GetCookies()
site = Site('https://mysite.sharepoint.com/sites/mysite/', version=Version.v2016, authcookie=authcookie)
folder = site.Folder('Shared Documents/Beta Testing')
file = folder.get_file('practice.xlsx')
with open("practice.xlsx", "wb") as fh:
fh.write(file)
print('---')
folder.upload_file('xlsx', 'practice.xlsx')
Currently it downloads the file just fine which is fantastic, however I do not know how to reverse what I did with opening and downloading the file. Basically I need to be able to upload the file with the exact same name as the one I downloaded in the exact same format (in this case xlsx) as to overwrite the one in the sharepoint with the updated document.
Your post indicates that you want to modify the file so you will need some file handling for the downloaded file once it is saved after modification. Once the file modification has been done you need to open the file in 'rb' and then read that to a variable which will be the content when calling folder_obj.upload_file(content, name).
#this is your step to modify the file.
with open("practice.xlsx", "wb") as fh:
#file modification stuff... pyxlsx?
fh.write(file)
#open the file and read it into a variable as binary
with open("practice.xlsx", "rb") as file_obj:
file_as_string = file_obj.read()
#upload the file including the file name and the variable (file_as_string)
folder.upload_file(file_as_string, 'practice.xlsx')
This has been working for me. If you want to change the name of the file to include a version, delete the old file by calling folder.delete_file("practice.xlsx").
Can you try the below and see if it works?
with open("practice.xlsx", "wb") as fh:
file_content = fh.write(file)
folder.upload_file(file_content,'practice.xlsx')
I need to open the page automatically and download the file returned by the server
I have a simple code to open the page and download the content. I am also pulling the headers so I know the name of the returned file. below is the code
downloadPageRequest = self.reqSession.get( self.url_file ,stream=True)
headers = downloadPageRequest.headers
if 'content-disposition' in headers:
file_name = re.findall("filename=(.+)", headers['content-disposition'])
that's what I got, it returns an array with the filename, but now I am stuck and have no idea how to open and go through returned excel file
this has to be done using requests, that's why i cannot use any other method (e.g selenium)
will be thankful for your support
The link to the code is here (didn´t copy it here to give the guy credit):
I don´t want it to change the name with the date as is currently doing, but to download the file "finviz.csv" and rewrite it each day (with the scheduler task) to keep the data updated in my data system.
I´ve tried some tweaks, but I´m no developer I don´t have a clue how to do it.
Can you please help?
The comments in the code described it quite clearly:
# we're going to name the file by the date it was downloaded (e.g. 2012-3-18.csv)
fname = now.strftime("%Y-%m-%d")+".csv";
So just change the line to
fname = "finviz.csv";
And fix the file existence check logic:
# check if the file does not already exist
if not os.path.isfile(savepath+"/"+fname):
# open a file to save the data to ("wb" means write binary mode)
outfile = open(savepath+"/"+fname, "wb");
# download the data from the url specified above
infile = urllib2.urlopen(url);
# read the downloaded data and write it to our output file
outfile.write(infile.read());
# close the output file once we're done
outfile.close();
else:
print "'"+fname+"' ALREADY EXISTS in the save directory '"+savepath+"'.";
to:
# open a file to save the data to ("wb" means write binary mode)
outfile = open(savepath+"/"+fname, "wb");
# download the data from the url specified above
infile = urllib2.urlopen(url);
# read the downloaded data and write it to our output file
outfile.write(infile.read());
# close the output file once we're done
outfile.close();
You have to change the line
fname = now.strftime("%Y-%m-%d")+".csv";
for
fname = "finviz.csv";
And you also need to delete this if (and its corresponding else):
if not os.path.isfile(savepath+"/"+fname):
I'm working on a script that will automatically update an installed version of Calibre. Currently I have it downloading the latest portable version. I seem to be having trouble saving the zipfile. Currently my code is:
import urllib2
import re
import zipfile
#tell the user what is happening
print("Calibre is Updating")
#download the page
url = urllib2.urlopen ( "http://sourceforge.net/projects/calibre/files" ).read()
#determin current version
result = re.search('title="/[0-9.]*/([a-zA-Z\-]*-[0-9\.]*)', url).groups()[0][:-1]
#download file
download = "http://status.calibre-ebook.com/dist/portable/" + result
urllib2.urlopen( download )
#save
output = open('install.zip', 'w')
output.write(zipfile.ZipFile("install.zip", ""))
output.close()
You don't need to use zipfile.ZipFile for this (and the way you're using it, as well as urllib2.urlopen, has problems as well). Instead, you need to save the urlopen result in a variable, then read it and write that output to a .zip file. Try this code:
#download file
download = "http://status.calibre-ebook.com/dist/portable/" + result
request = urllib2.urlopen( download )
#save
output = open("install.zip", "w")
output.write(request.read())
output.close()
There also can be a one-liner:
open('install.zip', 'wb').write(urllib.urlopen('http://status.calibre-ebook.com/dist/portable/' + result).read())
which doesn't have a good memory-efficiency, but still works.
If you just want to download a file from the net, you can use urllib.urlretrieve:
Copy a network object denoted by a URL to a local file ...
Example using requests instead of urllib2:
import requests, re, urllib
print("Calibre is updating...")
content = requests.get("http://sourceforge.net/projects/calibre/files").content
# determine current version
v = re.search('title="/[0-9.]*/([a-zA-Z\-]*-[0-9\.]*)', content).groups()[0][:-1]
download_url = "http://status.calibre-ebook.com/dist/portable/{0}".format(v)
print("Downloading {0}".format(download_url))
urllib.urlretrieve(download_url, 'install.zip')
# file should be downloaded at this point
have you tryed
output = open('install.zip', 'wb') // note the "b" flag which means "binary file"