I have written some code to read the contents from a specific url:
import requests
import os
def read_doc(doc_ID):
filename = doc_ID + ".txt"
if not os.path.exists(filename):
my_url = encode_url(doc_ID) #this is a call to another function that would encode the url
my_response = requests.get(my_url)
if my_response.status_code == requests.codes.ok:
return my_response.text
return None
This checks if there's a file named doc_ID.txt (where doc_ID could be any name provided). And if there's no such file, it would read the contents from a specific url and would return them. What I would like to do is to store those returned contents in a file called doc_ID.txt. That is, I would like to finish my function by creating a new file in case it didn't exist at the beginning.
How can I do that? I tried this:
my_text = my_response.text
output = os.rename(my_text, filename)
return output
but then, the actual contents of the file would become the name of the file and I would get an error saying the filename is too long.
So the issue I think I'm seeing is that you want to put the contents of your request's response into the file, rather than naming the file with the contents. The code below should create a file with the filename you want, and insert the text from your response!
import requests
import os
def read_doc(doc_ID):
filename = doc_ID + ".txt"
if not os.path.exists(filename):
my_url = encode_url(doc_ID) #this is a call to another function that would encode the url
my_response = requests.get(my_url)
if my_response.status_code == requests.codes.ok:
with open(filename, "w") as file:
file.write(my_response.text)
return file
return None
To write the response text to the file, you can simply use python file object, https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
with open(filename, "w") as file:
file.write(my_text)
Related
I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?
Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.
You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)
I'm trying to download a load of text in Python and want it all to save to a single file.
The code I'm currently using creates a separate file for each url. It loops through an archive of urls, requests the data and then saves it to its own file.
filename = archive[i]
urllib.request.urlretrieve(url, path + filename + ".pgn")
I've tried using the same filename for each url but it just overwrites the file.
Is there a way to loop through the archive and, rather than saving the data in its own separate file, add each block of text to a single file? Or do I need to just loop through all the files afterwards and concatenate them together?
Python's urlretrive docs says that
If you wish to retrieve a resource via URL and store it in a temporary location, you can do so via the urlretrieve() function
so if you wish to append the retrieved data in one file you have use urlopen for that
Like this :
import urllib.request
filename = "MY_FILE_PATH"
#-----------inside your i loop-------------
with urllib.request.urlopen(url) as response:
data = response.read()
# change your file type according e.g. "ab" for binary file
with open(filename + ".pgn", "a+") as fp: fp.write(str(data))
Note that urlretrieve might become deprecated at some point in the future. So use urlopen instead.
import urllib.request
import shutil
...
filename = archive[i]
with urllib.request.urlopen(url) as response, open(filename, 'ab') as out_file:
shutil.copyfileobj(response, out_file)
I have a long list of .json files that I need to download to my computer. I want to download them as .json files (so no parsing or anything like that at this point).
I have some code that works for small files, but it is pretty buggy. Also it doesn't handle multiple links well.
Appreciate any advice to fix up this code:
import os
filename = 'test.json'
path = "C:/Users//Master"
fullpath = os.path.join(path, filename)
import urllib2
url = 'https://www.premierlife.com/secure/index.json'
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close
It's creating a blank file because the f.close at the end should be f.close().
I took your code and made a little function and then called it on a little loop to go through a .txt file with the list of urls called "list_of_urls.txt" having 1 url per line (you can change the delimiter in the split function if you want to format it differently).
def save_json(url):
import os
filename = url.replace('/','').replace(':','')
# this replaces / and : in urls
path = "C:/Users/Master"
fullpath = os.path.join(path, filename)
import urllib2
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close()
And then the loop:
f = open('list_of_urls.txt')
p = f.read()
url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed
for url in url_list:
save_json(url)
I am running Python 3.x. So i have been working on some code for fetching data on currencies names around the world from a currency website to get information which the code is as follows
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
dumps = json.dumps(decoded, indent=4)
return dumps
I then need to save it as a file locally but having some issue and cant see where.
Here is the code for saving the currencies:
def save_currencies(_fetch_currencies, filename):
sorted_currencies = sorted(decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
They just don't seem to work together apart from when i remove the line ' dumps = json.dumps(decoded, indent=4) ' but i need that line to be able to print the file in text, how do i get around deleting this line and still be able to save and print? How do i also pick where it saves?
Any Help will be great, thank you very much anyone and everyone who answers/reads this.
I may be mistaken, but your "decoded" variable should be declared as global in both functions.
I would actually have _fetch_currencies() return a dictionary, and then I would pass that dictionary on to saved_currencies(currencies_decoded, filename). For example:
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
return decoded
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
my_currencies_decoded = _fetch_currencies()
save_currencies(my_currencies_decoded, "filename.csv")
Furthermore, if you would like to save your csv file to a certain location in your filesystem, you can import os and use the os.path.join() function and provide it the FULL path. For example, to save your .csv file to a location called "/Documents/Location/Here", you can do:
import os
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(os.path.join("Documents","Location","Here"), 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
You can also use a relative path, so that if you're already in directory "Documents", and you'd like to save a file in "/Documents/Location/Here", you can instead just say:
with open(os.path.join("Location", "Here"), 'w') as my_csv:
import urllib2
import urllib
import json
import urlparse
def main():
f = open("C:\Users\Stern Marketing\Desktop\dumpaday.txt","r")
if f.mode == 'r':
item = f.read()
for x in item:
urlParts = urlparse.urlsplit(x)
filename = urlParts.path.split('/')[-1]
urllib.urlretrieve(item.strip(), filename)
if __name__ == "__main__":
main()`
Looks like script still not working properly, I'm really not sure why... :S
Getting lots of errors...
urllib.urlretrieve("x", "0001.jpg")
This will try to download from the (static) URL "x".
The URL you actually want to download from is within the variable x, so you should write your line to reference that variable:
urllib.urlretrieve(x, "0001.jpg")
Also, you probably want to change the target filename for each download, so you don’t keep on overwriting it.
Regarding your filename update:
urlparse.urlsplit is a function that takes an URL and splits it into multiple parts. Those parts are returned from the function, so you need to save it in some variable.
One part is the path, which is what contains the file name. The path itself is a string on which you can call the split method to separate it by the / character. As you are interested in only the last part—the filename—you can discard everything else:
url = 'http://www.dumpaday.com/wp-content/uploads/2013/12/funny-160.jpg'
urlParts = urlparse.urlsplit(url)
print(urlParts.path) # /wp-content/uploads/2013/12/funny-160.jpg
filename = urlParts.path.split('/')[-1]
print(filename) # funny-160.jpg
It should work like this:
import urllib2
import urllib
import json
import urlparse
def main():
with open("C:\Users\Stern Marketing\Desktop\dumpaday.txt","r") as f:
for x in f:
urlParts = urlparse.urlsplit(x.strip())
filename = urlParts.path.split('/')[-1]
urllib.urlretrieve(x.strip(), filename)
if __name__ == "__main__":
main()`
The readlines method of file objects returns lines with a trailing newline character (\n).
Change your loop to the following:
# By the way, you don't need readlines at all. Iterating over a file yields its lines.
for x in fl:
urllib.urlretrieve(x.strip(), "0001.jpg")
Here is a solution that loops over images indexed 160 to 171. You can adjust as needed. This creates a url from the base, opens it via urllib2 and saves it as a binary file.
import urllib2
base_url = "http://www.dumpaday.com/wp-content/uploads/2013/12/funny-{}.jpg"
for n in xrange(160, 170):
url = base_url.format(n)
f_save = "{}.jpg".format(n)
req = urllib2.urlopen(url)
with open(f_save,'wb') as FOUT:
FOUT.write(req.read())