I am using the the following code to read the URL's in a text files and save the results in an another text file
import requests
with open('text.txt', 'r') as f: #text file containing the URLS
for url in f:
f = requests.get(url)
print (url)
print(f.text)
file=open("output.txt", "a") #output file
For some reason I am getting a {"error":"Permission denied"} message for each URL. I can paste the URL in the browser and get the correct response. I also tried with the following code and it worked OK on a singular URL.
import requests
link = "http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524"
f = requests.get(link)
print(f.text, file=open("output11.txt", "a"))
The txt file contains the following urls
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=22_Topografikartta_20k%2F3%2F3742%2F374207
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4432
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=21_Peruskartta_20k%2F3%2F3341%2F334112
I assume I am missing something very simple...Any clues?
Many thanks
Each line has a trailing newline. Simply strip it:
for url in f:
url = url.rstrip('\n')
...
you have to use content from the response-
you can use this code in loop
import requests
download_url="http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524"
response = requests.get(download_url, stream = True)
with open("document.txt", 'wb') as file:
file.write(response.content)
file.close()
print("Completed")
Related
I would like to download a file like this: https://www.bbs.unibo.it/conferma/?var=FormScaricaBrochure&brochureid=61305 with Python.
The problem is that is not directly a link to the file, but I only get the file id with query string.
I tried this code:
import requests
remote_url = "https://www.bbs.unibo.it/conferma/"
r = requests.get(remote_url, params = {"var":"FormScaricaBrochure", "brochureid": 61305})
But only the HTML is returned. How can I get the attached pdf?
You can use this example how to download the file using only brochureid:
import requests
url = "https://www.bbs.unibo.it/wp-content/themes/bbs/brochure-download.php?post_id={brochureid}&presentazione=true"
brochureid = 61305
with open("file.pdf", "wb") as f_out:
f_out.write(requests.get(url.format(brochureid=brochureid)).content)
Downloads the PDF to file.pdf (screenshot):
I am able to save the response of a single URL but I have a list of URLs in a .txt file and would like to print the all the responses. How can I read the URLs from the .txt file and save the responses in python?
This is what I currently have. Thanks!
import requests
data = requests.get('www.url.com')
with open('file.txt', 'wb') as f:
f.write(data.text)
you first want to read the urls from a file 'infile.txt', then iteratively send the requests and write the data to an outfile 'outfile.txt'.
with open('infile.txt', 'r') as f:
urls = f.readlines()
datalist=[]
for url in urls:
data = requests.get(url)
datalist.append(data.text)
with open('outfile.txt', 'w') as f:
for item in datalist:
f.write("%s\n" % item)
I need to download a file from an external source, I am using Basic authentication to login to the URL
import requests
response = requests.get('<external url', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<API URL to download the attachment>', auth=('<username>', '<password>'), stream=True)
print (data.content)
I am getting below output
<url to download the binary data>
\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\xcb\x00\x00\x1e\x00\x1e\x00\xbe\x07\x00\x00.\xcf\x05\x00\x00\x00'
I am expecting the URL to download the word document within the same session.
Working solution
import requests
import shutil
response = requests.get('<url>', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<url>', auth=('<username>', '<password>'), stream=True)
with open("C:/myfile.docx", 'wb') as f:
data.raw.decode_content = True
shutil.copyfileobj(data.raw, f)
I am able to download the file as it is.
When you want to download a file directly you can use shutil.copyfileobj():
https://docs.python.org/2/library/shutil.html#shutil.copyfileobj
You already are passing stream=True to requests which is what you need to get a file-like object back. Just pass that as the source to copyfileobj().
I'm trying to open a csv file from a url but for some reason I get an error saying that there is an invalid mode or filename. I'm not sure what the issue is. Help?
url = "http://...."
data = open(url, "r")
read = csv.DictReader(data)
Download the stream, then process:
import urllib2
url = "http://httpbin.org/get"
response = urllib2.urlopen(url)
data = response.read()
read = csv.DictReader(data)
I recommend pandas for this:
import pandas as pd
read = pandas.io.parsers.read_csv("http://....", ...)
please see the documentation.
You can do the following :
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
Slightly tongue in cheek:
require json
>>> for line in file(','):
... print json.loads('['+line+']')
CSV is not a well defined format. JSON is so this will parse a certain type of CSV correctly every time.
I am trying to automatically download files from a website using a list of URLs that I already have. The relevant part of my code looks like this:
for url in urls:
if len(url) != 0:
print url
Running this prints a list of urls as strings - as expected. However, when I add one new line as below:
for url in urls:
if len(url) != 0:
print url
r = requests.get(url)
an error appears saying "Invalid URL u'Document Detail': No schema supplied." Before this breaks, it is supposed to print a url. Previously, this printed the url as expected. However, now it prints "Document Detail" instead of a URL. I'm not quite sure why this happens and how to resolve it.
Any help would be appreciated!
EDIT
urls = []
with open('filename.csv', 'rb') as f:
reader = csv.reader(f)
count = 0
for row in reader:
urls.append(row[34])
With reference to my comment, "Document Details" is the header of your csv. Skip it. Here's one way to do it.
urls = []
with open('filename.csv', 'rb') as f:
read = f.readlines()
urls = [row.split(",")[34] for row in read[1:]]
It is possible that the layout of your csv file has changed and the url is no longer in column 33 i.e. (34 - 1 since rows is zero based).
The you should convert url to string explicitly:
for url in urls:
if len(url) != 0:
print str(url)
r = requests.get(str(url))
And maybe you can give us some piece of your .csv file please.