I am connecting to the API that returns a JSON object with several pieces of data and use that data to build an html page. I am having trouble with downloading a local copy of the image using python and including the image tag linking to the image. When I run the code, I receive the error stating AttributeError: 'tuple' object has not attribute 'content'. I have the following code:
import urllib.request
import json
out = open('outfile.txt','w')
link = "https://api.nasa.gov/planetary/apod?api_key="
print(link)
resp = urllib.request.urlopen(link)
data = resp.read()
print(str(data, 'utf-8'))
returnJson = json.loads(data)
img_url = returnJson['url']
title = returnJson['title']
current_date = returnJson['date']
print(img_url)
print(title)
print(current_date)
resp = urllib.request.urlretrieve(img_url)
img_file_name = img_url.split('/')[-1]
with open(img_file_name, 'wb') as f:
f.write(resp.content)
urllib.request.urlretrieve returns a tuple, which doesn't have a content attribute. Instead, it copies the content to a local file. Moreover, this function is legacy and may be deprecated in the future, according to the docs. I would recommend following the advice in the urllib.request docs, which is:
The Requests package is recommended for a higher-level HTTP client interface.
Firstly, your API key is in your question - you might want to edit that out so no one else uses it!
The error is in the last line of the sample you've given us
f.write(resp.content)
At this point, resp is set to the response of urllib.request.urlretrieve(img_url). However, urllib.request.urlretrieve actually returns a tuple - (filename, headers). The filename is where the downloaded resource is stored on the system, and headers is the response headers for the request.
Modifying your code, I believe this might be more what you want?
import os
#rest of your code here
(filename, headers) = urllib.request.urlretrieve(img_url)
img_file_name = img_url.split('/')[-1]
os.replace(filename, img_file_name)
EDIT: os.rename doesn't seem to like existing files, however os.replace does!
Related
Basically, my goal is to fetch the filename, extension and the content of an image by its url. And my fuction should work for both of these urls:
easy case:
https://image.shutterstock.com/image-photo/bright-spring-view-cameo-island-260nw-1048185397.jpg
hard case (does not end with filename.extension ):
https://images.unsplash.com/photo-1472214103451-9374bd1c798e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
Currently, what I have looks like this:
from os.path import splitext, basename
def get_filename_from_url(url):
result = urllib.request.urlretrieve(url)
filename, file_ext = splitext(basename(result.path))
print(filename, file_ext)
This works fine for the easy case. But apparently, no solution in case of hard-case url. But I have a feeling that that I can use python's requests module and parse the header to find the mimetype and then use the same module's guesstype functionality to extract the necessary data. So I went on to try this:
import requests
response = requests.get(url, stream=True)
Here, someone seems to describe the clue, saying that
but the problem is that using the hard-case url I get something strange in the response dict items, and maybe my key issue is that I don't know the correct way to parse the header of the response to extract what I need.
I've tried a third approach using urlparse:
from urllib.parse import urlparse
result = urlparse(self.url)
print(os.path.basename(a.path)) # 'photo-1472214103451-9374bd1c798e'
which yields the filename, but again, I miss the extension here...
The ideal solution would be to get the filename, file extension and file content in one go, preferrably being able to validate that the url actually contains an image, not something else...
UPD:
The result1 elemet in result = urllib.request.urlretrieve(self.url) seems to contain the Content-Type, by I can't figure out how to extract it correctly.
One way is to query the content type:
>>> from urllib.request import urlopen
>>> response = urlopen(url)
>>> response.info().get_content_type()
'image/jpeg'
or using urlretrieve as in your edit:
>>> response = urllib.request.urlretrieve(url)
>>> response[1].get_content_type()
Please correct me if I am wrong as I am a beginner in python.
I have a web services URL which contains an XML file:
http://abc.tch.xyz.edu:000/patientlabtests/id/1345
I have a list of values and I want to append each value in that list to the URL and download file for each value and the name of the downloaded file should be the same to the value appended from the list.
It is possible to download one file at a time but I have 1000's of values in the list and I was trying to write a function with a for loop and I am stuck.
x = [ 1345, 7890, 4729]
for i in x :
url = http://abc.tch.xyz.edu:000/patientlabresults/id/{}.format(i)
response = requests.get(url2)
****** Missing part of the code ********
with open('.xml', 'wb') as file:
file.write(response.content)
file.close()
The files downloaded from URL should be like
"1345patientlabresults.xml"
"7890patientlabresults.xml"
"4729patientlabresults.xml"
I know there is a part of the code which is missing and I am unable to fill in that missing part. I would really appreciate if anyone can help me with this.
Accessing your web service url seem not to be working. Check this.
import requests
x = [ 1345, 7890, 4729]
for i in x :
url2 = "http://abc.tch.xyz.edu:000/patientlabresults/id/"
response = requests.get(url2+str(i)) # i must be converted to a string
Note: When you use 'with' to open a file, you do not have close the file since it will closed automatically.
with open(filename, mode) as file:
file.write(data)
Since the Url you provide is not working, I am going to use a different url. And I hope you get the idea and how to write to a file using the custom name
import requests
categories = ['fruit', 'car', 'dog']
for category in categories :
url = "https://icanhazdadjoke.com/search?term="
response = requests.get(url + category)
file_name = category + "_JOKES_2018" #Files will be saved as fruit_JOKES_2018
r = requests.get(url + category)
data = r.status_code #Storing the status code in 'data' variable
with open(file_name+".txt", 'w+') as f:
f.write(str(data)) # Writing the status code of each url in the file
After running this code, the status codes will be written in each of the files. And the file will also be named as follows:
car_JOKES_2018.txt
dog_JOKES_2018.txt
fruit_JOKES_2018.txt
I hope this gives you an understanding of how to name the files and write into the files.
I think you just want to create a path using str.format as you (almost) are for the URL. maybe something like the following
import os.path
x = [ 1345, 7890, 4729]
for i in x:
path = '1345patientlabresults.xml'.format(i)
# ignore this file if we've already got it
if os.path.exists(path):
continue
# try and get the file, throwing an exception on failure
url = 'http://abc.tch.xyz.edu:000/patientlabresults/id/{}'.format(i)
res = requests.get(url)
res.raise_for_status()
# write the successful file out
with open(path, 'w') as fd:
fd.write(res.content)
I've added some error handling and better behaviour on retry
With Python 3, I want to read an XML web page and save it in my local drive.
Also, if the file already exist, it must overwrite it.
I tested some script like :
import urllib.request
xml = urllib.request.urlopen('URL')
data = xml.read()
file = open("file.xml","wb")
file.writelines(data)
file.close()
But I have an error :
TypeError: a bytes-like object is required, not 'int'
First suggestion: do what even the official urllib docs says and don't use urllib, use requests instead.
Your problem is that you use .writelines() and it expects a list of lines, not a bytes objects (for once in Python the error message is not very helpful). Use .write() instead
import requests
resp = requests.get('URL')
with open('file.xml', 'wb') as foutput:
foutput.write(resp.content)
I found a solution :
from urllib.request import urlopen
xml = open("import.xml", "r+")
xml.write(urlopen('URL').read().decode('utf-8'))
xml.close()
Thanks for your help.
Unable to download the converted file from zamzar api using python program, as specified on the https://developers.zamzar.com/docs but as i am using the code correctly along with api key. It is only showing error code : 20. Wasted 4hour behind this error, someone please.
import requests
from requests.auth import HTTPBasicAuth
file_id =291320
local_filename = 'afzal.txt'
api_key = 'my_key_of_zamzar_api'
endpoint = "https://sandbox.zamzar.com/v1/files/{}/content".format(file_id)
response = requests.get(endpoint, stream=True, auth=HTTPBasicAuth(api_key, ''))
try:
with open(local_filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
print("File downloaded")
except IOError:
print("Error")
THis is the code I am using for downloading the converted file.
This code easily convert files into different formats :
import requests
from requests.auth import HTTPBasicAuth
#--------------------------------------------------------------------------#
api_key = 'Put_Your_API_KEY' #your Api_key from developer.zamzar.com
source_file = "tmp/armash.pdf" #source_file_path
target_file = "results/armash.txt" #target_file_path_and_name
target_format = "txt" #targeted Format.
#-------------------------------------------------------------------------#
def check(job_id,api_key):
check_endpoint = "https://sandbox.zamzar.com/v1/jobs/{}".format(job_id)
response = requests.get(check_endpoint, auth=HTTPBasicAuth(api_key, ''))
#print(response.json())
#print(response.json())
checked_data=response.json()
value_list=checked_data['target_files']
#print(value_list[0]['id'])
return value_list[0]['id']
def download(file_id,api_key,local_filename):
downlaod_endpoint = "https://sandbox.zamzar.com/v1/files/{}/content".format(file_id)
download_response = requests.get(downlaod_endpoint, stream=True, auth=HTTPBasicAuth(api_key, ''))
try:
with open(local_filename, 'wb') as f:
for chunk in download_response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
print("File downloaded")
except IOError:
print("Error")
endpoint = "https://sandbox.zamzar.com/v1/jobs"
file_content = {'source_file': open(source_file, 'rb')}
data_content = {'target_format': target_format}
res = requests.post(endpoint, data=data_content, files=file_content, auth=HTTPBasicAuth(api_key, ''))
print(res.json())
data=res.json()
#print(data)
print("=========== Job ID ============\n\n")
print(data['id'])
target_id=check(data['id'],api_key)
print("\n================= target_id ===========\n\n")
print(target_id)
download(target_id,api_key,target_file)
Hope this well somebody!.
I'm the lead developer for the Zamzar API.
So the Zamzar API docs contain a section on error codes (see https://developers.zamzar.com/docs#section-Error_codes). The relevant code for your error is:
{
"message" : "API key was missing or invalid",
"code" : 20
}
This can mean either that you did not specify an API key at all or that the API key used was invalid for the file you are attempting to download. It seems more likely to be the latter, since your code contains an api_key variable.
Looking at your code it's possible that you have used the job ID (291320) to try and download your file, when in fact you should be using a file ID.
Each conversion job can output 1 or more converted files and you need to specify the file ID for the one you wish to grab. You can see a list of all converted file ID's for your job by querying /jobs/ID and looking at the target_files array. This is outlined in the API docs at https://developers.zamzar.com/docs#section-Download_the_converted_file
So if you change your code to use the file ID from the target_files array of your Job your download should spring into life.
I'm sorry you wasted time on this. Clearly if it has reached S.O. our docs haven't done a good enough job of explaining this distinction so we'll look at what we can do to make them clearer.
Happy converting !
I think this block of code is pretty close to being right, but something is throwing it off. I'm trying to loop through 10 URLs and download the contents of each to a text file, and make sure everything is structured orderly, in a dataframe.
import pandas as pd
rawHtml = ''
url = r'http://www.pga.com/golf-courses/search?page=" + i + "&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0'
g = open("C:/Users/rshuell001/Desktop/MyData.txt", "w")
for i in range(0, 10):
df = pd.DataFrame.from_csv(url)
print(df)
g.write(str(df))
g.close()
The error that I get says:
CParserError: Error tokenizing data.
C error: Expected 1 fields in line 22, saw 2
I have no idea what that means. I only have 9 lines of code, so I don't know why it's mentioning a problem on line 22.
Can someone give me a push to get this working?
pandas.DataFrame.from_csv() takes a first argument which is either a path or a file-like handle, where either are supposed to be pointing at valid CSV file.
You are providing it with a URL.
It seems that you want to use a different function: the top-level pandas.read_csv. This function will actually fetch the data from you from a valid URL, then parse it.
If for any reason you insist on using pandas.DataFrame.from_csv(), you will have to:
Get the text from the page.
Persist the text, or parts thereof, as a valid CSV file, or a file-like object.
Provide the path to the file, or the handler of the file-like, as the first argument to pandas.DataFrame.from_csv().
I finally got it working. This is what I was trying to do all along.
import requests
from bs4 import BeautifulSoup
link = "http://www.pga.com/golf-courses/search?page=1&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
html = requests.get(link).text
soup = BeautifulSoup(html, "lxml")
res = soup.findAll("div", {"class": "views-field-nothing"})
for r in res:
print("Address: " + r.find("span", {'class': 'field-content'}).text)