error in downloading a gif picture from url python - python

I want to download on disk the gif image:
http://www.portaportese.it/telefono/es_2014043024395.gif
with all the codes I found out around for downloading pictures, I end up with a error in the final saved picture such as:
GIF image was truncated or incomplete.
in a few words the picture is not being saved correctly.
Is there anybody able to provide a correct solution which will download this picture on disk?
Any code returns an empty image.. I tried this:
import urllib2
picture_page = "http://www.portaportese.it/telefono/es_2014043024395.gif"
opener1 = urllib2.build_opener()
page1 = opener1.open(picture_page)
my_picture = page1.read()
filename = "my_image.gif"
fout = open(filename, "wb")
fout.write(my_picture)
fout.close()

The problem does not lie with your Python code, the image that you are trying to download does not exist. If I use curl to place a request at that URL, you can see that no image is stored there.
~ ❯❯❯ curl -I http://www.portaportese.it/telefono/es_2014043024395.gif
HTTP/1.1 200 OK
Date: Mon, 07 Jul 2014 19:35:05 GMT
Server: Apache/2.2.3 (Red Hat)
Connection: close
Content-Type: text/plain; charset=UTF-8
Compare that with this request to a known image source:
~ ❯❯❯ curl -I http://baconmockup.com/300/200
HTTP/1.1 200 OK
Date: Mon, 07 Jul 2014 19:35:42 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.2
Access-Control-Allow-Origin: *
Content-Length: 20564
Content-Disposition: inline; filename=brisket-300-200.jpg
Pragma: public
Cache-Control: public
Expires: Mon, 21 Jul 2014 19:35:43 GMT
Last-Modified: Mon, 20 Aug 2012 19:20:21 GMT
Vary: User-Agent
Content-Type: image/jpeg
If you change the URL in your code to a good image source, then it will work perfectly well.
import urllib2
picture_page = "http://baconmockup.com/300/200"
opener1 = urllib2.build_opener()
page1 = opener1.open(picture_page)
my_picture = page1.read()
filename = "my_image.gif"
fout = open(filename, "wb")
fout.write(my_picture)
fout.close()
I just ran this, and was given a picture of some tasty brisket.

Related

How to serve a PNG with flask so it can be used as an attachment in Slack?

I am trying to send a PNG as an attachment, but it doesn't show up in Slack. I am setting the right parameters in the POST-method, but Slack refuses to use the image I am providing.
I am using flask to serve the static files:
#app.route('/data/<path:path>')
def send_png(path):
response = make_response(send_file("data/" + path))
return response
When I call the URL in my browser, the files gets displayed without any issues. When I pass the URL to slack as an attachment, the file doesn't show up.
When I pass the URL of an imgur-image, the attachment does get displayed.
For that reason, I assume the issue lies somewhere in the content-type/file-headers of the files flask serves.
My file headers are:
HTTP/1.0 200 OK
Content-Length: 391777
Content-Type: image/png
Last-Modified: Fri, 02 Mar 2018 22:46:41 GMT
Cache-Control: public, max-age=43200
Expires: Sat, 03 Mar 2018 12:48:53 GMT
ETag: "1520030801.2465587-391777-4064615867"
Server: Werkzeug/0.11.11 Python/3.5.2
Date: Sat, 03 Mar 2018 00:48:53 GMT
Connection: keep-alive
I can also verify, that Slack does request my attachment (just doesn't display it, as said before):
[('User-Agent', 'Slackbot 1.0 (+https://api.slack.com/robots)'), ('X-Forwarded-For', '54.89.92.4'), ('Content-Type', ''), ('Accept-Encoding', 'gzip,deflate'), ('Accept', '*/*'), ('Host', 'XXXXXXXXXX'), ('Referer', 'https://slack.com'), ('Content-Length', ''), ('X-Forwarded-Proto', 'https')]
What about this:, the image should go in files= if you want as an attachment
import io
import requests
from PIL import Image
img = Image.open('picture.png')
im = io.BytesIO(img.file.read())
r = requests.post("127.0.0.1", data=im, timeout=5, files=im)

Cors media confusion

I have an Issue with cors.
I am trying to access an mp3 I have on one server on a web page served by a different machine.
The server is setup with. https://gist.github.com/fxsjy/5465353
When I access the path directly it all works.
<httpProtocol>
<customHeaders>
<add name="Access-Control-Allow-Origin" value="*" />
<add name="Access-Control-Allow-Methods" value="POST, GET, OPTIONS" />
<add name="Access-Control-Allow-Headers" value="Content-type, Content-Length,Date,Last-Modified,Server" />
</customHeaders>
</httpProtocol>
I have the previous in my web.config and the following on my html.
<audio id="myAudio"
controls="controls"
src="http://{A IP}:{A PORT}/{A File Path}/{A file}.mp3"
type="audio/mpeg">
Your user agent does not support the HTML5 Audio element.
</audio>
The page response headers Are:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNetMvc-Version: 5.2
X-AspNet-Version: 4.0.30319
X-SourceFiles: =?UTF-8?B?QzpcVXNlcnNcc2Vhbi5oYW5zZm9yZFxEcm9wYm94XE11c2ljbzJcTXVzaWNvMlxNdXNpY28yXEhvbWVcVGVzdA==?=
X-Powered-By: ASP.NET
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: Content-type, Content-Length,Date,Last-Modified,Server
Date: Tue, 23 Jun 2015 00:20:11 GMT
Content-Length: 1233
When I put the Mp3 file locally it works when I access it remotely I get.
MediaElementAudioSource outputs zeroes due to CORS access restrictions for http://{A IP}:{A PORT}/{A File Path}/{A file}.mp3
As an error in chrome console. Firefox loads without errors but does nothing.
Both requests to the mp3 are returned as 200.
The response headers are.
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.9
Date: Tue, 23 Jun 2015 00:20:12 GMT
Content-type: audio/mpeg
Content-Length: 2882414
Last-Modified: Mon, 07 Feb 2011 10:31:23 GMT
Can anyone see what i'm doing wrong :(?
So dont ask me why but adding crossorigin="anonymous" to my audio tag fixes the issue...
<audio id="myAudio"
controls="controls"
src="TUNE.mp3"
type="audio/mpeg"
crossorigin="anonymous">
See https://bugzilla.mozilla.org/show_bug.cgi?id=937718
Also I have to add to the server response.

How to send huge files over HTTPPost method in Python, upload large files

I am tring to post huge .ova file through HTTPPost Method in Python
**ResponseHeaders**
Pragma no-cache
Date Thu, 18 Jul 2013 11:17:13 GMT
Content-Encoding gzip
Vary Accept-Encoding
Server Apache-Coyote/1.1
Transfer-Encoding chunked
Content-Language en-US
Content-Type application/json;charset=UTF-8
Cache-Control no-cache, no-store, max-age=0
Expires Thu, 01 Jan 1970 00:00:00 GMT
**RequestHeaders**
Content-Type application/json
Accept application/json
xyzAPIVersion 1.0
X-Requested-With XMLHttpRequest
How to send such a huge file(500 MB) through HTTPPost method through REST API.
You could use requests library:
import requests # $ pip install requests
with open("file.ova", "rb") as file:
requests.post(url, data=file)

why does urllib.urlopen(url) fail while urllib2.urlopen(url) works. What specifically about the server response is causing this?

I just want a better idea of what's going on here, I can of course "work around" the problem by using urllib2.
import urllib
import urllib2
url = "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
# urllib2 works fine (foo.headers / foo.read() also behave)
foo = urllib2.urlopen(url)
# urllib throws errors though, what specifically is causing this?
bar = urllib.urlopen(url)
http://pae.st/AxDW/ shows this code in action with the exception/stacktrace. foo.headers and foo.read() work fine
stu#sente.cc ~ $: curl -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
HTTP/1.1 302 Object Moved
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Location: /S-FSTWJcduy5w/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html
Server: Microsoft-IIS/7.5
Set-Cookie: SESSIONID=FSTWJcduy5w; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SYSTEMID=0; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SESSIONDATE=02/23/2012 17:07:00; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
X-AspNet-Version: 4.0.30319
HostName: cws105
Date: Thu, 23 Feb 2012 22:06:43 GMT
Thanks.
This server is both non-deterministic and sensitive to HTTP version. urllib2 is HTTP/1.1, urllib is HTTP/1.0. You can reproduce this by running curl --http1.0 -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"
a few times in a row. You should see the output curl: (52) Empty reply from server occasionally; that's the error urllib is reporting. (If you re-issue the request a bunch of times with urllib, it should succeed sometimes.)
I solved the Problem. I simply using now the urrlib instead of urllib2 and anything works fine thank you all :)

Decoding response while opening a URL

I am using the following code to open a url and retrieve it's response :
def get_issue_report(query):
request = urllib2.Request(query)
response = urllib2.urlopen(request)
response_headers = response.info()
print response.read()
The response I get is as follows :
<?xml version='1.0' encoding='UTF-8'?><entry xmlns='http://www.w3.org/2005/Atom' xmlns:gd='http://schemas.google.com/g/2005' xmlns:issues='http://schemas.google.com/projecthosting/issues/2009' gd:etag='W/"DUUFQH47eCl7ImA9WxBbFEg."'><id>http://code.google.com/feeds/issues/p/chromium/issues/full/2</id><published>2008-08-30T16:00:21.000Z</published><updated>2010-03-13T05:13:31.000Z</updated><title>Testing if chromium id works</title><content type='html'><b>What steps will reproduce the problem?</b>
<b>1.</b>
<b>2.</b>
<b>3.</b>
<b>What is the expected output? What do you see instead?</b>
<b>Please use labels and text to provide additional information.</b>
</content><link rel='replies' type='application/atom+xml' href='http://code.google.com/feeds/issues/p/chromium/issues/2/comments/full'/><link rel='alternate' type='text/html' href='http://code.google.com/p/chromium/issues/detail?id=2'/><link rel='self' type='application/atom+xml' href='https://code.google.com/feeds/issues/p/chromium/issues/full/2'/><author><name>rah...#google.com</name><uri>/u/#VBJVRVdXDhZCVgJ%2FF3tbUV5SAw%3D%3D/</uri></author><issues:closedDate>2008-08-30T20:48:43.000Z</issues:closedDate><issues:id>2</issues:id><issues:label>Type-Bug</issues:label><issues:label>Priority-Medium</issues:label><issues:owner><issues:uri>/u/kuchhal#chromium.org/</issues:uri><issues:username>kuchhal#chromium.org</issues:username></issues:owner><issues:stars>4</issues:stars><issues:state>closed</issues:state><issues:status>Invalid</issues:status></entry>
I would like to get rid of the characters like &lt, &gt etc. I tried using
response.read().decode('utf-8')
but this doesn't help much.
Just in case, the response.info() prints the following :
Content-Type: application/atom+xml; charset=UTF-8; type=entry
Expires: Fri, 01 Jul 2011 11:15:17 GMT
Date: Fri, 01 Jul 2011 11:15:17 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
Vary: Accept, X-GData-Authorization, GData-Version
GData-Version: 1.0
ETag: W/"DUUFQH47eCl7ImA9WxBbFEg."
Last-Modified: Sat, 13 Mar 2010 05:13:31 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close
Here's the URL : https://code.google.com/feeds/issues/p/chromium/issues/full/2
Sentinel has explained how you can decode entity references like < but there's a bit more to the problem than that.
The example you give suggests that you are reading an Atom feed. If you want to do this reliably in Python, then I recommend using Mark Pilgrim's Universal Feed Parser.
Here's how one would read the feed in your example:
>>> import feedparser
>>> d = feedparser.parse('http://code.google.com/feeds/issues/p/chromium/issues/full/2')
>>> len(d.entries)
1
>>> print d.entries[0].title
Testing if chromium id works
>>> print d.entries[0].description
<b>What steps will reproduce the problem?</b>
<b>1.</b>
<b>2.</b>
<b>3.</b>
<b>What is the expected output? What do you see instead?</b>
<b>Please use labels and text to provide additional information.</b>
Using feedparser is likely to be much more reliable and convenient than trying to do your own XML parsing, entity decoding, date parsing, HTML sanitization, and so on.
from HTMLParser import HTMLParser
import urllib2
query="http://code.google.com/feeds/issues/p/chromium/issues/full/2"
def get_issue_report(query):
request = urllib2.Request(query)
response = urllib2.urlopen(request)
response_headers = response.info()
return response.read()
s = get_issue_report(query)
p = HTMLParser()
print p.unescape(s)
p.close()
Use
xml.sax.saxutils.unescape()
http://docs.python.org/library/xml.sax.utils.html#module-xml.sax.saxutils

Categories