Python help - parse xml - python

Hey i'm trying to parse yahoo weather xml using python and this is the code :
#!/usr/bin/env python
import subprocess
import urllib
from xml.dom import minidom
WEATHER_URL = 'http://weather.yahooapis.com/forecastrss?w=55872649&u=c'
WEATHER_NS = 'http://xml.weather.yahoo.com/ns/rss/1.0'
dom = minidom.parse(urllib.urlopen(WEATHER_URL))
ycondition = dom.getElementsByTagNameNS(WEATHER_NS, 'condition')[0]
CURRENT_OUTDOOR_TEMP = ycondition.getAttribute('temp')
print(CURRENT_OUTDOOR_TEMP)
Why am i getting this error on IIS7?
Traceback (most recent call last): File "C:\inetpub\wwwroot\22.py", line 16, in dom = minidom.parse(urllib.urlopen(WEATHER_URL))
File "C:\Python27\lib\urllib.py", line 86, in urlopen return opener.open(url)
File "C:\Python27\lib\urllib.py", line 207, in open return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 344, in open_http h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 954, in endheaders self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send self.connect()
File "C:\Python27\lib\httplib.py", line 757, in connect self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 571, in create_connection
raise err IOError: [Errno socket error] [Errno 10061] No connection could be made because the target machine actively refused it
Please help...
thanks

It's look like that your firewall blocks the access to weather.yahooapis.com.
Check your firewall logs
Allow the access to the domain weather.yahooapis.com

could be several reasons:
You're behind proxy, so you should provide code that first connect to proxy then urlopen()
some firewall on your PC or Gateway forbid connecting to that site.
your antivirus software get suspicious about your program. rare but possible.
the website detected you're a bot, not browser. so closed the connection. for example from User Agent and etc.
make sure server is not ssl only.
hope it helps you diag the problem.

import json
import codecs
import urllib , cStringIO
import string
from Tkinter import *
weather_api =urllib.urlopen('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22nagercoil%2C%20IND%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys')
weather_json = json.load(weather_api)
data_location = weather_json["query"]['results']['channel']['location']
place = data_location['city']
data_item = weather_json["query"]['results']['channel']['item']
fore_cast = data_item['forecast'][0]['text']
temp = data_item['condition']['temp']
date = data_item['condition']['date']
data_image= weather_json["query"]['results']['channel']['image']
root = Tk()
f = Frame(width=800, height=546, bg='green', colormap='new')
w0 = Label(root, fg='#FF6699', text=fore_cast, font=("Helvetica",15))
w = Label(root, fg='blue', text=temp + u'\N{DEGREE SIGN}'+'F', font=("Helvetica",56))
w1 = Label(root, fg='#3399CC', text=place, font=("Helvetica",15))
w0.pack()
w.pack()
w1.pack()
root.mainloop()
simple weather forecast application using python2.7.
select api from this web address https://developer.yahoo.com/weather/ then simply decode the json object . json object contains large amount of data. then easily show data using Tkinter final output

Related

Connecting to a public FTP behind a corporate proxy

I'm trying to access a public FTP in my work, but I'm getting an error.
That's is the code that I used in my house.
from ftplib import FTP
ftp = FTP('ftp.cetip.com.br')
ftp.login()
ftp.cwd('/MediaCDI')
ftp.quit()
It's work fine in my home, but in my work I get this error:
Traceback (most recent call last):
File "C:\Users\TBMEPYG\Desktop\stack.py", line 3, in <module>
ftp = FTP('ftp.cetip.com.br')
File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\ftplib.py", line 117, in __init__
self.connect(host)
File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\ftplib.py", line 152, in connect
source_address=self.source_address)
File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\socket.py", line 722, in create_connection
raise err
File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\socket.py", line 713, in create_connection
sock.connect(sa)
[Finished in 21.3s]
When I was developing a scraper in a HTTP website, I solved this problem for http using HTTPProxyAuth and requests. Just to illustrate, that's was the code:
from requests.auth import HTTPProxyAuth
import requests
user = 'xxxx'
password = 'yyyy'
credenciais = HTTPProxyAuth(user, password)
params = {'Dt_Ref': data, 'TpInstFinanceiro': 'CRI', 'Tipo':'1','saida':'txt'}
proxy_access = {'http':'proxy.mywork/accelerated_pac_base.pac'}
url = 'http://www.anbima.com.br/reune/reune_down.asp'
r = requests.post(url, data = params, proxies = proxy_access , auth = credenciais)
Anyone has any idea about what can I do?
Thanks

Downloading second file from ftp fails

I want to download multiple files from FTP in python. the my code works when I just download 1 file, but not works for more than one!
import urllib
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC1790863.tar.gz', 'file1.tar.gz')
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')
An error say:
Traceback (most recent call last):
File "/home/ehsan/dev_center/bigADEVS-bknd/daemons/crawler/ftp_oa_crawler.py", line 3, in <module>
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')
File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
fp = self.open(url, data)
File "/usr/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 558, in open_ftp
(fp, retrlen) = self.ftpcache[key].retrfile(file, type)
File "/usr/lib/python2.7/urllib.py", line 906, in retrfile
conn, retrlen = self.ftp.ntransfercmd(cmd)
File "/usr/lib/python2.7/ftplib.py", line 334, in ntransfercmd
host, port = self.makepasv()
File "/usr/lib/python2.7/ftplib.py", line 312, in makepasv
host, port = parse227(self.sendcmd('PASV'))
File "/usr/lib/python2.7/ftplib.py", line 830, in parse227
raise error_reply, resp
IOError: [Errno ftp error] 200 Type set to I
What should I do?
It is a bug in urllib in python 2.7. Reported here. The reason behind the same is explained here
Now, when a user tries to download the same file or another file from
same directory, the key (host, port, dirs) remains the same so
open_ftp() skips ftp initialization. Because of this skipping,
previous FTP connection is reused and when new commands are sent to
the server, server first sends the previous ACK. This causes a domino
effect and each response gets delayed by one and we get an exception
from parse227()
A possible solution is to clear the cache that may have been built up by previous calls. You may use the urllib.urlcleanup() method calls between your urlretrieve calls for the same, as mentioned here.
Hope this helps!

HTML Link parsing using BeautifulSoup

here is my Python code which I'm using to extract the Specific HTML from the Page links I'm sending as parameter. I'm using BeautifulSoup. This code works fine for sometimes and sometimes it is getting stuck!
import urllib
from bs4 import BeautifulSoup
rawHtml = ''
url = r'http://iasexamportal.com/civilservices/tag/voice-notes?page='
for i in range(1, 49):
#iterate url and capture content
sock = urllib.urlopen(url+ str(i))
html = sock.read()
sock.close()
rawHtml += html
print i
Here I'm printing the loop variable to find out where it is getting stuck. It shows me that it's getting stuck randomly at any of the loop sequence.
soup = BeautifulSoup(rawHtml, 'html.parser')
t=''
for link in soup.find_all('a'):
t += str(link.get('href')) + "</br>"
#t += str(link) + "</br>"
f = open("Link.txt", 'w+')
f.write(t)
f.close()
what could be the possible issue. Is it the problem with the socket configuration or some other issue.
This is the error I got. I checked these links - python-gaierror-errno-11004,ioerror-errno-socket-error-errno-11004-getaddrinfo-failed for the solution. But I didn't find it much helpful.
d:\python>python ext.py
Traceback (most recent call last):
File "ext.py", line 8, in <module>
sock = urllib.urlopen(url+ str(i))
File "d:\python\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "d:\python\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
File "d:\python\lib\urllib.py", line 350, in open_http
h.endheaders(data)
File "d:\python\lib\httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "d:\python\lib\httplib.py", line 893, in _send_output
self.send(msg)
File "d:\python\lib\httplib.py", line 855, in send
self.connect()
File "d:\python\lib\httplib.py", line 832, in connect
self.timeout, self.source_address)
File "d:\python\lib\socket.py", line 557, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed
It's running perfectly fine when I'm running it on my personal laptop. But It's giving error when I'm running it on Office Desktop. Also, My version of Python is 2.7. Hope these information will help.
Finally, guys.... It worked! Same script worked when I checked on other PC's too. So probably the problem was because of the firewall settings or proxy settings of my office desktop. which was blocking this website.

python paramiko module error with callback

I'm trying to use the paramiko module to copy a (big) file in my local network, and get the output to display a GtkProgressBar.
A part of my code is:
...
NetworkCopy.pbar.set_text("Copy of the file in the Pi...")
while gtk.events_pending(): # refresh the progress bar
gtk.main_iteration()
self.connection(transferred, toBeTransferred)
def connection(self, transferred, toBeTransferred):
sftp = self.sftp
fichier_pc = self.fichier_pc
chemin_pi = self.chemin_pi # var names are in french !
fichier = self.fichier
transferred = self.transferred
toBeTransferred = self.toBeTransferred
print "Transferred: {0}\tStill to send: {1}".format(transferred, toBeTransferred)
sftp.put(fichier_pc, chemin_pi + fichier, callback=self.connection)
In the terminal, I can see
Transferred: 0 Still to send: 3762398252
for a while, but after 10s I have this error:
File "network_copier.py", line 158, in connection
sftp.put(fichier_pc, chemin_pi + fichier, callback=self.connection)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 615, in put
return self.putfo(fl, remotepath, os.stat(localpath).st_size, callback, confirm)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 577, in putfo
fr.close()
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_file.py", line 67, in close
self._close(async=False)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_file.py", line 88, in _close
self.sftp._request(CMD_CLOSE, self.handle)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 689, in _request
return self._read_response(num)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 721, in _read_response
raise SSHException('Server connection dropped: %s' % (str(e),))
paramiko.SSHException: Server connection dropped:
I have the 1.12.2 version of paramiko, from this ppa
Thanks for your help
Edit: The solution is to use pexpect instead of paramiko. It's working with big files.
See here

Close urllib2 connection

I'm using urllib2 to load files from ftp- and http-servers.
Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.
from urllib2 import urlopen
from time import sleep
url = 'ftp://user:pass#host/big_file.ext'
def load_file(url):
f = urlopen(url)
loaded = 0
while True:
data = f.read(1024)
if data == '':
break
loaded += len(data)
f.close()
#sleep(1)
print('loaded {0}'.format(loaded))
load_file(url)
load_file(url)
The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:
loaded 463675266
Traceback (most recent call last):
File "conection_test.py", line 20, in <module>
load_file(url)
File "conection_test.py", line 7, in load_file
f = urlopen(url)
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
File "/usr/lib/python2.6/urllib.py", line 854, in __init__
self.init()
File "/usr/lib/python2.6/urllib.py", line 860, in init
self.ftp.connect(self.host, self.port, self.timeout)
File "/usr/lib/python2.6/ftplib.py", line 134, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>
So the first file is loaded and the second fails because the first connection was not closed.
But when i use sleep(1) after f.close() the error does not occurr:
loaded 463675266
loaded 463675266
Is there any way to force close the connection so that the second download would not fail?
The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython.
A colleague proposed this sollution:
fdurl = urllib2.urlopen(req,timeout=self.timeout)
realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later
req = urllib2.Request(url, header)
try:
fdurl = urllib2.urlopen(req,timeout=self.timeout)
except urllib2.URLError,e:
print "urlopen exception", e
realsock.close()
fdurl.close()
The fix is ugly, but does the job, no more "too many open connections".
Biggie: I think it's because the connection is not shutdown().
Note close() releases the resource
associated with a connection but does
not necessarily close the connection
immediately. If you want to close the
connection in a timely fashion, call
shutdown() before close().
You could try something like this before f.close():
import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)
(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)
as for Python 2.7.1 urllib2 indeed leaks a file descriptor:
https://bugs.pypy.org/issue867
Alex Martelli answers to the similar question. Read this : should I call close() after urllib.urlopen()?
In a nutshell:
import contextlib
with contextlib.closing(urllib.urlopen(u)) as x:
# ...

Categories