Downloading second file from ftp fails - python

I want to download multiple files from FTP in python. the my code works when I just download 1 file, but not works for more than one!
import urllib
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC1790863.tar.gz', 'file1.tar.gz')
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')
An error say:
Traceback (most recent call last):
File "/home/ehsan/dev_center/bigADEVS-bknd/daemons/crawler/ftp_oa_crawler.py", line 3, in <module>
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')
File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
fp = self.open(url, data)
File "/usr/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 558, in open_ftp
(fp, retrlen) = self.ftpcache[key].retrfile(file, type)
File "/usr/lib/python2.7/urllib.py", line 906, in retrfile
conn, retrlen = self.ftp.ntransfercmd(cmd)
File "/usr/lib/python2.7/ftplib.py", line 334, in ntransfercmd
host, port = self.makepasv()
File "/usr/lib/python2.7/ftplib.py", line 312, in makepasv
host, port = parse227(self.sendcmd('PASV'))
File "/usr/lib/python2.7/ftplib.py", line 830, in parse227
raise error_reply, resp
IOError: [Errno ftp error] 200 Type set to I
What should I do?

It is a bug in urllib in python 2.7. Reported here. The reason behind the same is explained here
Now, when a user tries to download the same file or another file from
same directory, the key (host, port, dirs) remains the same so
open_ftp() skips ftp initialization. Because of this skipping,
previous FTP connection is reused and when new commands are sent to
the server, server first sends the previous ACK. This causes a domino
effect and each response gets delayed by one and we get an exception
from parse227()
A possible solution is to clear the cache that may have been built up by previous calls. You may use the urllib.urlcleanup() method calls between your urlretrieve calls for the same, as mentioned here.
Hope this helps!

Related

Python: Uploading files FTP_TLS- "550 The parameter is incorrect"

I'm trying to connect to an FTP server using TLS and upload a text file. The below code connects to the site just fine, but it's not uploading the file. Instead I'm getting the following error:
Traceback (most recent call last):
File "X:/HR & IT/Ryan/Python Scripts/ftps_connection_test.py", line 16, in <module>
ftps.storlines("STOR " + filename, open(filename,"r"))
File "C:\Python33\lib\ftplib.py", line 816, in storlines
with self.transfercmd(cmd) as conn:
File "C:\Python33\lib\ftplib.py", line 391, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "C:\Python33\lib\ftplib.py", line 756, in ntransfercmd
conn, size = FTP.ntransfercmd(self, cmd, rest)
File "C:\Python33\lib\ftplib.py", line 357, in ntransfercmd
resp = self.sendcmd(cmd)
File "C:\Python33\lib\ftplib.py", line 264, in sendcmd
return self.getresp()
File "C:\Python33\lib\ftplib.py", line 238, in getresp
raise error_perm(resp)
ftplib.error_perm: 550 The parameter is incorrect.
There's probably something really basic I'm missing, my code is below and any help is much appreciated.
import os
from ftplib import FTP_TLS as f
# Open secure connection
ftps = f("ftp.foo.com")
ftps.login(username,password)
ftps.prot_p()
# Create the test txt file to upload
filename = r"c:\path\to\file"
testFile = open(filename,"w")
testFile.write("Test file with test text")
testFile.close()
# Transfer testFile
ftps.storlines("STOR " + filename, open(filename,"r"))
# Quit connection
ftps.quit()
I have got the same error when trying to write upload a file to FTP server. In my case, the destination file name is not the correct format. It was something like
data_20180411T12:00:12.3435Z.txt
I renamed something like
data_20180411T120012_3435Z.txt. Then it worked.
filename = r"c:\path\to\file"
is the absolute path to a local file. This same value is being passed in the STOR command, i.e.
ftps.storlines("STOR " + filename, open(filename,"r"))
attempts to perform a STOR c:\path\to\file operation, however, it is unlikely that the path exists on the remote server, and the ftplib.error_perm exception would suggest that you don't have permission to write there (even if it does exist).
You could try this instead:
ftps.storlines("STOR " + os.path.basename(filename), open(filename,"r"))
which would issue a STOR file operation and upload the file to the default directory on the remote server. If you need to upload to a different path on the remote server, just add that to STOR.

python paramiko module error with callback

I'm trying to use the paramiko module to copy a (big) file in my local network, and get the output to display a GtkProgressBar.
A part of my code is:
...
NetworkCopy.pbar.set_text("Copy of the file in the Pi...")
while gtk.events_pending(): # refresh the progress bar
gtk.main_iteration()
self.connection(transferred, toBeTransferred)
def connection(self, transferred, toBeTransferred):
sftp = self.sftp
fichier_pc = self.fichier_pc
chemin_pi = self.chemin_pi # var names are in french !
fichier = self.fichier
transferred = self.transferred
toBeTransferred = self.toBeTransferred
print "Transferred: {0}\tStill to send: {1}".format(transferred, toBeTransferred)
sftp.put(fichier_pc, chemin_pi + fichier, callback=self.connection)
In the terminal, I can see
Transferred: 0 Still to send: 3762398252
for a while, but after 10s I have this error:
File "network_copier.py", line 158, in connection
sftp.put(fichier_pc, chemin_pi + fichier, callback=self.connection)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 615, in put
return self.putfo(fl, remotepath, os.stat(localpath).st_size, callback, confirm)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 577, in putfo
fr.close()
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_file.py", line 67, in close
self._close(async=False)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_file.py", line 88, in _close
self.sftp._request(CMD_CLOSE, self.handle)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 689, in _request
return self._read_response(num)
File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 721, in _read_response
raise SSHException('Server connection dropped: %s' % (str(e),))
paramiko.SSHException: Server connection dropped:
I have the 1.12.2 version of paramiko, from this ppa
Thanks for your help
Edit: The solution is to use pexpect instead of paramiko. It's working with big files.
See here

Python FTP download not working

I am trying to download a file from FTP using python. I was able to successfully move into the directory but can't download the file.
The command I use is ftp.retrbinary('master.idx', open(fname,'wb').write)
And error is below. It looks like the command is looking for MASTER.IDX instead of master.idx
The full path to the file I want to download is ftp://ftp.sec.gov/edgar/full-index/2011/QTR2/master.idx
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/ftplib.py", line 406, in retrbinary
conn = self.transfercmd(cmd, rest)
File "/usr/lib/python2.7/ftplib.py", line 368, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "/usr/lib/python2.7/ftplib.py", line 331, in ntransfercmd
resp = self.sendcmd(cmd)
File "/usr/lib/python2.7/ftplib.py", line 244, in sendcmd
return self.getresp()
File "/usr/lib/python2.7/ftplib.py", line 219, in getresp
raise error_perm, resp
ftplib.error_perm: 500 MASTER.IDX not understood
I can't say why the name changes to uppercase. In any case, when using FTP, I make like this, it may help you:
server = "URL.of.server"
directory = "directory/where/the/file/is"
filename = "nameoffile.txt"
from ftplib import FTP
ftp = FTP(server) #Set server address
ftp.login() # Connect to server
ftp.cwd(directory) # Move to the desired folder in server
ftp.retrbinary('RETR ' + filename,open(filename, 'wb').write) # Download file from server
ftp.close() # Close connection
I think that it may be the 'RETR ', if you don't write, it the server may not understand what you want to do
use wget module of python instead. Here is an example snippet
import wget
fileloc = '/path/to/the/file/foo.txt'
wget.download(fileloc)

Python ftplib - retrbinary fails with timeout for zero byte files

Using Python 2.6 and downloading files from an FTP server in passive mode, I found that retrbinary fails with a timeout if the source file is empty (0 bytes). Is this a bug or am I missing a configuration option?
ftp.retrbinary('RETR digital.conf', open('digital.conf','wb').write)
Downloading digital.conf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "download.py", line 13, in run
ftp.retrbinary('RETR %s' % source, callback)
File "c:\Python26\lib\ftplib.py", line 398, in retrbinary
conn = self.transfercmd(cmd, rest)
File "c:\Python26\lib\ftplib.py", line 360, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "c:\Python26\lib\ftplib.py", line 337, in ntransfercmd
resp = self.getresp()
File "c:\Python26\lib\ftplib.py", line 216, in getresp
raise error_temp, resp
ftplib.error_temp: 421 Timeout
Other non-zero byte files transfer fine.
This is Your session idle time too long.You can file after the President into instantiate ftplib. Otherwise. Modify ftp software configuration.
For example, you use vsftpd, you can add the following configuration to vsftpd.conf:
idle_session_timeout=60000 # The default is 600 seconds

Close urllib2 connection

I'm using urllib2 to load files from ftp- and http-servers.
Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.
from urllib2 import urlopen
from time import sleep
url = 'ftp://user:pass#host/big_file.ext'
def load_file(url):
f = urlopen(url)
loaded = 0
while True:
data = f.read(1024)
if data == '':
break
loaded += len(data)
f.close()
#sleep(1)
print('loaded {0}'.format(loaded))
load_file(url)
load_file(url)
The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:
loaded 463675266
Traceback (most recent call last):
File "conection_test.py", line 20, in <module>
load_file(url)
File "conection_test.py", line 7, in load_file
f = urlopen(url)
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
File "/usr/lib/python2.6/urllib.py", line 854, in __init__
self.init()
File "/usr/lib/python2.6/urllib.py", line 860, in init
self.ftp.connect(self.host, self.port, self.timeout)
File "/usr/lib/python2.6/ftplib.py", line 134, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>
So the first file is loaded and the second fails because the first connection was not closed.
But when i use sleep(1) after f.close() the error does not occurr:
loaded 463675266
loaded 463675266
Is there any way to force close the connection so that the second download would not fail?
The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython.
A colleague proposed this sollution:
fdurl = urllib2.urlopen(req,timeout=self.timeout)
realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later
req = urllib2.Request(url, header)
try:
fdurl = urllib2.urlopen(req,timeout=self.timeout)
except urllib2.URLError,e:
print "urlopen exception", e
realsock.close()
fdurl.close()
The fix is ugly, but does the job, no more "too many open connections".
Biggie: I think it's because the connection is not shutdown().
Note close() releases the resource
associated with a connection but does
not necessarily close the connection
immediately. If you want to close the
connection in a timely fashion, call
shutdown() before close().
You could try something like this before f.close():
import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)
(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)
as for Python 2.7.1 urllib2 indeed leaks a file descriptor:
https://bugs.pypy.org/issue867
Alex Martelli answers to the similar question. Read this : should I call close() after urllib.urlopen()?
In a nutshell:
import contextlib
with contextlib.closing(urllib.urlopen(u)) as x:
# ...

Categories