Python FTP "ERRNO 10054" Sequential File Download

Python FTP "ERRNO 10054" Sequential File Download - python

I've written some code to log on to a an AS/400 FTP site, move to a certain directory, and locate files I need to download. It works, but it seems like when there are MANY files to download I receive:
socket.error: [Errno 10054] An existing connection was
forcibly closed by the remote host
I log on and navigate to the appropriate directory successfully:
try:
newSession = ftplib.FTP(URL,username,password)
newSession.set_debuglevel(3)
newSession.cwd("SOME DIRECTORY")
except ftplib.all_errors, e:
print str(e).split(None,1)
sys.exit(0)
I grab a list of the files I need:
filesToDownload= filter(lambda x: "SOME_FILE_PREFIX" in x, newSession.nlst())
And here is where it is dying (specifically the newSession.retrbinary('RETR '+f,tempFileVar.write)):
for f in filesToDownload:
newLocalFileName = f + ".edi"
newLocalFilePath = os.path.join(directory,newLocalFileName)
tempFileVar = open(newLocalFilePath,'wb')
newSession.retrbinary('RETR '+f,tempFileVar.write)
tempFileVar.close()
It downloads upwards of 85% of the files I need before I'm hit with the Errno 10054 and I guess I'm just confused as to why it seems to arbitrarily die when so close to completion. My honest guess right now is too many requests to the FTP when trying to pull these files.
Here's a screenshot of the error as it appears on my command prompt:
Any advice or pointers would be awesome. I'm still trying to troubleshoot this.

There's no real answer to this I suppose, it seems like the client's FTP is at fault here it's incredibly unstable. Best I can do is a hacky work around catching the thrown socket error and resuming where I left off in the previous session before being forcibly disconnected. Client's IT team is looking into the problem on their end finally.
Sigh.

Related

pysftp and paramiko stop uploading files after a few seconds

I am using python 3.4 and pysftp , (pysftp suspected to be working on 3.4)
Pysftp is a wrapper over paramiko.
I have no problem downloading files.
I can also upload small files.
When i am uploading files that take longer than a few seconds to complete however i get an error.
I monitored my interent connection, after about 3 seconds there is no more uploading taking place.
after ~5 minutes i recieve an EOFError
I also experimented with the paramiko module with the same results.
I can upload files using open ssh as well as filezilla without problem.
with pysftp.Connection(host="host",username="python",
password="pass",port=2222) as srv:
print('server connected')
srv.put(file_name)
I would like to be able to upload files greater than a few kb... what am i missing?

It seems that paramiko is not adjusting window during file uploading. You can increase window_size manualy:
with pysftp.Connection(host="host",username="python",
password="pass",port=2222) as srv:
print('server connected')
channel = srv.sftp_client.get_channel()
channel.lock.acquire()
channel.out_window_size += os.stat(file_name).st_size
channel.out_buffer_cv.notifyAll()
channel.lock.release()
srv.put(file_name)
It works for me, but sometimes it is not enough for large files so I add some extra bytes. I think, that some packets may be lost and it depends on the connection.

python s3 boto connection.close causes an error

I have code that writes files to s3. The code was working fine
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(BUCKET, validate=False)
k = Key(bucket)
k.key = self.filekey
k.set_metadata('Content-Type', 'text/javascript')
k.set_contents_from_string(json.dumps(self.output))
k.set_acl(FILE_ACL)
This was working just fine. Then I noticed I wasn't closing my connection so I added this line at the end:
conn.close()
Now, the file writes as before but, I'm seeing this error in my logs now
S3Connection instance has no attribute '_cache', unable to write file
Anyone see what I'm doing wrong here or know what's causing this? I noticed that none of the tutorials on boto show people closing connections but I know you should close your connections for IO operations as a general rule...
EDIT
A note about this, when I comment out conn.close() the error disappears

I can't find that error message in the latest boto source code, so unfortunately I can't tell you what caused it. Recently, we had problems when we were NOT calling conn.close(), so there definitely is at least one case where you must close the connection. Here's my understanding of what's going on:
S3Connection (well, its parent class) handles almost all connectivity details transparently, and you shouldn't have to think about closing resource, reconnecting, etc.. This is why most tutorials and docs don't mention closing resources. In fact, I only know of one situation where you should close resources explicitly, which I describe at the bottom. Read on!
Under the covers, boto uses httplib. This client library supports HTTP 1.1 Keep-Alive, so it can and should keep the socket open so that it can perform multiple requests over the same connection.
AWS will close your connection (socket) for two reasons:
According to the boto source code, "AWS starts timing things out after three minutes." Presumably "things" means "idle connections."
According to Best Practices for Using Amazon S3, "S3 will accept up to 100 requests before it closes a connection (resulting in 'connection reset')."
Fortunately, boto works around the first case by recycling stale connections well before three minutes are up. Unfortunately, boto doesn't handle the second case quite so transparently:
When AWS closes a connection, your end of the connection goes into CLOSE_WAIT, which means that the socket is waiting for the application to execute close(). S3Connection handles connectivity details so transparently that you cannot actually do this directly! It's best to prevent it from happening in the first place.
So, circling back to the original question of when you need to close explicitly, if your application runs for a long time, keeps a reference to (reuses) a boto connection for a long time, and makes many boto S3 requests over that connection (thus triggering a "connection reset" on the socket by AWS), then you may find that more and more sockets are in CLOSE_WAIT. You can check for this condition on linux by calling netstat | grep CLOSE_WAIT. To prevent this, make an explicit call to boto's connection.close before you've made 100 requests. We make hundreds of thousands of S3 requests in a long running process, and we call connection.close after every, say, 80 requests.

Python ftplib connection error (gaierror)

I am trying to make a very basic FTP client in python, and within the first few lines of code I have already run into a problem
My Code:
from ftplib import FTP
ftp = FTP('ftp.mysite.com')
With this code, and with countless different urls used, I will always get the same error:
gaierror: [Errno 11004] getaddrinfo failed

I found myself here with this error trying to connect using the full path rather than just the hostname. Make sure you split that out and use cwd(path) after login().
For example:
ftp = FTP('ftp.ncdc.noaa.gov')
ftp.login()
ftp.cwd('pub/data/noaa/2013')
instead of:
# Doesn't work!!
ftp = FTP('ftp.ncdc.noaa.gov/pub/data/noaa')
ftp.login()
ftp.cwd('2013')
Kind of obvious in hindsight, but hopefully I help you notice your simple mistake!

Actually, this means that your computer can't resolve the domain name that you gave it. A detailed error description is available here. Try to use a well-known working FTP to test (e.g. ftp.microsoft.com). Then try to open the FTP you're trying to access with some FTP client.

How can I debug Errno 10061 from smtplib in Python?

At work, I have tools that send emails when they fail. I have a blacklist, which I put myself on, so errors I make don't send anything but I recently needed to update this code and in testing it, found that I can't send emails anymore.
Everyone else is able to send emails, just not me. I am able to use my regular email client (outlook) and I'm able to ping the mail server. The IT guys aren't familiar with python and as far as they can tell, my profile settings look fine. Just the simple line of server = smtplib.SMTP(smtpserver) fails with:
# error: [Errno 10061] No connection could be made because the target machine actively refused it #
For info's sake, I'm running Vista with python 2.6, though nearly everyone else is running almost identical machines with no issues. We are all running the same security settings and the same anti-virus software, though disabling them still doesn't work.
Does anyone have any ideas on how to figure out why this is happening? Maybe there is some bizarre setting on my profile in the workgroup? I'm at a loss on how to figure this out. All my searching leads to other people that didn't start their server or something like that but this is not my case.

You might want to use Ethereal to see what's going on. It may give you more information.

Python urllib2.urlopen bug: timeout error brings down my Internet connection?

I don't know if I'm doing something wrong, but I'm 100% sure it's the python script brings down my Internet connection.
I wrote a python script to scrape thousands of files header info, mainly for Content-Length to get the exact size of each file, using HEAD request.
Sample code:
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
response = urllib2.urlopen(HeadRequest("http://www.google.com"))
print response.info()
The thing is after several hours running, the script starts to throw out urlopen error timed out, and my Internet connection is down from then on. And my Internet connection will always be back on immediately after I close that script. At the beginning I thought it might be the connection not stable, but after several times running, it turned out to be the scripts fault.
I don't know why, this should be considered as a bug, right? Or my ISP banned me for doing such things? (I already set the program to wait 10s each request)
BTW, I'm using VPN network, does it have something to do with this?

I'd guess that either your ISP or VPN provider is limiting you because of high-volume suspicious traffic, or your router or VPN tunnel is getting clogged up with half-open connections. Consumer internet is REALLY not intended for spider-type activities.

"the script starts to throw out urlopen error timed out"
We can't even begin to guess.
You need to gather data on your computer and include that data in your question.
Get another computer. Run your script. Is the other computer's internet access blocked also? Or does it still work?
If both computers are blocked, it's not your software, it's your provider. Update Your Question with this information, and how you got it.
If only the computer running the script is stopped, it's not your provider, it's your OS resources being exhausted. This is harder to diagnose because it could be memory, sockets or file descriptors. Usually its sockets.
You need to find some ifconfig/ipconfig diagnostic software for your operating system. You need to update your question to state exactly what operating system you're using. You need to use this diagnostic software to see how many open sockets are cluttering up your system.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.