Airflow worker - Connection broken: IncompleteRead(0 bytes read) - python

Using Airflow worker and webserver/scheduler as a Docker images running on Kubernetes Engine on EC2
We have a task which has KubernetesPodOperator which is resource intensive and runs every 15min.
Got these error as email in airflow-worker
Try 2 out of 3
Exception:
('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Log: Link
Host: airflow-worker-deployment-123456789
Log file: /usr/local/airflow/logs/DAG_NAME/TASK_NAME/2019-03-14T10:50:00+00:00.log
Mark success: Link
Any idea what It can be ?

so, better late than never
it is because of known bug in KubernetesPodOperator.
to avoid this behavior you have to set operators get_logs parameter to False. default value is True.
details here
https://issues.apache.org/jira/browse/AIRFLOW-3534
https://issues.apache.org/jira/browse/AIRFLOW-5571

Well, it means that thge program logic didn't get the data it expected to receive from some socket. This can mean anything, from an intermittent network problem to the data simply not arriving in time and the logic not programmed to wait for it. IF the task is automatically retried, you may not even need to worry about intermittent problems.
If you wish to diagnose this further, you need to gather some diagnostic information. Problems are always diagnosed by the same scenario:
Identify the exact place in the program when the problem manifests itself.
Examine the program's state at that moment and find out which of the values is wrong.
Trace the erroneous value back to its origin.
The first can be identified by the stack trace and/or searching the codebase for relevant logic. The second -- debugging or debug printing. The 3rd is usually done by rerunning the program with breakpoints set earlier, at the step in the logic that produces the erroneous value; in your case, you can only do that very slowly, by waiting for the problem to happen again, so you are forced to do educated guesses from the codebase.

Related

Process finished with exit code -1073740791 (0xC0000409) pycharm error

I am trying to use fastText with PyCharm. Whenever I run below code:
import fastText
model=fastText.train_unsupervised("data_parsed.txt")
model.save_model("model")
The process exits with this error:
Process finished with exit code -1073740791 (0xC0000409)
What causes this error and what can be done to avoid it?
Are you using a windows system? 0xC0000409 means stack buffer overflow as seen in this windows help link.
Below is some advice that is taken from this link to solve similar type of issues.
STATUS_STACK_BUFFER_OVERRUN is a /GS exception. They are thrown when Windows detects 'tampering' of a security cookie protecting a return address. It is probable that you are writing something past the end of a buffer, or writing something to a pointer that is pointing to the wrong place. However it is also possible that you have some dodgy memory or otherwise faulty hardware that is tripping validation code.
One thing that you could try is to disable the /GS switch (project properties, look for C/C++ -> Code Generation -> Buffer Security Check) and recompile. Running the code again may well cause an error that you can trap and trace. I think /GS is designed not to give you any info for security reasons.
Another thing you could do is run the code as is on a different PC and see if that fails, this may point to a hardware problem if it doesn't.
Other strategies are reduce the size of the training file by removing some text and reducing the size of the vocabulary by running some text normalisation.

socket.timeout; Explanation?

I am building a port scanning program ((irrelevant to the question, just explaining the background)), and I know the IP of the host, but not what ports are open. Hence, the scan.
It is in the early stages of development, so the error handling is bad, but not bad enough to make why Python does this explainable.
It tries to connect to, say, 123.456.7.8, 1. Obviously it's a ridiculous port to be open, so it throws an error. The error is No Route to Host or the such, right? Wrong! It is instead Operation Timed Out!.
Okay, let's increase the timeout in case my calculations were incorrect.
.
..
...
....All that did was rinse and repeat!
About 20 minutes later, the timeout is at 20 seconds, and it still is timing out. Really? Why does python raise a timed out error though, instead of No route to host! or similar?
I need to distinguish between time outs and connection failures, because there is a difference between late and nowhere. This prevents me from doing so, creating an infinite loop of hurry up and wait.
Whatever shall I do? Wherever shall I go?
Python socket module is a thin wrapper around your platform's socket API. The issue is unrelated to Python.
It is not necessary that you get No Route to Host error. Moreover it is common that a firewall just drops received packets (for a filtered port) that may manifest as a timeout error in your code. See Drop vs. Reject (ignore the conclusion but read the explanation of what is happening).
To workaround, make multiple concurrent connections and set a fixed timeout or use raw-sockets and send the packets yourself (you could use scapy, to investigate the behavior).

Python socket wait

I was wondering if there is a way I can tell python to wait until it gets a response from a server to continue running.
I am writing a turn based game. I make the first move and it sends the move to the server and then the server to the other computer. The problem comes here. As it is no longer my turn I want my game to wait until it gets a response from the server (wait until the other player makes a move). But my line:
data=self.sock.recv(1024)
hangs because (I think) it's no getting something immediately. So I want know how can I make it wait for something to happen and then keep going.
Thanks in advance.
The socket programming howto is relevant to this question, specifically this part:
Now we come to the major stumbling block of sockets - send and recv operate on the
network buffers. They do not necessarily handle all the bytes you hand them (or expect
from them), because their major focus is handling the network buffers. In general, they
return when the associated network buffers have been filled (send) or emptied (recv).
They then tell you how many bytes they handled. It is your responsibility to call them
again until your message has been completely dealt with.
...
One complication to be aware of: if your conversational protocol allows multiple
messages to be sent back to back (without some kind of reply), and you pass recv an
arbitrary chunk size, you may end up reading the start of a following message. You’ll
need to put that aside >and hold onto it, until it’s needed.
Prefixing the message with it’s length (say, as 5 numeric characters) gets more complex,
because (believe it or not), you may not get all 5 characters in one recv. In playing
around, you’ll get away with it; but in high network loads, your code will very quickly
break unless you use two recv loops - the first to determine the length, the second to
get the data part of the message. Nasty. This is also when you’ll discover that send
does not always manage to get rid of everything in one pass. And despite having read
this, you will eventually get bit by it!
The main takeaways from this are:
you'll need to establish either a FIXED message size, OR you'll need to send the the size of the message at the beginning of the message
when calling socket.recv, pass number of bytes you actually want (and I'm guessing you don't actually want 1024 bytes). Then use LOOPs because you are not guaranteed to get all you want in a single call.
That line, sock.recv(1024), blocks until 1024 bytes have been received or the OS detects a socket error. You need some way to know the message size -- this is why HTTP messages include the Content-Length.
You can set a timeout with socket.settimeout to abort reading entirely if the expected number of bytes doesn't arrive before a timeout.
You can also explore Python's non-blocking sockets using setblocking(0).

PySerial write timed out -- how much data went through?

I have two applications interacting over a TCP/IP connection; now I need them to be able to interact over a serial connection as well.
There are a few differences between socket IO and serial IO that make porting less trivial than I hoped for.
One of the differences is about the semantics of send/write timeouts and the assumptions an application may make about the amount of data successfully passed down the connection. Knowing this amount the application also knows what leftover data it needs to transmit later should it choose so.
Socket.send
A call like socket.send(string) may produce the following results:
The entire string has been accepted by the TCP/IP stack, and the
length of the string is returned.
A part of the string has been accepted by the TCP/IP stack, and the
length of that part is returned. The application may transmit the
rest of the string later.
A socket.timeout exception is raised if the socket is configured to
use timeouts and the sender overwhelms the connection with data.
This means (if I understand it correctly) that no bytes of the
string have been accepted by the TCP/IP stack and hence the
application may try to send the entire string later.
A socket.error exception is raised because of some issues with the
connection.
PySerial.Serial.write
The PySerial API documentation says the following about Serial.write(string):
write(data)
Parameters:
data – Data to send.
Returns:
Number of bytes written.
Raises
SerialTimeoutException:
In case a write timeout is configured for the port and the time is exceeded.
Changed in version 2.5: Write returned None in previous versions.
This spec leaves a few questions uncertain to me:
In which circumstances may "write(data)" return fewer bytes written
than the length of the data? Is it only possible in the non-blocking
mode (writeTimeout=0)?
If I use a positive writeTimeout and the SerialTimeoutException is
raised, how do I know how many bytes went into the connection?
I also observe some behaviors of serial.write that I did not expect.
The test tries sending a long string over a slow connection. The sending port uses 9600,8,N,1 and no flow control. The receiving port is open too but no attempts to read data from it are being made.
If the writeTimeout is positive but not large enough the sender expectedly
gets the SerialTimeoutException.
If the writeTimeout is set large enough the sender expectedly gets all data written
successfully (the receiver does not care to read, neither do we).
If the writeTimeout is set to None, the sender unexpectedly gets the SerialTimeoutException
instead of blocking until all data goes down the connection. Am I missing something?
I do not know if that behavior is typical.
In case that matters, I experiment with PySerial on Windows 7 64-bit using two USB-to-COM adapters connected via a null-modem cable; that setup seems to be operational as two instances of Tera Term can talk to each other over it.
It would be helpful to know if people handle serial write timeouts in any way other than aborting the connection and notifying the user of the problem.
Since I currently do not know the amount of data written before the timeout has occurred, I am thinking of a workaround using non-blocking writes and maintaining the socket-like timeout semantics myself above that level. I do not expect this to be a terrifically efficient solution (:-)), but luckily my applications exchange relatively infrequent and short messages so the performance should be within the acceptable range.
[EDITED]
A closer look at non-blocking serial writes
I wrote a simple program to see if I understand how the non-blocking write works:
import serial
p1 = serial.Serial("COM11") # My USB-to-COM adapters appear at these high port numbers
p2 = serial.Serial("COM12")
message = "Hello! " * 10
print "%d bytes in the whole message: %r" % (len(message), message)
p1.writeTimeout = 0 # enabling non-blocking mode
bytes_written = p1.write(message)
print "Written %d bytes of the message: %r" % (bytes_written, message[:bytes_written])
print "Receiving back %d bytes of the message" % len(message)
message_read_back = p2.read(len(message))
print "Received back %d bytes of the message: %r" % (len(message_read_back), message_read_back)
p1.close()
p2.close()
The output I get is this:
70 bytes in the whole message: 'Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! '
Written 0 bytes of the message: ''
Receiving back 70 bytes of the message
Received back 70 bytes of the message: 'Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! Hello! '
I am very confused: the sender thinks no data was sent yet the receiver got it all. I must be missing something very fundamental here...
Any comments / suggestions / questions are very welcome!
Since it isn't documented, let's look at the source code. I only looked at the POSIX and Win32 implementations, but it's pretty obvious that on at least those two platforms:
There are no circumstances when write(data) may return fewer bytes written than the length of the data, timeout or otherwise; it always either returns the full len(data), or raises an exception.
If you use a positive writeTimeout and the SerialTimeoutException is raised, there is no way at all to tell how many bytes were sent.
In particular, on POSIX, the number of bytes sent so far is only stored on a local variable that's lost as soon as the exception is raised; on Windows, it just does a single overlapped WriteFile and raises an exception for anything but a successful "wrote everything".
I assume that you care about at least one of those two platforms. (And if not, you're probably not writing cross-platform code, and can look at the one platform you do care about.) So, there is no direct solution to your problem.
If the workaround you described is acceptable, or a different one (like writing exactly one byte at a time—which is probably even less efficient, but maybe simpler), do that.
Alternatively, you will have to edit the write implementations you care about (whether you do this by forking the package and editing your fork, monkeypatching Serial.write at runtime, or just writing a serial_write function and calling serial_write(port, data) instead of port.write(data) in your script) to provide the information you want.
That doesn't look too hard. For example, in the POSIX version, you just have to stash len(data)-t somewhere before either of the raise writeTimeoutError lines. You could stick it in an attribute of the Serial object, or pass it as an extra argument to the exception constructor. (Of course if you're trying to write a cross-platform program, and you don't know all of the platforms well enough to write the appropriate implementations, that isn't likely to be a good answer.)
And really, given that it's not that hard to implement what you want, you might want to add a feature request (ideally with a patch) on the pyserial tracker.

Python select.select, select.poll: Corrupted Double-linked List

I have a rather large client-server network application, written in Python. I'm using select.poll to provide asynchronous capabilities. For the past six months, everything has worked fine. However, recently I changed some things and allowed the client to reliably log-off from the server. It appeared at first glance that the client was never receiving the request, and furthermore, it was blocking. When I killed the process with , I received the following output:
*** glibc detected *** /usr/bin/python: corrupted double-linked list: 0x0a9fea60 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x6cbe1)[0xd96be1]
/lib/i386-linux-gnu/libc.so.6(+0x6fc1c)[0xd99c1c]
/lib/i386-linux-gnu/libc.so.6(__libc_malloc+0x63)[0xd9b1d3]
/usr/lib/i386-linux-gnu/libxcb.so.1(+0x8ff6)[0xb30ff6]
/usr/lib/i386-linux-gnu/libxcb.so.1(+0x706d)[0xb2f06d]
/usr/lib/i386-linux-gnu/libxcb.so.1(+0x75b5)[0xb2f5b5]
/usr/lib/i386-linux-gnu/libxcb.so.1(xcb_writev+0x67)[0xb2f667]
/usr/lib/i386-linux-gnu/libX11.so.6(_XSend+0x14b)[0x59b42b]
/usr/lib/i386-linux-gnu/libX11.so.6(_XFlush+0x39)[0x59b889]
/usr/lib/i386-linux-gnu/libX11.so.6(XFlush+0x31)[0x57ba81]
/usr/lib/libSDL-1.2.so.0(+0x34dfe)[0x16adfe]
/usr/lib/libSDL-1.2.so.0(+0x37998)[0x16d998]
/usr/lib/libSDL-1.2.so.0(+0x393db)[0x16f3db]
/usr/lib/libSDL-1.2.so.0(SDL_PumpEvents+0x3d)[0x140d7d]
/usr/lib/libSDL-1.2.so.0(SDL_PollEvent+0x17)[0x140db7]
/usr/lib/libSDL-1.2.so.0(SDL_EventState+0x58)[0x140f78]
/usr/lib/libSDL-1.2.so.0(SDL_JoystickEventState+0x5b)[0x16810b]
/usr/lib/python2.7/dist-packages/pygame/joystick.so(+0x196d)[0x55896d]
/usr/lib/python2.7/dist-packages/pygame/base.so(+0x178a)[0x56078a]
/usr/lib/python2.7/dist-packages/pygame/base.so(+0x17c7)[0x5607c7]
/usr/bin/python(PyEval_EvalFrameEx+0x4332)[0x80de822]
/usr/bin/python(PyEval_EvalCodeEx+0x127)[0x80e11e7]
/usr/bin/python[0x8105a61]
/usr/bin/python(PyObject_Call+0x4a)[0x80a464a]
/usr/bin/python(PyEval_CallObjectWithKeywords+0x44)[0x80da034]
/usr/bin/python(Py_Finalize+0xc7)[0x8070ee1]
/usr/bin/python(Py_Main+0xc66)[0x805c109]
/usr/bin/python(main+0x1b)[0x805b25b]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0xd40e37]
/usr/bin/python[0x81074ad]
followed by a memory map, which I'm not posting for the sake of brevity. I ran the code under PDB, and found that the client was blocking on the call to pollingObject.poll(0), which shouldn't be blocking. So, I changed that call to select.select([socket], [], [], 0), still without success. I'm using PyGame, if that makes a difference, as I know it sometimes does. I'm completely lost here. I know that Python overrides malloc, could it have something to do with that?
I managed to fix it by implementing the network code in C and calling it from Python.
It looks to me like PyGame is checking for input events after the X connection has been closed, due to finalizers. Calling anything in Xlib with a Display * that's already been passed to XCloseDisplay means accessing already-freed memory, of course, and if that's what's going on it isn't surprising that glibc's heap becomes corrupted.
If my diagnosis is correct, you won't be able to truly fix it at the application level, but producing a minimal test case and submitting it to the PyGame developers might be productive.

Categories