Unable to Poll for Binary Messages with `kafka-python` - python

I have a Kafka topic that is receiving binary data (raw packet capture data). I can confirm that it is indeed landing data using the Kafka CLI tools. I receive multiple messages each second.
kafka-console-consumer.sh --zookeeper svr:2181 --topic test
But when I use kafka-python, I cannot retrieve any messages. The poll method simply returns no results.
(Pdb) consumer = kafka.KafkaConsumer("test", bootstrap_servers=["svr:9092"])
(Pdb) consumer.poll(5000)
{}
I have been able to use kafka-python to pull messages from a separate topic that contains just text strings.
I am curious if somehow internally kafka-python is dropping the messages because they are binary and failing some sort of validation. How can I dig deeper and see why no messages can be retrieved?

The problem was that the data sent to the topic was using snappy compression. All I had to do was install an additional module to handle snappy.
pip install python-snappy
Unfortunately, using the code I outlined in the question, it simply returns no data rather than telling me that the issue is related to compression.
For comparison, I used the older consumer API which does correctly report the problem and led me to this solution.
>>> client = kafka.SimpleClient("svr:9092")
>>> consumer.close()
>>> consumer = kafka.SimpleConsumer(client, "group", "test")
>>> for message in consumer:
... print(message)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/kafka/consumer/simple.py", line 353, in __iter__
message = self.get_message(True, timeout)
File "/usr/lib/python2.7/site-packages/kafka/consumer/simple.py", line 305, in get_message
return self._get_message(block, timeout, get_partition_info)
File "/usr/lib/python2.7/site-packages/kafka/consumer/simple.py", line 320, in _get_message
self._fetch()
File "/usr/lib/python2.7/site-packages/kafka/consumer/simple.py", line 379, in _fetch
fail_on_error=False
File "/usr/lib/python2.7/site-packages/kafka/client.py", line 665, in send_fetch_request
KafkaProtocol.decode_fetch_response)
File "/usr/lib/python2.7/site-packages/kafka/client.py", line 295, in _send_broker_aware_request
for payload_response in decoder_fn(future.value):
File "/usr/lib/python2.7/site-packages/kafka/protocol/legacy.py", line 212, in decode_fetch_response
for partition, error, highwater_offset, messages in partitions
File "/usr/lib/python2.7/site-packages/kafka/protocol/legacy.py", line 219, in decode_message_set
inner_messages = message.decompress()
File "/usr/lib/python2.7/site-packages/kafka/protocol/message.py", line 121, in decompress
assert has_snappy(), 'Snappy decompression unsupported'
AssertionError: Snappy decompression unsupported

Related

can't able to connect to osquery daemon using python

I am trying to use evented tables of osquery using python but I am getting an exception. How can I use evented tables.
import osquery
if __name__=="__main__":
instance= osquery.ExtensionClient('\\.\pipe\osquery.em')
instance.open()
while True:
client=instance.extension_client()
results=client.query("SELECT * FROM ntfs_journal_events;")
if(results.response):
print(results.response)
break
instance.connection=None
The error I am getting is:
Traceback (most recent call last):
File "C:\Users\Yash\OneDrive - Incrux Technologies Private Limited\Desktop\Incrux\osquery3.py", line 11, in
results=client.query("SELECT * FROM _events;")
File "C:\Users\Yash\AppData\Local\Programs\Python\Python310\lib\site-packages\osquery\extensions\ExtensionManager.py", line 181, in query
self.send_query(sql)
File "C:\Users\Yash\AppData\Local\Programs\Python\Python310\lib\site-packages\osquery\extensions\ExtensionManager.py", line 190, in send_query
self._oprot.trans.flush()
File "C:\Users\Yash\AppData\Local\Programs\Python\Python310\lib\site-packages\thrift\transport\TTransport.py", line 179, in flush
self.__trans.write(out)
File "C:\Users\Yash\AppData\Local\Programs\Python\Python310\lib\site-packages\osquery\TPipe.py", line 126, in write
raise TTransportException(
thrift.transport.TTransport.TTransportException: Called read on non-open pipe
Called read on non-open pipe sounds like osquery isn't listening on that pipe. Is osquery running? Are you sure that's the socket path?

GUnicorn and shared dictionary on REST API: "Ran out of input" Error on high load

I am using a manager.dict to synchronize some data between multiple workers of an API served with GUnicorn (with Meinheld workers). While this works fine for a few concurrent queries, it breaks when I fire about 100 queries simultaneously at the API and I get displayed the following stack trace:
2020-07-16 12:35:38,972-app.api.my_resource-ERROR-140298393573184-on_post-175-Ran out of input
Traceback (most recent call last):
File "/app/api/my_resource.py", line 163, in on_post
results = self.do_something(a, b, c, **d)
File "/app/user_data/data_lookup.py", line 39, in lookup_something
return (a in self._shared_dict
File "<string>", line 2, in __contains__
File "/usr/local/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod
kind, result = conn.recv()
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
EOFError: Ran out of input
2020-07-16 12:35:38,972-app.api.my_resource-ERROR-140298393573184-on_post-175-unpickling stack underflow
Traceback (most recent call last):
File "/app/api/my_resource.py", line 163, in on_post
results = self.do_something(a, b, c, **d)
File "/app/user_data/data_lookup.py", line 39, in lookup_something
return (a in self._shared_dict
File "<string>", line 2, in __contains__
File "/usr/local/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod
kind, result = conn.recv()
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
_pickle.UnpicklingError: unpickling stack underflow
My API framework is falcon. I have a dictionary containing user data that can be updated via POST requests. The architecture should be simple, so I chose Manager.dict() from the multiprocessing package to store the data. When doing other queries, this some input will be checked against the contents of this dictionary (if a in self._shared_dict: ...). This is where the above-mentioned errors occur.
Why is this problem happening? It seems to be tied to the manager.dict. Besides, when I do debugging in PyCharm, it also happens that the debugger does not evaluate any variables and often just hangs infinitely somewhere in multiprocessing code waiting for data.
It seems to have something to do with the Meinheld workers. When I configure GUnicorn to use the default sync worker class, this error does not occur anymore. Hence, Python multiprocessing and the Meinheld package seem not to work well in my setting.

Trying to connect to coap resource with python library

So im trying to connect to a CoaP Resource using this python library https://github.com/chrysn/aiocoap . The library uses python 3.4 and i have 3.4 installed and set as the interpreter to use with this (Im on Windows 7 btw). Still im getting this error message, when executing the clientGET.py file. Same for the server file.
C:\Python34\python.exe C:/Learning/PyCoap/aiocoap/clientGET.py
Traceback (most recent call last):
File "C:/Learning/PyCoap/aiocoap/clientGET.py", line 34, in <module>
asyncio.get_event_loop().run_until_complete(main())
File "C:\Python34\lib\asyncio\base_events.py", line 268, in run_until_complete
return future.result()
File "C:\Python34\lib\asyncio\futures.py", line 277, in result
raise self._exception
File "C:\Python34\lib\asyncio\tasks.py", line 236, in _step
result = next(coro)
File "C:/Learning/PyCoap/aiocoap/clientGET.py", line 20, in main
protocol = yield from Context.create_client_context()
File "C:\Learning\PyCoap\aiocoap\aiocoap\protocol.py", line 510, in create_client_context
transport, protocol = yield from loop.create_datagram_endpoint(protofact, family=socket.AF_INET6)
File "C:\Python34\lib\asyncio\base_events.py", line 675, in create_datagram_endpoint
waiter)
File "C:\Python34\lib\asyncio\selector_events.py", line 68, in _make_datagram_transport
address, waiter, extra)
File "C:\Python34\lib\asyncio\selector_events.py", line 911, in __init__
super().__init__(loop, sock, protocol, extra)
File "C:\Python34\lib\asyncio\selector_events.py", line 452, in __init__
self._extra['sockname'] = sock.getsockname()
OSError: [WinError 10022] Ein ungultiges Argument wurde angegeben
Process finished with exit code 1
I didn't explore this in a real Python, as I don't have a Windows machine with Python 3.4 handy, but it seems to me that this could be a bug in asyncio. Its UDP socket creation probably simply doesn't work on Windows. Do some experimenting on the lower level, looking at what aiocoap is doing, and try to prove me wrong.
It's supposed to work, documentation only mentions ProactorEventLoop as not supporting UDP.
The error condition is described in Socket.error: Invalid Argument supplied .
aiocoap.protocol.Context.create_client_context() seems to be doing the right thing according to asyncio documentation, but _SelectorTransport.__init__() will always call sock.getsockname() before any packets are sent, at which point the socket will not be bound to an address (according to the linked SO question) and getsockname() will fail on Windows.
You might want to retry with a current version of Python and aiocoap (current development version, after 0.4a1). Windows used not to be supported in aiocoap, and is still not supporting all of CoAP, but now uses a socket implementation that is aware of some limitations in the Windows socket API.

passing file instance as an argument to the celery task raises "ValueError: I/O operation on closed file"

I need to pass file as argument to the celery task, but the passed file somehow got there closed. It happens just in case I'm executing the task asynchronous way.
Is this an expected behavior?
views:
from engine.tasks import s3_upload_handler
def myfunc():
f = open('/app/uploads/pic.jpg', 'rb')
s3_file_handler.apply_async(kwargs={"uploaded_file" : f,"file_name" : "test.jpg"})
tasks:
def s3_upload_handler(uploaded_file,file_name):
...
#some code for uploading to s3
traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/app/photohosting/engine/tasks.py", line 34, in s3_upload_handler
key.set_contents_from_file(uploaded_file)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1217, in set_contents_from_file
spos = fp.tell()
ValueError: I/O operation on closed file
flower logs:
kwargs {
'file_name': 'test.jpg',
'uploaded_file': <closed file '<uninitialized file>',
mode '<uninitialized file>' at 0x7f6ab9e75e40>
}
Yes, of course, the file would get there closed. Asynchronous celery tasks run in a completely separate process (moreover, they can even run on a different machine) and there is no way to pass an open file to it.
You should close the file in the process from where you call the task, and then pass its name and maybe position in file (if you need it) to the task and then reopen it in the task.
Another way of doing this would be to open the file and get a binary blob which you transfer over the wire. Of course if the file is really large then what #Vasily says is better but wont work in case of the worker running on a different m/c (unless your file is on a shared storage).

SUDS Exception Imported Schema Failed

I'm getting the error:
Exception: imported schema (http://www.w3.org/2001/XMLSchema) at
(http://www.w3.org/2001/XMLSchema.x sd), failed
when passing a Doctor (constructed with ImportDoctor) to the suds Client constructor.
I'm working on two Windows machines, both of them got the same version of suds installed, but only one of them rises the error above.
Could someone guide me here to know why this error rises?, so I can figure out what's missing on the machine where it happens?.
Thanks in advance!!!.
UPDATE: I don't really know if this is important, but it's worth noting that my Windows machine that is rising the error is an Amazon Web Services instance. At my local machine everything's working well!.
UPDATE: Here's some code I ran at the python interpreter of the machine I mentioned. Here you can detail how the error is rising...
>>> from suds.client import Client
>>> from suds.xsd.doctor import ImportDoctor, Import
>>> missing_import = Import("http://www.w3.org/2001/XMLSchema")
>>> missing_import.filter.add("http://tempuri.org/")
>>> doctor = ImportDoctor(missing_import)
>>> client = Client("http://etcfulfill.ebooks.com/Fulfillment.asmx?wsdl")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "suds\client.py", line 112, in __init__
self.wsdl = reader.open(url)
File "suds\reader.py", line 152, in open
d = self.fn(url, self.options)
File "suds\wsdl.py", line 159, in __init__
self.build_schema()
File "suds\wsdl.py", line 220, in build_schema
self.schema = container.load(self.options)
File "suds\xsd\schema.py", line 95, in load
child.dereference()
File "suds\xsd\schema.py", line 323, in dereference
midx, deps = x.dependencies()
File "suds\xsd\sxbasic.py", line 422, in dependencies
raise TypeNotFound(self.ref)
suds.TypeNotFound: Type not found: '(schema, http://www.w3.org/2001/XMLSchema, )'
>>> client = Client("http://etcfulfill.ebooks.com/Fulfillment.asmx?wsdl", doctor=doctor)
No handlers could be found for logger "suds.xsd.sxbasic"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "suds\client.py", line 112, in __init__
self.wsdl = reader.open(url)
File "suds\reader.py", line 152, in open
d = self.fn(url, self.options)
File "suds\wsdl.py", line 159, in __init__
self.build_schema()
File "suds\wsdl.py", line 220, in build_schema
self.schema = container.load(self.options)
File "suds\xsd\schema.py", line 93, in load
child.open_imports(options)
File "suds\xsd\schema.py", line 305, in open_imports
imported = imp.open(options)
File "suds\xsd\sxbasic.py", line 542, in open
result = self.download(options)
File "suds\xsd\sxbasic.py", line 567, in download
raise Exception(msg)
Exception: imported schema (http://www.w3.org/2001/XMLSchema) at (http://www.w3.org/2001/XMLSchema.xsd), failed
UPDATE:
I realized that suds connections always open in TCP increasing ports, and if it reaches the maximum TCP port (65535) then it starts opening again from the minimum TCP port available, so there's no problem with this.
The problem shows up when using suds ImportDoctor, because it has to open a previous connection to the location where the import should be retrieved, and for some reason, if the system reaches the maximum TCP port count, then suds somehow assumes that there's no TCP port available to open the connection for obtaining the import, and in consecuence it throws the exception:
Exception: imported schema (http://www.w3.org/2001/XMLSchema) at (http://www.w3.org/2001/XMLSchema.xsd), failed
I repeat, this only happens if suds has to open this previous connection for obtaining the import. If ImportDoctor is not used, then suds has no problem if the TCP port count reaches its maximum, it just restarts at the minimum port available.
Does anyone has any clue on how to resolve this issue???. I'd really appreciate the help!!!.
I've figured out what the problem was. The schema that was missing from the WSDL I was trying to use with suds was:
http://www.w3.org/2001/XMLSchema
And the XSD file for this schema is at:
http://www.w3.org/2001/XMLSchema.xsd
So when I used suds ImportDoctor to add this schema import, sometimes the w3.org domain was denying my access (don't know why really) and that's why this error was rising:
Exception: imported schema (http://www.w3.org/2001/XMLSchema) at (http://www.w3.org/2001/XMLSchema.xsd), failed
What did I do to solve this problem?. I just downloaded this schema to my machine and used suds ImportDoctor to retrieve this import locally.
And that was it!!!. Confusing bug!!!. But SOLVED.

Categories