Faust-Streaming Crashes with error "Partition is not assigned" - python

We have recently switched to faust-streaming(0.6.9) from faust 1.10.4. Post this we have seen the applications crashing with the below exception. The application has multiple layers with aggregation and filtering of data at each stage. At each stage, the processor sends the message to Kafka topic and the respective faust app agent consumes the message. But we have kept the partition count the same, for the Kafka topic, at each layer.
Cluster Size = 12
Topic & Table Parition count = 36
faust-streaming version = 0.6.9
kafka-python version = 2.0.2
[2021-07-29 10:05:23,761] [18808] [ERROR] [^---Fetcher]: Crashed reason=AssertionError(‘Partition is not assigned’)
Traceback (most recent call last):
File “/usr/local/lib/python3.8/site-packages/mode/services.py”, line 802, in _execute_task
await task
File “/usr/local/lib/python3.8/site-packages/faust/transport/consumer.py”, line 176, in _fetcher
await consumer._drain_messages(self)
File “/usr/local/lib/python3.8/site-packages/faust/transport/consumer.py”, line 1104, in _drain_messages
async for tp, message in ait:
File “/usr/local/lib/python3.8/site-packages/faust/transport/consumer.py”, line 714, in getmany
highwater_mark = self.highwater(tp)
File “/usr/local/lib/python3.8/site-packages/faust/transport/consumer.py”, line 1367, in highwater
return self._thread.highwater(tp)
File “/usr/local/lib/python3.8/site-packages/faust/transport/drivers/aiokafka.py”, line 923, in highwater
return self._ensure_consumer().highwater(tp)
File “/usr/local/lib/python3.8/site-packages/aiokafka/consumer/consumer.py”, line 673, in highwater
assert self._subscription.is_assigned(partition), \
AssertionError: Partition is not assigned
[2021-07-29 10:05:23,764] [18808] [INFO] [^Worker]: Stopping...
[2021-07-29 10:05:23,765] [18808] [INFO] [^-App]: Stopping...
Please help us here.

Related

Airflow error while using pandas - Task received SIGTERM signal

I'm getting this SIGTerm error on Airflow 1.10.11 using LocalExecutor.
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
The dag task is doing this:
reading some data from SQL Server (on Windows) to a pandas dataframe.
And then it writes it to a file (it doesn't even get to this part).
The strange thing is if I limit the number of rows to return in the query (say TOP 100), the dag succeeds.
If I run the python code in my machine locally, it succeeds. I'm using pyodbc and sqlalchemy. It fails on this line after only 20 or 30 seconds:
df_query_results = pd.read_sql(sql_query, engine)
Airflow log
[2020-09-21 10:26:51,210] {{helpers.py:325}} INFO - Sending Signals.SIGTERM to GPID xxx
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
[2020-09-21 10:26:51,804] {{taskinstance.py:1150}} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/airflow/dags/operators/sql_to_avro.py", line 39, in execute
df_query_results = pd.read_sql(sql_query, engine)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 436, in read_sql
chunksize=chunksize,
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 1231, in read_query
data = result.fetchall()
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1216, in fetchall
e, None, None, self.cursor, self.context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1478, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1211, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1161, in _fetchall_impl
return self.cursor.fetchall()
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 957, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
[2020-09-21 10:26:51,813] {{taskinstance.py:1194}} INFO - Marking task as FAILED.
EDIT:
I missed this earlier, but there is a warning message about the hostname.
WARNING - The recorded hostname da2mgrl001d1.mycompany.corp does not match this instance's hostname airflow-mycompany-dev.i.mct360.com
I had a Linux/network engineer help out. Unfortunately, I don't know the full details but the fix was they changed the hostname_callable setting in airflow.cfg to hostname_callable = socket:gethostname. It was previously set to socket:getfqdn
Note: I found a couple different (maybe related?) questions where this was the resolution.
How to fix the error "AirflowException("Hostname of job runner does not match")"?
https://stackoverflow.com/a/59108743/220997

Is it possible to deploy a daml smart contract with bots to Hyperledger Fabric?

I deployed quickstart tutorial based on the example "daml-on-fabric" https://github.com/hacera/daml-on-fabric and after that i tried to deploy the pingpong example from dazl https://github.com/digital-asset/dazl-client/tree/master/samples/ping-pong. The bots from the example works fine on daml ledger. However, when i try to deploy this example on fabric the bots are unable to send the transactions. Everything works fine based on this read me from https://github.com/hacera/daml-on-fabric/blob/master/README.md. The smart contract look like to be deployed on Fabric. The error is when i try to use the bots from pingpong python files https://github.com/digital-asset/dazl-client/blob/master/samples/ping-pong/README.md
I receive this error:
[ ERROR] 2020-03-10 15:40:57,475 | dazl | A command submission failed!
Traceback (most recent call last):
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/dazl/client/_party_client_impl.py", line 415, in main_writer
await submit_command_async(client, p, commands)
File "/home/vasisiop/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/dazl/protocols/v1/grpc.py", line 42, in <lambda>
lambda: self.connection.command_service.SubmitAndWait(request))
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/grpc/_channel.py", line 824, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/grpc/_channel.py", line 726, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Party not known on ledger"
debug_error_string = "{"created":"#1583847657.473821297","description":"Error received from peer ipv6:[::1]:6865","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Party not known on ledger","grpc_status":3}"
>
[ ERROR] 2020-03-10 15:40:57,476 | dazl | An event handler in a bot has thrown an exception!
Traceback (most recent call last):
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/dazl/client/bots.py", line 157, in _handle_event
await handler.callback(new_event)
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/dazl/client/_party_client_impl.py", line 415, in main_writer
await submit_command_async(client, p, commands)
File "/home/vasisiop/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/dazl/protocols/v1/grpc.py", line 42, in <lambda>
lambda: self.connection.command_service.SubmitAndWait(request))
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/grpc/_channel.py", line 824, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/vasisiop/.local/share/virtualenvs/ping-pong-sDNeps76/lib/python3.7/site-packages/grpc/_channel.py", line 726, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Party not known on ledger"
debug_error_string = "{"created":"#1583847657.473821297","description":"Error received from peer ipv6:[::1]:6865","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Party not known on ledger","grpc_status":3}"
From the error message it looks like the parties defined in the quick start example have not been allocated on the ledger, hence the "Party not known on ledger" error.
You can follow the steps in https://docs.daml.com/deploy/index.html with use of daml deploy --host= --port=, which will both upload the dars and allocate the parties on the ledger.
You can also run just the allocate party command daml ledger allocate-parties, which will allocate based on the parties defined you your daml.yaml.

Why doesn't Heroku like my folder structure?

So this Discord Bot works perfectly fine when I run it lokal on my PC. But when I push it to Heroku i get the following error log.
When I try to delete "Styles" (the part mentioned in the log down below) in the render.py the Bot goes online but is not working (of course).
heroku[worker.1]: Starting process with command python src/main.py
heroku[worker.1]: State changed from starting to up
heroku[worker.1]: Process exited with status 1
heroku[worker.1]: State changed from up to crashed
app[worker.1]: Traceback (most recent call last):
app[worker.1]: File "src/main.py", line 9, in <module>
app[worker.1]: from render import RenderStats
app[worker.1]: File "/app/src/render.py", line 6, in <module>
app[worker.1]: class RenderStats():
app[worker.1]: File "/app/src/render.py", line 29, in RenderStats
app[worker.1]: 'titles': ImageFont.truetype("fonts/MyriadPro-Bold.otf", 10),
app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/PIL/ImageFont.py", line 546, in truetype
app[worker.1]: return freetype(font)
app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/PIL/ImageFont.py", line 543, in freetype
app[worker.1]: return FreeTypeFont(font, size, index, encoding, layout_engine)
app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/PIL/ImageFont.py", line 161, in init
app[worker.1]: font, size, index, encoding, layout_engine=layout_engine
app[worker.1]: OSError: cannot open resource
Is the folder structure the problem?
Heroku isn't made for hosting Discord Bots, but to answer your question the issue might be Heroku's ephemeral filesystem. As Heroku explains here you may host static files (such as fonts) dedicated, they might get deleted upon Herokus daily cycling.

Google Pub/Sub Python Subscriber gets ALREADY_EXISTS after 10 minutes

I deployed a Python subscriber Sunday morning. The subscriber restarts every day. Starting today (3 days later), it experiences an ALREADY_EXISTS error 10-20 minutes after starting.
I've restarted it several times now. Every time it runs fine, retrieves previous messages and processes correctly. Then about 10-20 minutes later it dies again. At the time it dies, it didn't receive any messages and nothing significant happened.
Subscriber code
project_id = '***'
subscription = 'pop'
subscription_id = 'projects/{}/subscriptions/{}'.format(project_id, subscription)
def subscribe(callback):
subscriber = pubsub.SubscriberClient()
subscription = subscriber.subscribe(subscription_id)
future = subscription.open(callback)
future.result()
Error
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 54, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/grpc/_channel.py", line 341, in _next
raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.ABORTED, The operation was aborted.)>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../monitors/bautopops.py", line 45, in <module>
main()
File "../monitors/bautopops.py", line 41, in main
gmail.start_pubsub_subscription(callback)
File "/home/ec2-user/bautopops/gmail.py", line 270, in start_pubsub_subscription
pubsub.subscribe(pubsub_callback)
File "/home/ec2-user/bautopops/pubsub.py", line 19, in subscribe
future.result()
File "/usr/local/lib/python3.6/site-packages/google/cloud/pubsub_v1/futures.py", line 103, in result
raise err
File "/usr/local/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 336, in _blocking_consume
for response in responses:
File "/usr/local/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 462, in _pausable_iterator
yield next(iterator)
File "/usr/local/lib64/python3.6/site-packages/grpc/_channel.py", line 347, in __next__
return self._next()
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 2, in raise_from
google.api_core.exceptions.Aborted: 409 The operation was aborted.
Guest worker-005 exited with error A process has ended with a probable error condition: process ended with exit code 1.
I found in the Google Pub/Sub Reference that error 409 means ALREADY_EXISTS:
The topic or subscription already exists. This is an error on creation operations.
This really doesn't help since I am not creating a subscription. I am only subscribing to one already created, as described in the Google Cloud Docs.
Lastly, I found this Github Issue which experiences the same error, but under totally different circumstances. It does not seem to be related to my problem.
Please help.

thrift with tornado example server raise Exception

I am running the offical thrift py:tornado demo, exception was raised after the client close the transport.
example: https://github.com/apache/thrift/tree/master/tutorial/py.tornado
Starting the server...
ping()
add(1, 1)
zip()
zip()
calculate(1, Work(comment=None, num1=1, num2=0, op=4))
calculate(1, Work(comment=None, num1=15, num2=10, op=2))
getStruct(1)
ERROR:thrift.TTornado:thrift exception in handle_stream
Traceback (most recent call last):
File "/Users/user/venv/py27/lib/python2.7/site-packages/thrift/TTornado.py", line 174, in handle_stream
frame = yield trans.readFrame()
File "/Users/user/venv/py27/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/Users/user/venv/py27/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/Users/user/venv/py27/lib/python2.7/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(*exc_info)
File "/Users/user/venv/py27/lib/python2.7/site-packages/thrift/TTornado.py", line 141, in readFrame
raise gen.Return(frame)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/user/venv/py27/lib/python2.7/site-packages/thrift/TTornado.py", line 125, in io_exception_context
message=str(e))
TTransportException: Stream is closed
Is there any way to avoid this error msg or how to catch it?
many reasons would cause StreamClosedError.
check the thrift server side, there may raise an exception.
after test, I found thrift THttpServer cannot serve Tornado client stream.
and, when I upgrade my tornado version from 4.4.3 to 4.5, the StreamClosedError disappear.
client side Thrift version: 0.10.0
Tornado version: 4.5
client side Python version: 3.5.2
System version: Ubuntu 16.04

Categories