After installing MySQL and working for almost 1 month, I am facing below warning every time that I try to execute a query within a python script:
ANY HELP TO HANDLE THIS?
(MainThread) # _do_auth(): user: root
(MainThread) # _do_auth(): self._auth_plugin:
(MainThread) new_auth_plugin: caching_sha2_password
(MainThread) # request: b'\xa5\x0c/\x14v\t\x86O\xa8\x84\xc7\x93\x8c8\x1c\xa9\x8b#\xaf\xa1' size: 20
(MainThread) # server response packet: bytearray(b'\x07\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00')
Related
I ran the following command:
$ spark-submit --master yarn --deploy-mode cluster pi.py
So, below log is continuous print:
...
2021-12-23 06:07:50,158 INFO yarn.Client: Application report for application_1640239254568_0002 (state: ACCEPTED)
2021-12-23 06:07:51,162 INFO yarn.Client: Application report for application_1640239254568_0002 (state: ACCEPTED)
...
and I check the result through my 8088(Logs for container web UI), but there is nothing in stdout.
I was disappointed and tried to force the park operation to end, but suddenly the new log is print like below:
...
2021-12-23 06:09:06,694 INFO yarn.Client: Application report for application_1640239254568_0002 (state: RUNNING)
2021-12-23 06:09:06,695 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: master
ApplicationMaster RPC port: 40451
queue: default
start time: 1640239668020
final status: UNDEFINED
tracking URL: http://master2:8088/proxy/application_1640239254568_0002/
user: root
2021-12-23 06:09:07,707 INFO yarn.Client: Application report for application_1640239254568_0002 (state: RUNNING)
...
And after some time, an error log occurred as shown below:
...
2021-12-23 06:10:25,003 INFO retry.RetryInvocationHandler: java.io.EOFException: End of File Exception between local host is: "master/172.17.0.2"; destination host is: "master2":8032; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over rm2. Trying to failover immediately.
2021-12-23 06:10:25,003 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
2021-12-23 06:10:25,004 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From master/172.17.0.2 to master:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over rm1 after 1 failover attempts. Trying to failover after sleeping for 18340ms.
2021-12-23 06:10:43,347 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
...
I understand that sparks have completed the resource manager allocation after work, so it is normal for the above error log to appear.
Q1. Is the above job normal?
Q2. After this work, where can I check the results? Can I check them on "containerlogs web UI"?
IMPORTANT!! ADD. I re-ran the command. and check the status: SUCCEEDED. Why does the park-submit operation sometimes succeed and sometimes stop in the middle?
I'm trying to figure out why timer doesn't wait 300 secs, but raise imediately exception. Anybody can help me to resolve this issue? thanks.
here's my code:
import sys
import os
from java.lang import System
import getopt
import time as systime
from threading import Timer
import time
[...]
def _startServer(ServerName):
cd('domainRuntime:/ServerLifeCycleRuntimes/'+ServerName);
state=_serverstatus(ServerName);
while (state!='RUNNING'):
try:
cmo.start();
while (state!='RUNNING'):
state=_serverstatus(ServerName);
java.lang.Thread.sleep(1000);
t = threading.Timer(300.0,timeout)
t.start()
except:
print 'Error in getting current status of ' +ServerName+ '\n';
print 'Please check logged in user has full access to complete the start operation on ' +ServerName+ '\n';
print 'Timeout, NodeManager may be down. Checking... ' + '\n';
os.popen('sh /home/oracle/scripts/logAnalytics.sh')
systime.sleep(60)
state=_serverstatus(ServerName);
[...]
and the result:
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
172.31.129.68:7001
Connecting to t3://172.31.129.68:7001 with userid weblogic ...
Successfully connected to Admin Server 'AdminServer' that belongs to domain 'cll5_domain'.
Warning: An insecure protocol was used to connect to the
server. To ensure on-the-wire security, the SSL port or
Admin port should be used instead.
Successfully connected to the domain
Location changed to domainRuntime tree. This is a read-only tree with DomainMBean as the root.
For more help, use help(domainRuntime)
Server Server_2 is :SHUTDOWN
Trying To Start Server:Server_2...
Server Server_2 is :SHUTDOWN
Server Server_2 is :SHUTDOWN
Error in getting current status of Server_2
Please check logged in user has full access to complete the start operation on Server_2
Timeout, NodeManager may be down. Checking...
Server Server_2 is :SHUTDOWN
Server Server_2 is :SHUTDOWN
Error in getting current status of Server_2
Please check logged in user has full access to complete the start operation on Server_2
Timeout, NodeManager may be down. Checking...
Server Server_2 is :RUNNING
Disconnected from weblogic server: AdminServer
Exiting WebLogic Scripting Tool.
(Note that the string "Server Server_2 is :SHUTDOWN" must be written on screen several times before the next process start. About 60 times, because there's a while cycle that print the string of server status every 5 secs and the timer is 300 secs.)
Environment: Oracle Linux 6, Oracle Weblogic 11
Any help is apreciated.
So I have a publisher which is using schedule python package to read data from a file and every 5-10 mins and publish each line to a queue.
On the other side I have consumer using something like:
self.connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
self.channel = self.connection.channel()
while True:
method, properties, body = self.channel.basic_get(queue=conf.UNIVERSAL_MESSAGE_QUEUE, no_ack=False)
if body is not None:
self.assign_task(body=body)
self.channel.basic_ack(delivery_tag=method.delivery_tag)
else:
self.logger.info('channel empty')
self.move_to_done()
time.sleep(5)
Assign task function looks like:
def assign_task(body=body):
<do something with the message body>
For some reason after a while it throws the following error:
2017-08-03 15:27:43,756: ERROR: base_connection.py: _handle_error: 335: Socket Error: 10054
2017-08-03 15:27:43,756: WARNING: base_connection.py: _check_state_on_disconnect: 180: Socket closed when connection was open
2017-08-03 15:27:43,756: WARNING: connection.py: _on_disconnect: 1360: Disconnected from RabbitMQ at localhost:5672 (0): Not specified
Essentially both publisher and consumer are 2 different python programs intended to run on a single machine with Windows Server 2012. Can community help understand what might be going wrong here.
The same code runs absolutely fine locally on my windows machine
Following is the output from my log file.
=ERROR REPORT==== 3-Aug-2017::15:06:48 ===
closing AMQP connection <0.617.0> ([::1]:53485 -> [::1]:5672):
missed heartbeats from client, timeout: 60s
Simple answer to this was to create a durable queue and set heartbeat_interval to 0.
I have set heartbeat in Celery settings:
BROKER_HEARTBEAT = 10
I have also set this configuration value in RabbitMQ config:
'heartbeat' => '10',
But somehow heartbeats are still disabled:
ubuntu#sync1:~$ sudo rabbitmqctl list_connections name timeout
Listing connections ...
some_address:37781 -> other_address:5672 0
some_address:37782 -> other_address:5672 0
...done.
Any ideas what am I doing wrong?
UPDATE:
So now I get:
ubuntu#sync1:/etc/puppet$ sudo rabbitmqctl list_connections name timeout
Listing connections ...
some_address:41281 -> other_address:5672 10
some_address:41282 -> other_address:5672 10
some_address:41562 -> other_address:5672 0
some_address:41563 -> other_address:5672 0
some_address:41564 -> other_address:5672 0
some_address:41565 -> other_address:5672 0
some_address:41566 -> other_address:5672 0
some_address:41567 -> other_address:5672 0
some_address:41568 -> other_address:5672 0
...done.
I have 3 servers:
RabbitMQ broker
RESTful API server
Remote Worker server
It appears the remote demonised Celery workers send heartbeats correctly. The RESTful API server using Celery to remotely process tasks is not using heartbeat for some reason.
the heartbeat of celery worker is application level heartbeat, not AMQP protocol's heartbeat.
Each worker periodically send heartbeat event message to "celeryev" event exchange in BROKER.
The heartbeat event is forwarded back to worker such worker can know the health status of BROKER.
If number of loss heartbeat exceeding a threshold, the worker can do some reconnect action to BROKER.
For the rest of detail, you may check this page
The section: BROKER_FAILOVER_STRATEGY describes the actions you may do for dropping from a BROKER.
Celery worker support AMQP heartbeat definitely. The configuration item BROKER_HEARTBEAT is used to define the heartbeat interval of AMQP client(celery worker).
We can find the description of BROKER_HEARTBEAT here Celery Doc!
The possible causes of heartbeat not work:
Use a wrong transport such as 'librabbitmq'
As celery doc described, only 'pyamqp' transport support BROKER_HEARTBEAT.
We need to check whether if librabbitmq package is installed
or we can use 'pyamqp' transport in broker url: 'pyamqp://userid:password#hostname:port/virtual_host' rather than 'amqp://userid:password#hostname:port/virtual_host'
No event send to celery worker during three heartbeat interval after boot up
Check code here to see how heartbeat works!
drain_events will be called during worker boot up, see code here!
If there's no event sent to celery worker, connection.heartbeat_check will not be called.
By the way, connection.heartbeat_check is defined here!
Hopes to help someone encounter the heartbeat issue.
So I am using a RabbitMQ + Celery to create a simple RPC architecture. I have one RabbitMQ message broker and one remote worker which runs Celery deamon.
There is a third server which exposes a thin RESTful API. When it receives HTTP request, it sends a task to the remote worker, waits for response and returns a response.
This works great most of the time. However I have notices that after a longer inactivity (say 5 minutes of no incoming requests), the Celery worker behaves strangely. First 3 tasks received after a longer inactivity return this error:
exchange.declare: connection closed unexpectedly
After three erroneous tasks it works again. If there are not tasks for longer period of time, the same thing happens. Any idea?
My init script for the Celery worker:
# description "Celery worker using sync broker"
console log
start on runlevel [2345]
stop on runlevel [!2345]
setuid richard
setgid richard
script
chdir /usr/local/myproject/myproject
exec /usr/local/myproject/venv/bin/celery worker -n celery_worker_deamon.%h -A proj.sync_celery -Q sync_queue -l info --autoscale=10,3 --autoreload --purge
end script
respawn
My celery config:
# Synchronous blocking tasks
BROKER_URL_SYNC = 'amqp://guest:guest#localhost:5672//'
# Asynchronous non blocking tasks
BROKER_URL_ASYNC = 'amqp://guest:guest#localhost:5672//'
#: Only add pickle to this list if your broker is secured
#: from unwanted access (see userguide/security.html)
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True
CELERY_BACKEND = 'amqp'
# http://docs.celeryproject.org/en/latest/userguide/tasks.html#disable-rate-limits-if-they-re-not-used
CELERY_DISABLE_RATE_LIMITS = True
# http://docs.celeryproject.org/en/latest/userguide/routing.html
CELERY_DEFAULT_QUEUE = 'sync_queue'
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "sync_task.default"
CELERY_QUEUES = {
'sync_queue': {
'binding_key':'sync_task.#',
},
'async_queue': {
'binding_key':'async_task.#',
},
}
Any ideas?
EDIT:
Ok, now it appears to happen randomly. I noticed this in RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2014::17:31:54 ===
closing AMQP connection <0.295.0> (some_ip_address:36842 -> some_ip_address:5672):
connection_closed_abruptly
Is your RabbitMQ server or your Celery worker behind a load balancer by any chance? If yes, then the load balancer is closing the TCP connection after some period of inactivity. In which case, you will have to enable heartbeat from the client (worker) side. If you do, I would not recommend using the pure Python amqp lib for this. Instead, replace it with librabbitmq.
The connection_closed_abruptly is caused when clients disconnecting without the proper AMQP shutdown protocol:
channel.close(...)
Request a channel close.
This method indicates that the sender wants to close the channel.
This may be due to internal conditions (e.g. a forced shut-down) or due to
an error handling a specific method, i.e. an exception.
When a close is due to an exception, the sender provides the class and method id of
the method which caused the exception.
After sending this method, any received methods except Close and Close-OK MUST be discarded. The response to receiving a Close after sending Close must be to send Close-Ok.
channel.close-ok():
Confirm a channel close.
This method confirms a Channel.Close method and tells the recipient
that it is safe to release resources for the channel.
A peer that detects a socket closure without having received a
Channel.Close-Ok handshake method SHOULD log the error.
Here is an issue about that.
Can you set your custom configuration for BROKER_HEARTBEAT and BROKER_HEARTBEAT_CHECKRATE and check again, for example:
BROKER_HEARTBEAT = 10
BROKER_HEARTBEAT_CHECKRATE = 2.0