How to handle AllServersUnavailable Exception - python

I wanted to do a simple write operation to a Cassandra instance (v1.1.10) on a single node. I just wanted to see how it handles constant writes and if it can keep up with the write speed.
pool = ConnectionPool('testdb')
test_cf = ColumnFamily(pool,'test')
test2_cf = ColumnFamily(pool,'test2')
test3_cf = ColumnFamily(pool,'test3')
test_batch = test_cf.batch(queue_size=1000)
test2_batch = test2_cf.batch(queue_size=1000)
test3_batch = test3_cf.batch(queue_size=1000)
chars=string.ascii_uppercase
counter = 0
while True:
counter += 1
uid = uuid.uuid1()
junk = ''.join(random.choice(chars) for x in range(50))
test_batch.insert(uid, {'junk':junk})
test2_batch.insert(uid, {'junk':junk})
test3_batch.insert(uid, {'junk':junk})
sys.stdout.write(str(counter)+'\n')
pool.dispose()
The code keeps crushing after a long write (when the counter is around 10M+) with the following message
pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out
I set the queue_size=100 which didn't help. Also I fired up the cqlsh -3 console to truncate the table after the script crashed and got the following error:
Unable to complete request: one or more nodes were unavailable.
Tailing /var/log/cassandra/system.log gives no error sign but INFO on Compaction, FlushWriter and so on. What am I doing wrong?

I've had this problem too - as #tyler-hobbs suggested in his comment the node is likely overloaded (it was for me). A simple fix that I've used is to back-off and let the node catch up. I've rewritten your loop above to catch the error, sleep a while and try again. I've run this against a single node cluster and it works a treat - pausing (for a minute) and backing off periodically (no more than 5 times in a row). No data is missed using this script unless the error throws five times in a row (in which case you probably want to fail hard rather than return to the loop).
while True:
counter += 1
uid = uuid.uuid1()
junk = ''.join(random.choice(chars) for x in range(50))
tryCount = 5 # 5 is probably unnecessarily high
while tryCount > 0:
try:
test_batch.insert(uid, {'junk':junk})
test2_batch.insert(uid, {'junk':junk})
test3_batch.insert(uid, {'junk':junk})
tryCount = -1
except pycassa.pool.AllServersUnavailable as e:
print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down"
time.sleep(60) # A delay of 60s is probably unnecessarily high
tryCount = tryCount - 1
sys.stdout.write(str(counter)+'\n')
I've added a complete gist here

Related

Picking up where program left off after error encountered

I am running a program row-wise on a pandas dataframe that takes a long time to run.
The problem is, the VPN connection to the database can suddenly be lost, so I lose all my progress.
Currently, what I am doing is splitting the large dataframe into smaller chunks (500 rows at a time), and running the program on each chunk in a for loop. The result of the processing of each chunk is saved to my hard drive.
However, the chunks are still 500 rows each, so I can still lose a lot of progress when the connection is lost. Plus, I have to manually check to see where I got up to and adjust the code to pick up where the connection was lost.
What is the best way to write the code to "remember" which row the program is up to and pick up exactly where it left off once I re-establish the connection?
Current:
size = 500
list_of_dfs = np.split(large_df, range(size, len(large_df), size))
together_list = []
for count, chunk in enumerate(list_of_dfs):
# Process
chunk_processed = process_chunk(chunk)
chunk_processed.to_csv(f"processed_{count}.csv")
together_list.append(chunk_processed)
# merge lists together into one df
all_chunks_together = pd.concat(together_list)
Thanks in advance
You could use the existing csv files to remember where to pick up:
size = 500
list_of_dfs = np.split(large_df, range(size, len(large_df), size))
together_list = []
for count, chunk in enumerate(list_of_dfs):
csv_file = f"processed_{count}.csv"
if os.path.isfile(csv_file):
chunk_processed = from_csv(csv_file)
else:
chunk_processed = process_chunk(chunk)
chunk_processed.to_csv(csv_file)
together_list.append(chunk_processed)
# merge lists together into one df
all_chunks_together = pd.concat(together_list)
You would still have to re-start your program manually every time it loses the connection. To avoid this, you could catch the exception (assuming you're getting one on connection loss) and continue like in this example:
import random
random.seed(64)
l = []
while len(l) < 3:
try:
l = []
for n in range(3):
l.append(n)
x = 1 / random.randint(0,1) # div by 0 error with 50% probability
except:
print("error, trying again")
pass
print(l)
which yields
error, trying again
error, trying again
error, trying again
error, trying again
error, trying again
error, trying again
error, trying again
[0, 1, 2]
The downside of this approach is that you potentially re-read the csv files quite often. But assuming this is fast and you can wait, it may be fine. At least you would have no manual work to do anymore.

Lists not resetting after loop has been executed

I am trying create a loop that resets all the data inside of it every iteration but somewhere the data is not resetting even though I am initializing values inside of the loop.
Here is my code:
import time , sys , string
import serial
ser = serial.Serial("/dev/ttyUSB0", baudrate=9600,
parity=serial.PARITY_NONE,
stopbits=serial.STOPBITS_ONE,
bytesize=serial.EIGHTBITS
)
print (ser.name)
ser.write(b'U')
#print ('test1')
time.sleep(2)
b=0
while (b <= 2):
time.sleep(0.25)
ser.write(b'R')
#print ('test2')
d = [] # establishes an empty list to iterate through all the incoming data
i=0 # beginning of the buffer
time.sleep(0.5)
while (i<=11):
d.append(str(ord(ser.read()))) #adding each incoming bit to the list
#print (d[i])
#print ('^ is the ' + str(i) + 'th term') // these two lines help establish where the useful information begins for the lidar.
i+=1
#establishing a better way to write the data out / gd = good distance
gd = []
a = 0
for a in range(8):
gd.append(str(chr(int(d[a+4]))))
print (gd)
print (gd[0] + gd[1] + gd[2] + gd[3] + gd[4] + gd[5] + gd[6] + gd[7] + ' mm')
ser.flush()
b+=1
The reason i do d[a+4] is because the first few bits of information are nonsense so I need it to start from that bit every time.
The program works in the first loop and correctly prints it out, however, in subsequent loops it begins to start from different points when I try to print out the values again. I am unsure if I am missing something and would love a second opinion.
My outputs are:
D = 0609 mm
\r:\rD = 0 mm
mm \r:\rD mm
so it's looping around the lists I'm creating somewhere and since it's taking in the string from the print statement I wonder if that has something to do with the issue.
I cannot add a comment, so here is a little suggestion. At the end of your main loop, after
b+=1
add the following line
d, gd = [], []
if they are no longer useful after the loop has ended (which I suspect is the case). It'll reset any value the both variables hold in them and you'll be able to start from empty lists again.

Results won't be written into the txt file

Even though I'm using file.flush() at the end, no data will be written into the txt file.
# some essential code to connect to the server
while True:
try:
# do some stuff
try:
gaze_positions = filtered_surface['gaze_on_srf']
for gaze_pos in gaze_positions:
norm_gp_x, norm_gp_y = gaze_pos['norm_pos']
if (0 <= norm_gp_x <= 1 and 0 <= norm_gp_y <= 1):
with open('/the/path/to/the/file.txt', 'w') as file:
file.write('[' + norm_gp_x + ', ' + norm_gp_y + ']')
file.flush()
print(norm_gp_x, norm_gp_y)
except:
pass
except KeyboardInterrupt:
break
What am I doing wrong? Obvisouly I miss something, but I can't figure it out, what it is. Another odd thing: there's even no output for print(norm_gp_x, norm_gp_y). If I put the with open ... in a comment, I'll get the output.
got it:
First
if (0 <= norm_gp_x <= 1 and 0 <= norm_gp_y <= 1):
then:
file.write('[' + norm_gp_x + ', ' + norm_gp_y + ']')
So you're adding strings and integers. This triggers an exception, and since you used an universal except: pass construct, the code skips every iteration (note that this except statement also catches the KeyboardInterrupt exception you're trying to catch at a higher level, so that doesn't work either)
Never use that construct. If you want to protect about a specific exception (ex: IOError), use:
try IOError as e:
print("Warning: got exception {}".format(e))
so your exception is 1) focused and 2) verbose. Always wait until you get exceptions that you want to ignore to ignore them, selectively (read Catch multiple exceptions in one line (except block))
So the fix for your write it:
file.write('[{},{}]'.format(norm_gp_x, norm_gp_y))
or using the list representation since you're trying to mimic it:
file.write(str([norm_gp_x, norm_gp_y]))
Aside: your other issue is that you should use append mode
with open('/the/path/to/the/file.txt', 'a') as file:
or move you open statement before the loop, else you'll only get the last line in the file (a classic) since w mode truncates the file when opening. And you can drop flush since exiting the context closes the file.

Stuff isn't appending to my list

I'm trying to create a simulation where there are two printers and I find the average wait time for each. I'm using a class for the printer and task in my program. Basically, I'm adding the wait time to each of each simulation to a list and calculating the average time. My issue is that I'm getting a division by 0 error so nothing is being appended. When I try it with 1 printer (Which is the same thing essentially) I have no issues. Here is the code I have for the second printer. I'm using a queue for this.
if printers == 2:
for currentSecond in range(numSeconds):
if newPrintTask():
task = Task(currentSecond,minSize,maxSize)
printQueue.enqueue(task)
if (not labPrinter1.busy()) and (not labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter1.busy()) and (labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter2.busy()) and (labPrinter1.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter2.startNext(nexttask)
labPrinter1.tick()
labPrinter2.tick()
averageWait = sum(waitingtimes)/len(waitingtimes)
outfile.write("Average Wait %6.2f secs %3d tasks remaining." \
%(averageWait,printQueue.size()))
Any assistance would be great!
Edit: I should mention that this happens no matter the values. I could have a page range of 99-100 and a PPM of 1 yet I still get divided by 0.
I think your problem stems from an empty waitingtimes on the first iteration or so. If there is no print job in the queue, and there has never been a waiting time inserted, you are going to reach the bottom of the loop with waitingtimes==[] (empty), and then do:
sum(waitingtimes) / len(waitingtimes)
Which will be
sum([]) / len([])
Which is
0 / 0
The easiest way to deal with this would just be to check for it, or catch it:
if not waitingtimes:
averageWait = 0
else:
averageWait = sum(waitingtimes)/len(waitingtimes)
Or:
try:
averageWait = sum(waitingtimes)/len(waitingtimes)
except ZeroDivisionError:
averageWait = 0

Python: how to interrupt, then return to while loop, without goto?

I'm running a simple PID control program in python. Basically an infinite while loop, which reads from sensors then calculates the appropriate control signal, as well as outputs diagnostic info to the terminal.
However, sometimes while watching the diagnostic info, I'd like to change the PID coefficients - which are essentially some constants used by the loop - by breaking from the loop, accepting user input, then returning to the very same loop. I'd like to do this an arbitrary number of times.
With 'goto' this would be simple and easy and just what I want. Can someone give me some python pseudo-code to do this? I can't really think of how to do it. I can interrupt the loop with a CTRL+C exception handler, but then I can't get back to the main loop.
There must be some very simple way to do this but I can't think of it. Thoughts?
Snippets from my code:
while True:
t0 = get_temp_deg_c(thermocouple1)
print "Hose Temperature = " + str(t0) + " deg C"
t1 = get_temp_deg_c(thermocouple2)
print "Valve Temperature = " + str(t1) + " deg C"
# write temps to file
fi.write(str(t0))
fi.write(" " + str(t1) + "\n")
error = setpoint - t0
print "Setpoint = " + str(setpoint) + " deg C"
print "Error = " + str(error) + " deg C"
percent_error = error/setpoint*100
print "Percent error = " + str(percent_error) + " %"
duty_out = p.GenOut(percent_error)
print "PID controller duty output: " + str(duty_out) + " %"
# clamp the output
if(duty_out) > 100:
duty_out = 100
if(duty_out < 0):
duty_out = 0
PWM.set_duty_cycle(PWM_pin, duty_out)
# do we need to increment the setpoint?
if( (setpoint - setpoint_precision) ... # omitted logic here
# Here we return to the top
As long as you're okay with restarting "from the top" after each interrupt (as opposed to returning to the exact point in the loop when the signal was raised, which is a much harder problem):
while True:
try:
controller.main_loop()
except KeyboardInterrupt:
controller.set_coefficients()
In case you don't want a separate thread for IO, generators may be used to preserve the state of your loop across KeyboardInterrupts.
some_parameter = 1
def do_pid_stuff():
while True:
sensor_data1 = 'data'
sensor_data2 = 'data'
sensor_data3 = 'data'
yield 'based on sensor_data1 ' * some_parameter
yield 'based on sensor_data2 ' * some_parameter
yield 'based on sensor_data3 ' * some_parameter
stuff = do_pid_stuff()
while True:
try:
for control_signal in stuff:
print(control_signal)
except KeyboardInterrupt:
some_parameter = int(input())
So the main loop will continue with new parameters from the last executed yield. This however would require to rewrite your loop. Probably, it should be splitted into a generator that will give you sensor data and into a function that will actually do stuff based on the sensor values.
You already have a few ways to interact with your loop, I'd like to point out another one: select(). Using select(), you can wait for either user input. If you add a timeout, you can then break into the normal loop if no user input is available and interact with your hardware there.
Notes:
Here's the documentation for select , but consider the warning on top and look at the selectors module instead.
This solution, like the one using a keyboard interrupt, will stop interacting with the hardware while parameters are being changed. If that isn't acceptable, using a background thread is necessary.
Using select() is more generally applicable, you could also wait for network traffic.
Your hardware will not be serviced as often as possible but with a fixed pause in between. On the upside, you also don't use a full CPU then.

Categories