I have an python script that needs to be running all the time. Sometimes it can run for a hole day, sometimes it only runs for like an hour.
import RPi.GPIO as GPIO
import fdb
import re
con = fdb.connect(dsn='10.100.2.213/3050:/home/trainee2/Desktop/sms', user='sysdba', password='trainee') #connect to database
cur = con.cursor() #initialize cursor
pinnen = [21,20,25,24,23,18,26,19,13,6,27,17] #these are the GPIO pins we use, they are the same on all PI's! We need them in this sequence.
status = [0] * 12 #this is an empty array were we'll safe the status of each pin
ids = []
controlepin = [2] * 12 #this array will be the same as the status array, only one step behind, we have this array so we can know where a difference is made so we can send it
GPIO.setmode(GPIO.BCM) #Initialize GPIO
getPersonIDs() #get the ids we need
for p in range(0,12):
GPIO.setup(pinnen[p],GPIO.IN) #setup all the pins to read out data
while True: #this will repeat endlessly
for e in range(0,12):
if ids[e]: #if there is a value in the ids (this is only neccesary for PI 3 when there are not enough users
status[e] = GPIO.input(pinnen[e]) #get the status of the GPIO. 0 is dark, 1 is light
if (status[e] != controlepin[e]): #if there are changes
id = ids[e]
if id != '': #if the id is not empty
if status[e] == 1: #if there is no cell phone present
cur.execute("INSERT INTO T_ENTRIES (F_US_ID, F_EN_STATE) values (? ,0)",(id)) #SEND 0, carefull! Status 0 sends 1, status 1 sends 0 to let it make sense in the database!!
else :
cur.execute("INSERT INTO T_ENTRIES (F_US_ID, F_EN_STATE) values (? ,1)",(id))
con.commit() #commit your query
controlepin[e] = status[e] #safe the changes so we woulnd't spam our database
time.sleep(1) #sleep for one second, otherwise script will crash cause of while true
def getPersonIDs(): #here we get the IDS
cur.execute("SELECT first 12 A.F_US_ID FROM T_RACK_SLOTS a order by F_RS_ID;") #here is where the code changes for each pi
for (ID) in cur:
ids.append(ID) #append all the ids to the array
The script is used for a cellphone rack, through LDR's I can see if a cellphone is present, then I send that data to a Firebird database. The scripts are running om my Raspberry PI's.
Can it be that that the script just stops if the connection is lost for a few seconds? Is there a way to make sure they query's are always send?
Can it be that that the script just stops if the connection is lost for a few seconds?
More so, the script IS stopping for every Firebird command, including con.commit() and it only continues when Firebird processes the command/query.
So, not knowing much of Python libraries I would still give you some advices.
1) use parameters and prepared queries as much as you can.
if status[e] == 1: #if there is no cell phone present
cur.execute("INSERT INTO T_ENTRIES (F_US_ID, F_EN_STATE) values (? ,0)",(id)) #SEND 0, carefull! Status 0 sends 1, status 1 sends 0 to let it make sense in the database!!
else :
cur.execute("INSERT INTO T_ENTRIES (F_US_ID, F_EN_STATE) values (? ,1)",(id))
That is not the best idea. You force Firebird engine to parse the query text and build the query again and again. Waste of time and resources.
The correct approach is to make INSERT INTO T_ENTRIES (F_US_ID, F_EN_STATE) values (?,?) query, then prepare it, and then run the already prepared query changing the parameters. You would only prepare it once, before the loop, and then would run it many times.
Granted, I do not know how to prepare queries in the Python library, but I think you'd find the examples.
2) do not use SQL server for saving every single data element you get. It is a known mal-practice, that was suggested again decade ago. Especially with lazy versioned engine Interbase/Firebird is.
The thing is, with every your statement Firebird checks some internal statistics and sometimes it decides time came to do housekeeping.
For example, your select statement is akin for garbage collection. Firebird might stop for scanning all the table, find the orphaned obsolete versions of rows and clear them away. For example your insert statement is akin for index recreation: if Firebird would think the B-Tree of the index is got too one-sided, it would drop it, and build a new balanced tree, reading out the whole table ( and yes, reading the table may provoke GC on top of tree recreation ).
More so, let as steer away from Firebird specifics - what would you do ig Firebird crashes? Just crashes, it is a program, and like every program it may have bugs. Or for example you run out of disk space and Firebird can no more insert anything into the database - where would your hardware sensors data end in then? Won't it just be lost ?
http://www.translate.ru - this one works usually better than Google or Microsoft translation, especially if you set the vocabulary to computers.
See #7 at http://www.ibase.ru/dontdoit/ - "Do not issue commit after every single row". #24 at https://www.ibase.ru/45-ways-to-improve-firebird-performance-russian/ suggests committing packets of about a thousand rows as a sweet spot between to many transactions and too much uncommitted data. Also check #9, #10, #16 and #17 and #44 at the last link.
The overall structure of your software complex I believe has to be split into two services.
Query data from hardware sensors and save it to plain stupid binary flat file. Since this file is the most simplistic format that can be - the performance and reliability would be maxed.
Take ready binary files, and insert them into SQL database in bulk insert mode.
So, for example, you set the threshold of 10240 rows.
The service #1 creates the file "Phones_1_Processing" with BINARY well-defined format. It also creates and opens "Phones_2_Processing" file, but keeps it at 0 length. Then it keeps adding rows into the "Phones_1_Processing" for a while. It might also flush OS file buffers after every row, or every 100 rows, or something that would get best balance between reliability and performance.
When the threshold is met, the service #1 switches into recording incoming data cells into the already created and opened "Phones_2_Processing" file. It can be done instantly, change one file handler type variable in your program.
Then the service #1 closes and renames "Phones_1_Processing" into ""Phones_1_Complete".
Then the service #1 creates new empty file "Phones_3_Processing" and keeps it open with zero length. Now it is back at state "1" - ready to instantly switch its recording into new file, when the current file is over.
The key points here is that the service should only do most simple and most fast operations. Since any random delay would mean your realtime-generated data is lost and would never be recovered. BTW, how can you disable Garbage Collection in Python, so it would not "stop the world" suddenly? Okay, half-joking. But the point is kept. GC is random non-deterministic bogging down of your system, and it is badly compatible with regular non-buffered hardware data acquisition. That primary acquisition of non-buffered data is better be done with most simple=predictable services, and GC is a good global optimization, but the price is it tends to generate sudden local no-service spikes.
As this all happens with Service #1 with have another one.
Service #2 keeps monitoring data changes in the folder you use to save primary data. It subscribes to "some file was renamed" events and ignores others. Which service to use? ask Linux guru. iNotify, dNotify, FAM, Gamin, anything of a kind that would suit.
When Service #2 is awaken with "file was renamed and xxxx is new name" it checks if the new file name ends with "_Complete". If it does not - then that was a bogus event.
When the event is for a new "Phone_...._Complete" file, then it is time to "bulk insert" it into Firebird. Google for "Firebird bulk insert", for example http://www.firebirdfaq.org/faq209/
The Service #2 renames "Phone_1_Complete" into "Phone_1_Inserting", so the state of data packet is persisted (as file name).
The Service #2 attaches this file into Firebird database as an EXTERNAL TABLE
The Service #2 proceeds with bulk insert, as described above. De-activating indexes, it opens a no-auto-undo transaction and keeps pumping the rows from the External Table into the destination table. If the service or server crashes here - you have a consistent state: transaction gets rolled back and the file name shows it still is pending to be inserted.
When all the rows are pumped - frankly, if Python can work with binary files, it would be a single INSERT-FROM-SELECT, - you commit the transaction, delete the External Table (detaching firebird from the file), then rename the "Phone_1_Inserting" file into "Phone_2_Done" (persisting its changed state) and then you delete it.
Then the service #2 looks if there are new "_Complete" files already ready in the folder, and if not, it goes into step 1 - sleeps until FAM event would awake it
All in all, you should DECOUPLE your services.
https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29
The service who's main responsibility is to be with not a tiny pause ready to get and save data flow, and another service whose responsibility transfer the saved data into SQL database for ease of processing and it is not a big issue if it sometimes makes delays for few seconds as long as it does not lose data in the end.
Related
I have a Python script which reads files into my tables in MySQL. Now this program runs automatically every now and then. However I'am afraid of 2 things:
Their might come a time the program stops running because it cant connect to the MySQL server. There are a lot of processes depending on this tables, so if the tables are not up to date the rest of my process will also stop working.
Their might sneak a file inside the process which does not have the expected content. After the script finished running, every value of column X must have 12 rows. If it does not have 12 rows this means the files did not have the right content inside them.
My question is: Is there something I can do to tackle this before it happens? Like send an e-mail to myself so I can be notified if the connection fails or like run the program on another server or if a certain value has like NOT 12 rows?
I'm very eager to know how you guys handle this situations.
I have a very simple connection made like this:
mydb = mysql.connector.connect(
host= 'localhost',
user = 'root',
passwd = '*****.',
database= 'my_database'
)
The event you are talking about is very unlikely to happen, and the only possible situation I could see this happening is when your database runs out of memory. For which you can set up a 2 min period every 2 to 3 days when you will go and check the amount of memory left in your server.
I have to know the status of data transfer job(flow inside a Process Group) whether it is completed, failed or is it running. I want to do this using nipyapi for a web application.
I have a process group in NiFi, inside which I have the NiFi flow. I am scheduling the process group using nipyapi:
nipyapi.canvas.schedule_process_group(id, True)
Now I want to monitor the status of the process group using nipyapi. By status I specifically want to know whether it's still running, failed or completed.
NiFi does not really have a concept of a job that can be checked for completion. Once you start all the components in a process group, they are then running indefinitely until someone stops them.
The concept of being "done" or "complete" is really dependent on what you data flow is doing. For example, if your first processor is say GetFile, once that processor is running it is going to monitor the directory for files until someone stops the processor. While the processor is running it has no way of knowing if there will ever be more files, or if it has already seen all the files that will ever be dropped in the directory. That knowledge is only known by whoever/whatever is putting the files there.
To determine failure you need to do something in your data flow to capture the failures. Most processors have a failure relationship, so you would need to route these somewhere and take some action to track the failures.
I think I found a good solution for this problem. This is how i solved it.
So i have a mysql db which basically keeps track of all the files that are to be transferred. The database table will have 2 columns. One for the Filename (lets say is unique) and flag for whether the file has been transferred (True and False).
For Nifi Screenshot click here
We have 3 section of processors.
First: listSFTP and putMySQL
Second: getSFTP and putHDFS
Third: listHDFS and putHDFS
First Section responsible for listing the files in SFTP. It Gets all the files and adds a row into the mysql that the Filename is 'X' and 'False' for not transferred yet.
insert into NifiTest.Jobs values('${filename}', 0);
Third Section does the same thing for HDFS. It will either insert with Transferred = True or update if already a row exists with the same file name.
insert into NifiTest.Jobs values('${filename}', 1) on duplicate key update TRANSFERRED = 1;
2nd Section does nothing but Send the file to the HDFS.
Now to check when the data transfer job is finished.
You will start the entire process group together. When you query the Database and you get all Transferred = 1, that means the job is finished.
It might feel like there are some cases where it might fail but when you carefully think out all the cases you will see that it takes care of all the situations.
Let me know if i am wrong or some improvement can be made to this solution.
You can achieve this without a database using Nifi process groups variable registry APIs of process groups. Create a custom processor which sets a variable in process group lets say is_complete = true as last processor. Then you can monitor for this variable with nipyapi.
I have developed a basic proxy tester in Python. Proxy IPs and ports, as well as their date_of_last_test (e.g. 31/12/2011 10:10:10) and result_of_last_test (OK or KO) are stored in a single SQLite table. (I realize I could store a lot more details on the tests results and keep an history / stats, but this simple model suits my needs).
Here is the simplified code of the tester main loop, where I loop over the proxies and update their status:
while True:
# STEP 1: select
myCursor.execute("SELECT * from proxy ORDER BY date_of_last_test ASC;")
row = myCursor.fetchone()
# STEP 2: update
if isProxyWorking(row['ip'], row['port']): # this test can last a few seconds
updateRow(row['ip'], row['port'], 'OK')
else:
updateRow(row['ip'], row['port'], 'KO')
My code works fine when run as a single process. Now, I would like to be able to run many processes of the program, using the same SQLite database file.
The problem with the current code is the lack of a locking mechanism that would prevent several processes from testing the same proxy.
What would be the cleanest way to put a lock on the row at STEP 1 / SELECT time, so that the next process doing a SELECT gets the next row ?
In other words, I'd like to avoid the following situation:
Let's say it's 10PM, and the DB contains 2 proxies:
Proxy A tested for the last time at 8PM and proxy B tested at 9PM.
I start two processes of the tester to update their statuses:
10:00 - Process 1 gets the "oldest" proxy to test it: A
10:01 -
Process 2 gets the "oldest" proxy to test it: !!! A !!! (here I'd
like Process 2 to get proxy B because A is already being tested -
though not updated yet in db)
10:10 - Testing of A by Process 1 is
over, its status is updated in DB
10:11 - Testing of A by Process 2 is
over, its status is updated (!!! AGAIN !!!) in DB
There is no actual error/exception in this case, but a waste of time I want to avoid.
SQlite only allows one process to update anything in the database at a time, From the FAQ
Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time,
and
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.
So if there only a few updates then this will work otherwise you need to change to a more capable database.
so there is only one lock which is on the whole database
In Python, is there a way to get notified that a specific table in a MySQL database has changed?
It's theoretically possible but I wouldn't recommend it:
Essentially you have a trigger on the the table the calls a UDF which communicates with your Python app in some way.
Pitfalls include what happens if there's an error?
What if it blocks? Anything that happens inside a trigger should ideally be near-instant.
What if it's inside a transaction that gets rolled back?
I'm sure there are many other problems that I haven't thought of as well.
A better way if possible is to have your data access layer notify the rest of your app. If you're looking for when a program outside your control modifies the database, then you may be out of luck.
Another way that's less ideal but imo better than calling an another program from within a trigger is to set some kind of "LastModified" table that gets updated by triggers with triggers. Then in your app just check whether that datetime is greater than when you last checked.
If by changed you mean if a row has been updated, deleted or inserted then there is a workaround.
You can create a trigger in MySQL
DELIMITER $$
CREATE TRIGGER ai_tablename_each AFTER INSERT ON tablename FOR EACH ROW
BEGIN
DECLARE exec_result integer;
SET exec_result = sys_exec(CONCAT('my_cmd '
,'insert on table tablename '
,',id=',new.id));
IF exec_result = 0 THEN BEGIN
INSERT INTO table_external_result (id, tablename, result)
VALUES (null, 'tablename', 0)
END; END IF;
END$$
DELIMITER ;
This will call executable script my_cmd on the server. (see sys_exec fro more info) with some parameters.
my_cmd can be a Python program or anything you can execute from the commandline using the user account that MySQL uses.
You'd have to create a trigger for every change (INSERT/UPDATE/DELETE) that you'd want your program to be notified of, and for each table.
Also you'd need to find some way of linking your running Python program to the command-line util that you call via sys_exec().
Not recommended
This sort of behaviour is not recommend because it is likely to:
slow MySQL down;
make it hang/timeout if my_cmd does not return;
if you are using transaction, you will be notified before the transaction ends;
I'm not sure if you'll get notified of a delete if the transaction rolls back;
It's an ugly design
Links
sys_exec: http://www.mysqludf.org/lib_mysqludf_sys/index.php
Yes, may not be SQL standard. But PostgreSQL supports this with LISTEN and NOTIFY since around Version 9.x
http://www.postgresql.org/docs/9.0/static/sql-notify.html
Not possible with standard SQL functionality.
It might not be a bad idea to try using a network monitor instead of a MySQL trigger. Extending a network monitor like this:
http://sourceforge.net/projects/pynetmontool/
And then writing a script that waits for activity on port 3306 (or whatever port your MySQL server listens on), and then checks the database when the network activity meets certain filter conditions.
It's a very high level idea that you'll have to research further, but you don't run into the DB trigger problems and you won't have to write a cron job that runs every second.
I use this python code to output the number of Things every 5 seconds:
def my_count():
while True:
print "Number of Things: %d" % Thing.objects.count()
time.sleep(5)
my_count()
If another process generates a new Thing while my_count() is running, my_count() will keep printing the same number, even though it now has changed in the database. (But if I kill my_count() and restart it, it will display the new Thing count.)
Things are stored in a MYSQL innodb database, and this code runs on ubuntu.
Why won't my_count() display the new Thing.objects.count() without being restarted?
Because Python DB API is by default in AUTOCOMMIT=OFF mode, and (at least for MySQLdb) on REPEATABLE READ isolation level. This means that behind the scenes you have an ongoing database transaction (InnoDB is transactional engine) in which the first access to given row (or maybe even table, I'm not sure) fixes "view" of this resource for the remaining part of the transaction.
To prevent this behaviour, you have to 'refresh' current transaction:
from django.db import transaction
#transaction.autocommit
def my_count():
while True:
transaction.commit()
print "Number of Things: %d" % Thing.objects.count()
time.sleep(5)
-- note that the transaction.autocommit decorator is only for entering transaction management mode (this could also be done manually using transaction.enter_transaction_management/leave_transaction_managemen functions).
One more thing - to be aware - Django's autocommit is not the same autocommit you have in database - it's completely independent. But this is out of scope for this question.
Edited on 22/01/2012
Here is a "twin answer" to a similar question.