Why are duplicate UUIDs being generated from python on GCP? - python

I am facing this weird issue. Some (5%) of my celery tasks are silently being dropped.
Doing some digging in celery logs, I found that in some cases, same task IDs get generated for different tasks. Naturally, any new task overwrites an existing task with the same task ID; causing the old task to silently drop (if it wasn't executed).
In a span of 1.5 hours, the same UUID was generated 3 times. I did some random sampling and this turned to be the case on the same machine, in a short span (1-2 hours). The server generates around 1 million UUIDs a day. A minuscule number with 7 digits compared to a number with 38 digits- the number of possible UUIDs.
I am running python 3.6, and celery 4.4.2 on a Linux VM.
Celery uses python's uuid.uuid4: Reference
I'm not sure how to proceed from here. Is there a bug in a version of python (or the linux kernel), some configuration issue, or a hardware/VM bug? All scenarios seem very unlikely.
Update:
The VM is a standard Google Cloud Plaftform compute instance running ubuntu 18 LTS.

I couldn't figure out why but I implemented a workaround.
I monkey patched uuid.uuid4. For some reason I was unable to do the same for celery.utils.uuid or kombu.utils.uuid.
I made a very simple random number generator that concatenates the system nano time, and the hostname, and generates a UUID:
def __my_uuid_generator():
time_hex = float.hex(time.monotonic())[4:-4] # 13 chars
host = hex(abs(hash(socket.gethostname())))[2:] # 16 chars
hashed = bytes(f'{time_hex}{host}', 'ascii').hex()[:32] # always a 32 chars long hex string
return uuid.UUID(hashed)
# Monkey patch uuid4, because https://stackoverflow.com/q/62312607/1396264. Sigh!
uuid.uuid4 = __my_uuid_generator

Related

Run python job every x minutes

I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/

PyMongo dropping connections to mongos randomly

I have a MongoDB cluster (2.6.3) with three mongos processes and two replica sets, with no sharding enabled.
In particular, I have 7 hosts (all are Ubuntu Server 14.04):
host1: mongos + Client aplication
host2: mongos + Client aplication
host3: mongos + Client aplication
host4: RS1_Primary (or RS1_Secondary) and RS2_Arbitrer
host5: RS1_Secondary (or RS1_Primary)
host6: RS2_Primary (or RS2_Secondary) and RS1_Arbitrer
host7: RS2_Secondary (or RS2_Primary)
The Client application here is a Zato Cluster with 4 gunicorn workers running in each server which accesses MongoDB using two PyMongo.MongoClient instances for each worker.
These MongoClient objects are created as follows:
MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=25)
MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=10)
where this mongo_hosts is: 'host1:27017,host2:27017,host2:27017' in all servers.
So, in total, I have 12 MongoClient instances with max_pool_size=25 (4 in each server) and 12 others with max_pool_size=10 (also 4 in each server)
And my problem is:
When the Zato clusters are started and begin receiving requests (up to 10 rq/sec each, balanced using a simple round robin), a bunch of new connections are created and around 15-20 are then kept permanently open over the time in each mongos.
However, at some random point and with no apparent cause, a couple of connections are suddenly dropped at the same time in all three mongos and then the total number of connections keeps changing randomly until it stabilizes again after some minutes (from 5 to 10).
And while this happens, even though I see no slow queries in MongoDB logs (neither in mongos nor in mongod) the performance of the platform is severely reduced.
I have been isolating the problem and already tried to:
change the connection string to 'localhost:27017' in each MongoClient to see if the problem was in only one of the clients. The problem persisted, and it keeps affecting the three mongos at the same time, so it looks like something in the server side.
add log traces to make sure that the performance is lost inside MongoClient. The result is that running a simple find query in MongoClient is clearly seen to last more than one second in the client side, while usually it's less than 10ms. However, as I said before, I see no slow queries at all in MongoDB logs (default profiling level: 100ms).
monitor the platform activity to see if there's a load increase when this happens. There's none, and indeed it can even happen during low load periods.
monitor other variables in the servers, such as cpu usage or disk activity. I found nothing suspicious at all.
So, the questions at the end are:
Has anyone seen something similar (connections being dropped in PyMongo)?
What else can I look at to debug the problem?
Possible solution: MongoClient allows the definition of a max_pool_size, but I haven't found any reference to a min_pool_size. Is it possible to define so? Perhaps making the number of connections static would fix my performance problems.
Note about MongoDB version: I am currently running MongoDB 2.6.3 but I already had this problem before upgrading from 2.6.1, so it's nothing introduced in the last version.

Django aggregate function returns incorrect value

This is a very weird error that sometimes happens in our production environment. I use a very simple aggregate (MAX) query to determine the maximum value of a given column in a table:
last_uid = SomeEntity.objects.all().aggregate(Max('uid_mail'))['uid_mail__max'] or 0
The issue is that in very rare cases the value returned by this query is incorrect. It has happened maybe 3 or 4 times in 3 years, always with quite disastrous consequences (for instance, yesterday it returned 22 instead of 1722, causing our hosting provider to suspend our site because this error caused a very large amount of mails being sent).
The SQL query issued is straightforward:
DEBUG util (0.002) SELECT MAX(finanzas_enviosiirecibido.uid_mail) AS uid_mail__max FROM finanzas_enviosiirecibido; args=()
I am absolutely certain the returned value was incorrect: we log the value we obtain from this query, and these are the values we got:
On 2014-06-13 03:34:06:
DEBUG siidte lastUid: 1721
On 2014-06-14 05:39:29 (the next day):
DEBUG siidte lastUid: 22
And I am also absolutely certain the rows with values > 22 were not deleted in the meantime (as a matter of fact, after that last run there were two rows instead of one for each uid value -- there is no uniqueness constraint on that uid column, but I verified in previous backups that there was only one row for each uid).
So I suspect there is a bug in the stack somewhere between Django and the bare metal, but I have no idea where. This is the setting:
Django 1.5.1
Python 2.7.3
Database: MySQL 5.5 (5.5.22-0ubuntu1)
OS: Ubuntu 12.04.4 LTS (x86) running in a KVM virtual machine.
The physical server itself is probably irrelevant, as the problem has occurred with the VM on two different servers (one Intel Xeon and one Core i5).
Has somebody experienced something similar? And found a solution?

Why running django's session cleanup command kill's my machine resources?

I have a one year production site configured with django.contrib.sessions.backends.cached_db backend with a MySQL database backend. The reason why I chose cached_db is a mix of security with read performance.
The problem is, the cleanup command, responsible to delete all expired sessions, was never executed, resulting in a 2.3GB session table data length, 6 million rows and 500Mb index length.
When I try to run the ./manage.py cleanup (in Django 1.3) command, or ./manage.py clearsessions (Django`s 1.5 correspondent), the process never ends (or my patience doesn't complete 3 hours).
The code that Django use's to do this is:
Session.objects.filter(expire_date__lt=timezone.now()).delete()
In a first impression, I think that's normal because the table has 6M rows, but, after I inspect System's monitor, I discover that all memory and cpu was used by the python process, not mysqld, fullfilling my machine's resources. I think that's something terrible wrong with this command code. It seems that python iterates over all founded expired session rows before deleting each of them, one by one. In this case, a code refactoring to just raw a DELETE FROM command can resolve my problem and helps Django community, right? But, if this is the case, a Queryset delete command is acting weird and none optimized in my opinion. Am I right?

Redis Crash Windows Server 2003 R2

I’m running redis, 32bit, 2.0.2 from the cygwin compilation here: http://code.google.com/p/servicestack/wiki/RedisWindowsDownload
I am running it from the terminal. It works great for about 24 hours and then it crashes, no errors, it just closes. My config file I have defaults except:
# save 900 1
# save 300 10
# save 60 10000
appendonly no
appendfsync no
I tried using a newer version of redis. Redis-2.2.5 win32 here: https://github.com/dmajkic/redis/downloads
However, these I can run but it throws up ‘unpacking too many values’ error when task are added onto it with Celery 2.2.6.
I haven’t ran this long enough to see if it experiences the same crashing error that 2.0.2 has after 24 hours-ish.
Also I have redis flushdb at 1am every day. But the crash could happen any part of the day, normally around 24 hours since the last time it crashed.
Any thoughts?
Thanks!
additions
Sorry, I forgot to mention that Twisted is polling data every 20 seconds and storing it into redis, which roughly translates to close to a 700 thousand records a day or 4 or 5 gb of RAM used. There is no problem with Twisted, I just thought it might be relevant to the question.
follow up question?
Thanks Dhaivat Pandya!
Are there key-value database that are more supportive of the windows environment?
Redis does is not supposed to work with Windows, and the projects that try to make it work with windows all have numerous bugs that make them unstable.

Categories