Use cache under mod_wsgi for Python web framework - python

The codes are like this(I'm using flask and flask-cache but this might be a general problem):
#cache.memoize(500000)
def big_foo(a,b):
return a + b + random.randrange(0, 1000)
If I run it in a Python intepreter, I can always get the same result by calling big_foo(1,2).
But if I add this function in the application and use mod_wsgi to daemon, then request in browser. (the big_foo is called within the views function of that request). I found the result is not the same each time.
I think the results are different each time because mod_wsgi use multiple process to launch the app. Each process might have their own cache, and the cache can't be shared between process.
Is my guess right? If right, how can I assign one and only one cache for global accessing? If not, where was wrong with my codes?
Following is the config used for flask-cache
UPLOADS_FOLDER = "/mnt/Storage/software/x/temp/"
class RadarConfig(object):
UPLOADS_FOLDER = UPLOADS_FOLDER
ALLOWED_EXTENSIONS = set(['bed'])
SECRET_KEY = "tiananmen"
DEBUG = True
CACHE_TYPE = 'simple'
CACHE_DEFAULT_TIMEOUT = 5000000
BASIS_PATH = "/mnt/Storage/software/x/NMF_RESULT//p_NMF_Nimfa_NMF_Run_30632__metasites_all"
COEF_PATH = "/mnt/Storage/software/x/NMF_RESULT/MCF7/p_NMF_Nimfa_NMF_Run_30632__metasample_all"
MASK_PATH = "/mnt/Storage/software/x/NMF_RESULT/dhsHG19.bed"

Here's your problem: CACHE_TYPE = 'simple'. From SimpleCache documentation:
Simple memory cache for single process environments. This class exists mainly for the development server and is not 100% thread safe.
For production better suited backends are memcached, redis and filesystem, since they're designed to work in concurrent environments.

Related

Datastore delay on creating entities with put()

I am developing an application using with the Cloud Datastore Emulator (2.1.0) and the google-cloud-ndb Python library (1.6).
I find that there is an intermittent delay on entities being retrievable via a query.
For example, if I create an entity like this:
my_entity = MyEntity(foo='bar')
my_entity.put()
get_my_entity = MyEntity.query().filter(MyEntity.foo == 'bar').get()
print(get_my_entity.foo)
it will fail itermittently because the get() method returns None.
This only happens on about 1 in 10 calls.
To demonstrate, I've created this script (also available with ready to run docker-compose setup on GitHub):
import random
from google.cloud import ndb
from google.auth.credentials import AnonymousCredentials
client = ndb.Client(
credentials=AnonymousCredentials(),
project='local-dev',
)
class SampleModel(ndb.Model):
"""Sample model."""
some_val = ndb.StringProperty()
for x in range(1, 1000):
print(f'Attempt {x}')
with client.context():
random_text = str(random.randint(0, 9999999999))
new_model = SampleModel(some_val=random_text)
new_model.put()
retrieved_model = SampleModel.query().filter(
SampleModel.some_val == random_text
).get()
print(f'Model Text: {retrieved_model.some_val}')
What would be the correct way to avoid this intermittent failure? Is there a way to ensure the entity is always available after the put() call?
Update
I can confirm that this is only an issue with the datastore emulator. When testing on app engine and a Firestore in Datastore mode, entities are available immediately after calling put().
The issue turned out to be related to the emulator trying to replicate eventual consistency.
Unlike relational databases, Datastore does not gaurentee that the data will be available immediately after it's posted. This is because there are often replication and indexing delays.
For things like unit tests, this can be resolved by passing --consistency=1.0 to the datastore start command as documented here.

Connecting to local dockerized Perforce server in Python unittests

I maintain a Python tool that runs automation against a Perforce server. For obvious reasons, parts of my test suite (which are unittest.TestCase classes run with Pytest) require a live server. Until now I've been using a remote testing server, but I'd like to move that into my local environment, and make server initialization part of my pre-test setup.
I'm experimenting with dockerization as a solution, but I get strange connection errors when trying to run Perforce commands against the server in my test code. Here's my test server code (using a custom docker image, Singleton metaclass based on https://stackoverflow.com/a/6798042, and with the P4Python library installed):
class P4TestServer(metaclass=Singleton):
def __init__(self, conf_file='conf/p4testserver.conf'):
self.docker_client = docker.from_env()
self.config = P4TestServerConfig.load_config(conf_file)
self.server_container = None
try:
self.server_container = self.docker_client.containers.get('perforce')
except docker.errors.NotFound:
self.server_container = self.docker_client.containers.run(
'perforce-server',
detach=True,
environment={
'P4USER': self.config.p4superuser,
'P4PORT': self.config.p4port,
'P4PASSWD': self.config.p4superpasswd,
'NAME': self.config.p4name
},
name='perforce',
ports={
'1667/tcp': 1667
},
remove=True
)
self.p4 = P4()
self.p4.port = self.config.p4port
self.p4.user = self.config.p4superuser
self.p4.password = self.config.p4superpasswd
And here's my test code:
class TestSystemP4TestServer(unittest.TestCase):
def test_server_connection(self):
testserver = P4TestServer()
with testserver.p4.connect():
info = testserver.p4.run_info()
self.assertIsNotNone(info)
So this is the part that's getting to me: the first time I run that test (i.e. when it has to start the container), it fails with the following error:
E P4.P4Exception: [P4#run] Errors during command execution( "p4 info" )
E
E [Error]: 'TCP receive failed.\nread: socket: Connection reset by peer'
But on subsequent runs, when the container is already running, it passes. What's frustrating is that I can't otherwise reproduce this error. If I run that test code in any other context, including:
In a Python interpreter
In a debugger stopped just before the testserver.p4.run_info() invokation
The code completes as expected regardless of whether the container was already running.
All I can think at this point is that there's something unique about the pytest environment that's tripping me up, but I'm at a loss for even how to begin diagnosing. Any thoughts?
I had a similar issue recently where I would start postgres container and then immediately run a python script to setup database as per my app requirement.
I had to introduce a sleep command in between the two steps and that resolved the issue.
Ideally you should check if the start sequence of the docker container is done before trying to use it. But for my local development use case, sleep 5 seconds was good enough workaround.

latency with group in pymongo in tests

Good Day.
I have faced following issue using pymongo==2.1.1 in python2.7 with mongo 2.4.8
I have tried to find solution using google and stack overflow but failed.
What's the issue?
I have following function
from bson.code import Code
def read(groupped_by=None):
reducer = Code("""
function(obj, prev){
prev.count++;
}
""")
client = Connection('localhost', 27017)
db = client.urlstats_database
results = db.http_requests.group(key={k:1 for k in groupped_by},
condition={},
initial={"count": 0},
reduce=reducer)
groupped_by = list(groupped_by) + ['count']
result = [tuple(res[col] for col in groupped_by) for res in results]
return sorted(result)
Then I am trying to write test for this function
class UrlstatsViewsTestCase(TestCase):
test_data = {'data%s' % i : 'data%s' % i for i in range(6)}
def test_one_criterium(self):
client = Connection('localhost', 27017)
db = client.urlstats_database
for column in self.test_data:
db.http_requests.remove()
db.http_requests.insert(self.test_data)
response = read([column])
self.assertEqual(response, [(self.test_data[column], 1)])
this test sometimes fails as I understand because of latency. As I can see response has not cleaned data in it
If I add delay after remove test pass all the time.
Is there any proper way to test such functionality?
Thanks in Advance.
A few questions regarding your environment / code:
What version of pymongo are you using?
If you are using any of the newer versions that have MongoClient, is there any specific reason you are using Connection instead of MongoClient?
The reason I ask second question is because Connection provides fire-and-forget kind of functionality for the operations that you are doing while MongoClient works by default in safe mode and is also preferred approach of use since mongodb 2.2+.
The behviour that you see is very conclusive for Connection usage instead of MongoClient. While using Connection your remove is sent to server, and the moment it is sent from client side, your program execution moves to next step which is to add new entries. Based on latency / remove operation completion time, these are going to be conflicting as you have already noticed in your test case.
Can you change to use MongoClient and see if that helps you with your test code?
Additional Ref: pymongo: MongoClient or Connection
Thanks All.
There is no MongoClient class in version of pymongo I use. So I was forced to find out what exactly differs.
As soon as I upgrade to 2.2+ I will test whether everything is ok with MongoClient. But as for connection class one can use write concern to control this latency.
I older version One should create connection with corresponding arguments.
I have tried these twojournal=True, safe=True (journal write concern can't be used in non-safe mode)
j or journal: Block until write operations have been commited to the journal. Ignored if the server is running without journaling. Implies safe=True.
I think this make performance worse but for automatic tests this should be ok.

Python w/ Celery and RabbitMQ Does Not Behave Normally

I'm using Django 1.4.3 with Python 2.7, Celery 3.0.1 and django-celery 3.0.17 on Ubuntu 13.04.
I have some tasks setup to run time consuming processes. If I set them up to queue with celery they do not behave properly. If I run them without queuing them everything behaves perfectly. Any thoughts as to why this would be the case?
To provide some motivation to my problem. I need to clone company contracts. Each contract has multiple offers associated with it. Each offer has multiple offer fields. Each offer field has multiple values. I need to clone everything.
Here is an example of what I'm doing.
def clone_contract(self, contract_id, contract_name):
old_contract = models.Contract.objects.get(pk=contract_id)
contract_dict = dict()
for attr in old_contract._meta.fields:
contract_dict[attr.name] = getattr(old_contract, attr.name)
del contract_dict['id']
contract_dict['name'] = contract_name
new_contract = contracts_models.Contract(**contract_dict)
new_contract.save()
contracts_tasks.clone_offers.delay(new_contract, old_contract)
#task(name='Clone Offers')
def clone_offers(new_contract, old_contract):
for offer in old_contract.offer_set.all():
offer_dict = dict()
for attr in offer._meta.fields:
offer_dict[attr.name] = getattr(offer, attr.name)
del offer_dict['id']
del offer_dict['contract']
offer_dict['contract_id'] = new_contract.pk
new_offer = contracts_models.Offer(**offer_dict)
new_offer.save()
clone_offer_fields(new_offer, offer)
def clone_offer_fields(new_offer, old_offer):
offer_fields = models.OfferField.objects.filter(offer=old_offer)
for offer_field in offer_fields:
initial = dict()
for attr in offer_field._meta.fields:
initial[attr.name] = getattr(offer_field, attr.name)
initial['offer'] = new_offer
del initial['id']
new_offer_field = contracts_models.OfferField(**initial)
new_offer_field.save()
model = models.OfferFieldValue
values = model.objects.filter(**{'field': offer_field})
clone_model(new_offer_field, model, 'field', values)
def clone_model(new_obj, model, fk_name, values):
for value in values:
initial = dict()
for attr in value._meta.fields:
initial[attr.name] = getattr(value, attr.name)
del initial['id']
initial[fk_name] = new_obj
new_value = model(**initial)
new_value.save()
From what I've observed the clone_offers works but the clone_offer_fields does not - again, only if clone_offers gets called as clone_offers.delay(). If I run this calling clone_offers without .delay() (not queuing it) everything works perfectly.
Unfortunately I'm unable to log in queued tasks (nothing seems to be written to the log file) so I can't troubleshoot within the code.
Is there an issue calling functions within a queued task? I'm pretty sure I've done this before with no problems. (Edit: Answered below)
Any suggestions would be greatly appreciated.
EDIT1:
I decided to test this throwing all the methods together. I was 99% sure this wouldn't be the problem but thought it'd be better to check to make sure. No difference if I use a single massive method.
The problem involved Celery hijacking the default logging. I implemented the solution given: Django Celery Logging Best Practice

Programmatically detect system-proxy settings on Windows XP with Python

I develop a critical application used by a multi-national company. Users in offices all around the globe need to be able to install this application.
The application is actually a plugin to Excel and we have an automatic installer based on Setuptools' easy_install that ensures that all a project's dependancies are automatically installed or updated any time a user switches on their Excel. It all works very elegantly as users are seldom aware of all the installation which occurs entirely in the background.
Unfortunately we are expanding and opening new offices which all have different proxy settings. These settings seem to change from day to day so we cannot keep up with the outsourced security guys who change stuff without telling us. It sucks but we just have to work around it.
I want to programatically detect the system-wide proxy settings on the Windows workstations our users run:
Everybody in the organisazation runs Windows XP and Internet Explorer. I've verified that everybody can download our stuff from IE without problems regardless of where they are int the world.
So all I need to do is detect what proxy settings IE is using and make Setuptools use those settings. Theoretically all of this information should be in the Registry.. but is there a better way to find it that is guaranteed not to change with people upgrade IE? For example is there a Windows API call I can use to discover the proxy settings?
In summary:
We use Python 2.4.4 on Windows XP
We need to detect the Internet Explorer proxy settings (e.g. host, port and Proxy type)
I'm going to use this information to dynamically re-configure easy_install so that it can download the egg files via the proxy.
UPDATE0:
I forgot one important detail: Each site has an auto-config "pac" file.
There's a key in Windows\CurrentVersion\InternetSettings\AutoConfigURL which points to a HTTP document on a local server which contains what looks like a javascript file.
The pac script is basically a series of nested if-statements which compare URLs against a regexp and then eventually return the hostname of the chosen proxy-server. The script is a single javascript function called FindProxyForURL(url, host)
The challenge is therefore to find out for any given server which proxy to use. The only 100% guaranteed way to do this is to look up the pac file and call the Javascript function from Python.
Any suggestions? Is there a more elegant way to do this?
Here's a sample that should create a bullet green (proxy enable) or red (proxy disable) in your systray
It shows how to read and write in windows registry
it uses gtk
#!/usr/bin/env python
import gobject
import gtk
from _winreg import *
class ProxyNotifier:
def __init__(self):
self.trayIcon = gtk.StatusIcon()
self.updateIcon()
#set callback on right click to on_right_click
self.trayIcon.connect('popup-menu', self.on_right_click)
gobject.timeout_add(1000, self.checkStatus)
def isProxyEnabled(self):
aReg = ConnectRegistry(None,HKEY_CURRENT_USER)
aKey = OpenKey(aReg, r"Software\Microsoft\Windows\CurrentVersion\Internet Settings")
subCount, valueCount, lastModified = QueryInfoKey(aKey)
for i in range(valueCount):
try:
n,v,t = EnumValue(aKey,i)
if n == 'ProxyEnable':
return v and True or False
except EnvironmentError:
break
CloseKey(aKey)
def invertProxyEnableState(self):
aReg = ConnectRegistry(None,HKEY_CURRENT_USER)
aKey = OpenKey(aReg, r"Software\Microsoft\Windows\CurrentVersion\Internet Settings", 0, KEY_WRITE)
if self.isProxyEnabled() :
val = 0
else:
val = 1
try:
SetValueEx(aKey,"ProxyEnable",0, REG_DWORD, val)
except EnvironmentError:
print "Encountered problems writing into the Registry..."
CloseKey(aKey)
def updateIcon(self):
if self.isProxyEnabled():
icon=gtk.STOCK_YES
else:
icon=gtk.STOCK_NO
self.trayIcon.set_from_stock(icon)
def checkStatus(self):
self.updateIcon()
return True
def on_right_click(self, data, event_button, event_time):
self.invertProxyEnableState()
self.updateIcon()
if __name__ == '__main__':
proxyNotifier = ProxyNotifier()
gtk.main()
As far as I know, In a Windows environment, if no proxy environment variables are set, proxy settings are obtained from the registry's Internet Settings section. .
Isn't it enough?
Or u can get something useful info from registry:
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer
Edit:
sorry for don't know how to format comment's source code, I repost it here.
>>> import win32com.client
>>> js = win32com.client.Dispatch('MSScriptControl.ScriptControl')
>>> js.Language = 'JavaScript'
>>> js.AddCode('function add(a, b) {return a+b;}')
>>> js.Run('add', 1, 2)
3

Categories