I'm currently running python suds against a wsdl file and its corresponding 50+ xsd files. The following call to Client takes about 90 seconds:
from suds.client import Client
url = 'http://localhost:7080/webservices/WebServiceTestBean?wsdl'
client = Client(url)
After I run the last line above, I get a Client instance. Creating that client takes a long time. Does caching work with Python objects or is it restricted to primitives like strings and integers?
Here's what I want to do in code, the syntax is wrong but it's to convey what I want:
from suds.client import Client
if 'current_client' in cache:
client = cache.get('current_client')
else:
url = 'http://localhost:7080/webservices/WebServiceTestBean?wsdl'
client = Client(url)
cache.put('current_client', client)
suds caches WSDL and XSD files for a day by default so that each instantiation of a Client object doesn't require a separate URL request.
90 seconds seems like a really long time, is that time spent waiting on the wsdl response, or is it spent parsing the wsdl? If it's taking that long to parse it, the built-in caching isn't going to help much.
I've done something like this before, but instead of the singleton pattern, I just used a module-level global dictionary. It's the singleton pattern without all the class noise.
Something like this:
from suds.client import Client
_clients = {}
def get_client(name):
if name not in _clients:
_clients[name] = Client(url_for_name)
return _clients[name]
if i understand well your problem i think you don't want to create each time a new Client() and that you want to put in a cache so you can retrieve it ; but i think that you're complicating thing and i will suggest using the singleton pattern this will allow you to create only one instance of the client and every time you want to create an new instance it will just return the old instance that was created.
Here is an example that can help you understand what i'm suggesting.
class MyClient(Client):
__instance__ = None
def __new__(cls, *args, **kws):
if not cls.__instance__:
cls.__instance__ = super(Client, cls).__new__(cls, *args, **kws)
return cls.__instance__
N.B: i wanted to use the borg pattern which is like singleton but more beautiful but i wasn't able to figure out how to not call Super.init (which take a long time) and in the mean time sharing the same state, if some one has a better idea of how to put it using the Borg pattern it will be great , but i don't think borg pattern can be useful in this case
Hope this can help
suds >= 0.3.5 r473 provides some URL caching. By default, http get(s) such as getting the WSDL and importing XSDs are cached.
Related
In my system I'm thinking about migrating an RPC service to gRPC. (Previously, I'm using messagepack-rpc, but that doesn't matter here, except maybe for the fact that it didn't require a schema).
gRPC has many advantages, it's well supported and documented, so it seems like an obvious choice. One thing I'm trying to get a better understanding of is, how do people write their server code without unnecessary duplication?
Let me whip up an example. First, we have a class that acts as a system controller. It contains all the domain knowledge of what my system needs to do, and as such, we want to be good developers and keep implementation details of the RPC technology out of it. So, it looks something like this:
class SystemController:
def __init__(self, args):
# Do some system initializing
def do_action1(self, str_arg, int_arg):
""" Does action #1. Requires a string argument and
an integer argument."""
run_some_code()
def query_sensor17(self, sensor_name):
"""Queries sensors, requires a sensor name"""
return get_sensor_value(sensor_name)
In essence, this class will have a several dozen API calls that I can use to query or control my system.
Now, in order to use gRPC I'm going to have to describe all of these APIs again for the protobufs. Maybe it'll look like this:
syntax = "proto3";
package mysystemcontroller;
service SystemControllerServer {
rpc do_action1(Action1Request) returns (Action1Reply) {}
rpc query_sensor17(QuerySensor17Request) returns (QuerySensor17Reply) {}
}
message Action1Request {
string str_arg = 1;
int32 int_arg = 2;
}
message Action1Reply {
}
message QuerySensor17Request {
string sensor_name = 1;
}
message QuerySensor17Reply {
string sensor_value = 1;
}
Part of me is a little unhappy about having to repeat describing all of the APIs from the system controller class, but to be fair, I don't have all the type information in my Python and this gives the ability to do type-safe RPC calls.
But now, I have to write a third file containing the actual server code:
class SystemControllerServer(mysystemcontroller_grpc.SystemControllerServerServicer):
def __init__(self):
self._sc = SystemController()
def do_action1(self, request, context):
return mysystemcontroller.Action1Reply(self._sc.do_action1(request.str_arg, request.int_arg))
def query_sensor17(self, request, context):
return mysystemcontroller.QuerySensor17Reply(self._sc.query_sensor17(request.sensor_name))
At this point, I have to ask: Is this what everyone does, or do people use custom generation code for the server, or is there some neat Python-introspection that can save some of the trouble?
I do realize that not necessarily all of the server methods are going to be identical, but likely, for the most part, there will be plenty of similarity between server methods.
Put differently, there is so much overlap in information between the three files (domain knowledge implementation, proto file, and server implementation) that I'm curious if people have smarter ways than having to manually keep three separate files in sync when something changes.
I'm wondering where the best place to instantiate a boto3 s3 client is so that it can be reused during the duration of a request in django.
I have a django model with a computed property that returns a signed s3 url:
#property
def url(self):
client = boto3.client('s3')
params = {
'Bucket': settings.BUCKET,
'Key': self.frame.s3_key,
'VersionId': self.key
}
return client.generate_presigned_url('get_object', Params=params)
The object is serialized as json and returned in a list that can contain 100's of these objects.
Even though boto3.client('s3') does not perform any network requests when instantiated, I've found that it is slow.
Placing S3_CLIENT = boto3.client('s3') into settings.py and then using that instead of instantiating a new client per object reduced the response time by ~3X with 100 results. However, I know it is bad practice to place global variables in settings.py
My question is where to instantiate this client so that is can be reused at least at the request level?
If you use a lambda client, go with global. The client lets you reuse execution environments which has cost and performance savings
Take advantage of execution environment reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
Otherwise I think this is a stylistic choice dependent on your app.
If your client will never change, global seems like a safe way to do it. The drawback is since it's a constant, you shouldn't change it during runtime. This has consequences, e.g. this makes changing Session hard. You could use a singleton but the code would become more verbose.
If you instantiate clients everywhere, you run the risk of making a client() call signature change a large effort, eg if you need to pass client('s3', verify=True), you'd have to add verify=True everywhere which is a pain. It's unlikely you'd do this though. The only param you're likely to override is config which you can pass through the session using set_default_config.
You could make it its own module, eg
foo.bar.aws.clients
session = None
ecs_client = None
eks_client = None
def init_session(new_session):
session = new_session
ecs_client = session.client('ecs')
eks_client = session.client('eks')
You can call init_session from an appropriate place or have defaults and an import hook to auto instatiate. This file will get larger as you use more clients but at least the ugliness is contained. You could also do a hack like
def init_session(s):
session = s
clients = ['ecs', 'iam', 'eks', …]
for c in clients:
globals()[f'{c}_client'] = session.client(c)
The problem is the indirection that this hack adds, eg intelliJ is not smart enough to figure out where your clients came from and will say you are using an undefined variable.
My best approach is to use functools.partial and have all the constant variables such as bucket and other metadata frozen in a partial and then pust pass in variable data. However, boto3 is still slow as hell to create the signed urls, compared to a simple string format it is ~x100 slower.
my question is quite hard to describe, so I will focus on explaining the situation. So let's say I have 2 different entities, which may run on different machines. Let's call the first one Manager and the second one Generator. The manager is the only one which can be called via the user.
The manager has a method called getVM(scenario_Id), which takes the ID of a scenario as a parameter, and retrieve a BLOB from the database corresponding to the ID given as a parameter. This BLOB is actually a XML structure that I need to send to the Generator. Both have a Flask running.
On another machine, I have my generator with a generateVM() method, which will create a VM according to the XML structure it recieves. We will not talk about how the VM is created from the XML.
Currently I made this :
Manager
# This method will be called by the user
#app.route("/getVM/<int:scId>", methods=['GET'])
def getVM(scId):
xmlContent = db.getXML(scId) # So here is what I want to send
generatorAddr = sgAdd + "/generateVM" # sgAdd is declared in the Initialize() [IP of the Generator]
# Here how should I put my data ?
# How can I transmit xmlContent ?
testReturn = urlopen(generatorAddr).read()
return json.dumps(testReturn)
Generator
# This method will be called by the user
#app.route("/generateVM", methods=['POST'])
def generateVM():
# Retrieve the XML content...
return "Whatever"
So as you can see, I am stuck on how to transmit the data itself (the XML structure), and then how to treat it... So if you have any strategy, hint, tip, clue on how I should proceed, please feel free to answer. Maybe there are some things I do not really understand about Flask, so feel free to correct everything wrong I said.
Best regards and thank you
PS : Lines with routes are commented because they mess up the syntax coloration
unless i'm missing something couldn't you just transmit it in the body of a post request? Isn't that how your generateVM method is setup?
#app.route("/getVM/<int:scId>", methods=['GET'])
def getVM(scId):
xmlContent = db.getXML(scId)
generatorAddr = sgAdd + "/generateVM"
xml_str = some_method_to_generate_xml()
data_str = urllib.urlencode({'xml': xml_str})
urllib.urlopen(generatorAddr, data=data_str).read()
return json.dumps(testReturn)
http://docs.python.org/2/library/urllib.html#urllib.urlopen
I noticed a strange behaviour today: It seems that, in the following example, the config.CLIENT variable stays persistent accross requests – even if the view gets passed an entirely different client_key, the query that gets the client is only executed once (per many requests), and then the config.CLIENT variable stays assigned.
It does not seem to be a database caching issue.
It happens with mod_python as well as with the test server (the variable is reassigned when the test server is restarted).
What am I missing here?
#views.py
from my_app import config
def get_client(client_key=None):
if config.CLIENT == None:
config.CLIENT = get_object_or_404(Client, key__exact=client_key, is_active__exact=True)
return config.CLIENT
def some_view(request, client_key):
client = get_client(client_key)
...
return some_response
# config.py
CLIENT = None
Multiple requests are processed by the same process and global variables like your CLIENT live as long, as process does. You shouldn't rely on global variables, when processing requests - use either local ones, when you need to keep a variable for the time of building response or put data into the database, when something must persist across multiple requests.
If you need to keep some value through the request you can either add it to thread locals (here you should some examples, that adds user info to locals) or simply pass it as a variable into other functions.
OK, just to make it slightly clearer (and in response to the comment by Felix), I'm posting the code that does what I needed. The whole problem arose from a fundamental misunderstanding on my part and I'm sorry for any confusion I might have caused.
import config
# This will be called once per request/view
def init_client(client_key):
config.CLIENT = get_object_or_404(Client, key__exact=client_key, is_active__exact=True)
# This might be called from other modules that are unaware of requests, views etc
def get_client():
return config.CLIENT
I love CherryPy's API for sessions, except for one detail. Instead of saying cherrypy.session["spam"] I'd like to be able to just say session["spam"].
Unfortunately, I can't simply have a global from cherrypy import session in one of my modules, because the cherrypy.session object isn't created until the first time a page request is made. Is there some way to get CherryPy to initialize its session object immediately instead of on the first page request?
I have two ugly alternatives if the answer is no:
First, I can do something like this
def import_session():
global session
while not hasattr(cherrypy, "session"):
sleep(0.1)
session = cherrypy.session
Thread(target=import_session).start()
This feels like a big kludge, but I really hate writing cherrypy.session["spam"] every time, so to me it's worth it.
My second solution is to do something like
class SessionKludge:
def __getitem__(self, name):
return cherrypy.session[name]
def __setitem__(self, name, val):
cherrypy.session[name] = val
session = SessionKludge()
but this feels like an even bigger kludge and I'd need to do more work to implement the other dictionary functions such as .get
So I'd definitely prefer a simple way to initialize the object myself. Does anyone know how to do this?
For CherryPy 3.1, you would need to find the right subclass of Session, run its 'setup' classmethod, and then set cherrypy.session to a ThreadLocalProxy. That all happens in cherrypy.lib.sessions.init, in the following chunks:
# Find the storage class and call setup (first time only).
storage_class = storage_type.title() + 'Session'
storage_class = globals()[storage_class]
if not hasattr(cherrypy, "session"):
if hasattr(storage_class, "setup"):
storage_class.setup(**kwargs)
# Create cherrypy.session which will proxy to cherrypy.serving.session
if not hasattr(cherrypy, "session"):
cherrypy.session = cherrypy._ThreadLocalProxy('session')
Reducing (replace FileSession with the subclass you want):
FileSession.setup(**kwargs)
cherrypy.session = cherrypy._ThreadLocalProxy('session')
The "kwargs" consist of "timeout", "clean_freq", and any subclass-specific entries from tools.sessions.* config.