I'm trying to export all data connected to an User instance to CSV file. In order to do so, I need to get it from the DB first. Using something like
data = SomeModel.objects.filter(owner=user)
on every model possible seems to be very inefficient, so I want to use prefetch_related(). My question is, is there any way to prefetch all different model's instances with FK pointing at my User, at once?
Actually, you don't need to "prefetch everything" in order to create a CSV file – or, anything else – and you really don't want to. Python's CSV support is of course designed to work "row by row," and that's what you want to do here: in a loop, read one row at a time from the database and write it one row at a time to the file.
Remember that Django is lazy. Functions like filter() specify what the filtration is going to be, but things really don't start happening until you start to iterate over the actual collection. That's when Django will build the query, submit it to the SQL engine, and start retrieving the data that's returned ... one row at a time.
Let the SQL engine, Python and the operating system take care of "efficiency." They're really good at that sort of thing.
Related
I'm looking for ideas on how to improve a report that takes up to 30 minutes to process on the server, I'm currently working with Django and MySQL but if there is a solution that requires changing the language or SQL database I'm open to it.
The report I'm talking about reads multiple excel files and insert all the rows from those files into a table (the report table) with a range from 12K to 15K records, the table has around 50 columns. This part doesn't take that much time.
Once I have all the records on the report table I start applying multiple phases of business logic so I end having something like this:
def create_report():
business_logic_1()
business_logic_2()
business_logic_3()
business_logic_4()
Each function of the business_logic_X does something very similar, it starts by doing a ReportModel.objects.all() and then it applies multiple calculations like checking dates, quantities, etc and updates the record. Since it's a 12K record table it quickly starts adding time to the complete report.
The reason I'm going multiple functions separately and no all processing in one pass it's because the logic from the first function needs to be completed so the logic on the next functions works (ex. the first function finds all related records and applies the same status for all of them.
The first thing that I know could be optimized is somehow caching the objects.all() instead of calling it in each function but I'm not sure how to pass it to the next function without saving the records first.
I already optimized the report a bit by using update_fields on the save method of the functions and that saved a bit of time.
My question is, is there a better approach to this kind of problem? Is Django/MySQL the right stack for this?
What takes time is the business logic that you're doing in Django. So it does several round trips between the database and the application.
It sounds like there are several tables involved, so I would suggest that you write your query in raw sql and once you have the results you get that into the application, if you need it.
The orm has a method "raw" that you can use. Or you could drop down to even lower level and interface with Your database directly.
Unless I see more what you do, I can't give any more specific advice
in my app I have a mixin that defines 2 fields like start_date and end_date. I've added this mixin to all table declarations which require these fields.
I've also defined a function that returns filters (conditions) to test a timestamp (e.g. now) to be >= start_date and < end_date. Currently I'm manually adding these filters whenever I need to query a table with these fields.
However sometimes me or my colleagues forget to add the filters, and I wonder whether it is possible to automatically extend any query on such a table. Like e.g. an additional function in the mixin that is invoked by SQLalchemy whenever it "compiles" the statement. I'm using 'compile' only as an example here, actually I don't know when or how to best do that.
Any idea how to achieve this?
In case it works for SELECT, does it also work for INSERT and UPDATE?
thanks a lot for your help
Juergen
Take a look at this example. You can change the criteria expressed in the private method to refer to your start and end dates.
Note that this query will be less efficient because it overrides the get method to bypass the identity map.
I'm not sure what the enable_assertions false call does; I'd recommend understanding that before proceeding.
I tried extending Query but had a hard time. Eventually (and unfortunately) I moved back to my previous approach of little helper functions returning filters and applying them to queries.
I still wish I would find an approach that automatically adds certain filters if a table (Base) has certain columns.
Juergen
I am working heavily with a database, using python, and I am trying to write code that actually makes my life easier.
Most of the time, I need to run a query and get results to process them; most of the time I get the same fields from the same table, so my idea was to collect the various results in an object, to process it later.
I am using SQLAlchemy for the DB interaction. From what I can read, there is no direct way to just say "dump the result of this query to an object", so I can access the various fields like
print object.fieldA
print object.fieldB
and so on. I tried dumping the results to JSON, but even that require parsing and it is not as straightforward as I hoped.
So at this point is there anything else that I can actually try? Or should I write a custom object that mimic the db structure, and parse the result with for loops, to put the data in the right place? I was hoping to find a way to do this automatically, but so far it seems that the only way to get something close to what I am looking for, is to use JSON.
EDIT:
Found some info about serialization and the capabilities that SQLAlchemy has, to read a table and reproduce a sort of 1:1 copy of it in an object, but I am not sure that this will actually work with a query.
Found that the best way is to actually use a custom object.
You can use reflection trough SQLAlchemy to extrapolate the structure, but if you are dealing with a small database with few tables, you can simply create on your own the object that will host the data. This gives you control over the object and what you can put in it.
There are obvious other ways, but since nobody posted anything; I assume that either are too easy to be mentioned, or too hard and specific to each case.
I have a SQLAlchemy-based tool for selectively copying data between two different databases for testing purposes. I use the merge() function to take model objects from one session and store them in another session. I'd like to be able to store the source objects in some intermediate form and then merge() them at some later point in time.
It seems like there are a few options to accomplish this:
Exporting DELETE/INSERT SQL statements. Seems pretty straightforward, I think I can get SQLAlchemy to give me the INSERT statements, and maybe even the DELETEs.
Exproting the data to a SQLite database file with the same (or similar) schema, that could then be read in as a source at a later point in time.
Serializing the data in some manner and then reading them back into memory for the merge. I don't know if SQLAlchemy has something like this built-in or not. I'm not sure what the challenges would be in rolling this myself.
Has anyone tackled this problem before? If so, what was your solution?
EDIT: I found a tool built on top of SQLAlchemy called dataset that provides the freeze functionality I'm looking for, but there seems to be no corresponding thaw functionality for restoring the data.
I haven't used it before, but the dogpile caching techniques described in the documentation might be what you want. This allows you to query to and from a cache using the SQLAlchemy API:
http://docs.sqlalchemy.org/en/rel_0_9/orm/examples.html#module-examples.dogpile_caching
I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.