Django csv download performance issue

Django csv download performance issue - python

Ok so i am having problems generating a CSV file from data.
I have a list of about 6,000 objects. These objects are like custom forms,
so each object has many fields associated with it. for example a request could consist of name, age, phone number, etc. In this case every request has the same fields. So each form has a one to many relationship with these fields.
So I am grabbing this custom form and a few standard things from it as a ValuesQuerySet, then grabbing all of the fields associated with this form. and adding them to each row. Finally I use dictionary writer to write the data to a csv. This all works great however the performance is awful. to do my 6,000 records each with about 15 custom fields takes about 90 seconds.(This triggers a timeout error). I am in need of some way to make this faster.
My Code Looks Something like this....
forms = SomeForm.objects.filter(created_by=user).values('value1', 'value2', 'etc...')
for form in forms:
id = form['id']
fields = Fields.objects.filter(Form_id=id).values('value', 'field__label')
for field in fields:
label = field['form_field__label']
value = field['value']
form[label] = value
writer = csv.DictWriter(response, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(forms)
Some of the code is removed, but this is the gist of it.
if you can think of anyway to speed this up or maybe an alternative way to handle it i would be very grateful.

Ya I ended up taking taking cdvv7788's advice and making this a celery task. Thanks guys

Related

Retrieve json store keys in a predetermined order in kivy?

Im working on a program using kivy and python, and i will be saving some data to a json file like say 'items.json'.
The thing is i intend to retrieve the data from the store and use them to form
a list of buttons in my app. here is an example.
store = JsonStore('items.json')
store.put('infinix', name = 'infinix', category = 'gadgets')
store.put('wrist watch', name = 'wrist watch', category = 'outfits')
store.put('t-shirt', name = 't-shirt', category = 'outfits')
this works well. But my problem is in retrieving the data.
i would like to get them in the same order i entered the data into the store.
for example if i do
store.keys()
i would like it to return
['infinix', 'wrist watch', 't-shirt']
which is the same order i entered the data.
currently whenever i try to retrieve the data, the order is mixed up.
is there a way to achieve what i need?
Any help is greatly appreciated.

The simplest option would seem to be just adding an extra storage key containing a list of your items in the correct order. You can then just check this first, and load them in that order.

Get fields from a specific Jira issue

I'm trying to get all the fields and values from a specific issue my code:
authenticated_jira = JIRA(options={'server': self.jira_server}, basic_auth=(self.jira_username, self.jira_password))
issue = authenticated_jira.issue(self.id)
print issue.fields()
Instead of returning the list of fields it returns:
<jira.resources.PropertyHolder object at 0x108431390>

authenticated_jira = JIRA(options={'server': self.jira_server}, basic_auth=(self.jira_username, self.jira_password))
issue = authenticated_jira.issue(self.id)
for field_name in issue.raw['fields']:
print "Field:", field_name, "Value:", issue.raw['fields'][field_name]
Depends on field type, sometimes you get dictionary as a value and then you have to find the actual value you want.

Found using:
print self.issue_object.raw
which returns the raw json dictionary which can be iterate and manipulate.

You can use issue.raw['fields']['desired_field'], but this way is kind of indirectly accessing the field values, because what you get in return is not consistent. You get lists of strings, then just strings themselves, and then straight up values that don't have a key for you to access them with, so you'll have to iterate, count the location, and then parse to get value which is unreliable.
Best way is to use issue.fields.customfield_# This way you don't have to do any parsing through the .raw fields
Almost everything has a customfield associated with it. You can pull just issues from REST API to find customfield #'s, or some of the fields that you get from using .raw will have a customfield id that should look like "customfield_11111" and that's what you'll use.

Using Answer from #kobi-k but dumping in better format, I used following code:
with open("fields.txt", "w") as f:
json.dump(issue.raw, f, indent=4)
It dumped all the fields in a file with name "fields.txt"

Add a field to existing document in CouchDB

I have a database with a bunch of regular documents that look something like this (example from wiki):
{
"_id":"some_doc_id",
"_rev":"D1C946B7",
"Subject":"I like Plankton",
"Author":"Rusty",
"PostedDate":"2006-08-15T17:30:12-04:00",
"Tags":["plankton", "baseball", "decisions"],
"Body":"I decided today that I don't like baseball. I like plankton."
}
I'm working in Python with couchdb-python and I want to know if it's possible to add a field to each document. For example, if I wanted to have a "Location" field or something like that.
Thanks!

Regarding IDs
Every document in couchdb has an id, whether you set it or not. Once the document is stored you can access it through the doc._id field.
If you want to set your own ids you'll have to assign the id value to doc._id. If you don't set it, then couchdb will assign a uuid.
If you want to update a document, then you need to make sure you have the same id and a valid revision. If say you are working from a blog post and the user adds the Location, then the url of the post may be a good id to use. You'd be able to instantly access the document in this case.
So what's a revision
In your code snippet above you have the doc._rev element. This is the identifier of the revision. If you save a document with an id that already exists, couchdb requires you to prove that the document is still the valid doc and that you are not trying to overwrite someone else's document.
So how do I update a document
If you have the id of your document, you can just access each document by using the db.get(id) function. You can then update the document like this:
doc = db.get(id)
doc['Location'] = "On a couch"
db.save(doc)
I have an example where I store weather forecast data. I update the forecasts approximately every 2 hours. A separate process is looking for data that I get from a different provider looking at characteristics of tweets on the day.
This looks something like this.
doc = db.get(id)
doc_with_loc = GetLocationInformationFromOtherProvider(doc) # takes about 40 seconds.
doc_with_loc["_rev"] = doc["_rev"]
db.save(doc_with_loc) # This will fail if weather update has also updated the file.
If you have concurring processes, then the _rev will become invalid, so you have to have a failsave, eg. this could do:
doc = db.get(id)
doc_with_loc = GetLocationInformationFromAltProvider(doc)
update_outstanding = true
while update_outstanding:
doc = db.get(id) //reretrieve this to get
doc_with_loc["_rev"] = doc["_rev"]
update_outstanding = !db.save(doc_with_loc)
So how do I get the Ids?
One option suggested above is that you actively set the id, so you can retrieve it. Ie. if a user sets a given location that is attached to a URL, use the URL. But you may not know which document you want to update - or even have a process that finds all the document that don't have a location and assign one.
You'll most likely be using a view for this. Views have a mapper and a reducer. You'll use the first one, forget about the last one. A view with a mapper does the following:
It returns a simplyfied/transformed way of looking at your data. You can return multiple values per data or skip some. It gives the data you emit a key, and if you use the _include_docs function it will give you the document (with _id and rev alongside).
The simplest view is the default view db.view('_all_docs') this will return all documents and you may not want to update all of them. Views for example will be stored as a document as well when you define these.
The next simple way is to have view that only returns items that are of the type of the document. I tend to have a _type="article in my database. Think of this as marking that a document belongs to a certain table if you had stored them in a relational database.
Finally you can filter elements that have a location so you'd have a view where you can iterate over all those docs that still need a location and identify this in a separate process. The best documentation on writing view can be found here.

Item won't update in database

I'm writing a method to update several fields in multiple instances in my database. For now, I'm trying to get it to work just for one.
My user uploads a CSV file with all the information to change (including the pk). I've written the function that parses all the information, and this all works fine. I can even assign the data to an item, and if I print it from that function, it comes out correctly. However, when I save the updates (using item.save()) nothing seems to change in the database.
Here's a very stripped down version of the method. I really don't know why it isn't working. I've done something very similar in other spots (getting data through a form, setting the field, calling save, and then displaying the changed information), and I've used a very similar CSV uploading technique to create new entries.
Small piece of relevant code:
reader = csv.reader(f)
for row in reader:
pk = row[0]
print(pk)
item = POObject.objects.get(pk=pk)
p2 = item.purchase2
print item.purchase.requested_start_date
print p2.requested_start_date
requested_start_date=row[6]
requested_start_date = datetime.datetime.strptime(requested_start_date, "%d %b %y")
print requested_start_date
p2.requested_start_date = requested_start_date
p2.save()
print p2.requested_start_date
item.purchase2 = p2
item.save()
print item.purchase.requested_start_date
return pk
Obviously I have lots of prints in there to find where stuff went wrong. Basically what I find is that if I look at item, it looks fine, but if I query the server again (after saving) i.e. dong item2=POObject.objects.get(pk=pk) it won't have had any updates. Does anyone have any idea why save() isn't doing anything?
UPDATE:
The mystery continues.
If I update a field that isn't contained within an FK relation (say, a text field or something), everything seems to work fine. However, what I really need to do is update an item, and then set that item as the fk relation to the main item in question. I'm not sure why this isn't working in the normal way (updating the internal item, saving it, and then setting the fk to that new, updated item).

Whoa. Feel a little ashamed that I didn't figure this out. I had forgotten exactly how I had designed one of my models, and there was another object within it that needed to be updated, but I wasn't saving it.

What is the most efficient way to do a ONE:ONE relation on google app engine datastore

Even with all I do know about the AppEngine datastore, I don't know the answer to this. I'm trying to avoid having to write and run all the code it would take to figure it out, hoping someone already knows the answer.
I have code like:
class AddlInfo(db.Model)
user = db.ReferenceProperty(User)
otherstuff = db.ListProperty(db.Key, indexed=False)
And create the record with:
info = AddlInfo(user=user)
info.put()
To get this object I can do something like:
# This seems excessively wordy (even though that doesn't directly translate into slower)
info = AddlInfo.all().filter('user =', user).fetch(1)
or I could do something like:
class AddlInfo(db.Model)
# str(user.key()) is the key to this record
otherstuff = db.ListProperty(db.Key, indexed=False)
Creation looks like:
info = AddlInfo(key_name=str(user.key()))
info.put()
And then get the info with:
info = AddlInfo.get(str(user.key()))
I don't need the reference_property in the AddlInfo, (I got there using the user object in the first place). Which is faster/less resource intensive?
==================
Part of why I was doing it this way is that otherstuff could be a list of 100+ keys and I only need them sometimes (probably less than 50% of the time) I was trying to make it more efficient by not having to load those 100+ keys on every request.....

Between those 2 options, the second is marginally cheaper, because you're determining the key by inference rather than looking it up in a remote index.
As Wooble said, it's cheaper still to just keep everything on one entity. Consider an Expando if you just need a way to store a bunch of optional, ad-hoc properties.

The second approach is the better one, with one modification: There's no need to use the whole key of the user as the key name of this entity - just use the same key name as the User record.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.