How to make a model in django without predefined attributes? - python

I am little bit confused with the django model as the title said. I also cannot find any posts in google.
Let's say in the normal way, we will have a model like this:
class something(models.Model):
fName = models.TextField()
lName = models.TextField()
......
lastThings = models.TextField()
However, I don't want to have a model like this. I want to have a model with no predefined attributes. In order words, I can put anythings into this model. My thought is like can I use a loop or some other things to create such model?
class someModel(models.Model):
for i in numberOfModelField:
field[j] = i
j+=1
This is table A to read:
A B C
1 2 3
2 3 4
This is table B to read:
A B C D E F G G G
1 2 3 4 5 6 7 8 9
...............
4 5 3 2 4 5 6 4 3
And so different kind of table can be read
Therefore, I can have a model that fit in any cases. I am not sure is it clear enough to let you understand my confuse. Thank you

To expand on my comment (put as an answer so I can format the code decently).
class something(models.Model):
sheet_name = models.TextField()
row = models.TextField()
col = models.TextField()
cell_value = models.TextField()
class Meta:
unique_together = [[sheet_name, row, col]]
Once you have the values in this format you can do what you want with them. If you know the first row is always headers you could define a header table keyed on sheet_name and col, and map them to header_name as well, or you could just take them from this table.
There's probably better ways of handling this, and I'm still not sure of your use case. If this is loading data temporarily to use in other processes, then this should be fine. If it's to populate some new database for use indefinitely, then you need to spend more time defining the actual tables, though this process would be OK as an intermediate staging area just to get the data out of excel.

Related

Prevent duplicate objects in python

I have class definition like...
class companyInCountry:
def __init__(self,name,country):
self.name = name
self.country = country
self.amountOwed = defaultdict(int)
And I'm looping through a table that let's say has 6 rows...
COMPANY COUNTRY GROSS NET
companyA UK 50 40
companyA DE 20 15
companyA UK 10 5
companyA FR 20 10
companyB DE 35 25
companyB DE 10 5
What I want at the end of looping through this table is to end up with many company/territory specific objects, e.g.
object1.name = companyA
object1.territory = UK
object1.amountOwed['GROSS'] = 60
object1.amountOwed['NET'] = 45
But what I'm struggling to visualise is the best way to prevent objects being created that have duplicate company/country combinations (e.g. that would happen for the first time on row 3 in my data). Is there some data type or declaration I can include inside my init def that will ignore duplicates? Or do I need to manually check for the existence of similar objects before calling companyInCountry(name,country) to initialise a new instance?
The simplest way to do this would be to maintain a set of (company, country) tuples which can be consulted before creating a new object. If the pair already exists, skip it, otherwise create the object and add the new pair to the set. Something like
pairs = set()
for row in table:
if (row.company, row.country) in pairs:
continue
pairs.add((row.company, row.country))
company = CompanyInCountry(row.company, row.country)
# do something with company
If you want a more object-oriented solution, delegate creation of companies to a collection class that performs the necessary checks before creation.
class CompanyCollection:
def __init__(self):
# A list to hold the companies - could also be a dict.
self._companies = []
self._keys = set()
def add_company(self, row):
key = (row.company, row.country)
if key in self._keys:
return
self._companies.append(CompanyInCountry(*key))
return
# Define methods for accessing the companies,
# or whatever you want

Aggregate and calculate median of arrayfield in django queryset

I'm wondering if this is possible in a more efficient way.
I have a dataset in PostGRESQL that is structured like this:
Year, Sitename, Array (length = 4500)
For example:
1982, DANC, array([2,3,4,5,6,7,...])
1982, ANCH, array([5,6,4,3,5,7,...])
1983, DANC, array([3,3,4,6,3,6,...])
1983, ANCH, array([8,8,5,4,3,2,...])
What I want to do is add up the arrays (across rows) by years
E.G.,
1982 1982 1982
DANC ANCH TOT
2 5 7
3 6 9
4 4 8
5 3 8
6 5 11
7 7 14
... ... ...
My Django model looks like this:
class Abundance(models.Model):
abundance_id = models.AutoField(primary_key=True)
site = models.ForeignKey('Site')
season = models.SmallIntegerField()
samples = ArrayField(models.DecimalField(blank=True, decimal_places=3, max_digits=30))
def __unicode__(self):
return self.site
The following code in my Views.py works:
import numpy as np
import bottleneck as bn
...
def testview(request):
s = ["ACUN","BRDM"]
quants = []
medians = []
for yr in range(1982,2015):
X = Abundance.objects.values_list('samples').filter(site__site_id__in = s).filter(season = yr)
h = np.matrix(np.array(X,dtype=float))
i = h.sum(axis=0)
m = bn.median(i)
up = np.percentile(i,95)
down = np.percentile(i,5)
qlist = [yr, round(down,3), round(up,3)]
mlist = [yr, round(m,3)]
quants.append(qlist)
medians.append(mlist)
return JsonResponse({'quants':quants, 'medians':medians})
However, the above code is very slow - especially when drawing many sites. I have tried playing with .aggregate() but I've not found a good solution.
Thanks in advance
You can probably use some of the .aggregate() on there to push the load down to Postgres, but I think one of the bigger problems with speed here is the Decimal field. It's the highest precision, but it's also one of the more expensive types for Python to move in and out of.
That said, I'm not sure if there's a quick way to get the percentiles out from the DB call but the sums and medians you can easily push down to the DB via the Django ORM. For the others (percentiles, etc.) you can probably push them down as well but you'll be delving into custom aggregates for django (https://docs.djangoproject.com/en/1.9/ref/models/expressions/#creating-your-own-aggregate-functions), which if you're going to go that far it might be worth checking out something like aldjemy (https://github.com/Deepwalker/aldjemy/) and convert the entire query over to SQLAlchemy so you have maximum control over it.

Iterating through form data

I have a QueryDict object in Django as follows:
{'ratingname': ['Beginner', 'Professional'], 'sportname': ['2', '3']
where the mapping is such:
2 Beginner
3 Professional
and 2, 3 are the primary key values of the sport table in models.py:
class Sport(models.Model):
name = models.CharField(unique=True, max_length=255)
class SomeTable(models.Model):
sport = models.ForeignKey(Sport)
rating = models.CharField(max_length=255, null=True)
My question here is, how do I iterate through ratingname such that I can save it as
st = SomeTable(sport=sportValue, rating=ratingValue)
st.save()
I have tried the following:
ratings = dict['ratingname']
sports = dict['sportname']
for s,i in enumerate(sports):
sport = Sport.objects.get(pk=sports[int(s[1])])
rate = SomeTable(sport=sport, rating=ratings[int(s)])
rate.save()
However, this creates a wrong entry in the tables. For example, with the above given values it creates the following object in my table:
id: 1
sport: 2
rating: 'g'
How do I solve this issue or is there a better way to do something?
There are a couple of problems here. The main one is that QueryDicts return only the last value when accessed with ['sportname'] or the like. To get the list of values, use getlist('sportname'), as documented here:
https://docs.djangoproject.com/en/1.7/ref/request-response/#django.http.QueryDict.getlist
Your enumerate is off, too - enumerate yields the index first, which your code assigns to s. So s[1] will throw an exception. There's a better way to iterate through two sequences in step, though - zip.
ratings = query_dict.getlist('ratingname') # don't reuse built in names like dict
sports = query_dict.getlist('sportname')
for rating, sport_pk in zip(ratings, sports):
sport = Sport.objects.get(pk=int(sport_pk))
rate = SomeTable(sport=sport, rating=rating)
rate.save()
You could also look into using a ModelForm based on your SomeTable model.
You may use zip:
ratings = dict['ratingname']
sports = dict['sportname']
for rating, sport_id in zip(ratings, sports):
sport = Sport.objects.get(pk=int(sport_id))
rate = SomeTable(sport=sport, rating=rating)
rate.save()

How to divide a dbf table to two or more dbf tables by using python

I have a dbf table. I want to automatically divide this table into two or more tables by using Python. The main problem is, that this table consists of more groups of lines. Each group of lines is divided from the previous group by empty line. So i need to save each of groups to a new dbf table. I think that this problem could be solved by using some function from Arcpy package and FOR cycle and WHILE, but my brain cant solve it :D :/ My source dbf table is more complex, but i attach a simple example for better understanding. Sorry for my poor english.
Source dbf table:
ID NAME TEAM
1 A 1
2 B 2
3 C 1
4
5 D 2
6 E 3
I want get dbf1:
ID NAME TEAM
1 A 1
2 B 2
3 C 1
I want get dbf2:
ID NAME TEAM
1 D 2
2 E 3
Using my dbf package it could look something like this (untested):
import dbf
source_dbf = '/path/to/big/dbf_file.dbf'
base_name = '/path/to/smaller/dbf_%03d'
sdbf = dbf.Table(source_dbf)
i = 1
ddbf = sdbf.new(base_name % i)
sdbf.open()
ddbf.open()
for record in sdbf:
if not record.name: # assuming if 'name' is empty, all are empty
ddbf.close()
i += 1
ddbf = sdbf.new(base_name % i)
continue
ddbf.append(record)
ddbf.close()
sdbf.close()

django annotation and filtering

Hopefully this result set is explanatory enough:
title text total_score already_voted
------------- ------------ ----------- -------------
BP Oil spi... Recently i... 5 0
J-Lo back ... Celebrity ... 7 1
Don't Stop... If there w... 9 0
Australian... The electi... 2 1
My models file describes article (author, text, title) and vote (caster, date, score). I can get the first three columns just fine with the following:
articles = Article.objects.all().annotate(total_score=Sum('vote__score'))
but calculating the 4th column, which is a boolean value describing whether the current logged in user has placed any of the votes in column 3, is a bit beyond me at the moment! Hopefully there's something that doesn't require raw sql for this one.
Cheers,
Dave
--Trindaz on Fedang #django
I cannot think of a way to include the boolean condition. Perhaps others can answer that better.
How about thinking a bit differently? If you don't mind executing two queries you can filter your articles based on whether the currently logged in user has voted on them or not. Something like this:
all_articles = Article.objects.all()
articles_user_has_voted_on = all_articles.filter(vote__caster =
request.user).annotate(total_score=Sum('vote__score'))
other_articles = all_articles.exclude(vote__caster =
request.user).annotate(total_score=Sum('vote__score'))
Update
After some experiments I was able to figure out how to add a boolean condition for a column in the same model (Article in this case) but not for a column in another table (Vote.caster).
If Article had a caster column:
Article.objects.all().extra(select = {'already_voted': "caster_id = %s" % request.user.id})
In the present state this can be applied for the Vote model:
Vote.objects.all().extra(select = {'already_voted': "caster_id = %s" % request.user.id})

Categories