I currently have a function that writes one to four entries into a database every 12 hours. When certain conditions are met the function is called again to write another 1-4 entries based on the previous ones. Now since time isn't the only factor I have to check whether or not the conditions are met and because the entries are all in the same database I have to differentiate them based on their time posted into the database (DateTimeField is in the code)
How could I achieve this? Is there a function built in in django that I just couldn't find? Or would I have to take a look at a rather complicated solution.
as a sketch I would say i'd expect something like this:
latest = []
allData = myManyToManyField.objects.get(externalId=2)
for data in allData:
if data.Timestamp.checkIfLatest(): #checkIfLatest returns true/false
latest.append(data)
or even better something like this (although I don't think that's implemented)
latest = myManyToManyField.objects.get.latest.filter(externalId=2)
The django documentation is very very good, especially with regards to querysets and model layer functions. It's usually the first place you should look. It sounds like you want .latest(), but it's hard to tell with your requirements regarding conditions.
latest_entry = m2mfield.objects.latest('mydatefield')
if latest_entry.somefield:
# do something
Or perhaps you wanted:
latest_entry = m2mfield.objects.filter(somefield=True).latest('mydatefield')
You might also be interested in order_by(), which will order the rows according to a field you specify. You could then iterate on all the m2m fields until you find the one that matches a condition.
But without more information on what these conditions are, it's hard to be more specific.
IT's just a thought.. we can keep epoch time(current time of the entrie) field in database as a primary key and compare with the previous entiries and diffrentiate them
Related
I have JSON with data about some products and I have already converted this into flat table by pandas, so now I have a few columns with data. I Selected some products manually and putted them into one group. I have sorted them by name for example but this is more complicated, there are also some features and requirements which need to be checked.
So what I want is to create script which will group my products in familiar way as those few groups I created manually based on my own thoughts.
Im totally new into machine learning, but I read about this and also watched some tutorials, but I haven't seen this type of case.
I saw that if I use KNN classifier for example, I have to put in input every group that exists and then it will assign single product to one of those groups, but in my case this must be more complicated I guess since I want from this script to create those groups on his own in similiar way to selected by me.
I was thinking about unsupervised machine learnign but this doesn't look like solution beacuse I have my own data which I want to provide, it seems like I need to use some kind of hybrid with supervised machine learning.
data = pd.read_json('recent.json')['results']
data = json_normalize(data)
le = preprocessing.LabelEncoder()
product_name = le.fit_transform(data['name'])
just some code to show what I have done
I don't know if that makes sense what I want, I already made attempt to this problem in normal way without machine learning just by If and loop things, but I wish I could do that also in "smarter" way
the code above shows nothing. If you have data about some products like each entry contains fields you can clasterize this with KNN what is an unsupervides method.
I have to put in input every group that exists
Not, just define the metric between two entries and the method makes classes or entire dendrogram according to that, so you can select classes from dendrogramm as you want. If you look at each node there, it contains common feature of items in class, so it makes auto-description for a class.
I have a Download model. I would like to store how often each instance is being downloaded.
I could just store a count but this seems brutally simplistic. Old files will seem more popular. I could mean-average against the upload date, but this makes newer almost always better. I don't neccessarily want to rank thigns but I'd like to show meaningful data to the people managing the files, something like:
Last 24 hours
Last 7 days
Last 30 days
Last 365 days
Total
Is there a field, method, or some sort of generic add-on that I can add to my model to store rate so that I have counts for something like that?
Edit: It occurs to me that doing "last x" means having to somehow store every instance with a datetime until they fall out the end (and only apply to total). I'm open to more efficient slightly compromised solutions that involve less churn.
You have not stated how many records / downloads you will be handling but a simple way of doing this create a DownloadCount model. Maybe something like:
class DownloadCount(models.Model):
date = models.DateField(auto_now_add=True)
item_downloaded = models.ForeignKey(Downloads)
You will then be able to produce metrics based off of the date field.
Something like:
DownloadCount.objects.filter(
date__range=(state_date, end_date),
item_downloaded=<Download-Model>
).count()
Obviously it would be highly advantageous to cache such results.
If you wanted to make the results a little more accurate you could also add an IPAddressField into the model making DownloadCount records unique based off of IP Address and Date.
I'm not sure if this has been answered before, I didn't get anything on a quick search.
My table is built in a random order, but thereafter it is modified very rarely. I do frequent selects from the table and in each select I need to order the query by the same column. Now is there a way to sort a table permanently by a column so that it does not need to be done again for each select?
You can add an index sorted by the column you want. The data will be presorted according to that index.
You can have only one place where you define it, and re-use that for
every query:
def base_query(session, what_for):
return session.query(what_for).order_by(what_for.rank_or_whatever)
Expand that as needed, then for all but very complex queries you can use that like so:
some_query = base_query(session(), Employee).filter(Employee.feet > 3)
The resulting query will be ordered by Employee.rank_or_whatever. If you are always querying for the same, You won't habve to use it as argument, of course.
EDIT: If you could somehow define a "permanent" order on your table which is observed by the engine without being issued an ORDER BY I'd think this would be an implementation feature specific to which RDBMS you use, and just convenience. Internally it makes for a DBMS no sense to being coerced how to store the data, since retrieving this data in a specific order is easily and efficiently accomplished by using an INDEX - forcing a specific order would probably decrease overall performance.
I'm using SQLAlchemy (being relatively new both to it and SQL) and I want to get a list of all comments posted to a set of things, but I'm only interested in comments that have been posted since a certain date, and the date is different for each thing:
To clarify, here's what I'm doing now: I begin with a dictionary that maps the ID code of each thing I'm interested in to the date I'm interested in for that thing. I do a quick list comprehension to get a list of just the codes (thingCodes) and then do this query:
things = meta.Session.query(Thing)\
.filter(Thing.objType.in_(['fooType', 'barType']))\
.filter(Thing.data.any(and_(Data.key == 'thingCode',Data.value.in_(thingCodes))))\
.all()
which returns a list of the thing objects (I do need those in addition to the comments). I then iterate through this list, and for each thing do a separate query:
comms = meta.Session.query( Thing )
.filter_by(objType = 'comment').filter(Thing.data.any(wc('thingCode', code))) \
.filter(Thing.date >= date) \
.order_by('-date').all()
This works, but it seems horribly inefficient to be to be doing all these queries separately. So, I have 2 questions:
a) Rather than running the second query n times for an n-length list of things, is there a way I could do it in a single query while still returning a separate set of results for each ID (presumably in the form of a dictionary of ID's to lists)? I suppose I could do a value_in(listOfIds) to get a single list of all the comments I want and then iterate through that and build the dictionary manually, but I have a feeling there's a way to use JOINs for this.
b) Am I over-optimizing here? Would I be better off with the second approach I just mentioned? And is it even that important that I roll them all into a single transactions? The bulk of my experience is with Neo4j, which is pretty good at transparently nesting many small transactions into larger ones - does SQL/SQLAlchemy have similar functionality, or is it definitely in my interest to minimize the number of queries?
i have very simple problem. I need to create model, that represent element of ordered list. This model can be implemented like this:
class Item(models.Model):
data = models.TextField()
order = models.IntegerField()
or like this:
class Item(models.Model):
data = models.TextField()
next = models.ForeignKey('self')
What way is preferred? What drawbacks have each solution?
Essentially, the second solution you propose is a linked list. Linked list implemented at the database level are usually not a good idea. To retrieve a list of n elements, you will need n database access (or use complicated queries). Performance wise, retrieving a list in O(n) is awfully not efficient.
In regular code, linked list are used to get better insert performance compared to arrays (no need to move all elements around). In your database, updating all elements is not that complicated in only 2 queries :
UPDATE item.order = item.order + 1 FROM item WHERE order > 3
INSERT INTO item (order, ...) VALUES (3, ...)
I remember seeing a reuseable app that implemented all that and a nice admin interface, but I cant find it right now ...
To summarize, definitly use solution #1 and stay away from solution #2 unless you have a very very good reason not to !
That depends on what you want to do.
The first one seems better to make a single query in the database and get all data in the correct order
The second one seems better to insert an element between two existing elements (because in the first one you'd have to change a lot of items if the numbers are sequential)
I'd use the first one, because it seems to fit better a database table, which is how django stores model data behind the hood.
There is another solution.
class Item(models.Model):
data = models.TextField()
You can just pickle or marshal Python list into the data field and the load it up. This one is good for updating and reading, but not for searching e.g. fetching all lists that contain a specific item.