I'm learning Django and to practice I'm currently developing a clone page of YTS, it's a movie torrents repository*.
As of right now, I scrapped all the movies in the website and have them on a single db table called Movie with all the basic information of each movie (I'm planning on adding one more for Genre).
Every few days YTS will post new movies and I want my clone-web to automatically add them to the database. I'm currently stuck on deciding how to do this:
I was planning on comparing the movie id of the last movie in my db against the last movie in the YTS db each time the user enters the website, but that'd mean make a request to YTS every time my page loads, it'd also mean some very slow code should be executed inside my index() views method.
Another strategy would be to query the last time my db was updated (new entries were introduced) and if it's let's say bigger than a day then request new movies to YTS. Problem with this is I don't seem to find any method to query the time of last db updates. Does it even exist such method?
I could also set a cron job to update the information but I'm having problems to make changes from a separated Python function (I import django.db and such but the interpreter refuses to execute django db instructions).
So, all in all, what's the best strategy to update my database from a third party service/website without bothering the user with loading times? How do you set such updates in non-intrusive way to the user? How do you generally do it?
* I know a torrents website borders the illegal and I'm not intended, in any way, to make my project available to the public
I think you should choose definetely the third alternative, a cron job to update the database regularly seems the best option.
You don' t need to use a seperate python function, you can schedule a task with celery, which can be easily integrated with django using django-celery
The simplest way would be to write a custom management command and run it periodically from a cron job.
Related
We have a requirement where we want to show zendesk tickets updated with the data from PostgreSQL database , We are using Python as the scripting language and planning to use this API "http://docs.facetoe.com.au/zenpy.html" for this.
The idea is to help the service team to gather and see all the information in the Zendesk itself.There are additional data in the database which we want to show it in the tickets either as comments or a table structure with the details from other tickets which is raised by this user(We are taking the email address of the user for this).
There is no application at our DWH, So mostly google reference shows the integration between zendesk and some other applications and not much references about updating the tickets from the database via Python or other scripting languages.
So is it possible to pass the data from our DWH to be appeared in the zendesk tickets?
Can anyone helps/suggest me on how to achieve/start on this.
It is possible to update tickets from anywhere using python and some codding.
Your problem can be solved in different ways.
The first one, a little simpler:
You make a simple python app and launch it with cron. App architecture will be like this:
Main process periodically track new tickets in Zendesk using search request. If relevant to database ticket is found (you need some metrics to understand is it relevant ticket) you main process makes a post via ticket.update with information from database. And make a special tag on ticket, to understand that it was already updated.
This is easy to write, but if you database data will be updated it will not be updated in ticket.
The second option is to make private app on zendesk side with backend on you side.
So in this case when your staff member opens some ticket app will request backend to display current data from database, relevant to this ticket. In this case you will see actual information everytime, but will get some database requests on every ticket open case.
To make first script you will need:
zenpy, sqlalchemy and 1-2 days codding.
To make second option you will need:
zenpy, sqlalchemy, flask, front-end interface.
I'm looking to post new records on a user triggered basis (i.e. workflow). I've spent the last couple of days reasearching the best way to approach this and so far I've come up with the following ideas:
(1) Utilize Django signals to check for conditions on a field change, and then post data originating from my Django app.
(2) Utilize JS/AJAX on the front-end to post data to the app based upon a user changing certain fields.
(3) Utilize a prebuilt workflow app like http://viewflow.io/, again based upon changes triggers by users.
Of the three above options, is there a best practice? Are there any other options I'm not considering for how to take this workflow based approach to post new records?
The second approach of monitoring the changes in the front end and then calling a backend view to update go database would be a better approach because processing on the backend or any other site would put the processing on the server which would slow down the site whereas second approach is more of a client side solution thereby keeping server relieved.
I do not think there will be a data loss, you are just trying to monitor a change, as soon as it changes your view will update the database, you can also use cookies or sessions to keep appending values as a list and update the database when site closes. Also django gives https errors you could put proper try and except conditions in that case as well. Anyways cookies would be a good approach I think
For anyone that finds this post I ended up deciding to take the Signals route. Essentially I'm utilizing Signals to track when users change a fields, and based on the field that changes I'm performing certain actions on the database.
For testing purposes this has been working well. When I reach production with this project I'll try to update this post with any challenges I run into.
Example:
#receiver(pre_save, sender=subTaskChecklist)
def do_something_if_changed(sender, instance, **kwargs):
try:
obj = sender.objects.get(pk=instance.pk) #define obj as "old" before change values
except sender.DoesNotExist:
pass
else:
previous_Value = obj.FieldToTrack
new_Value = instance.FieldToTrack #instance represents the "new" after change object
DoSomethingWithChangedField(new_Value)
Scenario:
Developing a question answer app.
Here are different users can answer the questions.
Each question may have several fields to response (2 or 3 yes/No checkboxes) and any user can update any of those any time.
Problem:
I need to keep a log (with time and user name) in a different log table every time the records got any changes.
The log table is just a look alike of the original model (e.g. ChangeLogModel) just with 2 extra fields as logDate and ChangingUser.
This will help me to check the log and find the status of the question in any specific date.
Possible Solutions:
Using signals (...Not used to with signals, lack of detailed tutorials, documentation is not detailed too)
making the backup before doing any ".save()" (... Have o idea how to do that)
Install any external app (...Trying to avoid installing any app)
Summary:
Basically What I am asking for is a log table where the 'state' of the original record/row/tuple would be saved to another table (i.e. logTable) prior to hit the "form.save()" trigger.
So, every time the record got updated so the LogTable will get a new row with a datestamp.
You could use an django package for audit and history, any of those in this overview for example.
I had success using django-simple-history.
I think that the best way is just to do it straight forward. You can save the user's answer and right after that the log, wrap it with database transaction and rollback if something goes wrong.
Btw if the logs table has the same fields like the original model you might consider using foreign key or inheritance, depends on your program logic.
I have a basic personal project website that I am looking to learn some web dev fundamentals with and database (SQL) fundamentals as well (If SQL is even the right technology to use??).
I have the basic skeleton up and running but as I am new to this, I want to make sure I am doing it in the most efficient and "correct" way possible.
Currently the site has a main index (landing) page and from there the user can select one of a few subpages. For the sake of understanding, each of these sub pages represents a different surf break and they each display relevant info about that particular break i.e. wave height, wind, tide.
As I have already been able to successfully scrape this data, my main questions revolve around how would I go about inserting this data into a database for future use (historical graphs, trends)? How would I ensure data is added to this database in a continuous manner (once/day)? How would I use data that was scraped from an earlier time, say at noon, to be displayed/used at 12:05 PM rather than scraping it again?
Any other tips, guidance, or resources you can point me to are much appreciated.
This kind of data is called time series. There are specialized database engines for time series, but with a not-extreme volume of observations - (timestamp, wave heigh, wind, tide, which break it is) tuples - a SQL database will be perfectly fine.
Try to model your data as a table in Postgres or MySQL. Start by making a table and manually inserting some fake data in a GUI client for your database. When it looks right, you have your schema. The corresponding CREATE TABLE statement is your DDL. You should be able to write SELECT queries against your table that yield the data you want to show on your webapp. If these queries are awkward, it's a sign that your schema needs revision. Save your DDL. It's (sort of) part of your source code. I imagine two tables: a listing of surf breaks, and a listing of observations. Each row in the listing of observations would reference the listing of surf breaks. If you're on a Mac, Sequel Pro is a decent tool for playing around with a MySQL database, and playing around is probably the best way to learn to use one.
Next, try to insert data to the table from a Python script. Starting with fake data is fine, but mold your Python script to read from your upstream source (the result of scraping) and insert into the table. What does your scraping code output? Is it a function you can call? A CSV you can read? That'll dictate how this script works.
It'll help if this import script is idempotent: you can run it multiple times and it won't make a mess by inserting duplicate rows. It'll also help if this is incremental: once your dataset grows large, it will be very expensive to recompute the whole thing. Try to deal with importing a specific interval at a time. A command-line tool is fine. You can specify the interval as a command-line argument, or figure out out from the current time.
The general problem here, loading data from one system into another on a regular schedule, is called ETL. You have a very simple case of it, and can use very simple tools, but if you want to read about it, that's what it's called. If instead you could get a continuous stream of observations - say, straight from the sensors - you would have a streaming ingestion problem.
You can use the Linux subsystem cron to make this script run on a schedule. You'll want to know whether it ran successfully - this opens a whole other can of worms about monitoring and alerting. There are various open-source systems that will let you emit metrics from your programs, basically a "hey, this happened" tick, see these metrics plotted on graphs, and ask to be emailed/texted/paged if something is happening too frequently or too infrequently. (These systems are, incidentally, one of the main applications of time-series databases). Don't get bogged down with this upfront, but keep it in mind. Statsd, Grafana, and Prometheus are some names to get you started Googling in this direction. You could also simply have your script send an email on success or failure, but people tend to start ignoring such emails.
You'll have written some functions to interact with your database engine. Extract these in a Python module. This forms the basis of your Data Access Layer. Reuse it in your Flask application. This will be easiest if you keep all this stuff in the same Git repository. You can use your chosen database engine's Python client directly, or you can use an abstraction layer like SQLAlchemy. This decision is controversial and people will have opinions, but just pick one. Whatever database API you pick, please learn what a SQL injection attack is and how to use user-supplied data in queries without opening yourself up to SQL injection. Your database API's documentation should cover the latter.
The / page of your Flask application will be based on a SQL query like SELECT * FROM surf_breaks. Render a link to the break-specific page for each one.
You'll have another page like /breaks/n where n identifies a surf break (an integer that increments as you insert surf break rows is customary). This page will be based on a query like SELECT * FROM observations WHERE surf_break_id = n. In each case, you'll call functions in your Data Access Layer for a list of rows, and then in a template, iterate through those rows and render some HTML. There are various Javascript and Python graphing libraries you can feed this list of rows into and get graphs out of (client side or server side). If you're interested in something like a week-over-week change, you should be able to express that in one SQL query and get that dataset directly from the database engine.
For performance, try not to get in a situation where more than one SQL query happens during a page load. By default, you'll be doing some unnecessary work by going back to the database and recomputing the page every time someone requests it. If this becomes a problem, you can add a reverse proxy cache in front of your Flask app. In your case this is easy, since nothing users do to the app cause its content to change. Simply invalidate the cache when you import new data.
I am working on a project which requires me to create a table of every user who registers on the website using the username of that user. The columns in the table are same for every user.
While researching I found this Django dynamic model fields. I am not sure how to use django-mutant to accomplish this. Also, is there any way I could do this without using any external apps?
PS : The backend that I am using is Mysql
An interesting question, which might be of wider interest.
Creating one table per user is a maintenance nightmare. You should instead define a single table to hold all users' data, and then use the database's capabilities to retrieve only those rows pertaining to the user of interest (after checking permissions if necessary, since it is not a good idea to give any user unrestricted access to another user's data without specific permissions having been set).
Adopting your proposed solution requires that you construct SQL statements containing the relevant user's table name. Successive queries to the database will mostly be different, and this will slow the work down because every SQL statement has to be “prepared” (the syntax has to be checked, the names of table and columns has to be verified, the requesting user's permission to access the named resources has to be authorized, and so on).
By using a single table (model) the same queries can be used repeatedly, with parameters used to vary specific data values (in this case the name of the user whose data is being sought). Your database work will move along faster, you will only need a single model to describe all users' data, and database management will not be a nightmare.
A further advantage is that Django (which you appear to be using) has an extensive user-based permission model, and can easily be used to authenticate user login (once you know how). These advantages are so compelling I hope you will recant from your heresy and decide you can get away with a single table (and, if you planning to use standard Django logins, a relationship with the User model that comes as a central part of any Django project).
Please feel free to ask more questions as you proceed. It seems you are new to database work, and so I have tried to present an appropriate level of detail. There are many pitfalls such as this if you cannot access knowledgable advice. People on SO will help you.
This page shows how to create a model and install table to database on the fly. So, you could use type('table_with_username', (models.Model,), attrs) to create a model and use django.core.management to install it to the database.