I have coded a Python Flask application, and I run two instances of it on two different web servers. Every time a HTTP request is made to the application, a new connection to my MySQL database is created. In my application, there is a table of accounts, and each account has an integer balance. The problem is, I discovered that if a user makes multiple requests to a URL route that subtracts from an accounts balance quickly, they can end up with a negative balance because the two requests are executed at the same time.
Basically my question is, how can I make it so that when I am running a query on a row's balance column, that row will be untouchable by any other connections, of which there may be many? I would like to leave all other rows in the table editable, I would just like to kind of "lock" a single row so that users can't "double spend" their balance, etc., without locking the entire table for performance's sake.
Or perhaps I am approaching this completely wrong -- sorry if I am, I'm quite new to this.
I did ask a question on a different account when I first started coding this application regarding the proper way to do this, but it got downvoted to hell, and now I'm in this mess.
I have looked into MySQL transactions, but I don't understand them very well, and I don't think that I should be using them here, or am I wrong?
Thank you!
Related
I'm a very novice web developer and I am currently building a website from scratch. I have most of the frontend part setup, but I am really struggling with backend and databases.
The point of the website is to display a graph with class completion status (for each class, it will display what percent is complete/incomplete, and how many total users). It will retrieve this data from a CSV file on an SFTP server. The issue I am having is when I try to directly access the data, it loads incredibly slowly.
Here is the code I am using to retrieve the data:
Courses = ['']
Total =[0]
Compl =[0]
csvreal = pandas.read_csv(file)
for index, row in csvreal.iterrows():
string =(csvreal.loc[[index]].to_string(index=False, header=False))
if(Courses[i] !=string.split(' ')[0]):
i+=1
Courses.append(string.split(' ')[0])
Total.append(0)
Compl.append(0)
if(len(string.split(' ')[2])>3):
Compl[i]+=1
Total[i]+=1
To explain it a little bit, the CSV file has the roster information, i.e. each row has a name of course, name of user, completion date, and course code. The course name is the first column so that is why in the code, you see string,split(' ')[0], as it is the first part of the string. If the user has completed it, then the third column (completion date) is empty, so that is why it checks if it is longer than 3 chars, because if it is, then the user has completed it.
This takes entirely too long to compute. About 30 seconds with around 7,000 entries. Recently the CSV size was increased to something like 36,000.
I was advised to setup a database using SQL and have a nightly cronjob to parse the data and have the website retrieve the data from the database, instead of the CSV.
Any advice on where to even begin, or how to do this would be greatly appreciated.
This takes entirely too long to compute. About 30 seconds with around 7,000 entries. Recently the CSV size was increased to something like 36,000.
I was advised to setup a database using SQL and have a nightly cronjob to parse the data and have the website retrieve the data from the database, instead of the CSV.
Before I recommend using a database, how fast is the connection to the SFTP server you are getting the data from? Would it be faster to host it on the local machine? If this isn't the issue, so see below.
Yes, in this case a database would speed up your computation time and retrieval time. You need to setup a SQL database, have a way to put data into it, and then retrieve it. I included resources at the bottom that will help familiarize yourself with SQL. Knowledge of PHP will be needed in order to interact and manipulate the database.
Using SQl will be much simpler for you to interact with. For example, you needed to check to see if a cell is empty. In SQL, this can be done with;
SELECT * FROM table WHERE some_col IS NULL OR some_col = '';
https://www.khanacademy.org/computing/computer-programming/sql
https://www.w3schools.com/sql/
https://www.guru99.com/introduction-to-database-sql.html
I'm new to Flask and web development in general. I have a Flask web-application that is using SQLAlchemy, is it ok to put session.rollback at the beginning of the app in order to keep it running even after a transaction fails?
I had a problem with my website when it stopped working after I was attempting to delete records of one table. The error log showed that the deletion failed due to entries in another table still referencing these records as their foreign key. The error log suggested using session.rollback to rollback this change, so I put it at the beginning of my app just after binding my database and creating the session and my website worked. This gave me the hint to leave that line there. Is my move right, safe and ok? Can anyone tell me what is the correct thing to do if this is somewhat endangering the functionality or logic of my website by any chance?
I'd say by that you are by definition cargo cult coding and should try to determine why you're finding these errors in the first place instead of just including a bit of code for a reason you don't understand.
The problem you're describing is the result of using foreign keys to ensure data integrity in your database. Typically SQLAlchemy will nullify all of the depending foreign keys, but since I don't know anything about your set up I can't explain why it wasn't. It is perhaps a difference between databases.
One massive problem with putting the rollback at the beginning of a route (or the entire global app) is that you might rollback data which you didn't want to. You haven't provided an MVCE so no one can really help you debug your problem.
Cargo cult coding in circumstances like this is understandable, but it's never a good practice. To solve this problem, investigate the cascades in SQLAlchemy. Also, fire up your actual SQL db interface and look at the data's structure, and set SQLALCHEMY_ECHO = 1 in your config file to see what's actually getting emitted.
Good luck!
You should not use the rollback at the beginning but when a database operation fails.
The error is due to an integrity condition in your database. Some rows in your table are being referenced by another table. So, you have to remove referencing rows first.
I'm learning Django and to practice I'm currently developing a clone page of YTS, it's a movie torrents repository*.
As of right now, I scrapped all the movies in the website and have them on a single db table called Movie with all the basic information of each movie (I'm planning on adding one more for Genre).
Every few days YTS will post new movies and I want my clone-web to automatically add them to the database. I'm currently stuck on deciding how to do this:
I was planning on comparing the movie id of the last movie in my db against the last movie in the YTS db each time the user enters the website, but that'd mean make a request to YTS every time my page loads, it'd also mean some very slow code should be executed inside my index() views method.
Another strategy would be to query the last time my db was updated (new entries were introduced) and if it's let's say bigger than a day then request new movies to YTS. Problem with this is I don't seem to find any method to query the time of last db updates. Does it even exist such method?
I could also set a cron job to update the information but I'm having problems to make changes from a separated Python function (I import django.db and such but the interpreter refuses to execute django db instructions).
So, all in all, what's the best strategy to update my database from a third party service/website without bothering the user with loading times? How do you set such updates in non-intrusive way to the user? How do you generally do it?
* I know a torrents website borders the illegal and I'm not intended, in any way, to make my project available to the public
I think you should choose definetely the third alternative, a cron job to update the database regularly seems the best option.
You don' t need to use a seperate python function, you can schedule a task with celery, which can be easily integrated with django using django-celery
The simplest way would be to write a custom management command and run it periodically from a cron job.
I have a basic personal project website that I am looking to learn some web dev fundamentals with and database (SQL) fundamentals as well (If SQL is even the right technology to use??).
I have the basic skeleton up and running but as I am new to this, I want to make sure I am doing it in the most efficient and "correct" way possible.
Currently the site has a main index (landing) page and from there the user can select one of a few subpages. For the sake of understanding, each of these sub pages represents a different surf break and they each display relevant info about that particular break i.e. wave height, wind, tide.
As I have already been able to successfully scrape this data, my main questions revolve around how would I go about inserting this data into a database for future use (historical graphs, trends)? How would I ensure data is added to this database in a continuous manner (once/day)? How would I use data that was scraped from an earlier time, say at noon, to be displayed/used at 12:05 PM rather than scraping it again?
Any other tips, guidance, or resources you can point me to are much appreciated.
This kind of data is called time series. There are specialized database engines for time series, but with a not-extreme volume of observations - (timestamp, wave heigh, wind, tide, which break it is) tuples - a SQL database will be perfectly fine.
Try to model your data as a table in Postgres or MySQL. Start by making a table and manually inserting some fake data in a GUI client for your database. When it looks right, you have your schema. The corresponding CREATE TABLE statement is your DDL. You should be able to write SELECT queries against your table that yield the data you want to show on your webapp. If these queries are awkward, it's a sign that your schema needs revision. Save your DDL. It's (sort of) part of your source code. I imagine two tables: a listing of surf breaks, and a listing of observations. Each row in the listing of observations would reference the listing of surf breaks. If you're on a Mac, Sequel Pro is a decent tool for playing around with a MySQL database, and playing around is probably the best way to learn to use one.
Next, try to insert data to the table from a Python script. Starting with fake data is fine, but mold your Python script to read from your upstream source (the result of scraping) and insert into the table. What does your scraping code output? Is it a function you can call? A CSV you can read? That'll dictate how this script works.
It'll help if this import script is idempotent: you can run it multiple times and it won't make a mess by inserting duplicate rows. It'll also help if this is incremental: once your dataset grows large, it will be very expensive to recompute the whole thing. Try to deal with importing a specific interval at a time. A command-line tool is fine. You can specify the interval as a command-line argument, or figure out out from the current time.
The general problem here, loading data from one system into another on a regular schedule, is called ETL. You have a very simple case of it, and can use very simple tools, but if you want to read about it, that's what it's called. If instead you could get a continuous stream of observations - say, straight from the sensors - you would have a streaming ingestion problem.
You can use the Linux subsystem cron to make this script run on a schedule. You'll want to know whether it ran successfully - this opens a whole other can of worms about monitoring and alerting. There are various open-source systems that will let you emit metrics from your programs, basically a "hey, this happened" tick, see these metrics plotted on graphs, and ask to be emailed/texted/paged if something is happening too frequently or too infrequently. (These systems are, incidentally, one of the main applications of time-series databases). Don't get bogged down with this upfront, but keep it in mind. Statsd, Grafana, and Prometheus are some names to get you started Googling in this direction. You could also simply have your script send an email on success or failure, but people tend to start ignoring such emails.
You'll have written some functions to interact with your database engine. Extract these in a Python module. This forms the basis of your Data Access Layer. Reuse it in your Flask application. This will be easiest if you keep all this stuff in the same Git repository. You can use your chosen database engine's Python client directly, or you can use an abstraction layer like SQLAlchemy. This decision is controversial and people will have opinions, but just pick one. Whatever database API you pick, please learn what a SQL injection attack is and how to use user-supplied data in queries without opening yourself up to SQL injection. Your database API's documentation should cover the latter.
The / page of your Flask application will be based on a SQL query like SELECT * FROM surf_breaks. Render a link to the break-specific page for each one.
You'll have another page like /breaks/n where n identifies a surf break (an integer that increments as you insert surf break rows is customary). This page will be based on a query like SELECT * FROM observations WHERE surf_break_id = n. In each case, you'll call functions in your Data Access Layer for a list of rows, and then in a template, iterate through those rows and render some HTML. There are various Javascript and Python graphing libraries you can feed this list of rows into and get graphs out of (client side or server side). If you're interested in something like a week-over-week change, you should be able to express that in one SQL query and get that dataset directly from the database engine.
For performance, try not to get in a situation where more than one SQL query happens during a page load. By default, you'll be doing some unnecessary work by going back to the database and recomputing the page every time someone requests it. If this becomes a problem, you can add a reverse proxy cache in front of your Flask app. In your case this is easy, since nothing users do to the app cause its content to change. Simply invalidate the cache when you import new data.
Some devices are asynchronously storing values on a common remote MySQL database server.
I would like to write a supervisor app in Python (and possibly SQLAlchemy) to recognize the external INSERT events on the database and act upon the last rows' data. This is to avoid a long manual test to see if every table is being updated regularly or a logger crashed.
Can somebody just tell me where to search online this kind of info and, even better, an example?
EDIT
I already read all tables periodically using a datetime primary key ({date_time}), loading the last row of each table, and comparing to the previous values:
SELECT * FROM table ORDER BY date_time DESC LIMIT 1
but it looks very cumbersome and doesn't guarantee that I don't lose some rows between successive database checks.
The engine is an old version of INNODB that I cannot upgrade: I cannot use the UPDATE field in schema because it simply doesn't work.
To reword my question:
How to listen any database event with a daemon-like Python application (sleeping thread) and wake up only when something happens?
I want also to avoid SQL triggers because this would be just too heavy to manage: tables are in hundreds and they are added/removed very often according to the active loggers.
I gave a look to SQLAlchemy but all reference I could find, if I don't misunderstood it, are decorators to act on INSERTs made by SQLAlchemy's itself. I didn't find anything about external changes to the database.
About the example request: I am not interested in a copy-and-paste, because first I want to understand how stuff works. I prefer (even incomplete) examples because SQLAlchemy documentation is far too deep for my knowledge and I simply cannot put the pieces together.