I'm working on a project that has all its data written as python files(codes). Users with authentication could upload data through a web page, and this will directly add and change codes on the production server. This is causing trouble every time I want to git pull changes from the git repo to the production server since the codes added by users directly on production are untracked.
I wonder if anyone has some other ideas. I know this is ill-designed but this is what I got from the beginning, and implementing a database will require a lot of effort since all the current codes are designed for python files.I guess it's because the people who wrote this didn't know much about databases and it works only because there is relatively little data.
The only two solutions I could think of is
use a database instead of having all datas being codes
git add/commit/push all changes on the production server every time the user upload data
Details added:
There is a 'data' folder of python files, with each file storing the information of a book. For instance, latin_stories.py has a dictionary variable text, an integer variable section, a string variable language, etc. When users upload a csv file according to a certain format, a program will automatically add and change the python files in the data folder, directly on the production server.
Related
I'm finally at the stage of deploying my Django web-app on Heroku. The web-app retrieves and performs financial analysis on a subset of public companies. Most of the content of the web-app is stored in .csv and .xslx files. There's a master .xslx file that I need to open and manually update on a daily basis. Then I run scripts which based on info in that .xsls file retrieve financial data, news, etc and store the data in .csv files. Then my Django views and html templates are referring to those files (and not to a sql/postgress database). This is my setup in a nutshell.
Now I'm wondering what is the best way to make this run smoothly in production.
Shall I store these .xslx and .csv files on AWS S3 and have Django access them and update them from there? I assume that this way I can easily open and edit the master .xslx file anytime. Is that a good idea and do I run any performance or security issues?
Is it better to convert all these data into a postgress database on Heroku? Is there a good guideline in terms of how I can technically do that? In that scenario wouldn't it be much more challenging for me to edit the data in the master .xsls file?
Are there any better ways you would suggest to handle this?
I'd highly appreciate any advice on this matter.
You need to perform a trade-off between easy of use (access/update the source XSLX file) and maintainability (storing safely and efficiently the data).
Option #1 is more convenient if you need to quickly open and change the file using your Excel/Numbers application. On the other hand your application needs to access physical files to perform the logic and render the views.
BTW some time ago I have created a repository Heroku Files to present some options in terms of using external files.
Option #2 is typically better from a design point of view: the data is organised in the database and can be more efficiently queried and manipulated. The challenge in this case is that you need a way to view/edit the data, and this normally requires more development (creating new screens, etc..)
Involving a database is normally the preferred approach as you can scale up to large dataset without problems (which is not the case with files).
On the other hand if the XLS file stays small and you only need simple (quick) updates your current architecture can work.
I have made a python flask app that I can use to manipulate a scoreboard. The app is hosted on heroku.com. The scoreboard is stored in a JSON file. First I just had the JSON file in the GitHub that Heroku makes for you. But then I found out that every couple of hours Heroku does a reset to your last commit. So any changes I would have made to the scoreboard.json would have been lost.
So I came to the conclusion that I needed to use an actual database hosting site to host my scoreboard.json. I have chosen mLab for this.
What command sends over a complete copy of a file in mLab back to the server so I can make changes to the file and then what command replaces the old file with the new file?
You're looking for a python mongodb driver. According to https://docs.mongodb.com/ecosystem/drivers/python/:
PyMongo is the recommended way to work with MongoDB from Python.
Check out the tutorial on using PyMongo, specifically inserting and getting documents.
That being said, you may want to consider splitting up the scoreboard data into smaller units. For example, having one document per player/team might be easier to manage.
Good luck!
I have a server which generates some data on a daily basis. I am using D3 to visualise the data (d3.csv("path")).
The problem is I can only access the files if they are under my static_dir in the project.
However, if I put them there, they do eventually get cached and I stop seeing the updates, which is fine for css and js files but not for the underlying data.
Is there a way to put these files maybe in a different folder and prevent caching on them? Under what path will I be able to access them?
Or alternatively how would it be advisable to structure my project differently in order to maybe avoid this operation in the first place. Atm, I have a seperate process that generates the data and stores it the given folder which is independent from the server.
Many thanks,
Tony
When accessing the files you can always add ?t=RANDOM to the request in order to get a "new" data all the time.
Because the request (on the server-side) is "new" - there will be no cache, and from the client side it doesn't really matter.
To get a new random you can use Date.now():
url = "myfile.csv?t="+Date.now()
My production database currently contains 4k MyModels (loaded from the dev database last year). I started working on this project again. I now have 270k MyModels (including the original 4k MyModels). I want to export this new datadump to my production database. What will happen to the 4k MyModels already there (doing a simple dumpdata/loaddata)? How will the records be overwritten?
After you dump your data into a file, you go and cd into folder where you keep your dump file and do
mysql -u root -p your_database_name < DumpDevDatabase.sql
NOTE:
Bare in mind that you will create new database in production every time you want to dump your data into it and that is a bad thing.
You should not do this, this should work the other way around, Production database need to be isolated from this things, you should dump data from your production to your development database so you can work with the data.
In that case when you dump data from production to development, again you need to create new database for loading data into it.
You can use tools like mysql workbench or pgadmin if you use postgreql, this will help you to easier work with your database.
I'm still not sure why you want to do it, but I strongly advice you not do overwrite your production database.
I am trying to serve up some user uploaded files with Flask, and have an odd problem, or at least one that I couldn't turn up any solutions for by searching. I need the files to retain their original filenames after being uploaded, so they will have the same name when the user downloads them. Originally I did not want to deal with databases at all, and solved the problem of filename conflicts by storing each file in a randomly named folder, and just pointing to that location for the download. However, stuff came up later that required me to use a database to store some info about the files, but I still kept my old method of handling filename conflicts. I have a model for my files now and storing the name would be as simple as just adding another field, so that shouldn't be a big problem. I decided, pretty foolishly after I had written the implmentation, on using Amazon S3 to store the files. Apparently S3 does not deal with folders in the way a traditional filesystem does, and I do not want to deal with the surely convoluted task of figuring out how to create folders programatically on S3, and in retrospect, this was a stupid way of dealing with this problem in the first place, when stuff like SQLalchemy exists that makes databases easy as pie. Anyway, I need a way to store multiple files with the same name on s3, without using folders. I thought of just renaming the files with a random UUID after they are uploaded, and then when they are downloaded (the user visits a page and presses a download button so I need not have the filename in the URL), telling the browser to save the file as its original name retrieved from the database. Is there a way to implement this in Python w/Flask? When it is deployed I am planning on having the web server handle the serving of files, will it be possible to do something like this with the server? Or is there a smarter solution?
I'm stupid. Right in the Flask API docs it says you can include the parameter attachment_filename in send_from_directory if it differs from the filename in the filesystem.