Django Server - How to prevent caching on csv files?

Django Server - How to prevent caching on csv files? - python

I have a server which generates some data on a daily basis. I am using D3 to visualise the data (d3.csv("path")).
The problem is I can only access the files if they are under my static_dir in the project.
However, if I put them there, they do eventually get cached and I stop seeing the updates, which is fine for css and js files but not for the underlying data.
Is there a way to put these files maybe in a different folder and prevent caching on them? Under what path will I be able to access them?
Or alternatively how would it be advisable to structure my project differently in order to maybe avoid this operation in the first place. Atm, I have a seperate process that generates the data and stores it the given folder which is independent from the server.
Many thanks,
Tony

When accessing the files you can always add ?t=RANDOM to the request in order to get a "new" data all the time.
Because the request (on the server-side) is "new" - there will be no cache, and from the client side it doesn't really matter.
To get a new random you can use Date.now():
url = "myfile.csv?t="+Date.now()

Related

Users change codes directly on the production server

I'm working on a project that has all its data written as python files(codes). Users with authentication could upload data through a web page, and this will directly add and change codes on the production server. This is causing trouble every time I want to git pull changes from the git repo to the production server since the codes added by users directly on production are untracked.
I wonder if anyone has some other ideas. I know this is ill-designed but this is what I got from the beginning, and implementing a database will require a lot of effort since all the current codes are designed for python files.I guess it's because the people who wrote this didn't know much about databases and it works only because there is relatively little data.
The only two solutions I could think of is
use a database instead of having all datas being codes
git add/commit/push all changes on the production server every time the user upload data
Details added:
There is a 'data' folder of python files, with each file storing the information of a book. For instance, latin_stories.py has a dictionary variable text, an integer variable section, a string variable language, etc. When users upload a csv file according to a certain format, a program will automatically add and change the python files in the data folder, directly on the production server.

Best practice in updating Django web app content when deployed on Heroku

I'm finally at the stage of deploying my Django web-app on Heroku. The web-app retrieves and performs financial analysis on a subset of public companies. Most of the content of the web-app is stored in .csv and .xslx files. There's a master .xslx file that I need to open and manually update on a daily basis. Then I run scripts which based on info in that .xsls file retrieve financial data, news, etc and store the data in .csv files. Then my Django views and html templates are referring to those files (and not to a sql/postgress database). This is my setup in a nutshell.
Now I'm wondering what is the best way to make this run smoothly in production.
Shall I store these .xslx and .csv files on AWS S3 and have Django access them and update them from there? I assume that this way I can easily open and edit the master .xslx file anytime. Is that a good idea and do I run any performance or security issues?
Is it better to convert all these data into a postgress database on Heroku? Is there a good guideline in terms of how I can technically do that? In that scenario wouldn't it be much more challenging for me to edit the data in the master .xsls file?
Are there any better ways you would suggest to handle this?
I'd highly appreciate any advice on this matter.

You need to perform a trade-off between easy of use (access/update the source XSLX file) and maintainability (storing safely and efficiently the data).
Option #1 is more convenient if you need to quickly open and change the file using your Excel/Numbers application. On the other hand your application needs to access physical files to perform the logic and render the views.
BTW some time ago I have created a repository Heroku Files to present some options in terms of using external files.
Option #2 is typically better from a design point of view: the data is organised in the database and can be more efficiently queried and manipulated. The challenge in this case is that you need a way to view/edit the data, and this normally requires more development (creating new screens, etc..)
Involving a database is normally the preferred approach as you can scale up to large dataset without problems (which is not the case with files).
On the other hand if the XLS file stays small and you only need simple (quick) updates your current architecture can work.

Displaying arbitrary nested filesystem directories/file contents on site using Django

I have a directory structure on the filesystem on my server. It can contain an arbitrary level of nested subdirectories, themselves which eventually end up containing “reports”/“queries” which are basically just xml files containing data. I'd like to display this directory structure as a nested collapsible menu list view on my site with the "reports" being the leaves in the menu list view and displaying the data from those "reports" in another area of the page once clicked on. In each of the menu headings I’d like to display the directory name (actually, I have already configured a file that maps the directory names to a more user friendly display name).
The structure of the data on disk looks like this (where the categories and subcategories are directories on the filesystem and the queries represent the files whose information I want to display):
My overall question is:
What’s the best way to read the filesystem directory structure into a python data structure so that I can eventually display it on a website using collapsible nested menus to represent the directory structure.
Since the data is being collected from the filesystem to be displayed on a page request, I’m not sure how to use django templates (if I even should) to generate these collapsible menus since the html being generated for the list structure is read on a page request and can and often does change on the filesystem. I'm not sure how to get a static Django template to work with something like that.
What is the best front-end framework out there that integrates well with Django, and preferably supports nested menus or blinds to an arbitrary depth that would allow me to display these menus and eventually the output of the reports files?
Maybe I should I use a django template to display the initial top level of the directory structure and use ajax to grab the next (and next after that) sub-level of data when a user tries to drill down into a category/directory? Ideally, I’d like to avoid the latency of making a request and waiting for a response every time a list item is clicked so I’d like to build the entire page on the server before sending it to the client. But again, I’m not sure if it’s even possible to use django templates to display arbitrary data like this.
Any suggestions and advice are much appreciated, thanks.
PS: I think I can just construct the html programmatically to display the directory structure and send that to the client but I'd like to avoid doing it that way if possible.

Just a suggestion but I would consider keeping a database table of all of the entries with their disk path, file size, file name, etc. Fill that with a scheduled job. Then search in the dbms against those paths/names.
That allows you to show a little bit more information to a user, and makes everything on the django/front end side use tooling that you'd normally expect to work with.
Take the file system out of the equation and think about what the user really wants to be able to do, odds are they don't care about how it is stored on the server.

How to create files at runtime on python app-engine: live or localhost

The question is simply as stated in the title. But the reason is probably important for a good answer.
I need to generate static html files at runtime. Generating the files is not the problem. Using jinja2 and webapp2 I can painlessly generate dynamic html files at runtime and serve them to my users. My problem is that generating these files is expensive. So I want to be able to save them as static html files; so that, when user tries to access them, I just serve the static html if it exists. If it does not exist, then I can create the file and then serve it.
Again I am already using jinja2 to create my string (i.e. file content). So my question is, how do I save the file to a path that my app.yaml file can map to?
Also if you are thinking memcache I am already using that. It is not sufficient storage. So while it’s a buffer, it’s not as good as having static html files.
Some back story:
If I can generate the files in localhost, that'd be even better. The thing is the site has a huge number of pages. But the structure is common to all the pages. So we store the relevant data in the datastore and use jinja2 to inflate the different pages. But because of heavy usage, memcache is not able to keep up. So now it's looking more cost-effective to create static html pages from the data stored in the datastore. Creating those pages by hand is obscene and error prone. So if we could generate the html pages as usual (i.e. jinja2 template and datastore) and then have an automated system for creating the html pages, it would be awesome. We could do the app.yaml part manually.
So after I do
template = jinja_environment.get_template(‘template.html')
content = template.render(template_values)
What do I do to save the file to say ./relevant/path/filename.html

IfI understand your question properly, the best way to do this would be to store your static files in Google Cloud Storage (GCS). Easy to read and write to GCS. And easy to delete the file when you no longer need it.

You can't write to files in App Engine -- see https://cloud.google.com/appengine/kb/java?csw=1#writefile for explanation.
Why don't you store the post-rendered page content back in datastore? I'm assuming the "expensive" part of the approach you're using now is the building of the templated content from the data. Once you've done that, instead of writing it off to local files, you could either put it as an additional property on the original data entity or you could create an entirely new model just for the rendered pages.
Essentially this would be a durable datastore implementation instead of memcache, if the issue is that pages are getting evicted too often from memcache.
A single fetch (directly from the key, since you can use keynames that you can create based on the page you're fetching, so you don't have to even do a query) is lightweight. Then, if you're still using memcache, then you'd check memcache first; if not there, you fetch the entity from the datastore and check to see if the rendered property contains data. If not, you use the data in the entity to render it with Jinja, store it back off in the rendered property, put the entity, store it off in memcache, and return the rendered content back to the client.

Is there a way to tell a browser to download a file as a different name than as it exists on disk?

I am trying to serve up some user uploaded files with Flask, and have an odd problem, or at least one that I couldn't turn up any solutions for by searching. I need the files to retain their original filenames after being uploaded, so they will have the same name when the user downloads them. Originally I did not want to deal with databases at all, and solved the problem of filename conflicts by storing each file in a randomly named folder, and just pointing to that location for the download. However, stuff came up later that required me to use a database to store some info about the files, but I still kept my old method of handling filename conflicts. I have a model for my files now and storing the name would be as simple as just adding another field, so that shouldn't be a big problem. I decided, pretty foolishly after I had written the implmentation, on using Amazon S3 to store the files. Apparently S3 does not deal with folders in the way a traditional filesystem does, and I do not want to deal with the surely convoluted task of figuring out how to create folders programatically on S3, and in retrospect, this was a stupid way of dealing with this problem in the first place, when stuff like SQLalchemy exists that makes databases easy as pie. Anyway, I need a way to store multiple files with the same name on s3, without using folders. I thought of just renaming the files with a random UUID after they are uploaded, and then when they are downloaded (the user visits a page and presses a download button so I need not have the filename in the URL), telling the browser to save the file as its original name retrieved from the database. Is there a way to implement this in Python w/Flask? When it is deployed I am planning on having the web server handle the serving of files, will it be possible to do something like this with the server? Or is there a smarter solution?

I'm stupid. Right in the Flask API docs it says you can include the parameter attachment_filename in send_from_directory if it differs from the filename in the filesystem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.