I downloaded nltk data into the data directory in my Flask app. The views reside in a blueprint in another directory on the same level as the data directory. In the view I'm trying to set the path to the data, but it doesn't work.
nltk.data.path.append('../nltk_data/')
This doesn't work. If I use the whole path, it does work.
nltk.data.path.append('/home/username/myapp/app/nltk_data/')
Why does the first form not work? How can I refer to the location of the data correctly?
In Python (and most languages), where the code resides in a package is different than what the working directory is when running a program. All relative paths are relative to the current working directory, not the code file it's written in. So you would use the relative path nltk_data/ even from a blueprint, or you would use the absolute path and leave no ambiguity.
The root_path attribute on an app (or blueprint) refers to the package directory for the app (or blueprint). Join your relative path to that to get the absolute path.
resource_path = os.path.join(app.root_path, 'enltk_data')
There's probably no reason to be appending this folder every time you call a view. I'm not familiar with nltk specifically, but there's probably a way to structure this so you set up the data path once when you create your app.
project / app / blueprint
/ data
^ join with root_path to get here
^ app.root_path always points here, no matter where cwd is
^ current working directory
Related
This question already has an answer here:
Refering to a directory in a Flask app doesn't work unless the path is absolute
(1 answer)
Closed 5 years ago.
In this link http://flask.pocoo.org/docs/0.12/tutorial/setup/#tutorial-setup
In file in flaskr.py :
app = Flask(__name__) # create the application instance :)
app.config.from_object(__name__) # load config from this file , flaskr.py
# Load default config and override config from an environment variable
app.config.update(dict(
DATABASE=os.path.join(app.root_path, 'flaskr.db'),
SECRET_KEY='development key',
USERNAME='admin',
PASSWORD='default'
))
app.config.from_envvar('FLASKR_SETTINGS', silent=True)
The explanation for the line
DATABASE=os.path.join(app.root_path, 'flaskr.db'),
is :
Operating systems know the concept of a current working directory for each process. Unfortunately, you cannot depend on this in web applications because you might have more than one application in the same process.
For this reason the app.root_path attribute can be used to get the path to the application. Together with the os.path module, files can then easily be found. In this example, we place the database right next to it.
Can anyone explain it with example as I am unable to understand the explanation?
I understand it as that: In a normal program, you would first do os.chdir(PATH) and then just open flaskr.db. But, as Flask starts several application (aka app) in a process, each application in a separate thread, but they all share the same "current directory" as the current directory is per-process and not per-thread. In order to be save you should always work with absolute directories and use app.root_path.
I have simple question.
I have a python module "dev1.py" that needs a file "dev1_blob"
If I have everything in one directory.
my_app loads the dev1 like
from dev1 import func1
it works fine.
I want to move dev1.py to a directory "./dev_files" with init.py in it.I can load the dev1.py as
from dev_files.dev1 import func1
However when func1 runs to access the "device_blob" -- it barfs as:
resource not found ..
This is so basic that I believe I am missing something.
I can't figure out why great minds of python want everything to refer to __file__ (cwd) and force me to modify dev1.py based on where it's being run from. i.e. in dev1.py refer to the file as: 'dev_files/device_blob'
I can make it work this way, but it's purely absurd way of writing code.
Is there a simple way to access a file next to the module files or in the tree below?
Relative pathing is one of Python's larger flaws.
For this use case, you might be able to call open('../dev_files/device_blob') to go back a dir first.
My general solution is to have a "project.py" file containing an absolute path to the project directory. Then I call open(os.path.join(PROJECT_DIR, 'dev_files', 'device_blob')).
I have my webapp written in Python running on Google App Engine; so far only testing in my localhost. My script needs to read text files located in various directories. So in my Python script I would simply use os.chdir("./"+subdirectory_name) and then start opening and reading the files and using the text for analysis. However I cannot do that when trying to move it over to App Engine.
I assume I need to make some sort of changes to app.yaml but I'm stuck after that. Can anyone walk me through this somewhat thoroughly?
From the answer below, I was able to fix this part, I have a followup question about the app.yaml structure though. I have my folder set up like the following, and I'm wondering what I need to put in my app.yaml file to make it work when I deploy the project (it works perfectly on localhost right now).
Main_Folder:
python_my_app.py
app.yaml
text_file1
text_file2
text_file3
subfolder1_with_10_text_files_inside
subfolder2_with_10_text_files_inside
subfolder3_with_10_text_files_inside
...how do I specify this structure in my app.yaml and do I need to change anything in my code if it works right now on localhost?
You don't need to change your working directory at all to read files. Use absolute paths instead.
Use the current file location as the starting point, and resolve all relative paths from there:
import os.path
HERE = os.path.dirname(os.path.abspath(__file__))
somefile = os.path.abspath(os.path.join(HERE, 'subfolder1_with_10_text_files_inside /somefile.txt'))
If you want to be able to read static files, do remember to set the application_readable flag on these, as Google will otherwise only upload them to their content delivery network, not to the app engine.
You can package your text files inside your application and then do something like this:
path = os.path.join(os.path.dirname(__file__), 'subdir', 'file')
file = open(path)
db: mysql
lang: python
framework:
django
Operating System: Linux (ubuntu)
The current project points I am working has a directory tree structure that will comprise of folders, sub-folders, files.
root path: /var/Testcases
Folder1
subFolder2
file1
file2
Folder2
subFolder2
File1
I can store/map the file paths of the 'file' in a database but I need to know what the directory , subdirectory names are thus I need a mechanism to update the structure every time it has changed. I've looked at other solutions to 'walk' / crawl and even implemented them but are there any suggestions on how to intelligently store directory structures that will change frequently? Both Parent folder, sub folder and the name of the file must be stored separately as a column. Would MongoDB be something to look into ( I'm talking out of my domain)?
I could run a chron job every few seconds to walk the directory and update my table to reflect the changes but there must be some other route. I hope I have provided enough information. I'm open to exploring various trial and error solutions. I've heard great things about MongoDB and that's why I threw it out there.
I have created a simple GAE app based on the default template. I want to add an external module like short_url. How do I do this? The directions that I have found so far are confusing and GAE doesn't seem to use PYTHONPATH for obvious reasons I guess.
Simply place the short_url.py file in your app's directory.
Sample App Engine project:
myapp/
app.yaml
index.yaml
main.py
short_url.py
views.py
And in views.py (or wherever), you can then import like so:
import short_url
For more complex projects, perhaps a better method is to create a directory especially for dependencies; say lib:
myapp/
lib/
__init__.py
short_url.py
app.yaml
index.yaml
main.py
views.py
from lib import short_url
Edit #2:
Apologies, I should have mentioned this earlier. You need modify your path, thanks to Nick Johnson for the following fix.
Ensure that this code is run before starting up your app; something like this:
import os
import sys
def fix_path():
# credit: Nick Johnson of Google
sys.path.append(os.path.join(os.path.dirname(__file__), 'lib'))
def main():
url_map = [ ('/', views.IndexHandler),] # etc.
app = webapp.WSGIApplication(url_map, debug=False)
wsgiref.handlers.CGIHandler().run(app)
if __name__ == "__main__":
fix_path()
main()
Edit3:
To get this code to run before all other imports, you can put the path managing code in a file of its own in your app's base directory (Python recognizes everything in that directory without any path modifications).
And then you'd just ensure that this import
import fix_path
...is listed before all other imports in your main.py file.
Here's a link to full, working example in case my explanation wasn't clear.
i will second the answers given by #Adam Bernier and #S.Mark, although adam's explains things is a bit more detail. in general, you can add any pure Python module/package to your App Engine directory and use as-is, as long as they don't try to work outside of the sandbox, i.e, cannot create files, cannot open network sockets, etc.
also keep in mind the hard limits:
maximum total number of files (app files and static files): 3,000
maximum size of an application file: 10 megabytes
maximum size of a static file: 10 megabytes
maximum total size of all application and static files: 150 megabytes
UPDATE (Oct 2011): most of these numbers have been increased to:
maximum total number of files (app files and static files): 10,000
maximum size of an application file: 32MB
maximum size of a static file: 32MB
UPDATE (Jun 2012): the last limit was bumped up to:
maximum total size of all application and static files: 1GB
You can import python packages as ZIPs. This allows you to avoid the maximum file count.
The app engine docs address this.
python 2.5: zipimport is supported.
python 2.7: zipimport is not supported, but Python 2.7 can natively import from .zip files.
This is how I import boto.
sys.path.insert(0, 'boto.zip')
import boto #pylint: disable=F0401
from boto import connect_fps #pylint: disable=F0401
The cons of this technique include having to manually re-archive many packages.
For example, boto.zip decompresses into the "boto" subdirectory, with the "boto" module inside of it (as another subdirectory).
So to import boto naturally you may have to do from boto import boto, but this can cause weirdness with a lack of __init__.py.
To solve this, simply decompress, and archive the boto subfolder manually as boto.zip, and place that in your application folder.
Since that url_shortener program written in python, you could just include in your source code and import it like other python modules.