Change directory Vs. full path - python

I was wondering what the best practices are on working with paths in the following scenario:
I can either choose to change the current directory to the desired folder and then generate a file using only the file name, or just use the full path directly.
Here is the code where I set the current directory os.chdir():
a=time.clock()
import os
for year in range(start,end):
os.chdir("C:/CO2/%s" % year)
with open("Table.csv",'r') as file:
content=file.read()
b=time.clock()
b-a
Out[55]: 0.002037443263361638
And that is slower than when using the full path directly:
a=time.clock()
for year in range(start,end):
with open("C:/CO2/%s/Table.csv" % year,'r') as file:
content=file.read()
b=time.clock()
b-a
Out[56]: 0.0014569102613677387
I still doubt though whether using the full path is good practice. Are both the methods cross platform? Should I be using os.path instead of %s?

What's the use case for the code in question? Is it a script invoked on the command line by a user? If so, I would usually take the path as a command-line argument (sys.argv), as a command-line option (argparse), or using some sort of configuration file.
Or is the file path part of a more general-purpose module? In that case, I might think about wrapping the path and related code in a class (class FooBar). Users of the module could pass in the needed file path information when instantiating a FooBar. If users tended to use the same path over and over, I would again lean toward a strategy based on a configuration file.
Either way, the file path would be separate from the code -- at least for real software projects.
If we're talking about a one-off script with very few users and almost zero likelihood of future evolution or code re-use, it does not matter too much what you do.

As #lutz-horn said, hardcoded path isn't good idea for any code, except single-run scripts.
Talking about design, choose the methods that seem to be more explicit and simple for further development, don't optimize your code until run time becomes an issue.
In particular case, I'd prefer second way. No need to chdir until you're writing consistent files. You should use explicit chdir in case you're writing many files with different name schemas.

Related

How can the root directory in python chunk be specified?

Setting the root directory in a python-chunk with the following code line results in an error while for an ordinary r-chunk it works just fine
knitr::opts_knit$set(root.dir ="..")
knitr::opts_knit$set(root.dir ="..")
Optimally there should exist the following options for each knitr-chunk:
- directory to find code to be imported / executed
- directory to find files / dependencies that are needed for code execution
- directory to save any code output
Does something similar exist?
What it looks like here is that you have told it that it is to look for python code:
```{python}
knitr::opts_knit$set(root.dir ="..")
```
When you run this in R studio it will give you an error:
Error: invalid syntax (, line 1)
You fed it python code instead. This makes sense as the call knit::opt_knit$set means to look in the knitr package for the opts_knit$set and set it to…. This doesn’t exist in python… yet. The python compiler does not recognize it as python code and returns a syntax error. Whereas when you run it as an R chunk, it knows to look into the knitr package. Error handling can be huge issue to deal with. It makes more sense to handle error categories than to account for every type of error. If you want to control the settings for a code chunk, you would do so in the parenthesis ie:
```{python, args }
codeHere
```
I have not seen args for any other language than R, but that does not mean it doesn’t exist. I have just not seen them. I also do not believe that this will fix your issue. You could try some of the following ideas:
Writing your python in a separate file and link to it. This would allow for you to take advantage of the language and utilize things like the OS import. This may be something you want to consider as even python has its ways of navigating around the various operating systems. This may be helpful if you are just running quick scripts and not loading or running python programs.
# OS module
import os
# Your os name
print(os.name)
# Gets PWD or present working directory
os.getcwd()
# change your directory
os.chdir("path")
You could try using the reticulate library within an R chunk and load your python that way
Another thought is that you could try
library(reticulate)
use_python(“path”)
Knitr looks in the same directory as your markdown file to find other files if needed. This is just like about any other IDE
At one point in time knitr would not accept R’s setwd() command. Trying to call setwd() may not do you any good.
It may not the best idea to compute paths relative to what's being executed. If possible they should be determined relative to where the user calls the code from.
This site may help.
The author of the knitr package is very active and involved. Good luck with what you are doing!

How to ignore hidden files when using os.stat() results in Python?

I'm trying to get the time of last modification (os.stat.st_mtime) of a particular directory. My issue is I have added a few metadata files that are hidden (they start with .). If I use os.stat(directory).st_mtime I get the date at which I updated the metadata file, not the date that a non-hidden file was modified in the directory. I would like to get the most recent time of modification for all of the other files in the directory other than the hidden metadata files.
I figure it's possible to write my own function, something along the lines of:
for file in folder:
if not file starts with '.':
modified_times.append(os.path.getmtime('/path/to/file')
last_time = most recent of modified_times
However, is it possible to do this natively in python? Or do I need to write my own function like the pseudocode above (or something like this question)?
Your desired outcome is impossible. The most recent modification time of all non-hidden files doesn't necessarily correspond to the virtual "last modified time of a directory ignoring hidden files". The problem is that directories are modified when files are moved in and out of them, but the file timestamps aren't changed (the file was moved, but not modified). So your proposed solution is at best a heuristic; you can hope it's correct, but there is no way to be sure.
In any event, no, there is no built-in that provides this heuristic. The concept of hidden vs. non-hidden files is OS and file system dependent, and Python provides no built-in API that cares about the distinction. If you want to make a "last_modified_guess" function, you'll have to write it yourself (I recommend basing it on os.scandir for efficiency).
Something as simple as:
last_time = max(entry.stat().st_mtime for entry in os.scandir(somedir) if not entry.name.startswith('.'))
would get you the most recent last modified time (in seconds since the epoch) of your non-hidden directory entries.
Update: On further reflection, the glob module does include a concept of . prefix meaning "hidden", so you could use glob.glob/glob.iglob of os.path.join(somedir, '*') to have it filter out the "hidden" files for you. That said, by doing so, you give up some of the potential benefits of os.scandir (free or cached stat results, free type checks, etc.), so if all you need is "hidden" filtering, a simple .startswith('.') check is not worth giving that up.

best way to handle the input data in python

I have some input data that is user configurable, so i do not want to hard code it. Like the data path, result path etc.
Can you please suggest the best way to handle this data? Should i keep them in an excel or notepad and then read at run time? Or is there a better way to handle it?
Thanks
There are a lot of ways to do it.
Configuration file
You can store configuration in separate file in YAML, JSON, INI or any other format. There are a lot of tools and libraries for parsing and loading such configurations. Take a look on this article. Such approach is good for rarely changed configuration like services credentials, but it's not really good for configuration that changes very often.
Environment variables
Also, you can store configuration inside environment variables. Take a look on py-env-config. You can hard-code default configuration values but allow user to override them using environment variables.
Script arguments
If you are writing a script, you can always pass all configuration as command-line arguments/options. Manuals. Such approach is good of configs that changes very often (almost every script execution).
EDIT
I'll suggest you to use configuration file for this constants.

python modules accessing files in "local" directory

I have simple question.
I have a python module "dev1.py" that needs a file "dev1_blob"
If I have everything in one directory.
my_app loads the dev1 like
from dev1 import func1
it works fine.
I want to move dev1.py to a directory "./dev_files" with init.py in it.I can load the dev1.py as
from dev_files.dev1 import func1
However when func1 runs to access the "device_blob" -- it barfs as:
resource not found ..
This is so basic that I believe I am missing something.
I can't figure out why great minds of python want everything to refer to __file__ (cwd) and force me to modify dev1.py based on where it's being run from. i.e. in dev1.py refer to the file as: 'dev_files/device_blob'
I can make it work this way, but it's purely absurd way of writing code.
Is there a simple way to access a file next to the module files or in the tree below?
Relative pathing is one of Python's larger flaws.
For this use case, you might be able to call open('../dev_files/device_blob') to go back a dir first.
My general solution is to have a "project.py" file containing an absolute path to the project directory. Then I call open(os.path.join(PROJECT_DIR, 'dev_files', 'device_blob')).

Best way to accept permanent input from a user?

I have a script which has a global variable that must be set by the user once and for all, the variable is a string containing a pathname, and each time the script runs it needs it. I don't want to prompt the user each time for this pathname.
Currently, I am considering asking the user to a set an environment variable permanently, by adding it to his /etc/profile or .bash_profile, and access it with sys.environ dictionary. The other option would be to have a config file and ask the user to edit the relevant line, then use configparser to read it.
Is there a recommended method for doing this?
Use the Python ConfigParser module, or configparser in Python 3.
It follows the standard *.ini format and allows you to store information from one run to the next in an easily readable format. The format is essentially self-documenting because you can name your keys in the file, and you can add comments to the configuration file too.
It also provides more flexibility over the environment variable method because it is easier to modify a configuration file, and the file can easily be passed from one computer to the next along with your script regardless of platform or other environment settings.
Your use case is exactly what configuration files are intended for, and you could accomplish your task with only a handful of lines of code:
cfg_parser = ConfigParser.ConfigParser() # Python 2.x
if cfg_parser.read('config_file_name.ini'):
path = cfg_parser.get('SECTION_NAME', 'path')
else:
print("No config file found")
This gives you your path, and all you have to ask your user to do is edit one line of a text file instead of making any system changes.
Additionally, this gives you a lot of room to expand in the future. If you ever want more options added to your script, modifying a configuration file is a lot easier than coming up with new environment variables.
Lastly, the ConfigParser library allows you to edit configuration files programmatically as well. You could add a command line option (perhaps with argparse) that allows your user to specify a path, and have your script automagically write its own config file with the path. Now your user never has to touch the configuration file manually, and will never have to add the path on the command line again either. Even better, if the path ever changes, your user can just run it with the command line path option again and voila, the old path in the config file is overwritten and the new one is saved.
I would definitely recommend the configuration file approach due to its flexibility and user-friendliness.

Categories