Primer needed in python pathnames

Primer needed in python pathnames - python

I am a very novice coder, and Python is my first (and, practically speaking, only) language. I am charged as part of a research job with manipulating a collection of data analysis scripts, first by getting them to run on my computer. I was able to do this, essentially by removing all lines of coding identifying paths, and running the scripts through a Jupyter terminal opened in the directory where the relevant modules and CSV files live so the script knows where to look (I know that Python defaults to the location of the terminal).
Here are the particular blocks of code whose function I don't understand
import sys
sys.path.append('C:\Users\Ben\Documents\TRACMIP_Project\mymodules/')
import altdata as altdata
I have replaced the pathname in the original code with the path name leading to the directory where the module is; the file containing all the CSV files that end up being referenced here is also in mymodules.
This works depending on where I open the terminal, but the only way I can get it to work consistently is by opening the terminal in mymodules, which is fine for now but won't work when I need to work by accessing the server remotely. I need to understand better precisely what is being done here, and how it relates to the location of the terminal (all the documentation I've found is overly technical for my knowledge level).
Here is another segment I don't understand
import os.path
csvfile = 'csv/' + model +'_' + exp + '.csv'
if os.path.isfile(csvfile): # csv file exists
hcsvfile = open(csvfile )
I get here that it's looking for the CSV file, but I'm not sure how. I'm also not sure why then on some occasions depending on where I open the terminal it's able to find the module but not the CSV files.
I would love an explanation of what I've presented, but more generally I would like information (or a link to information) explaining paths and how they work in scripts in modules, as well as what are ways of manipulating them. Thanks.

sys.path
This is simple list of directories where python will look for modules and packages (.py and dirs with __init__.py file, look at modules tutorial). Extending this list will allow you to load modules (custom libs, etc.) from non default locations (usually you need to change it in runtime, for static dirs you can modify startup script to add needed enviroment variables).
os.path
This module implements some useful functions on pathnames.
... and allows you to find out if file exists, is it link, dir, etc.
Why you failed loading *.csv?
Because sys.path responsible for module loading and only for this. When you use relative path:
csvfile = 'csv/' + model +'_' + exp + '.csv'
open() will look in current working directory
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory)...
You need to use absolute paths by constucting them with os.path module.

I agree with cdarke's comment that you are probably running into an issue with backslashes. Replacing the line with:
sys.path.append(r'C:\Users\Ben\Documents\TRACMIP_Project\mymodules')
will likely solve your problem. Details below.
In general, Python treats paths as if they're relative to the current directory (where your terminal is running). When you feed it an absolute path-- which is a path that includes the root directory, like the C:\ in C:\Users\Ben\Documents\TRACMIP_Project\mymodules-- then Python doesn't care about the working directory anymore, it just looks where you tell it to look.
Backslashes are used to make special characters within strings, such as line breaks (\n) and tabs (\t). The snag you've hit is that Python paths are strings first, paths second. So the \U, \B, \D, \T and \m in your path are getting misinterpreted as special characters and messing up Python's path interpretation. If you prefix the string with 'r', Python will ignore the special characters meaning of the backslash and just interpret it as a literal backslash (what you want).
The reason it still works if you run the script from the mymodules directory is because Python automatically looks in the working directory for files when asked. sys.path.append(path) is telling the computer to include that directory when it looks for commands, so that you can use files in that directory no matter where you're running the script. The faulty path will still get added, but its meaningless. There is no directory where you point it, so there's nothing to find there.
As for path manipulation in general, the "safest" way is to use the function in os.path, which are platform-independent and will give the correct output whether you're working in a Windows or a Unix environment (usually).
EDIT: Forgot to cover the second part. Since Python paths are strings, you can build them using string operations. That's what is happening with the line
csvfile = 'csv/' + model +'_' + exp + '.csv'
Presumably model and exp are strings that appear in the filenames in the csv/ folder. With model = "foo" and exp = "bar", you'd get csv/foo_bar.csv which is a relative path to a file (that is, relative to your working directory). The code makes sure a file actually exists at that path and then opens it. Assuming the csv/ folder is in the same path as you added in sys.path.append, this path should work regardless of where you run the file, but I'm not 100% certain on that. EDIT: outoftime pointed out that sys.path.append only works for modules, not opening files, so you'll need to either expand csv/ into an absolute path or always run in its parent directory.
Also, I think Python is smart enough to not care about the direction of slashes in paths, but you should probably not mix them. All backslashes or all forward slashes only. os.path.join will normalize them for you. I'd probably change the line to
csvfile = os.path.join('csv\', model + '_' + exp + '.csv')
for consistency's sake.

Related

Python - File Path not found if script run from another directory

I'm trying to run a script that works without issue when I run using in console, but causes issue if I try to run it from another directory (via IPython %run <script.py>)
The issue comes from this line, where it references a folder called "Pickles".
with open('Pickles/'+name+'_'+date.strftime('%y-%b-%d'),'rb') as f:
obj = pickle.load(f)
In Console:
python script.py <---works!
In running IPython (Jupyter) in another folder, it causes a FileNotFound exception.
How can I make any path references within my scripts more robust, without putting the whole extended path?
Thanks in advance!

Since running in the console the way you show works, the Pickles directory must be in the same directory as the script. You can make use of this fact so that you don't have to hard code the location of the Pickles directory, but also don't have to worry about setting the "current working directory" to be the directory containing Pickles, which is what your current code requires you to do.
Here's how to make your code work no matter where you run it from:
with open(os.path.join(os.path.dirname(__file__), 'Pickles', name + '_' + date.strftime('%y-%b-%d')), 'rb') as f:
obj = pickle.load(f)
os.path.dirname(__file__) provides the path to the directory containing the script that is currently running.
Generally speaking, it's a good practice to always fully specify the locations of things you interact with in the filesystem. A common way to do this as shown here.
UPDATE: I updated my answer to be more correct by not assuming a specific path separator character. I had chosen to use '/' only because the original code in the question already did this. It is also the case that the code given in the original question, and the code I gave originally, will work fine on Windows. The open() function will accept either type of path separator and will do the right thing on Windows.

You have to use absolute paths. Also to be cross platform use join:
First get the path of your script using the variable __file__
Get the directory of this file with os.path.dirname(__file__)
Get your relative path with os.path.join(os.path.dirname(__file__), "Pickles", f"{name}_{date.strftime('%y-%b-%d')}")
it gives you:
with open(os.path.join(os.path.dirname(__file__), "Pickles", f"{name}_{date.strftime('%y-%b-%d')}"), 'rb') as f:
obj = pickle.load(f)

How to guess a file extension with python?

I have recently tried to make a python code which takes a path of a file without an extension and determine what extension it has.
I was looking for something like the example below. In the example the extension is exe (but the code doesn't know that yet).
path = 'C:\\MyPath\\Example'
#takes the path above and guesses the programs extension:
extension = guess_extension(path)
#adds the extension to the path:
fullPath = path+extension
print(fullPath)
Output:
C:\MyPath\Example.exe
If you know a python module that would do that (or something similar), please list it below.
I have tried to use filetype (filetype.guess()) and mimetypes (mimetypes.guess_extension()) modules, but they would both return value of none.
I have also tried to use answers from many questions like this one, but that still didn't work.

It sounds like the built in glob module (glob docs) might be what you're looking for. This module provides Unix style pattern expansion functionality within Python.
In the following example the incomplete path variable has the str .* appended to it when passed to glob.glob. This essentially tells glob.glob to return a list of valid paths found within the host system that start the same as path, followed by a period (designating a file extension), with the asterisk matching any and all characters following those from path + '.'.
import glob
path = r'C:\Program Files\Firefox Developer Edition\minidump-analyzer'
full = glob.glob(path+'.*')
print(full[0])
Output: C:\Program Files\Firefox Developer Edition\minidump-analyzer.exe
It is worth noting that the above is just an illustration of how glob could be leveraged as part of a solution to your question. Proper handling of unexpected inputs, edge cases, exceptions etc. should be implemented as required by the needs of your program.

Weird python file path behavior

I have this folder structure, within edi_standards.py I want to open csv/transaction_groups.csv
But the code only works when I access it like this os.path.join('standards', 'csv', 'transaction_groups.csv')
What I think it should be is os.path.join('csv', 'transaction_groups.csv') since both edi_standards.py and csv/ are on the same level in the same folder standards/
This is the output of printing __file__ in case you doubt what I say:
>>> print(__file__)
~/edi_parser/standards/edi_standards.py

when you're running a python file, the python interpreter does not change the current directory to the directory of the file you're running.
In your case, you're probably running (from ~/edi_parser):
standards/edi_standards.py
For this you have to hack something using __file__, taking the dirname and building the relative path of your resource file:
os.path.join(os.path.dirname(__file__),"csv","transaction_groups.csv")
Anyway, it's good practice not to rely on the current directory to open resource files. This method works whatever the current directory is.

I do agree with Answer of Jean-Francois above,
I would like to mention that os.path.join does not consider the absolute path of your current working directory as the first argument
For example consider below code
>>> os.path.join('Functions','hello')
'Functions/hello'
See another example
>>> os.path.join('Functions','hello','/home/naseer/Python','hai')
'/home/naseer/Python/hai'
Official Documentation
states that whenever we have given a absolute path as a argument to the os.path.join then all previous path arguments are discarded and joining continues from the absolute path argument.
The point I would like to highlight is we shouldn't expect that the function os.path.join will work with relative path. So You have to submit absolute path to be able to properly locate your file.

How to process files from one subfolder to another in each directory using Python?

I have a basic file/folder structure on the Desktop where the "Test" folder contains "Folder 1", which in turn contains 2 subfolders:
An "Original files" subfolder which contains shapefiles (.shp).
A "Processed files" subfolder which is empty.
I am attempting to write a script which looks into each parent folder (Folder 1, Folder 2 etc) and if it finds an Original Files subfolder, it will run a function and output the results into the Processed files subfolder.
I made a simple diagram to showcase this where if Folder 1 contains the relevant subfolders then the function will run; if Folder 2 does not contain the subfolders then it's simply ignored:
I looked into the following posts but having some trouble:
python glob issues with directory with [] in name
Getting a list of all subdirectories in the current directory
How to list all files of a directory?
The following is the script which seems to run happily, annoying thing is that it doesn't produce an error so this real noob can't see where the problem is:
import os, sys
from os.path import expanduser
home = expanduser("~")
for subFolders, files in os.walk(home + "\Test\\" + "\*Original\\"):
if filename.endswith('.shp'):
output = home + "\Test\\" + "\*Processed\\" + filename
# do_some_function, output

I guess you mixed something up in your os.walk()-loop.
I just created a simple structure as shown in your question and used this code to get what you're looking for:
root_dir = '/path/to/your/test_dir'
original_dir = 'Original files'
processed_dir = 'Processed files'
for path, subdirs, files in os.walk(root_dir):
if original_dir in path:
for file in files:
if file.endswith('shp'):
print('original dir: \t' + path)
print('original file: \t' + path + os.path.sep + file)
print('processed dir: \t' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir)
print('processed file: ' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir + os.path.sep + file)
print('')
I'd suggest to only use wildcards in a directory-crawling script if you are REALLY sure what your directory tree looks like. I'd rather use the full names of the folders to search for, as in my script.
Update: Paths
Whenever you use paths, take care of your path separators - the slashes.
On windows systems, the backslash is used for that:
C:\any\path\you\name
Most other systems use a normal, forward slash:
/the/path/you/want
In python, a forward slash could be used directly, without any problem:
path_var = '/the/path/you/want'
...as opposed to backslashes. A backslash is a special character in python strings. For example, it's used for the newline-command: \n
To clarify that you don't want to use it as a special character, but as a backslash itself, you either have to "escape" it, using another backslash: '\\'. That makes a windows path look like this:
path_var = 'C:\\any\\path\\you\\name'
...or you could mark the string as a "raw" string (or "literal string") with a proceeding r. Note that by doing that, you can't use special characters in that string anymore.
path_var = r'C:\any\path\you\name'
In your comment, you used the example root_dir = home + "\Test\\". The backslash in this string is used as a special character there, so python tries to make sense out of the backslash and the following character: \T. I'm not sure if that has any meaning in python, but \t would be converted to a tab-stop. Either way - that will not resolve to the path you want to use.
I'm wondering why your other example works. In "C:\Users\me\Test\\", the \U and \m should lead to similar errors. And you also mixed single and double backslashes.
That said...
When you take care of your OS path separators and trying around with new paths now, also note that python does a lot of path-concerning things for you. For example, if your script reads a directory, as os.walk() does, on my windows system the separators are already processed as double backslashes. There's no need for me to check that - it's usually just hardcoded strings, where you'll have to take care.
And finally: The Python os.path module provides a lot of methods to handle paths, seperators and so on. For example, os.path.sep (and os.sep, too) wil be converted in the correct seperator for the system python is running on. You can also build paths using os.path.join().
And finally: The home-directory
You use expanduser("~") to get the home-path of the current user. That should work fine, but if you're using an old python version, there could be a bug - see: expanduser("~") on Windows looks for HOME first
So check if that home-path is resolved correct, and then build your paths using the power of the os-module :-)
Hope that helps!

how to launch an exe with a variable path, special characters and arguements

I want to copy an installer file from a location where one of the folder names changes as per the build number
This works for defining the path where the last folder name changes
import glob
import os
dirname = "z:\\zzinstall\\*.install"
filespec = "setup.exe"
print glob.glob (os.path.join (dirname, filespec))
# the print is how I'm verifying the path is correct
['z:\\zzinstall\\35115.install\\setup.exe'
The problem I have is that I can't get the setup.exe to launch due to the arguments needed
I need to launch setup.exe with, for example
setup.exe /S /z"
There are numerous other arguments that need to be passed with double quotes, slashes and whitespaces. Due to the documentation provided which is inconsistent, I have to test via trial and error. There are even instances that state I need to use a "" after a switch!
So how can I do this?
Ideally I'd like to pass the entrire path, including the file I need to glob or
I'd like to declare the result of the path with glob as a variable then concatenate with setup.exe and the arguements. That did not work, the error list can't be combined with string is returned.
Basically anything that works, so far I've failed because of my inability to handle the filename that varies and the obscene amount of whitespaces and special characters in the arguements.
The following link is related howevers does not include a clear answer for my specific question
link text
The response provided below does not answer the question nor does the link I provided, that's why I'm asking this question. I will rephrase in case I'm not understood.
I have a file that I need to copy at random times. The file is prependedned with unique, unpredicatable number e.g. a build number. Note this is a windows system.
For this example I will cite the same folder/file structure.
The build server creates a build any time in a 4 hour range. The path to the build server folder is Z:\data\builds*.install\setup.exe
Note the wildcard in the path. This means the folder name is prepended with a random(yes, random) string of 8 digits then a dot. then "install". So, the path at one time may be Z:\data\builds\12345678.install\setup.exe or it could be Z:\data\builds\66666666.install\setup.exe This is one, major portion of this problem. Note, I did not design this build numbering system. I've never seen anything like this my years as a QA engineer.
So to deal with the first issue I plan on using a glob.
import glob
import os
dirname = "Z:\\data\\builds\\*.install"
filespec = "setup.exe"
instlpath = glob.glob (os.path.join (dirname, filespec))
print instlpath # this is the test,printsthe accurate path to launch an install, problem #is I have to add arguements
OK so I thought I could use path that I defined as instlpath, concatnenate it and execute.
when it try and use prinnt to test
print instlpath + [" /S /z" ]
I get
['Z:\builds\install\12343333.install\setup.exe', ' /S /z']
I need
Z:\builds\install\12343333.install\setup.exe /S /z" #yes, I need the whitespace as #well and amy also need a z""
Why are all of the installs called setup.exe and not uniquely named? No freaking idea!
Thank You,
Surfdork

The related question you linked to does contain a relatively clear answer to your problem:
import subprocess
subprocess.call(['z:/zzinstall/35115.install/setup.exe', '/S', '/z', ''])
So you don't need to concatenate the path of setup.exe and its arguments. The arguments you specify in the list are passed directly to the program and not processed by the shell. For an empty string, which would be "" in a shell command, use an empty python string.
See also http://docs.python.org/library/subprocess.html#subprocess.call

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.