list of jpeg files in nested subdirectories - python

I use the following python code to get list of jpg files in nested subdirectories which are in parent directory.
import glob2,os
all_header_files = glob2.glob(os.path.join('Path/to/parent/directory','/**/*.jpg'))
However, I get nothing but when I cd into the parent directory and I use the following python code then I get the list of jpeg files.
import glob2
all_header_files = glob2.glob('./**/*.jpg')
How can I get the result with the absolute path?(first version)

You have an extra slash.
The os.path.join will insert the filepath separators for you, so you should think of it as this to get the correct directory
join('Path/to/parent directory' , '**/*.jpg')
Even more accurately,
parent = os.path.join('Path', 'to', 'parent directory')
os.path.join(parent, '**/*.jpg')
If you are trying to use your Home directory, see os.path.expanduser
In [10]: import os, glob
In [11]: glob.glob(os.path.join('~', 'Downloads', "**/*.sh"))
Out[11]: []
In [12]: glob.glob(os.path.expanduser(os.path.join('~', 'Downloads', "**/*.sh")))
Out[12]:
['/Users/name/Downloads/dir/script.sh']

You should not join with the trailing slash as you'll end up with the root. You can debug by printing out the resulting path before passing it to glob.
Try to change your code like this (note the dot):
import glob2,os
all_header_files = glob2.glob(os.path.join('Path/to/parent directory','./**/*.jpg'))

os.path.join() joins paths in an intelligent way.
os.path.join('Path/to/anything','/**/*.jpg'))
resolves to '/**/*.jpg' since '/**/*.jpg' is any path, ever.
Change the '/**/*.jpg' to '**/*.jpg' and it should work.
In cases like this, I recommend to always try out the result of a certain function within the python command line. At least, this is how I found out the issue here.

The problem with the code you have posted lies in the use of os.path.join.
In the documentation it says for os.path.join(path, *paths):
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
In your case, the component /**/*.jpg is an absolute path, as it starts with a /. Consequently your initial input /Path/to/parent directory is being truncated by the call to the join function. (https://docs.python.org/3.5/library/os.path.html#os.path.join)
I have locally tested the joining part with python3 and for me it is the case, that using os.path.join(any_path, "/**/*.pdf") returns the string '/**/*.pdf'.
The fix for this error is:
import glob2,os
all_header_files = glob2.glob(os.path.join('Path/to/parent directory','**/*.jpg'))
This returns the path 'Path/to/parent directory/**/*.jpg'

Related

Convert a relative path (mp3) from a master file path (playlist) using python pathlib

I have three files
My python file running in an unimportant different folder: C:\DD\CC\BB\AA\code.py
A playlist file "C:\ZZ\XX\Playlist.pls" which points to ....\mp3\song.mp3
The C:\mp3\song.mp3 file.
What I want is to get the location of the mp3 as an absolute path. But every attemp I try I get everything related to whenever the code.py file is.
import pathlib
plMaster = pathlib.Path(r"C:\ZZ\XX\Playlist.pls")
plSlave = pathlib.Path(r"..\..\mp3\song.mp3")
I have tried plSlave.absolute() and gives me "C:\DD\CC\BB\AA....\mp3\song.mp3"
Using relative_to doesn't work. I feel like I am doing such an easy task but I must be missing something because I can't find any function that lets me set the reference to compute the relative path.
Note: I already have parsed the pls file, and have the string r"....\mp3\song.mp3" extracted. I just need to get the path "C:\mp3\song.mp3" knowing that they are relative to the pls. (Not relative to the code.py)
If you're using a Windows version of Python, this is fairly easy. You can join the directory of plMaster (plMaster.parent) with the relative path of plSlave, then resolve the path using resolve(). You can use strict=False to force the resolve even if the path components aren't found.
This worked for me:
>>> plMaster = pathlib.Path(r"C:\ZZ\XX\Playlist.pls")
>>> plSlave = pathlib.Path(r"..\..\mp3\song.mp3")
>>> plMaster.parent.joinpath(plSlave).resolve(strict=False)
WindowsPath('C:/mp3/song.mp3')
If you're on a Unix version of Python, using Windows paths, I couldn't get this to work no matter what I tried, even using pathlib.PureWindowsPath().
Might well be a better method here, but you can use pathlib.Path.parents and pathlib.Path.parts to extract some useful info here and get where you are going
new_relative_path = r"..\..\mp3\song.mp3" #however you got this from reading your .pls file or whatever
pls_path = pathlib.Path(r'C:\ZZ\XX\Playlist.pls')
relative_save = pathlib.Path(new_relativePath)
n = relative_save.parts.count('..')
new_path = pls_path.parents[n-1].joinpath(*relative_save.parts[n:])
The key thing here is that you are going to navigate up the original path (the pls_path) n times (so n-1 since we start at 0), and then you are going to append to that whatever your new relative path is, stripping the '..' segments from the beginning of it.
Whilst I was waiting for other answers I manage to figure it out ditching pathlib and using os instead.
import os
plMaster = r"C:\ZZ\XX\Playlist.pls"
plSlave = r"..\..\mp3\song.mp3"
os.chdir(os.path.dirname(plMaster))
os.path.abspath(plSlave)

Relative path in Python

I'm writing some python code to generate the relative path. Situation need to be considered:
Under the same folder. I want "." or ".\", both of tham are ok for me.
Other folder. I want like ".\xxx\" and "..\xxx\xxx\"
os.path.relpath() will generate the relative path, but without .\ at the beginning and \ in the end. We can add \ in the end by using os.path.join(dirname, ""). But i can't figure out how to add ".\" at the beginning without impacting the first case when they are under the same folder and "..\xxx\xxx\".
It will give you relative path
import os
dir = os.path.dirname(__file__)
filename = os.path.join(dir,'Path')
The relpath() function will produce the ".." syntax given the appropriate base to start from (second parameter). For instance, supposing you were writing something like a script generator that produces code using relative paths, if the working directory is as the second parameter to relpath() as below indicates, and you want to reference in your code another file in your project under a directory one level up and two deep, you'll get "../blah/blah".. In the case where you want to prefix paths in the same folder, you can simply do a join with ".". That will produce a path with the correct OS specific separator.
print(os.path.relpath("/foo/bar/blah/blah", "/foo/bar/baz"))
>>> ../blah/blah
print(os.path.join('.', 'blah'))
>>> ./blah

How to resolve relative paths in python?

I have Directory structure like this
projectfolder/fold1/fold2/fold3/script.py
now I'm giving script.py a path as commandline argument of a file which is there in
fold1/fold_temp/myfile.txt
So basically I want to be able to give path in this way
../../fold_temp/myfile.txt
>>python somepath/pythonfile.py -input ../../fold_temp/myfile.txt
Here problem is that I might be given full path or relative path so I should be able to decide and based on that I should be able to create absolute path.
I already have knowledge of functions related to path.
Question 1
Question 2
Reference questions are giving partial answer but I don't know how to build full path using the functions provided in them.
try os.path.abspath, it should do what you want ;)
Basically it converts any given path to an absolute path you can work with, so you do not need to distinguish between relative and absolute paths, just normalize any of them with this function.
Example:
from os.path import abspath
filename = abspath('../../fold_temp/myfile.txt')
print(filename)
It will output the absolute path to your file.
EDIT:
If you are using Python 3.4 or newer you may also use the resolve() method of pathlib.Path. Be aware that this will return a Path object and not a string. If you need a string you can still use str() to convert it to a string.
Example:
from pathlib import Path
filename = Path('../../fold_temp/myfile.txt').resolve()
print(filename)
A practical example:
sys.argv[0] gives you the name of the current script
os.path.dirname() gives you the relative directory name
thus, the next line, gives you the absolute working directory of the current executing file.
cwd = os.path.abspath(os.path.dirname(sys.argv[0]))
Personally, I always use this instead of os.getcwd() since it gives me the script absolute path, independently of the directory from where the script was called.
For Python3, you can use pathlib's resolve functionality to resolve symlinks and .. components.
You need to have a Path object however it is very simple to do convert between str and Path.
I recommend for anyone using Python3 to drop os.path and its messy long function names and stick to pathlib Path objects.
import os
dir = os.path.dirname(__file__)
path = raw_input()
if os.path.isabs(path):
print "input path is absolute"
else:
path = os.path.join(dir, path)
print "absolute path is %s" % path
Use os.path.isabs to judge if input path is absolute or relative, if it is relative, then use os.path.join to convert it to absolute

Extracting penultimate folder name from path

Does anyone know a clever way to extract the penultimate folder name from a given path?
eg folderA/folderB/folderC/folderD
-> I want to know what the name of folderC is, I don't know the names of the other folders and there may be a variable number of directories before folderC but it's always the 2nd to last folder.
everything i come up with seems too cumbersome (eg getting name of folderD using basename and normpath, removing this from path string, and the getting folderC
cheers, -m
There isn't a good way to skip directly to portions within a path in a single call, but what you want can be easily done like so:
>>> os.path.basename(os.path.dirname('test/splitting/folders'))
'splitting'
Alternatively, if you know you'll always be on a filesystem with '/' delineated paths, you can just use regular old split() to get there directly:
>>> 'test/splitting/folders'.split('/')[-2]
'splitting'
Although this is a bit more fragile. The dirname+basename combo works with/without a file at the end of the path, where as the split version you have to alter the index
yep, there sure is:
>>> import os.path
>>> os.path.basename(os.path.dirname("folderA/folderB/folderC/folderD"))
'folderC'
That is, we find the 'parent directory' of the named path, and then extract the filename of the resulting path from that.

New folder that is created inside the current directory

I have a program in Python that during the processes it creates some files. I want the program to recognize the current directory and then then creates a folder inside the directory, so that the created files will be put in that directory.
I tried this:
current_directory = os.getcwd()
final_directory = os.path.join(current_directory, r'/new_folder')
if not os.path.exists(final_directory):
os.makedirs(final_directory)
But it doesn't give me what I wanted. It seems that the second line is not working as I wanted. Can anybody help me to solve the problem?
think the problem is in r'/new_folder' and the slash (refers to the root directory) used in it.
Try it with:
current_directory = os.getcwd()
final_directory = os.path.join(current_directory, r'new_folder')
if not os.path.exists(final_directory):
os.makedirs(final_directory)
That should work.
One thing to note is that (per the os.path.join documentation) if an absolute path is provided as one of the arguments, the other elements are thrown away. For instance (on Linux):
In [1]: import os.path
In [2]: os.path.join('first_part', 'second_part')
Out[2]: 'first_part/second_part'
In [3]: os.path.join('first_part', r'/second_part')
Out[3]: '/second_part'
And on Windows:
>>> import os.path
>>> os.path.join('first_part', 'second_part')
'first_part\\second_part'
>>> os.path.join('first_part', '/second_part')
'/second_part'
Since you include a leading / in your join argument, it is being interpreted as an absolute path and therefore ignoring the rest. Therefore you should remove the / from the beginning of the second argument in order to have the join perform as expected. The reason you don't have to include the / is because os.path.join implicitly uses os.sep, ensuring that the proper separator is used (note the difference in the output above for os.path.join('first_part', 'second_part').

Categories