Python cp regex: copy an unknown dir

Python cp regex: copy an unknown dir - python

Currently in a program I have, I use:
os.system('cp /run/user/1000/gvfs/*/store_00010001/DCIM/100ND610/* /home/$USER/Pictures/$(date +%F)/')
But I want to use shutil.copy to do the copying for me. I have one problem though. In Bash I can have a path that has a regex/wildcard like * or ? eg.: /run/user/1000/gvfs/*/store_000?0001/, but how to I do that in python for a path?
I'm using Python 3.3

The following is a clean way to use glob when you are expecting only one match:
import glob
unknown_dir = "/run/user/1000/gvfs/gphoto*/"
matches = glob.glob(unknown_dir)
if len(matches) == 0:
raise Exception('No matches')
if len(matches) > 1:
raise Exception('Too many matches')
known_dir = matches[0]
If you are expecting zero or more matches, you can simply iterate over the list of matches and handle each one in turn:
import glob
unknown_dir = "/run/user/1000/gvfs/gphoto*/"
for match in glob.glob(unknown_dir):
# use match

#eryksun answered my question with recommending glob. For future people looking at this question, here is how I used it:
import os
import glob
unkown_dir = "/run/user/1000/gvfs/gphoto*/"
known_dir = str(glob.glob(unkown_dir)).strip("[]'")
I need the path as a string, hence me using str() and strip().
I'll now be able to use shutil.copy. Thank you #eryksun!

Related

Getting file names without file extensions with glob

I'm searching for .txt files only
from glob import glob
result = glob('*.txt')
>> result
['text1.txt','text2.txt','text3.txt']
but I'd like result without the file extensions
>> result
['text1','text2','text3']
Is there a regex pattern that I can use with glob to exclude the file extensions from the output, or do I have to use a list comprehension on result?

There is no way to do that with glob(), You need to take the list given and then create a new one to store the values without the extension:
import os
from glob import glob
[os.path.splitext(val)[0] for val in glob('*.txt')]
os.path.splitext(val) splits the file names into file names and extensions. The [0] just returns the filenames.

Since you’re trying to split off a filename extension, not split an arbitrary string, it makes more sense to use os.path.splitext (or the pathlib module). While it’s true that the it makes no practical difference on the only platforms that currently matter (Windows and *nix), it’s still conceptually clearer what you’re doing. (And if you later start using path-like objects instead of strings, it will continue to work unchanged, to boot.)
So:
paths = [os.path.splitext(path)[0] for path in paths]
Meanwhile, if this really offends you for some reason, what glob does under the covers is just calling fnmatch to turn your glob expression into a regular expression and then applying that to all of the filenames. So, you can replace it by just replacing the regex yourself and using capture groups:
rtxt = re.compile(r'(.*?)\.txt')
files = (rtxt.match(file) for file in os.listdir(dirpath))
files = [match.group(1) for match in files if match]
This way, you’re not doing a listcomp on top of the one that’s already in glob; you’re doing one instead of the one that’s already in glob. I’m not sure if that’s a useful win or not, but since you seem to be interested in eliminating a listcomp…

This glob only selects files without an extension: **/*/!(*.*)

Use index slicing:
result = [i[:-4] for i in result]

Another way using rsplit:
>>> result = ['text1.txt','text2.txt.txt','text3.txt']
>>> [x.rsplit('.txt', 1)[0] for x in result]
['text1', 'text2.txt', 'text3']
You could do as a list-comprehension:
result = [x.rsplit(".txt", 1)[0] for x in glob('*.txt')]

Use str.split
>>> result = [r.split('.')[0] for r in glob('*.txt')]
>>> result
['text1', 'text2', 'text3']

Grab part of filename with Python

Newbie here.
I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.
So in my case, let's say I have four .pdf like this:
aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf
(Note that they are of variable length.)
I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?

Here's a simple solution using the re module as mentioned in other answers.
# Libraries
import re
# Example filenames. Use glob as described below to grab your pdf filenames
file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf")
for fname in file_list:
res = re.findall("ID_(\d+).pdf", fname)
if not res: continue
print res[0] # You can append the result to a list
And below should be your output. You should be able to adapt this to other patterns.
# Output
123
456
Goodluck!

Here's another alternative, using re.split(), which is probably closer to the spirit of exactly what you're trying to do (although solutions with re.match() and re.search(), among others, are just as valid, useful, and instructive):
>>> import re
>>> re.split("[_.]", "dddddd_ID_4421.pdf")[-2]
'4421'
>>>

If the numbers are variable length, you'll want the regex module "re"
import re
# create and compile a regex pattern
pattern = re.compile(r"_([0-9]+)\.[^\.]+$")
pattern.search("abc_ID_8423.pdf").group(1)
Out[23]: '8423'
Regex is generally used to match variable strings. The regex I just wrote says:
Find an underscore ("_"), followed by a variable number of digits ("[0-9]+"), followed by the last period in the string ("\.[^\.]+$")

You can use the os module in python and do a listdir to get a list of filenames present in that path like so:
import os
filenames = os.listdir(path)
Now you can iterate over the filenames list and look for the pattern which you need using regular expressions:
import re
for filename in filenames:
m = re.search('(?<=ID_)\w+', filename)
print (m)
The above snippet will return the part of the filename following ID_ and prints it out. So, for your example, it would return 4421.pdf, 8423.pdf etc. You can write a similar regex to remove the .pdf part.

You probably want to use glob, which is a python module for file globbing. From the python help page the usage is as follows:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']

Pythonic way to extract the file name and its parent directory from the full file path?

We have the full path of the file:
/dir1/dir2/dir3/sample_file.tgz
Basically, I would like to end up with this string:
dir3/sample_file.tgz
We can solve it with regex or with .split("/") and then take and concatenate the last two items in the list.....but I am wondering if we can do this more stylish with os.path.dirname() or something like that?

import os
full_filename = "/path/to/file.txt"
fname = os.path.basename(full_filename)
onedir = os.path.join(os.path.basename(os.path.dirname(full_filename)), os.path.basename(full_filename))
no one ever said os.path was pretty to use, but it ought to do the correct thing regardless of platform.
If you're in python3.4 (or higher, presumably), there's a pathlib:
import os
import pathlib
p = pathlib.Path("/foo/bar/baz/txt")
onedir = os.path.join(*p.parts[-2:])

Get specific parts of a file path in Python

I have a path string like
'/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
Q1: How can I get only accescontrol.dat from the path.
Q2: How can I get only /path/eds/vs/accescontrol.dat from the path.

import re
url = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
match = re.search('^(.+/([^/]+\.dat))[^$]', url)
print match.group(1)
# Outputs /path/eds/vs/accescontrol.dat
print match.group(2)
# Outputs accescontrol.dat
I edited this to work in python2 and to answer both questions (the earlier regex answer above only answers the first of the two)

You could use regular expressions
import re
ma = re.search('/([^/]+\.dat)/d=', path)
print ma.group(1)

A simple solution is to use .split():
Q1:
str = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
[x for x in str.split('/') if x[-4:] == '.dat']
gives:
['accescontrol.dat','file1.dat']
A similar trick will answer Q2.
For more advanced file path manipulation I would recommend reading about os.path
https://docs.python.org/2/library/os.path.html#module-os.path

I would recommend separating each level of folder/file into strings in a list.
path = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'.split("/")
This makes path = ['path', 'eds', 'vs', 'accescontrol.dat', 'd=12520', 'file1.dat']
Then from there, you can access each of the different parts.

Why not this way
from pathlib import Path
h=r'C:\Users\dj\Pictures\Saved Pictures'
path = Path(h)
print(path.parts)
Path: .
('C:\\', 'Users', 'dj', 'Pictures', 'Saved Pictures')

python:extract certain part of string

I have a string from which I would like to extract certain part. The string looks like :
E:/test/my_code/content/dir/disp_temp_2.hgx
This is a path on a machine for a specific file with extension hgx
I would exactly like to capture "disp_temp_2". The problem is that I used strip function, does not work for me correctly as there are many '/'. Another problem is that, that the above location will change always on the computer.
Is there any method so that I can capture the exact string between the last '/' and '.'
My code looks like:
path = path.split('.')
.. now I cannot split based on the last '/'.
Any ideas how to do this?
Thanks

Use the os.path module:
import os.path
filename = "E:/test/my_code/content/dir/disp_temp_2.hgx"
name = os.path.basename(filename).split('.')[0]

Python comes with the os.path module, which gives you much better tools for handling paths and filenames:
>>> import os.path
>>> p = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> head, tail = os.path.split(p)
>>> tail
'disp_temp_2.hgx'
>>> os.path.splitext(tail)
('disp_temp_2', '.hgx')

Standard libs are cool:
>>> from os import path
>>> f = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> path.split(f)[1].rsplit('.', 1)[0]
'disp_temp_2'

Try this:
path=path.rsplit('/',1)[1].split('.')[0]

path = path.split('/')[-1].split('.')[0] works.

You can use the split on the other part :
path = path.split('/')[-1].split('.')[0]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python cp regex: copy an unknown dir - python

Related

Getting file names without file extensions with glob

Grab part of filename with Python

Pythonic way to extract the file name and its parent directory from the full file path?

Get specific parts of a file path in Python

python:extract certain part of string

Categories

Resources