python:extract certain part of string - python

I have a string from which I would like to extract certain part. The string looks like :
E:/test/my_code/content/dir/disp_temp_2.hgx
This is a path on a machine for a specific file with extension hgx
I would exactly like to capture "disp_temp_2". The problem is that I used strip function, does not work for me correctly as there are many '/'. Another problem is that, that the above location will change always on the computer.
Is there any method so that I can capture the exact string between the last '/' and '.'
My code looks like:
path = path.split('.')
.. now I cannot split based on the last '/'.
Any ideas how to do this?
Thanks

Use the os.path module:
import os.path
filename = "E:/test/my_code/content/dir/disp_temp_2.hgx"
name = os.path.basename(filename).split('.')[0]

Python comes with the os.path module, which gives you much better tools for handling paths and filenames:
>>> import os.path
>>> p = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> head, tail = os.path.split(p)
>>> tail
'disp_temp_2.hgx'
>>> os.path.splitext(tail)
('disp_temp_2', '.hgx')

Standard libs are cool:
>>> from os import path
>>> f = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> path.split(f)[1].rsplit('.', 1)[0]
'disp_temp_2'

Try this:
path=path.rsplit('/',1)[1].split('.')[0]

path = path.split('/')[-1].split('.')[0] works.

You can use the split on the other part :
path = path.split('/')[-1].split('.')[0]

Related

How to replace multiple forward slashes in a directory by a single slash?

My path:
'/home//user////document/test.jpg'
I want this to be converted into:
'/home/user/document/test.jpg'
How to do this?
Use os.path.abspath or normpath to canonicalise the path:
>>> import os.path
>>> os.path.abspath('/home//user////document/test.jpg')
'/home/user/document/test.jpg'
Solution:
This code snippet should solve your issue:
import re
x = '/home//user////document/test.jpg'
re.sub('/+','/', x)
Output:
'/home/user/document/test.jpg'
this solution is very simple by using Regex.
You can use it 're' module of the Python standard library.
import re
old_path = '/home//user////document/test.jpg'
converted_path = re.sub('/+', '/', old_path)
I'm sorry not to speak English fluently ;)
Instantiating a pathlib.Path object from your string will remove redundant slashes automatically for you:
from pathlib import Path
path = Path('/home//user////document/test.jpg')
print(path)
# /home/user/document/test.jpg
I think the easiest way is to replace '//' with '/' twice:
a = '/home//user////document/test.jpg'
a.replace('//', '/').replace('//', '/')

Grab part of filename with Python

Newbie here.
I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.
So in my case, let's say I have four .pdf like this:
aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf
(Note that they are of variable length.)
I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?
Here's a simple solution using the re module as mentioned in other answers.
# Libraries
import re
# Example filenames. Use glob as described below to grab your pdf filenames
file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf")
for fname in file_list:
res = re.findall("ID_(\d+).pdf", fname)
if not res: continue
print res[0] # You can append the result to a list
And below should be your output. You should be able to adapt this to other patterns.
# Output
123
456
Goodluck!
Here's another alternative, using re.split(), which is probably closer to the spirit of exactly what you're trying to do (although solutions with re.match() and re.search(), among others, are just as valid, useful, and instructive):
>>> import re
>>> re.split("[_.]", "dddddd_ID_4421.pdf")[-2]
'4421'
>>>
If the numbers are variable length, you'll want the regex module "re"
import re
# create and compile a regex pattern
pattern = re.compile(r"_([0-9]+)\.[^\.]+$")
pattern.search("abc_ID_8423.pdf").group(1)
Out[23]: '8423'
Regex is generally used to match variable strings. The regex I just wrote says:
Find an underscore ("_"), followed by a variable number of digits ("[0-9]+"), followed by the last period in the string ("\.[^\.]+$")
You can use the os module in python and do a listdir to get a list of filenames present in that path like so:
import os
filenames = os.listdir(path)
Now you can iterate over the filenames list and look for the pattern which you need using regular expressions:
import re
for filename in filenames:
m = re.search('(?<=ID_)\w+', filename)
print (m)
The above snippet will return the part of the filename following ID_ and prints it out. So, for your example, it would return 4421.pdf, 8423.pdf etc. You can write a similar regex to remove the .pdf part.
You probably want to use glob, which is a python module for file globbing. From the python help page the usage is as follows:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']

Get specific parts of a file path in Python

I have a path string like
'/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
Q1: How can I get only accescontrol.dat from the path.
Q2: How can I get only /path/eds/vs/accescontrol.dat from the path.
import re
url = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
match = re.search('^(.+/([^/]+\.dat))[^$]', url)
print match.group(1)
# Outputs /path/eds/vs/accescontrol.dat
print match.group(2)
# Outputs accescontrol.dat
I edited this to work in python2 and to answer both questions (the earlier regex answer above only answers the first of the two)
You could use regular expressions
import re
ma = re.search('/([^/]+\.dat)/d=', path)
print ma.group(1)
A simple solution is to use .split():
Q1:
str = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
[x for x in str.split('/') if x[-4:] == '.dat']
gives:
['accescontrol.dat','file1.dat']
A similar trick will answer Q2.
For more advanced file path manipulation I would recommend reading about os.path
https://docs.python.org/2/library/os.path.html#module-os.path
I would recommend separating each level of folder/file into strings in a list.
path = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'.split("/")
This makes path = ['path', 'eds', 'vs', 'accescontrol.dat', 'd=12520', 'file1.dat']
Then from there, you can access each of the different parts.
Why not this way
from pathlib import Path
h=r'C:\Users\dj\Pictures\Saved Pictures'
path = Path(h)
print(path.parts)
Path: .
('C:\\', 'Users', 'dj', 'Pictures', 'Saved Pictures')

Remove unecssary directories in path name constructed with os.join

In Python when I print directory path constructed with os.join I get something like this :
rep/rep2/../rep1
Is there a way to get only this :
rep/rep1
Yes, os.path.normpath() collapses redundant separators and up-references.
os.path.realpath() converts the path to a canonical path, which includes eliminating '..' components, but it also eliminates symlinks.
See https://docs.python.org/2/library/os.path.html.
Use os.path.relpath:
>>> import os
>>> os.path.relpath("rep/rep2/../rep1", start="")
'rep/rep1'
Or os.path.normpath:
>>> import os
>>> os.path.normpath("rep/rep2/../rep1")
'rep/rep1'

Get Filename Without Extension in Python

If I have a filename like one of these:
1.1.1.1.1.jpg
1.1.jpg
1.jpg
How could I get only the filename, without the extension? Would a regex be appropriate?
In most cases, you shouldn't use a regex for that.
os.path.splitext(filename)[0]
This will also handle a filename like .bashrc correctly by keeping the whole name.
>>> import os
>>> os.path.splitext("1.1.1.1.1.jpg")
('1.1.1.1.1', '.jpg')
You can use stem method to get file name.
Here is an example:
from pathlib import Path
p = Path(r"\\some_directory\subdirectory\my_file.txt")
print(p.stem)
# my_file
If I had to do this with a regex, I'd do it like this:
s = re.sub(r'\.jpg$', '', s)
No need for regex. os.path.splitext is your friend:
os.path.splitext('1.1.1.jpg')
>>> ('1.1.1', '.jpg')
One can also use the string slicing.
>>> "1.1.1.1.1.jpg"[:-len(".jpg")]
'1.1.1.1.1'

Categories