Get all characters after a certain character?

Get all characters after a certain character? - python

Let's say I have a list of strings like this:
list1 = [
"filename1.txt",
"file2.py",
"fileexample.tiff"
]
How would I be able to grab all characters after the '.', if it's not too much to ask, by using "for i in" and have them come back in a list, like this: ['.txt','.py','.tiff']

If you are dealing with filepaths, then you should use the os.path module
import os.path
list1 = ["filename1.txt","file2.py","fileexample.tiff"]
print [os.path.splitext(f)[1] for f in list1]
prints
['.txt', '.py', '.tiff']

import os
for i in list1:
fileName, fileExtension = os.path.splitext(i)
print fileExtension
second one :
[i.split('.')[1] for i in list1]

map(lambda s:s.rsplit(".",1)[-1],my_list)
is probably how I would do it
which just splits from the right side exactly once on a period ... and gets whatever is on the right hand side for each item in the list

Related

Multiple non-nested if conditions in list comprehension without a terminal else

(Note: before you jump the gun to look for duplicate if-else Q's, please see the next section for why many of them did not suit mine)
I want to learn how to use list comprehension to simplify the two set of code block into one:
filenameslist.extend(
[f[:-4] for f in filenames if (
f.endswith('.mp3') or
f.endswith('.wma') or
f.endswith('.aac') or
f.endswith('.ogg') or
f.endswith('.m4a')
)])
filenameslist.extend(
[f[:-5] for f in filenames if (
f.endswith('.opus')
)])
I have tried to achieve it using the following code after following so many answers here in SO. However, these doesn't work for me. Please have a look at what I have right now:
filenameslist.extend(
[(f[:-4] if (
f.endswith('.mp3') or
f.endswith('.wma') or
f.endswith('.aac') or
f.endswith('.ogg') or
f.endswith('.m4a')
) else (f[:-5] if f.endswith('.opus') else '')) for f in filenames])
The unnecessary else '' at the end adds an entry '' to my list which I don't need. Removing the else or using else pass results in syntax error.
I can delete the '' entry manually from list, but the point is to learn how to do this one-step with list comprehension. I am using py 3.8.

There is no way in the expression of your list comprehension to state something like "do not produce an item in that case" (when the extension is not your list of allowed extensions).
You have to somehow repeat your test:
filenames = ['test.mp3', 'something.opus', 'dontcare.wav']
l = [
f[:-5] if f.endswith('.opus') else f[:-4]
for f in filenames
if (
f.endswith('.mp3') or
f.endswith('.wma') or
f.endswith('.aac') or
f.endswith('.ogg') or
f.endswith('.m4a') or
f.endswith('.opus')
)
]
print(l)
Note that you can use os.path.splitext to ease your work:
import os
filenames = ['test.mp3', 'something.opus', 'dontcare.wav']
l = [
os.path.splitext(f)[0]
for f in filenames
if os.path.splitext(f)[1] in ['.mp3', '.wma', '.aac', '.ogg', '.m4a', '.opus']
]
print(l)

Use the Path objects' built-in properties instead of parsing the names yourself:
from pathlib import Path
filenames = Path('/some/folder/').glob('*')
allowed_suffixes = ['.mp3', '.wma', '.aac', '.ogg', '.m4a', '.opus']
file_stems = set(f.stem for f in filenames if f.suffix in allowed_suffixes)
You can use a list instead of a set, of course. This looks cleaner than a convoluted list comprehension. If you want to retain the files' full paths, use:
file_stems = set(f.parent / f.stem for f in filenames if f.suffix in allowed_suffixes)

The str.endswith method can optionally take a tuple of suffixes, so you can simply do:
allowed_suffixes = '.mp3', '.wma', '.aac', '.ogg', '.m4a', '.opus'
filenameslist.extend(f[:f.rfind('.')] for f in filenames if f.endswith(allowed_suffixes))

You can use rpartition like below:
filenameslist.extend([fn.rpartition('.')[0] for fn in filenames if fn[fn.rfind('.'):] in suffixes])
Example:
suffixes = ['.mp3', '.wma', '.aac', '.ogg', '.m4a', '.opus', '.wav']
filenames = ['test.mp3', 'something.opus', 'dontcare.wav', 'lara']
[fn.rpartition('.')[0] for fn in filenames if fn[fn.rfind('.'):] in suffixes]
Output:
['test', 'something', 'dontcare']

Remove different substrings from list of strings

I have a list of strings with two different prefixes that I would like to remove.
example_list=[
'/test1/test2/test3/ABCD_1',
'/test1/test2/test3/ABCD_2',
'/test1/test2/test3/ABCD_3',
'/test1/test4/test5/test6/ABCD_4',
'/test1/test4/test5/test6/ABCD_5',
'/test1/test4/test5/test6/ABCD_6',
'/test1/test4/test5/test6/ABCD_7']
I would like the new list to look like:
example_list=[
'ABCD_1',
'ABCD_2',
'ABCD_3',
'ABCD_4',
'ABCD_5',
'ABCD_6',
'ABCD_7']
I was trying something like this, but keep running into errors.
for i in example_list:
if i.startswith('/test1/test2/test3/'):
i=i[19:]
else:
i=i[25:]

example_list = [path.split('/')[-1] for path in example_list]
Output:
['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']

given that these are all filesystem paths i suggest you use pathlib:
from pathlib import Path
example_list = [
'/test1/test2/test3/ABCD_1',
'/test1/test2/test3/ABCD_2',
'/test1/test2/test3/ABCD_3',
'/test1/test4/test5/test6/ABCD_4',
'/test1/test4/test5/test6/ABCD_5',
'/test1/test4/test5/test6/ABCD_6',
'/test1/test4/test5/test6/ABCD_7']
res = [Path(item).name for item in example_list]
print(res) # ['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']

Just use reverse indexing:
new_list=[]
for i in example_list:
j=i[-6:]
new_list.append(j)
print(new_list)
Output will be
['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']

How to form a glob that works for a wild char or exact match?

Iam using a statement such as :
input_stuff = '1,2,3'
glob(folder+'['+ input_stuff + ']'+'*')
to list files that begin with 1,2 or 3 while this lists files such as 1-my-file, 2-my-file, 3-my-file .
This doesnt work if exact file names are given
input_stuff = '1-my-file, 2-my-file, 3-my-file'
glob(folder+'['+ input_stuff + ']'+'*')
The error is :sre_constants.error: bad character range
worse for :
input_stuff = '1-my-'
glob(folder+'['+ input_stuff + ']'+'*')
It prints everything in the folder such as 3-my-file etc.,
Is there a glob statement that will print files for both
input_stuff = '1,2,3'
or
input_stuff = '1-my-file, 2-my-file, 3-my-file'
?

Glob expression in brackets is a set of characters, not a list of strings.
You first expresion input_stuff = '1,2,3' is equivalent to '123,' and will also match a name starting with comma.
Your second expression contains '-', which is used to denote character ranges like '0-9A-F', hence the error you get.
It is better to drop glob altogether, split input_stuff and use listdir.
import re, os
input_stuff = '1-my-file, 2-my-file, 3-my-file'
folder = '.'
prefixes = re.split(r'\s*,\s*', input_stuff) #split on commas with optional spaces
prefixes = tuple(prefixes) # startswith doesn't work with list
file_names = os.listdir(folder)
filtered_names = [os.path.join(folder, fname) for fname in file_names
if file_name.startswith(prefixes)]

You can use the following:
input_stuff = '1,2,3'
glob(folder+'['+input_stuff+']-my-file*')
EDIT: Since you said in your comment that you can't hardcode "-my-file", you can do something like:
input_stuff = '1,2,3'
name = "-my-file"
print glob.glob(folder+'['+input_stuff+']'+name+'*')
and then just change the "name" variable when you need to.

finding a file name from a substring

I have a list of my filenames that I've saved as follows:
filelist = os.listdir(mypath)
Now, suppose one of my files is something like "KRAS_P01446_3GFT_SOMETHING_SOMETHING.txt".
However, all I know ahead of time is that I have a file called "KRAS_P01446_3GFT_*". How can I get the full file name from file list using just "KRAS_P01446_3GFT_*"?
As a simpler example, I've made the following:
mylist = ["hi_there", "bye_there","hello_there"]
Suppose I had the string "hi". How would I make it return mylist[0] = "hi_there".
Thanks!

In the first example, you could just use the glob module:
import glob
import os
print '\n'.join(glob.iglob(os.path.join(mypath, "KRAS_P01446_3GFT_*")))
Do this instead of os.listdir.
The second example seems tenuously related to the first (X-Y problem?), but here's an implementation:
mylist = ["hi_there", "bye_there","hello_there"]
print '\n'.join(s for s in mylist if s.startswith("hi"))

If you mean "give me all filenames starting with some prefix", then this is simple:
[fname for fname in mylist if fname.startswith('hi')]
If you mean something more complex--for example, patterns like "some_*_file" matching "some_good_file" and "some_bad_file", then look at the regex module.

mylist = ["hi_there", "bye_there","hello_there"]
partial = "hi"
[fullname for fullname in mylist if fullname.startswith(partial)]

If the list is not very big, you can do a per item check like this.
def findMatchingFile (fileList, stringToMatch) :
listOfMatchingFiles = []
for file in fileList:
if file.startswith(stringToMatch):
listOfMatchingFiles.append(file)
return listOfMatchingFiles
There are more "pythonic" way of doing this, but I prefer this as it is more readable.

Why are these strings escaping from my regular expression in python?

In my code, I load up an entire folder into a list and then try to get rid of every file in the list except the .mp3 files.
import os
import re
path = '/home/user/mp3/'
dirList = os.listdir(path)
dirList.sort()
i = 0
for names in dirList:
match = re.search(r'\.mp3', names)
if match:
i = i+1
else:
dirList.remove(names)
print dirList
print i
After I run the file, the code does get rid of some files in the list but keeps these two especifically:
['00. Various Artists - Indie Rock Playlist October 2008.m3u', '00. Various Artists - Indie Rock Playlist October 2008.pls']
I can't understand what's going on, why are those two specifically escaping my search.

You are modifying your list inside a loop. That can cause issues. You should loop over a copy of the list instead (for name in dirList[:]:), or create a new list.
modifiedDirList = []
for name in dirList:
match = re.search(r'\.mp3', name)
if match:
i += 1
modifiedDirList.append(name)
print modifiedDirList
Or even better, use a list comprehension:
dirList = [name for name in sorted(os.listdir(path))
if re.search(r'\.mp3', name)]
The same thing, without a regular expression:
dirList = [name for name in sorted(os.listdir(path))
if name.endswith('.mp3')]

maybe you should use the glob module - here is you entire script:
>>> import glob
>>> mp3s = sorted(glob.glob('*.mp3'))
>>> print mp3s
>>> print len(mp3s)

As soon as you call dirList.remove(names), the original iterator doesn't do what you want. If you iterate over a copy of the list, it will work as expected:
for names in dirList[:]:
....
Alternatively, you can use list comprehensions to construct the right list:
dirList = [name for name in dirList if re.search(r'\.mp3', name)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get all characters after a certain character? - python

Let's say I have a list of strings like this: list1 = [ "filename1.txt", "file2.py", "fileexample.tiff" ] How would I be able to grab all characters after the '.', if it's not too much to ask, by using "for i in" and have them come back in a list, like this: ['.txt','.py','.tiff']

If you are dealing with filepaths, then you should use the os.path module import os.path list1 = ["filename1.txt","file2.py","fileexample.tiff"] print [os.path.splitext(f)[1] for f in list1] prints ['.txt', '.py', '.tiff']

import os for i in list1: fileName, fileExtension = os.path.splitext(i) print fileExtension second one : [i.split('.')[1] for i in list1]

map(lambda s:s.rsplit(".",1)[-1],my_list) is probably how I would do it which just splits from the right side exactly once on a period ... and gets whatever is on the right hand side for each item in the list

Related

Multiple non-nested if conditions in list comprehension without a terminal else

Remove different substrings from list of strings

How to form a glob that works for a wild char or exact match?

finding a file name from a substring

Why are these strings escaping from my regular expression in python?

Categories

Resources