python open file matching pattern excluding substring

python open file matching pattern excluding substring - python

I need to open some files inside a folder in python
Say, I have the following files in the folder:
text_pbs.fna
text_pdom_fo_oo.fna
text_pdom_fo_oo_aa.fna
text_pdom_fo_oo.ali
text_pdom_ba_ar.fna
text_pdom_ba_ar_aa.fna
text_pdom_ba_ar.ali
text_pdom_ba_az.fna
text_pdom_ba_az_aa.fna
text_pdom_ba_az.ali
I want to open:
text_pdom_fo_oo.fna
text_pdom_ba_ar.fna
text_pdom_ba_az.fna
only.
I tried with glob:
glob.glob('*_pdom_*[^aa].fna')
But it doesn't work.
Many thanks to point out the problem in the above pattern. Is there any other work around for this?

The ^ is not handled and must be replaced by !, You should try this code:
import glob
glob.glob('*_pdom_*[!aa].fna')
gives the result:
['text_pdom_fo_oo.fna','text_pdom_ba_ar.fna','text_pdom_ba_az.fna']

Related

Python Convert Windows File path in a variable

Given is a variable that contains a windows file path. I have to then go and read this file. The problem here is that the path contains escape characters, and I can't seem to get rid of it. I checked os.path and pathlib, but all expect the correct text formatting already, which I can't seem to construct.
For example this. Please note that fPath is given, so I cant prefix it with r for a rawpath.
#this is given, I cant rawpath it with r
fPath = "P:\python\t\temp.txt"
file = open(fPath, "r")
for line in file:
print (line)
How can I turn fPath via some function or method from:
"P:\python\t\temp.txt"
to
"P:/python/t/temp.txt"
I've tried also tried .replace("\","/"), which doesnt work.
I'm using Python 3.7 for this.

You can use os.path.abspath() to convert it:
print(os.path.abspath("P:\python\t\temp.txt"))
>>> P:/python/t/temp.txt
See the documentation of os.path here.

I've solved it.
The issues lies with the python interpreter. \t and all the others don't exist as such data, but are interpretations of nonprint characters.
So I got a bit lucky and someone else already faced the same problem and solved it with a hard brute-force method:
http://code.activestate.com/recipes/65211/
I just had to find it.
After that I have a raw string without escaped characters, and just need to run the simple replace() on it to get a workable path.

You can use Path function from pathlib library.
from pathlib import Path
docs_folder = Path("some_folder/some_folder/")
text_file = docs_folder / "some_file.txt"
f = open(text_file)

if you would like to do replace then do
replace("\\","/")

When using python version >= 3.4, the class Path from module pathlib offers a function called as_posix, which will sort of convert a path to *nix style path. For example, if you were to build Path object via p = pathlib.Path('C:\\Windows\\SysWOW64\\regedit.exe'), asking it for p.as_posix() it would yield C:/Windows/SysWOW64/regedit.exe. So to obtain a complete *nix style path, you'd need to convert the drive letter manually.

I came across similar problem with Windows file paths. This is what is working for me:
import os
file = input(str().split('\\')
file = '/'.join(file)
This gave me the input from this:
"D:\test.txt"
to this:
"D:/test.txt"
Basically when trying to work with the Windows path, python tends to replace '' to '\'. It goes for every backslash. When working with filepaths, you won't have double slashes since those are splitting folder names.
This way you can list all folders by order by splitting '\' and then rejoining them by .join function with frontslash.
Hopefully this helps!

Finding latest file in a folder using python

I've searched for an answer for this but the answers still gave me an error message and I wasn't allowed to ask there because I had to make a new question. So here it goes...
I need my python script to use the latest file in a folder.
I tried several things, currently the piece of code looks like this:
list_of_files = glob.glob('/my/path/*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
But the code fails with the following comment:
ValueError: max() arg is an empty sequence
Does anyone have an idea why?

It should be ok if the list is not empty, but it seems to be. So first check if the list isn't empty by printing it or something similar.
I tested this code and it worked fine:
import os
import glob
mypath = "C:/Users/<Your username>/Downloads/*.*"
print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))

glob.glob has a limitation of not matching the files that start with a .
So, if you want to match these files, this is what you should do - (assume a directory having .picture.png in it)
import glob
glob.glob('.p*') #assuming you're already in the directory
Also, it would be an ideal way to check the number of files present in the directory, before operating on them.

Python Match Portion of File Name

I am trying to match file names within a folder using python so that I can run a secondary process on the files that match. My file names are such that they begin differently but match strings at some point as below:
3322_VEGETATION_AREA_2009_09
3322_VEGETATION_LINE_2009_09
4522_VEGETATION_POINT_2009_09
4422_VEGETATION_AREA_2009_09
8722_VEGETATION_LINE_2009_09
2522_VEGETATION_POINT_2009_09
4222_VEGETATION_AREA_2009_09
3522_VEGETATION_LINE_2009_09
3622_VEGETATION_POINT_2009_09
Would regex be the right approach to matching those files after the first underscore or am I overthinking this?

import glob
files = glob.glob("*VEGETATION*")
should do the trick. It should find all files in the current directory that contain "VEGETATION" somewhere in the filename

How to delete files using the syntax '*' with python3?

There are some files that named like percentxxxx.csv,percentyyyy.csv in the dir.I want to delete the files with the name begins with percent.
I find the os.remove function maybe can help me,bu I don't konw how to solve the problem.
Are there any other functions can delete files using the syntax percent*.csv ?
The following is my method:
system_dir=os.getcwd()
for fname in os.listdir(system_dir):
# print(fname)
if fname.startswith('report'):
os.remove(os.path.join(system_dir, fname))
I mainly want to know whether there are more easier methed ,for example using * syntax in the method.

Use glob:
import os
import glob
for csv in glob.glob("percent*.csv"):
os.remove(csv)

Find 'all' files in a directory, not all files found

Using python, I'm trying to find all files in /sys and match a certain file. The problem I'm having is that not all files are being found. It's not a matter of access. I know that python can read and write to the file, which I've tested manually using file.open("file_path","w") and file.write(). I just want to know whether there is some trick to locating files I'm missing here:
import os,re
for roots,dirs,files in os.walk('/sys'):
match=re.search(r'\S+/rq_affinity',roots)
if match:
print(match.group())
I've already tried writing every single file found using os.walk() to a file and then using the shell and grep to see if the file I'm looking for is there, so the problem isn't with matching.
FIXED search:
import os,re
for roots,dirs,files in os.walk('/sys'):
for file in files:
match=re.search(r'\S+/rq_affinity',os.path.join(roots,file))
if match:
print(match.group())

rq_affinity is a file isn't it? Why would you get that in roots?
Also the entries under /sys/dev/block are symlinks so you need to tell os.walk to follow them with followlinks=True.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python open file matching pattern excluding substring - python

The ^ is not handled and must be replaced by !, You should try this code: import glob glob.glob('_pdom_[!aa].fna') gives the result: ['text_pdom_fo_oo.fna','text_pdom_ba_ar.fna','text_pdom_ba_az.fna']

Related

Python Convert Windows File path in a variable

Finding latest file in a folder using python

Python Match Portion of File Name

How to delete files using the syntax '*' with python3?

Find 'all' files in a directory, not all files found

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python open file matching pattern excluding substring - python

The ^ is not handled and must be replaced by !, You should try this code: import glob glob.glob('*_pdom_*[!aa].fna') gives the result: ['text_pdom_fo_oo.fna','text_pdom_ba_ar.fna','text_pdom_ba_az.fna']

Related

Python Convert Windows File path in a variable

Finding latest file in a folder using python

Python Match Portion of File Name

How to delete files using the syntax '*' with python3?

Find 'all' files in a directory, not all files found

Categories

Resources

The ^ is not handled and must be replaced by !, You should try this code: import glob glob.glob('_pdom_[!aa].fna') gives the result: ['text_pdom_fo_oo.fna','text_pdom_ba_ar.fna','text_pdom_ba_az.fna']