Iterate over first n files of file list - python

file_list is the folder path containing a list of files.
I want to do certain action on the files inside file_list. To perform the action on all the files, here is the python code;
import os
for filename in os.listdir(file_list):
print(filename)
What if I only want to perform the action on the first n files. How do I modify the code? I am open to totally new code to do the task.
I am using python v3.6

import os
for filename in os.listdir(file_list)[:n]:
print(filename)
Is it suitable for you?

Related

Extract a list of files with a certain criteria within subdirectory of zip archive in python

I want to access some .jp2 image files inside a zip file and create a list of their paths. The zip file contains a directory folder named S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE and I am currently reading the files using glob, after having extracted the folder.
I don't want to have to extract the contents of the zip file first. I read that I cannot use glob within a zip directory, nor I can use wildcards to access files within it, so I am wondering what my options are, apart from extracting to a temporary directory.
The way I am currently getting the list is this:
dirr = r'C:\path-to-folder\S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE'
jp2_files = glob.glob(dirr + '/**/IMG_DATA/**/R60m/*B??_??m.jp2', recursive=True)
There are additional different .jp2 files in the directory, for which reason I am using the glob wildcards to filter the ones I need.
I am hoping to make this work so that I can automate it for many different zip directories. Any help is highly appreciated.
I made it work with zipfile and fnmatch
from zipfile import ZipFile
import fnmatch
zip = path_to_zip.zip
with ZipFile(zipaki, 'r') as zipObj:
file_list = zipObj.namelist()
pattern = '*/R60m/*B???60m.jp2'
filtered_list = []
for file in file_list:
if fnmatch.fnmatch(file, pattern):
filtered_list.append(file)

Python: read folders that are not ZIP folders

Currently I read folders individually like so:
input_location = r'path\202206'
I would like to read in all the folders at once + not read in the zipped files.
Essentially the logic id like to perform is input_location = r'path\202206' + r'path\202207' + r'path\202207' + r'path\202208'
I cant just do input_location = r'path\ as it may read in those zip files I do not want.
Is there a way to read in the folders without reading in the zip files? Or explicitly list the folder names in one variable (input_location)?
IIUC: Collecting all directories from a directory can be done using a simple list comprehension as follows.
The glob library is nice, as it returns the full filepath, whereas a function such as os.listdir() only returns the filenames.
import os
from glob import glob
dirs = [f for f in glob('/path/to/files/*') if os.path.isdir(f)]
Output:
['/path/to/files/202207',
'/path/to/files/202206',
'/path/to/files/202208',
'/path/to/files/202209']
Then, your script can iterate over the list of directories as required.
For completeness, the directory content is a follows:
202206
202206.zip
202207
202207.zip
202208
202208.zip
202209
202209.zip
If you use pathlib from Pythons standard library you can get all entries in the folder and check if the entry is a folder.
from pathlib import Path
for entry in Path('/path/to/folder').glob('*'):
if entry.is_dir():
print(entry)
A os base approach. Notice that os.listdir returns the content of the directory in a basename form and not as a path.
import os
def my_dirs(wd):
return list(filter(os.path.isdir, (os.path.join(wd, f) for f in os.listdir('.'))))
working_dir = # add path
print(*my_dirs(working_dir), sep='\n')
Remarks: to make your program platform independent you always stuffs like os.path.join or os.sep for path manipulation

Extracting all file names in python

I have a application that converts from one photo format to another by inputting in cmd.exe following: "AppConverter.exe" "file.tiff" "file.jpeg"
But since i don't want to input this every time i want a photo converted, i would like a script that converts all files in the folder. So far i have this:
def start(self):
for root, dirs, files in os.walk("C:\\Users\\x\\Desktop\\converter"):
for file in files:
if file.endswith(".tiff"):
subprocess.run(['AppConverter.exe', '.tiff', '.jpeg'])
So how do i get the names of all the files and put them in subprocess. I am thinking taking basename (no ext.) for every file and pasting it in .tiff and .jpeg, but im at lost on how to do it.
I think the fastest way would be to use the glob module for expressions:
import glob
import subprocess
for file in glob.glob("*.tiff"):
subprocess.run(['AppConverter.exe', file, file[:-5] + '.jpeg'])
# file will be like 'test.tiff'
# file[:-5] will be 'test' (we remove the last 5 characters, so '.tiff'
# we add '.jpeg' to our extension-less string
All those informations are on the post I've linked in the comments o your original question.
You could try looking into os.path.splitext(). That allows you to split the file name into a tuple containing the basename and extension. That might help...
https://docs.python.org/3/library/os.path.html

Concatenating fasta files from different folders

I have a large numbers of fasta files (these are just text files) in different subfolders. What I need is a way to search through the directories for files that have the same name and concatenate these into a file with the name of the input files. I can't do this manually as I have 10000+ genes that I need to do this for.
So far I have the following Python code that looks through one of the directories and then uses those file names to search through the other directories. This returns a list that has the full path for each file.
import os
from os.path import join, abspath
path = '/directoryforfilelist/' #Directory for source list
listing = os.listdir(path)
for x in listing:
for root, dirs, files in os.walk('/rootdirectorytosearch/'):
if x in files:
pathlist = abspath(join(root,x))
Where I am stuck is how to concatenate the files it returns that have the same name. The results from this script look like this.
/directory1/file1.fasta
/directory2/file1.fasta
/directory3/file1.fasta
/directory1/file2.fasta
/directory2/file2.fasta
/directory3/file2.fasta
In this case I would need the end result to be two files named file1.fasta and file2.fasta that contain the text from each of the same named files.
Any leads on where to go from here would be appreciated. While I did this part in Python anyway that gets the job done is fine with me. This is being run on a Mac if that matters.
Not tested, but here's roughly what I'd do:
from itertools import groupby
import os
def conc_by_name(names):
for tail, group in groupby(names, key=os.path.split):
with open(tail, 'w') as out:
for name in group:
with open(name) as f:
out.writelines(f)
This will create the files (file1.fasta and file2.fasta in your example) in the current folder.
For each file of your list, allocate the target file in append mode, read each line of your source file and write it to the target file.
Assuming that the target folder is empty to start with, and is not in /rootdirectorytosearch.

Running a python script on all the files in a directory

I have a Python script that reads through a text csv file and creates a playlist file. However I can only do one at a time, like:
python playlist.py foo.csv foolist.txt
However, I have a directory of files that need to be made into a playlist, with different names, and sometimes a different number of files.
So far I have looked at creating a txt file with a list of all the names of the file in the directory, then loop through each line of that, however I know there must be an easier way to do it.
for f in *.csv; do
python playlist.py "$f" "${f%.csv}list.txt"
done
Will that do the trick? This will put foo.csv in foolist.txt and abc.csv in abclist.txt.
Or do you want them all in the same file?
Just use a for loop with the asterisk glob, making sure you quote things appropriately for spaces in filenames
for file in *.csv; do
python playlist.py "$file" >> outputfile.txt;
done
Is it a single directory, or nested?
Ex.
topfile.csv
topdir
--dir1
--file1.csv
--file2.txt
--dir2
--file3.csv
--file4.csv
For nested, you can use os.walk(topdir) to get all the files and dirs recursively within a directory.
You could set up your script to accept dirs or files:
python playlist.py topfile.csv topdir
import sys
import os
def main():
files_toprocess = set()
paths = sys.argv[1:]
for p in paths:
if os.path.isfile(p) and p.endswith('.csv'):
files_toprocess.add(p)
elif os.path.isdir(p):
for root, dirs, files in os.walk(p):
files_toprocess.update([os.path.join(root, f)
for f in files if f.endswith('.csv')])
if you have directory name you can use os.listdir
os.listdir(dirname)
if you want to select only a certain type of file, e.g., only csv file you could use glob module.

Categories