Merging files within a subfolder for a large batch of subfolders - python

I am analyzing some data in a bioinformatics pipeline (qiime). I am trying to use a cat command to merge two files within a subfolder - I need to do this for 330 files, but am having trouble with the command string.
My current string:
cat AdapterRemoval/*.fastq/output_paired.collapsed AdapterRemoval/*.fastq/output_paired.collapsed.truncated > AdapterRemoval/*.fastq/mergedfile.fastq
This is the code I am using - with the * to indicate the command should look in all .fastq folders for the files output_paired and output_paired.collapsed then merge those files into one mergedfile.fastq and place it within the same folder the original files are found in.
For instance:
AdapterRemoval/C1.fastq/output_paired.collapsed AdapterRemoval/C1.fastq/output_paired.collapsed.truncated > AdapterRemoval/C1.fastq/mergedfile.fastq
So that those two files found within the AdapterRemoval/C1 subfolder would be merged and the merged file placed in that same subfolder.
In fact, when I type it out like this using the single filepath with a specific folder ID, it works. But when I put the * in place of the subfolder I get an error saying there is no such directory or file as AdapterRemoval/*.fastq/mergedfile.fastq
Does anyone know what I might be doing wrong? Any help would be much appreciated!
Thank you,
Sarah

Related

How do I copy subfolders into another location

I'm making a program to back up files and folders to a destination.
The problem I'm currently facing is if I have a folder inside a folder and so on, with files in between them, I can't Sync them at the destination.
e.g.:
The source contains folder 1 and file 2. Folder 1 contains folder 2, folder 2 contains folder 3 and files etc...
The backup only contains folder 1 and file 2.
If the backup doesn't exist I simply use: shutil.copytree(path, path_backup), but in the case, I need to sync I can't get the files and folders or at least I'm not seeing a way to do it. I have walked the directory with for path, dir, files in os.walk(directory) and even used what someone suggest in another post:
def walk_folder(target_path, path_backup):
for files in os.scandir(target_path):
if os.path.isfile(files):
file_name = os.path.abspath(files)
print(file_name)
os.makedirs(path_backup)
elif os.path.isdir(files):
walk_folder(files, path_backup)
Is there a way to make the directories in the backup folder from the ground up and then add the info alongside or is the only way to just delete the whole folder and use shutil.copytree(path, path_backup).
With makedirs, all it does is say it can't create because the folder already exists, this is understandable as it's trying to write in the Source folder and not in the backup. Is there a way to make the path to replace Source for backup?
If any more code is needed feel free to ask!

How to selectively zip files from specified folders

I want to selectively zip files from some folders but couldn't find a good way.
The source folder structure:
C:/temp/x86/file1.dll
C:/temp/x86/file2.dll
C:/temp/x86/file3.txt
And
C:/temp/x64/file1.dll
C:/temp/x64/file2.dll
C:/temp/x64/file4.dll
My requirement:
Zip file1.dll from C:/temp/x86 and C:/temp/x64 into C:/outputFolder/x86 and C:/outputFolder/x64 separately. The zip file name can be example.zip.
That's to say, after example.zip was unzipped, the output structure is as the following:
outputFolder/x86/file1.dll
And
outputFolder/x64/file1.dll
One solution is to manually copy the files into the destination folder, then zip them.
But I want to avoid the copy because in my actual codes because there are dozens of files which are big.
How can I achieve that? Thanks all very much!

os.walk isn't showing all the files in the given path

I'm trying to make my own backup program but to do so I need to be able to give a directory and be able to get every file that is somewhere deep down in subdirectories to be able to copy them. I tried making a script but it doesn't give me all the files that are in that directory. I used documents as a test and my list with items is 3600 but the amount of files should be 17000. why isn't os.walk showing everything?
import os
data = []
for mdir, dirs, files in os.walk('C:/Users/Name/Documents'):
data.append(files)
print(data)
print(len(data))
Use data.extend(files) instead of data.append(files).
files is a list of files in a directory. It looks like ["a.txt", "b.html"] and so on. If you use append, you end up with data looking like
[..., ["a.txt", "b.html"]]
whereas I suspect you're after
[..., "a.txt", "b.html"]
Using extend will provide the second behaviour.

Copy file, rename it, iterate and repeat

I use os.renmae to rename files and move them about, but i am failing at doing the following task.
I have a main folder containing sub-folders with the structure below.
Main folder "Back", containing sub-folders named with letters and numbers e.g. A01, A02, B01, B02, etc.. inside each of those folders is a set of files, amongst them is a file called "rad" so a file path example looks something like this:
Back/A01/rad
/A02/rad
/B01/rad
.../rad
I have another sub-folder called "rads" inside the main "Back"
Back/rads
What i want to do is copy all the rad files only, from each of the folders in back and move them to the folder "rads" and name each rad file based on the folder it came from.
e.g. rad_A01, rad_A02, rad_B01, etc...
I couldnt really figure out how to increase the folder number when i move the files.
os.rename ("Back//rad, Back/rads/rad_")
I thought of making a list of all the names of the files and then do something like from x in list, do os.rename (), but i didnt know how to tell python to name the file according to the subfolder it came from as they are not a continuous series..
Any help is appreciated.
Thanks in advance
import os
for subdir, dirs, files in os.walk('./Back/'):
for file in files:
filepath = subdir+os.sep+file
if filepath.endswith("rad.txt"):
par_dir = os.path.split(os.path.dirname(filepath))[1]
os.system('cp '+filepath+' ./Back/rads/rad_'+par_dir)
save this python file beside Back directory and it should work fine!
This code iterates over each files in each subdirectory of Back, checks all files with name rad.txt, appends name of parent directory and copy to the rads folder.
Note: I saved rad files with .txt extension, change it accordingly.

how to loop through certain directories in a directory structure in python?

I have the following directory structure
As you can see in the pics, there are a lot of .0 files in different directories. This directory structure exists for 36 folders(Human_C1 to C36) and each Human_C[num] folder has a 1_image_contours folder which has a contours folder with all related .0 files.
These .0 files contain some co-ordinates(x,y). I wish to loop through all these files, take the data in them and put it in an excel sheet(I am using pandas for this).
The problem is, how I loop through only these set of files and none else? (there can be .0 files in contour_image folders also)
Thanks in advance
Since your structure is not recursive I would recommend this:
import glob
zero_files_list = glob.glob("spinux/generated/Human_C*/*/contours/*.0")
for f in zero_files_list:
print("do something with "+f)
Run it from the parent directory of spinux or you'll have no match!
It will expand the pattern for the fixed directory tree above, just as if you used ls or echo in a linux shell.

Categories