The purpose of this code is:
Read a csv file which contains a column for a list of file names
here is the csv file:
https://drive.google.com/open?id=0B5bJvxM9TZkhVGI5dkdLVzAyNTA
Then check a specific folder to check if the files exist or not
If its found a file is not in the list delete it
here is the code:
import pandas as pd
import os.path
data = pd.read_csv('data.csv')
names = data['title']
path = "C:\\Users\\Sayed\\Desktop\\Economic Data"
for file in os.listdir(path):
os.path.exists(file)
print(file)
file = os.path.join(path, file)
fileName = os.path.splitext(file)
if fileName not in names:
print('error')
os.remove(file)
I modified the first code, and this is the new code and I got no error but the simply delete all the files in the directory
os.chdir does not return anything, so assigning the result to path means that path has None, which causes the error.
Since you're using pandas, here's a little trick to speed this up using pd.Series.isin.
root = "C:\Users\Sayed\Desktop\Economic Data"
files = os.listdir(root)
for f in data.loc[~data['title'].isin(files), 'title'].tolist():
try:
os.remove(os.path.join(root, f))
except OSError:
pass
Added a try-except check in accordance with EAFP (since I'm not doing an os.path.exists check here). Alternatively, you could add a filter based on existence using pd.Series.apply:
m = ~data['title'].isin(files) & data['title'].apply(os.path.exists)
for f in data.loc[m, 'title'].tolist():
os.remove(os.path.join(root, f))
Your path is the return value of the os.chdir() call. Which is obviously None.
You want to set path to the string representing the path ... leave the chdir out.
Related
I have a one folder, within it contains 5 sub-folders.
Each sub folder contains some 'x.txt','y.txt' and 'z.txt' files and it repeats in every sub-folders
Now I need to read and print only 'y.txt' file from all sub-folders.
My problem is I'm unable to read and print 'y.txt' files. Can you tell me how solve this problem.
Below is my code which I have written for reading y.txt file
import os, sys
import pandas as pd
file_path = ('/Users/Naga/Desktop/Python/Data')
for root, dirs, files in os.walk(file_path):
for name in files:
print(os.path.join(root, name))
pd.read_csv('TextInformation.txt',delimiter=";", names = ['Name', 'Value'])
error :File TextInformation.txt does not exist: 'TextInformation.txt'
You could also try the following approach to fetch all y.txt files from your subdirectories:
import glob
import pandas as pd
# get all y.txt files from all subdirectories
all_files = glob.glob('/Users/Naga/Desktop/Python/Data/*/y.txt')
for file in all_files:
data_from_this_file = pd.read_csv(file, sep=" ", names = ['Name', 'Value'])
# do something with the data
Subsequently, you can apply your code to all the files within the list all_files. The great thing with glob is that you can use wilcards (*). Using them you don't need the names of the subdirectories (you can even use it within the filename, e.g. *y.txt). Also see the documentation on glob.
Your issue is forgot adding the parent path of 'y.txt' file. I suggest this code for you, hope it help.
import os
pth = '/Users/Naga/Desktop/Python/Data'
list_sub = os.listdir(pth)
filename = 'TextInformation.txt'
for sub in list_sub:
TextInfo = open('{}/{}/{}'.format(pth, sub, filename), 'r').read()
print(TextInfo)
I got you a little code. you can personalize it anyway you like but the code works for you.
import os
for dirPath,foldersInDir,fileName in os.walk(path_to_main_folder):
if fileName is not []:
for file in fileName:
if file.endswith('y.txt'):
loc = os.sep.join([dirPath,file])
y_txt = open(loc)
y = y_txt.read()
print(y)
But keep in mind that {path_to_main} is the path that has the subfolders.
I've tried to write some code which will rename some files in a folder - essentially, they're listed as xxx_(a).bmp whereas they need to be xxx_a.bmp, where a runs from 1 to 2000.
I've used the inbuilt os.rename function to essentially swap them inside of a loop to get the right numbers, but this gives me FileNotFoundError [WinError2] the system cannot find the file specified Z:/AAA/BBB/xxx_(1).bmp' -> 'Z:/AAA/BBB/xxx_1.bmp'.
I've included the code I've written below if anyone could point me in the right direction. I've checked that I'm working in the right directory and it gives me the directory I'm expecting so I'm not sure why it can't find the files.
import os
n = 2000
folder = r"Z:/AAA/BBB/"
os.chdir(folder)
saved_path = os.getcwd()
print("CWD is" + saved_path)
for i in range(1,n):
old_file = os.path.join(folder, "xxx_(" + str(i) + ").bmp")
new_file = os.path.join(folder, "xxx_" +str(i)+ ".bmp")
os.rename(old_file, new_file)
print('renamed files')
The problem is os.rename doesn't create a new directory if the new name is a filename in a directory that does not currently exist.
In order to create the directory first, you can do the following in Python3:
os.makedirs(dirname, exist_ok=True)
In this case dirname can contain created or not-yet-created subdirectories.
As an alternative, one may use os.renames, which handles new and intermediate directories.
Try iterating files inside the directory and processing the files that meet your criteria.
from pathlib import Path
import re
folder = Path("Z:/AAA/BBB/")
for f in folder.iterdir():
if '(' in f.name:
new_name = f.stem.replace('(', '').replace(')', '')
# using regex
# new_name = re.sub('\(([^)]+)\)', r'\1', f.stem)
extension = f.suffix
new_path = f.with_name(new_name + extension)
f.rename(new_path)
I have two folders with images. Let the two folder names A and B. A contains 100 files and B has only 80 files. Both the files have the same name. I want to save only the 80 files from A which has the same correspondence to B in folder C.
Here is a part of my code. However, it is throwing error :
Required argument 'img' (pos 2) not found.
path1= '/home/vplab/Kitty/Saliency Dataset/PiCANet-Implementation/TrainSet/images'
path_mask= '/home/vplab/Kitty/Saliency Dataset/PiCANet-Implementation/TrainSet/masks'
save_path = '/home/vplab/Kitty/Saliency Dataset/PiCANet-Implementation/TrainSet/exp'
for file in os.listdir(path1):
for file1 in os.listdir(path_mask):
img_name = file[:-4]
mask_name =file1[:-4]
if img_name == mask_name:
cv2.imwrite(os.path.join(save_path,img_name))
Your issue here is that you are not passing a file object to cv2.imwrite(os.path.join(save_path,img_name)) when trying to perform the copy; that's what the error is telling you.
However, your current approach includes a nested for loop which will give poor performance. If you only want to know the files that the directories have in common, you can create a set of the file names in each directory and find the intersection. Then you just need to iterate through the common files and copy them over (as said in the comments, there's no need for cv2 here - they may be images but they're just regular files that can be copied).
import os
from shutil import copyfile
dir_1 = 'A'
dir_2 = 'B'
output_dir = 'C'
files_1 = os.listdir(dir_1)
files_2 = os.listdir(dir_2)
# Find the common files between both
common_files = set(files_1).intersection(files_2)
# Copy the common files over.
for file in common_files:
copyfile(os.path.join(dir_1, file),
os.path.join(output_dir, file))
If the reason that you are stripping the last characters from the files in os.listdir is because the files have the same name but different extensions, you only need to make two small modifications (where here I'm assuming the extension is .png that needs to be added back later):
files_1 = [item[:-4] for item in os.listdir(dir_1)]
files_2 = [item[:-4] for item in os.listdir(dir_2)]
And:
for file in common_files:
file = file + '.png' # Add the extension back on to the file name
copyfile(os.path.join(dir_1, file),
os.path.join(output_dir, file))
The any() method returns True if any element of an iterable is True. If not, any() returns False. shutil.copy - Copies the file src to the file or directory dst.
import os
import shutil
def read_file(folderName,folderPath):
''' Return list of files name '''
path = folderPath+folderName
return [file for file in os.listdir(path)]
def save_file(soureFolderName,destFolderName,folderPath,fileName):
''' Save file on destination folder'''
try:
source_path = folderPath+soureFolderName+"/"+fileName
dest_path = folderPath+destFolderName+"/"+fileName
shutil.copy(source_path, dest_path)
except Exception as e:
print(e)
base_path = '/home/vplab/Kitty/Saliency Dataset/PiCANet-Implementation/TrainSet/'
folder_images_files = read_file('images',base_path)
folder_masks_file = read_file('masks',base_path)
for file_1 in folder_images_files:
#Check folder A file is exists in folder B
if any(file_1 == file_2 for file_2 in folder_masks_file):
save_file("images","exp",base_path,file_1)
I have been working this challenge for about a day or so. I've looked at multiple questions and answers asked on SO and tried to 'MacGyver' the code used for my purpose, but still having issues.
I have a directory (lets call it "src\") with hundreds of files (.txt and .xml). Each .txt file has an associated .xml file (let's call it a pair). Example:
src\text-001.txt
src\text-001.xml
src\text-002.txt
src\text-002.xml
src\text-003.txt
src\text-003.xml
Here's an example of how I would like it to turn out so each pair of files are placed into a single unique folder:
src\text-001\text-001.txt
src\text-001\text-001.xml
src\text-002\text-002.txt
src\text-002\text-002.xml
src\text-003\text-003.txt
src\text-003\text-003.xml
What I'd like to do is create an associated folder for each pair and then move each pair of files into its respective folder using Python. I've already tried working from code I found (thanks to a post from Nov '12 by Sethdd, but am having trouble figuring out how to use the move function to grab pairs of files. Here's where I'm at:
import os
import shutil
srcpath = "PATH_TO_SOURCE"
srcfiles = os.listdir(srcpath)
destpath = "PATH_TO_DEST"
# grabs the name of the file before extension and uses as the dest folder name
destdirs = list(set([filename[0:9] for filename in srcfiles]))
def create(dirname, destpath):
full_path = os.path.join(destpath, dirname)
os.mkdir(full_path)
return full_path
def move(filename, dirpath):
shutil.move(os.path.join(srcpath, filename)
,dirpath)
# create destination directories and store their names along with full paths
targets = [
(folder, create(folder, destpath)) for folder in destdirs
]
for dirname, full_path in targets:
for filename in srcfile:
if dirname == filename[0:9]:
move(filename, full_path)
I feel like it should be easy, but Python isn't something I work with everyday and it's been a while since my scripting days... Any help would be greatly appreciated!
Thanks,
WK2EcoD
Use the glob module to interate all of the 'txt' files. From that you can parse and create the folders and copy the files.
The process should be as simple as it appears to you as a human.
for file_name in os.listdir(srcpath):
dir = file_name[:9]
# if dir doesn't exist, create it
# move file_name to dir
You're doing a lot of intermediate work that seems to be confusing you.
Also, insert some simple print statements to track data flow and execution flow. It appears that you have no tracing output so far.
You can do it with os module. For every file in directory check if associated folder exists, create if needed and then move the file. See the code below:
import os
SRC = 'path-to-src'
for fname in os.listdir(SRC):
filename, file_extension = os.path.splitext(fname)
if file_extension not in ['xml', 'txt']:
continue
folder_path = os.path.join(SRC, filename)
if not os.path.exists(folder_path):
os.mkdir(folderpath)
os.rename(
os.path.join(SRC, fname),
os.path.join(folder_path, fname)
)
My approach would be:
Find the pairs that I want to move (do nothing with files without a pair)
Create a directory for every pair
Move the pair to the directory
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os, shutil
import re
def getPairs(files):
pairs = []
file_re = re.compile(r'^(.*)\.(.*)$')
for f in files:
match = file_re.match(f)
if match:
(name, ext) = match.groups()
if ext == 'txt' and name + '.xml' in files:
pairs.append(name)
return pairs
def movePairsToDir(pairs):
for name in pairs:
os.mkdir(name)
shutil.move(name+'.txt', name)
shutil.move(name+'.xml', name)
files = os.listdir()
pairs = getPairs(files)
movePairsToDir(pairs)
NOTE: This script works when called inside the directory with the pairs.
Basically, the problem I'm have is trying to open multiple files in a for loop. The filename has this format:
filename = 'mms1_fgm_srvy_l2_20160104_v4.18.0.cdf'
With '20160104' being the date, which I know how to update in the loop. The problem is that the '18' attached at the end isn't constant for every file, and I don't know how it changes, unlike the dates. I was wondering is there is a way to update the number, and check if the file exists in my directory.
As always, any help would be greatly appreciated. Thanks.
You can use the glob.glob() function with a suitable filename pattern to get a list of files (that exist) which match the pattern.
For example:
import glob
pattern = 'mms1_fgm_srvy_l2_*_v4.*.0.cdf'
for filename in glob.glob(pattern):
with open(filename) as file:
process(file)
import os
BASE_NAME = 'mms1_fgm_srvy_l2_20160104_v4.{}.0'
EXT = '.cdf'
attempts = int(input('Check file up to: '))
for num in range(attempts):
file_name = BASE_NAME.format(num) + EXT
if os.path.isfile(file_name):
# open file here
print("Opened File")
else:
print("File does not exist")
Checks if the file exists and if it does you can load it and save it how ever you want else it will print the the file doesn't exist