NotADirectoryError when converting multiple xml files to .csv - python

CODE
import os
import pandas as pd
df = pd.DataFrame()
xml_file_path = "/Users/ruzi/Desktop/top1000_complete 2"
csv_file_path = "/Users/ruzi/Desktop/xml.csv"
if os.path.isdir(xml_file_path):
for e in os.listdir(xml_file_path):
new_path = xml_file_path + "/" + str(e)
if str(e) != '.DS_Store' and os.path.isdir(new_path):
for e1 in os.listdir(new_path):
next_new_path = new_path + "/" + str(e1)
if str(e1) != '.DS_Store' and os.path.isfile(next_new_path):
for e2 in os.listdir(next_new_path):
third_new_path = new_path + "/" + str(e1)
if str(e2) != '.DS_Store' and os.path.isfile(third_new_path):
data_frame = pd.read_xml(third_new_path)
df=df.append(data_frame)
data_frame = pd.DataFrame()
# Convert Into CSV
df.to_csv(csv_file_path, index=None)
ERROR MESSAGE
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3 /Users/ruzi/Documents/pythonProject/main.py
Traceback (most recent call last):
File "/Users/ruzi/Documents/pythonProject/main.py", line 14, in <module>
for e2 in os.listdir(next_new_path):
NotADirectoryError: [Errno 20] Not a directory: '/Users/ruzi/Desktop/top1000_complete 2/P04-3022/citing_sentences_annotated.json'
Process finished with exit code 1
[FILE LOCATION][1]
[FOLDER 1][2]
[FOLDER 2][3]
##images
[1]: https://i.stack.imgur.com/UEeki.png
[2]: https://i.stack.imgur.com/2CmnR.png
[3]: https://i.stack.imgur.com/g5fVU.png

for e1 in os.listdir(new_path):
next_new_path = new_path + "/" + str(e1)
if str(e1) != '.DS_Store' and os.path.isfile(next_new_path):
# At this point, next_new_path is a file, not a dir
for e2 in os.listdir(next_new_path):
Consider this file tree:
dir1/
subdir1/
subdir2/
file1
file2
e1 is any subpath in new_path (file or directory) (ex. subdir1 or file1)
next_new_path is the same including parent (ex. dir1/subdir1/ or dir1/file)
Then you check that next_new_path is a file (not a dir), so you exclude dir1/subdir/ and only keep dir1/file.
And then you call listdir on this, which is wrong because it is a file, and this is exactly what the error message says.
In 2021, I'd recommend using pathlib rather than os.path.

Related

How am I getting this error when looping through files in a directory? "OSError: [Errno 2] No such file or directory"

I have thousands of files I need to delete in a directory. I want to keep the first ten (alphabetically/numerically) that match the conditions. For example, I want to keep 'part-of-file-name-abc00000.filetype' but not 'part-of-file-name-abc42422.filetype'. Below is the code I'm using to do so:
import os
i = 0
for f in os.listdir('/dir/dir'):
if 'part-of-file-name' in f:
i = i + 1
if i > 10:
os.remove(f)
else:
os.remove(f)
print("Files found: " + str(i))
print("Files removed: " + str(i - 10))
This is the error I'm getting:
File "delete_data_files.py", line 11, in <module>
os.remove(f)
OSError: [Errno 2] No such file or directory: 'part-of-file-name-i-want-then-other-parts.filetype'
This makes no sense to me. The file obviously exists; otherwise, I would not be reading the entire file name in the error.
import os
i = 0
path = './dir/dir'
for f in os.listdir(path):
print "file path", f
f = path + "/" + f;
if 'part-of-file-name' in f:
i = i + 1
if i > 10:
os.remove(f)
else:
os.remove(f)
print("Files found: " + str(i))
print("Files removed: " + str(i - 10))
Output:
file path part-of-file-name.txt
Files found: 1
Files removed: -9
You are providing file name only. You need to provide full path for removing the files.
check out for your path directory
os.listdir('./dir/dir')
if the directories are in the script file you run. You can check our wether the path exists by
import os
path = './dir/dir'
print(os.path.exists(path))
# True
# Means the path exists if its false means you are directing the path to a false location

I have to run my code twice to get desired outcome, why?

I'm finding that I have to run my code twice for the desired output and I'm not sure why. It's also printing a long string of letters in the shell that aren't needed. I'd just like it to be a bit cleaner.
The code creates folders with subfolders, based on files names, then moves the files into specific subfolders.
Filename example is "A123456-20190101-A01.mp3"
import os
import shutil
path = "/Volumes/ADATA UFD/For script/Files"
file_names = [file for file in os.listdir(path) if
os.path.isfile(os.path.join(path, file))]
file_map = {'O':'1-Original','P':'2-PreservationMaster','M':'3-Mezzanine','T':'4-Presentation','A':'5-Access','R':'6-Reference'}
parent_folders = set(file_name.rsplit('-', 1)[0] for file_name in file_names)
sub_folders = ['1-Original','2-PreservationMaster','3-Mezzanine','4-Presentation','5-Access','6-Reference']
for folder in parent_folders:
folder_path = os.path.join(path, folder)
try:
os.mkdir(folder_path)
except:
print('folder already exist:', folder_path)
for folders in sub_folders:
try:
folders_path = os.path.join(folder_path, folders)
os.mkdir(folders_path)
except:
print('folder already exists:', folders_path)
for file_name in file_names:
parent_folder = file_name.rsplit('-', 1)[0]
ext = file_name[19]
print(ext)
dest = os.path.join(path, parent_folder, file_map[ext.upper()], file_name)
src = os.path.join(path, file_name)
try:
shutil.move(src, dest)
except Exception as e:
print(e)
I'm getting this error message:
Traceback (most recent call last):
File "/Volumes/ADATA UFD/For script/MoveFilesToPreservationBundleTest3.py", line 30, in <module>
dest = os.path.join(path, parent_folder, file_map[ext.upper()], file_name)
builtins.KeyError: '0'

How do I rename multiple files in Python, using part of the existing name?

I have a few hundred .mp4 files in a directory. When originally created their names were set as "ExampleEventName - Day 1", "ExampleEventName - Day 2" etc. thus they are not in chronological order.
I need a script to modify each of their names by taking the last 5 characters in the corresponding string and add it to the front of the name so that File Explorer will arrange them properly.
I tried using the os module .listdir() and .rename() functions, inside a for loop. Depending on my input I get either a FileNotFoundError or a TypeError:List object is not callable.
import os
os.chdir("E:\\New folder(3)\\New folder\\New folder")
for i in os.listdir("E:\\New folder(3)\\New folder\\New folder"):
os.rename(i, i[:5] +i)
Traceback (most recent call last):
File "C:/Python Projects/Alex_I/venv/Alex_OS.py", line 15, in <module>
os.rename(path + i, path + i[:6] +i)
FileNotFoundError: [WinError 2] The system cannot find the file specified:
import os, shutil
file_list = os.listdir("E:\\New folder(3)\\New folder\\New folder")
for file_name in file_list("E:\\New folder(3)\\New folder\\New folder"):
dst = "!#" + " " + str(file_name) #!# meant as an experiment
src = "E:\\New folder(3)\\New folder\\New folder" + file_name
dst = "E:\\New folder(3)\\New folder\\New folder" + file_name
os.rename(src, dst)
file_name +=1
Traceback (most recent call last):
File "C:/Python Projects/Alex_I/venv/Alex_OS.py", line 14, in <module>
for file_name in file_list("E:\\New folder(3)\\New folder\\New folder"):
TypeError: 'list' object is not callable
Some other approach:
Not based on based length ( 5 for subname )
import glob
import os
# For testing i created 99 files -> asume last 5 chars but this is wrong if you have more files
# for i in range(1, 99):
# with open("mymusic/ExampleEventName - Day {}.mp4".format(i), "w+") as f:
# f.flush()
# acording to this i will split the name at - "- Day X"
files = sorted(glob.glob("mymusic/*"))
for mp4 in files:
# split path from file and return head ( path ), tail ( filename )
head, tail = os.path.split(mp4)
basename, ext = os.path.splitext(tail)
print(head, tail, basename)
num = [int(s) for s in basename.split() if s.isdigit()][0] #get the number extracted
newfile = "{}\\{}{}{}".format(head, num, basename.rsplit("-")[0][:-1], ext) # remove - day x and build filename
print(newfile)
os.rename(mp4, newfile)
You're having multiple problems:
You're trying to increment a value that should not be incremented. Also you've created the list file_list, and thus it should not take any arguments anymore.
When using the syntax:
for x in y:
you do not have to increment the value. It will simply iterate through the list until there is no more left.
Therefore you simply have to leave out the incrementation and iterate through the list file_list.
import os, shutil
file_list = os.listdir("E:\\New folder(3)\\New folder\\New folder")
for file_name in file_list: #removed the argument, the as file_list is a list and thus not callable.
dst = "!#" + " " + str(file_name) #!# meant as an experiment
src = "E:\\New folder(3)\\New folder\\New folder" + file_name
dst = "E:\\New folder(3)\\New folder\\New folder" + file_name
os.rename(src, dst)
#file_name +=1 removed this line
Now your solution should work.

Rename duplicate files using Python

I'm trying to write a Python script that renames all duplicate file names recursively (i.e. inside all directories)
I already searched the web and Stack Overflow but I couldn't find any answer...
Here's my code:
#!/usr/bin/env python3.6
import os
import glob
path = os.getcwd()
file_list = []
duplicates={}
# remove filename duplicates
for file_path in glob.glob(path + "/**/*.c", recursive=True):
file = file_path.rsplit('/', 1)[1]
if file not in file_list:
file_list.append(file)
else:
if file in duplicates.keys():
duplicates[file] += 1
lista = []
lista.append(file_path)
os.rename(file_path, file_path.rsplit('/', 1)[:-1] + '/' + str(duplicates[file]) + file)
else:
duplicates[file] = 1
os.rename(file_path, file_path.rsplit('/', 1)[:-1] + '/' + str(duplicates[file]) + file)
And this is the error I'm getting:
Traceback (most recent call last):
File "/home/andre/Development/scripts/removeDuplicates.py", line 22, in <module>
os.rename(file_path, file_path.rsplit('/', 1)[:-1] + '/' + str(duplicates[file]) + file)
TypeError: can only concatenate list (not "str") to list
I know why I'm getting this error, but my question is: Is there a more clever way to do this? I'd also like to rename all duplicate directory names, but I still didn't figure it out...

Python: Print file names and their directory based on their file size

I want to print filenames and their directory if their filesize is more than a certain amount. I wrote one and set the bar 1KB, but it doesn't work even if there are plenty of files larger than 1KB.
import os, shutil
def deleteFiles(folder):
folder = os.path.abspath(folder)
for foldername, subfolders, filenames in os.walk(folder):
for filename in filenames:
if os.path.getsize(filename) > 1000:
print(filename + ' is inside: ' + foldername)
deleteFiles('C:\\Cyber\\Downloads')
And I got 'Nothing'!
and then I wrote codes in interactive shell, I got following error:
Traceback (most recent call last):
File "<pyshell#14>", line 3, in <module>
if os.path.getsize(filename) > 100:
File "C:\Users\Cyber\Downloads\lib\genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError:
I am wondering How I can fix my code.
os can't find the file without a given path, following your code, you have to re-specify the absolute path. Replace
if os.path.getsize(filename) > 1000:
with
if os.path.getsize(os.path.abspath(foldername + "/" + filename)) > 1000:
And it should work.
Replace:
deleteFiles('C:\\Cyber\\Downloads')
with
import os
a = 'c:' # removed slash
b = 'Cyber' # removed slash
c = 'Downloads'
path = os.path.join(a + os.sep, b, c)
deleteFiles(path)

Categories