I have about 500 Excel files in the format: data_1, data_2 ... data_500
However, not all file are there. File like data_3 is not in the folder.
I want to import all available data into dataframe.
However, the my code below will stop when it hit a name of file not in the list, say data_3
Can you please help me to skip these record?
Thank you,
HN
for i in range(500):
filename='data_'+ str(i) + 'xlsx'
output = pd.read_excel('PATH' + filename)
THE KEY IS CHECK IN FULL PATH IN glob.glob
import glob
for i in xlx_file_list:
filename = 'Excel_Sample' + str(i) + '.xlsx' #; print(filename)
full_path = 'D:\Python...\\' + filename #; print(full_path)
if full_path not in glob.glob('D:\Python...\*'):
print(filename, ' not in folder')
continue
outfile = pd.read_excel(full_path, sheet_name='data_sheet')
print(outfile)
Hi in your sample probably PATH is a variable, not a string, 'PATH'+filename cannot work.
i suggest to use os.path.join() to compose file path, don't use string composition for this.
There are two way to solve this problem:
Generate all names and see if the file exists:
import os
for i in range(500):
filename='data_'+ str(i) + 'xlsx'
if os.path.exists(filename)
output = pd.read_excel(filename)
or generate only the correct filename list:
import glob
for filename in glob.glob('data_*.xlsx'):
output = pd.read_excel(filename)
Related
My initial code is here:
import pandas as pd
import os
directory_in_str = input('\n\nEnter the name of the folder you would like to use. If there are spaces, replace with underscores: ')
directory_in_str.strip()
directory = os.fsencode(directory_in_str)
user = input('\nEnter your first initial and last name as one word (ex: username): ')
user.strip()
path1 = '/Users/'
path2 = '/Desktop/DataScience/'
dspath = path1 + user + path2
slash = '/'
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".xls") or filename.endswith(".xlsx"):
print(directory)
pathname = dspath + directory_in_str + slash + filename
print(filename)
#Global = pd.read_excel(pathname, sheet_name=0)
Stats = pd.read_excel(pathname, sheet_name=1)
listorder = ['1', '2', '3']
Stats = Stats.reindex(columns=listorder)
Stats.to_excel(filename, sheet_name='Statistics', index=False)
continue
else:
continue
I've included the filename print statement to insure that the correct path is being used. However, the print statement happens twice.
These are the statements printed.
b'testrearrange'
Testname.xlsx
b'testrearrange'
~$Testname.xlsx
Why are the two characters '~$' added? The error originates from the line
Stats = pd.read_excel(pathname, sheet_name=1)
with the error
ValueError: File is not a recognized excel file
Does anyone know how to fix this?
I think the files starting with "~$# are temporary excel files that are created when you open the file in excel. One option is to close the file, in which case the temporary file is deleted. Other option is to change the logic by which you list the files to be read so that it ignores files that start with ~. I like to use glob for this
from glob import glob
path = "C:/Users/Wolf/[!~]*.xls*"
files = glob(path)
for file in files:
print("Do your thing here")
Ive got a bunch of files in a folder all with the same extension:
[movie] Happy feet 2021-04-01 22-00-00.avi
[movie] Dogs Life 2021-06-01 22-03-50.avi
etc
would like to rename the start and end to:
Happy feet.avi
Dogs Life.avi
etc
Here is what I have so far:
import shutil
import os
source_dir = r"C:\\Users\\ED\\Desktop\\testsource"
target_dir = r"C:\\Users\\ED\\Desktop\\testdest"
data = os.listdir(source_dir)
print(data)
new_data = [file[8:].split(' 2021')[0] + '.txt' for file in data]
print(new_data)
for file in data:
os.replace(data, [file[8:].split(' 2021')[0] + '.txt')
for file_name in data:
shutil.move(os.path.join(source_dir, file_name), target_dir)
Im having trouble with the os.rename() part after I printed it out.
You can use (tested):
from glob import glob
import re
import shutil
import os
src = "C:/Users/ED/Desktop/testsource"
dst = "C:/Users/ED/Desktop/testdest"
for f in glob(f"{src}/**"):
fn = os.path.basename(f)
new_fn = re.sub(r"^\[.*?\](.*?) \d{4}-\d{2}-\d{2} \d{2}-\d{2}-\d{2}(\..*?)$", r"\1\2", fn).strip()
shutil.move(f, f"{dst}/{new_fn}")
For python 2:
for f in glob(src+"/**"):
fn = os.path.basename(f)
new_fn = re.sub(r"^\[.*?\](.*?) \d{4}-\d{2}-\d{2} \d{2}-\d{2}-\d{2}(\..*?)$", r"\1\2", fn).strip()
shutil.move(f, dst+"/"+new_fn)
If you're using r"foobar" I don't think you have to escape the \s, do you?
So it should be
source_dir = r"C:\Users\ED\Desktop\testsource"
target_dir = r"C:\Users\ED\Desktop\testdest"
So, there are some issues with the code.
In the for loop for os.replace(), you are passing the entire list of data as src which is ['Dogs Life 2021-06-01 22-03-50.avi', '2021-04-01 22-00-00.avi']
Instead what I did is use file in the loop.
Also with the statement os.replace(data, [file[8:].split(' 2021')[0] + '.txt') your variables inside os.replace() would be a list item, so I changed it string.
One last thing is that you need to use the full file path in the os.move() unless the files are in the current working directory
I didn't touch the shutil.move() function. Let me know if this works.
import shutil
import os
source_dir = r"C:\\Users\\ED\\Desktop\\testsource"
target_dir = r"C:\\Users\\ED\\Desktop\\testdest"
data = os.listdir(source_dir)
print(data)
new_data = [file[8:].split(' 2021')[0] + '.txt' for file in data]
print(new_data)
for file in data:
os.replace('C:\\Users\\ED\\Desktop\\testsource\\'+str(file), 'C:\\Users\\ED\\Desktop\\testsource\\'+str(file[8:].split(' 2021')[0] + '.txt'), src_dir_fd=None, dst_dir_fd=None)
I have a folder with images that are currently named with timestamps. I want to rename all the images in the directory so they are named 'captured(x).jpg' where x is the image number in the directory.
I have been trying to implement different suggestions as advised on this website and other with no luck. Here is my code:
path = '/home/pi/images/'
i = 0
for filename in os.listdir(path):
os.rename(filename, 'captured'+str(i)+'.jpg'
i = i +1
I keep getting an error saying "No such file or directory" for the os.rename line.
The results returned from os.listdir() does not include the path.
path = '/home/pi/images/'
i = 0
for filename in os.listdir(path):
os.rename(os.path.join(path,filename), os.path.join(path,'captured'+str(i)+'.jpg'))
i = i +1
The method rename() takes absolute paths, You are giving it only the file names thus it can't locate the files.
Add the folder's directory in front of the filename to get the absolute path
path = 'G:/ftest'
i = 0
for filename in os.listdir(path):
os.rename(path+'/'+filename, path+'/captured'+str(i)+'.jpg')
i = i +1
Two suggestions:
Use glob. This gives you more fine grained control over filenames and dirs to iterate over.
Use enumerate instead of manual counting the iterations
Example:
import glob
import os
path = '/home/pi/images/'
for i, filename in enumerate(glob.glob(path + '*.jpg')):
os.rename(filename, os.path.join(path, 'captured' + str(i) + '.jpg'))
This will work
import glob2
import os
def rename(f_path, new_name):
filelist = glob2.glob(f_path + "*.ma")
count = 0
for file in filelist:
print("File Count : ", count)
filename = os.path.split(file)
print(filename)
new_filename = f_path + new_name + str(count + 1) + ".ma"
os.rename(f_path+filename[1], new_filename)
print(new_filename)
count = count + 1
the function takes two arguments your filepath to rename the file and your new name to the file
Lets say I have n files in a directory with filenames: file_1.txt, file_2.txt, file_3.txt .....file_n.txt. I would like to import them into Python individually and then do some computation on them, and then store the results into n corresponding output files: file_1_o.txt, file_2_o.txt, ....file_n_o.txt.
I've figured out how to import multiple files:
import glob
import numpy as np
path = r'home\...\CurrentDirectory'
allFiles = glob.glob(path + '/*.txt')
for file in allFiles:
# do something to file
...
...
np.savetxt(file, ) ???
Not quite sure how to append the _o.txt (or any string for that matter) after the filename so that the output file is file_1_o.txt
Can you use the following snippet to build the output filename?
parts = in_filename.split(".")
out_filename = parts[0] + "_o." + parts[1]
where I assumed in_filename is of the form "file_1.txt".
Of course would probably be better to put "_o." (the suffix before the extension) in a variable so that you can change at will just in one place and have the possibility to change that suffix more easily.
In your case it means
import glob
import numpy as np
path = r'home\...\CurrentDirectory'
allFiles = glob.glob(path + '/*.txt')
for file in allFiles:
# do something to file
...
parts = file.split(".")
out_filename = parts[0] + "_o." + parts[1]
np.savetxt(out_filename, ) ???
but you need to be careful, since maybe before you pass out_filename to np.savetxt you need to build the full path so you might need to have something like
np.savetxt(os.path.join(path, out_filename), )
or something along those lines.
If you would like to combine the change in basically one line and define your "suffix in a variable" as I mentioned before you could have something like
hh = "_o." # variable suffix
..........
# inside your loop now
for file in allFiles:
out_filename = hh.join(file.split("."))
which uses another way of doing the same thing by using join on the splitted list, as mentioned by #NathanAck in his answer.
import os
#put the path to the files here
filePath = "C:/stack/codes/"
theFiles = os.listdir(filePath)
for file in theFiles:
#add path name before the file
file = filePath + str(file)
fileToRead = open(file, 'r')
fileData = fileToRead.read()
#DO WORK ON SPECIFIC FILE HERE
#access the file through the fileData variable
fileData = fileData + "\nAdd text or do some other operations"
#change the file name to add _o
fileVar = file.split(".")
newFileName = "_o.".join(fileVar)
#write the file with _o added from the modified data in fileVar
fileToWrite = open(newFileName, 'w')
fileToWrite.write(fileData)
#close open files
fileToWrite.close()
fileToRead.close()
I keep getting a 'Too many open files' error when doing something like this:
# read file names
file_names = []
for file_name in os.listdir(path):
if '.json' not in file_name: continue
file_names.append(file_name)
# process file names...
# iter files
for file_name in file_names:
# load file into DF
file_path = path + '/' + file_name
df = pandas.read_json(file_path)
# process the data, etc...
# not real var names, just for illustration purposes...
json_arr_1 = ...
json_arr_2 = ...
# save DF1 to new file
df_1 = pandas.DataFrame(data=json_arr_1)
file_name2 = os.getcwd() + '/db/' + folder_name + '/' + file_name
df_1.to_json(file_name2, orient='records')
# save DF2 to new file
df_2 = pandas.DataFrame(data=json_arr_2)
file_name3 = os.getcwd() + '/db/other/' + folder_name + '/' + file_name
df_2.to_json(file_name3, orient='records')
The DF documentation doesn't mention having to handle open or closed files and I don't think listdir keeps pointers to open files (should just return a list of strings).
Where am I going wrong?
It seems like a system issue, and not pandas issue.
You might need to increase the number of open files in the system.
How to increase number:
https://easyengine.io/tutorials/linux/increase-open-files-limit/
The following Q&A:
IOError: [Errno 24] Too many open files:
discuss about ulimit and the limit of open files
This Q&A discuss about number of open files in Linux:
https://unix.stackexchange.com/questions/36841/why-is-number-of-open-files-limited-in-linux