There is OSError when I try to set the output directory and write Prefix in front of "i" such as 'cal_' or 'edit_'. If I added the postfix like df.to_csv(i + '_edit.csv'), the result was "filename.csv_edit".
So the files were saved the input directory and I couldn't add any prefix and postfix. How to fix this error?
import pandas as pd
import glob
PathIn = r'C:\Users\input'
PathOut = r'C:\Users\output'
filenames = glob.glob(PathIn + "/*.csv")
file_list = []
for i in filenames:
df = pd.read_csv(i)
file_list.append(df)
df.columns = df.columns.str.replace('[','')
df.columns = df.columns.str.replace(']','')
df.to_csv(i + '.csv')
Try this one. This should work. It has the full code you want.
import os
import pandas as pd
PathIn = r'C:\Users\input'
PathOut = r'C:\Users\output'
file_list = []
for name in os.listdir(PathIn):
if name.endswith(".csv"):
#print(name)
df = pd.read_csv(PathIn + "\" + name)
file_list.append(df)
df.columns = df.columns.str.replace('[','')
df.columns = df.columns.str.replace(']','')
df.to_csv(PathOut + name)
The value of i in filenames is the absolute path of the csv file you are reading.
So if you have 3 csv files in your input directory, you filenames list will be like below :
['C:\Users\input\file1.csv',
'C:\Users\input\file2.csv',
'C:\Users\input\file3.csv']
Now you are trying to add a prefix in front of the elements of above list which would not be a valid path.
You need to fetch the filename of input file and append it with PathOut so that a valid path exists.
You can fetch the filenames in any directory as below :
filenames = []
for entry in os.listdir(PathIn):
if os.path.isfile(os.path.join(PathIn, entry)) and ".csv" in entry:
filenames.append(entry)
Now you can iterate over this list and do operations you were doing. For saving the final df to file in output directory, append the filenames with PathOut.
Related
I have a folder that includes folders and these folders include many csv files. I want to import and concatenate all of them in Python.
Let's say main folder: /main
subfolders: /main/main_1
csv: /main/main_1/first.csv
path='/main'
df_list = []
for file in os.listdir(path):
df = pd.read_csv(file)
df_list.append(df)
final_df = df.append(df for df in df_list)
What about this:
import pandas as pd
from pathlib import Path
directory = "path/to/root_dir"
# Read each CSV file in dir "path/to/root_dir"
dfs = []
for file in Path(directory).glob("**/*.csv"):
dfs.append(pd.read_csv(file))
# Put the dataframes to a single dataframe
df = pd.concat(dfs)
Change the path/to/root_dir to where ever your CSV files are.
I found a way to concat all of them but it doesn't satisfy to me as it takes too much time due to computational complexity.
path = "/main"
folders = []
directory = os.path.join(path)
for root,dirs,files in os.walk(directory):
folders.append(root)
del folders[0]
final = []
for folder in folders:
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(folder + "/*.csv"))))
final.append(df)
Remember to add back main to the path:
df =pd.read_csv(path + "/" + file)
I am trying to open multiple files with pandas into a dataframe.
Only files with a prefix ~$ show an error of
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x15Microso'
Here is two of my list of filepaths:
bulk_uploads /~$0730-0731.xlsx',
bulk_uploads /0701-0702.xlsx'
The one without the prefix opens perfectly fine, and I am not sure why the other one throws an error.
Here is the code I am trying:
import pandas as pd
import glob
path = 'bulk_uploads ' # use your path
all_files = glob.glob(path + "/*.xlsx")
li = []
for filename in all_files:
df = pd.read_excel(filename, sheet_name = 1)
df['Date'] = str(filename)[:-4]
li.append(df)
# frame = pd.concat(li, axis=0, ignore_index=True)
Is there either a way to chance any files that have this prefix to lose it, or another way around it?
It looks like they are files which I have previously opened (I have no files currently open)
import pandas as pd
import glob
import re
path = 'bulk_uploads ' # use your path
all_files = glob.glob(path + "/*.xlsx")
li = []
special=re.compile('$~') #####add more special characters if any
for filename in all_files:
if special.search(filename):
os.remove(filename)
else:
df = pd.read_excel(filename, sheet_name = 1)
df['Date'] = str(filename)[:-4]
li.append(df)
Can you give this a try and see if it works fine?
It seems that your folder is having temporary files..
I have the following dataframe:
import pandas as pd
df_Sensor = pd.DataFrame({'Name_Archive': ['SENSOR_1', 'SENSOR_250'],
'Type': ['Analog', 'Dig'],
'Value': [199, 0]})
I have two folders on my desktop. First the "Archive" folder that has the csv files with names:
SENSOR_1.csv
SENSOR_198.csv
SENSOR_250.csv
(its location is: C:/Users/User/Desktop/Archive)
The other folder is called "Archive_Select", this folder is empty.
(its location is: C:/Users/User/Desktop/Archive_Select)
I would like to go through the 'Name Archive' dataframe column and make a copy of that file, which is in the (Archive) folder in the new folder (Archive_Select).
For example, as the 'Name_Archive' column contains the name of 2 of my files in the 'Archive' folder, I would like that when I open the 'Archive_Select' folder, only the files would appear:
SENSOR_1.csv #and
SENSOR_250.csv
I tried to use the glob function, but I don't know how to do what I wanted:
import glob
All_Archives = glob.glob("C:/Users/Usuario/Desktop/Archive/*.csv")
for i in range(0, len(df_Sensor)):
for j in range(0, len(All_Archive)):
if(df_Sensor['Name_Archive'].iloc[i] == All_Archive.iloc[j]):
df_Sensor.to_csv("C:/Users/Usuario/Desktop/Archive_Select.csv")
You first define your source and destination directories
import pandas as pd
from shutil import copyfile
src_directory = "C:\Users\User\Desktop\Archive"
dest_directory = "C:\Users\User\Desktop\Archive_Select"
Then loop over your desired files from the dataframe to copy them
for fileName in df_Sensor['Name_Archive']:
src_file = src_directory + "\\" + fileName + ".csv"
dest_file = dest_directory + "\\" + fileName + ".csv"
copyfile(src_file, dest_file)
I have a folder with images that are currently named with some random image numbers. I want to rename all the images in the directory with a specific column of an excel sheet which I put on a data frame.
Note: The random number of images is the last part of the excel value. For example, if an image name in the folder is 'img1097' then in the excel sheet I can find the image in the column as 'First_Second_img1097'. So in the folder, I have images with names like 'img1098', 'img1099' etc, and in the data frame, I have values like as follows:
#my_dataframe
**Column**
First_Second_img1097
Table_Chairs_img1098
Cat_Image_img1099
I have been trying to implement different options with no luck. Here is my code:
import os,re
path = r'myfolder\user\allimages\Main\target'
files = os.listdir(path)
for index, file in enumerate(files):
tempname = file.split('.')[0] #no extentions
for i in tempname :
for picture_name in my_dataframe['Column']:
if temp_file_name == picture_name:
os.rename(os.path.join(path, file),
os.path.join(path,str(index) + '.jpg'))
I'm expecting an answer where I will be able to rename all the matching images with the cell values of the dataframe's "column" column. so 'img1097' should be renamed with 'First_Second_img1097'. Hope this makes it clear.
This will solve your problem:
import ntpath
import os,re
path = r'myfolder\user\allimages\Main\target'
pathlist = []
for root, dirs, files in os.walk(path):
for name in files:
pathlist.append(os.path.join(root, name))
for new_path in pathlist:
file_name = ntpath.basename(new_path)
tempname = file_name.split('.')[0]
for index, row in df["Columns"].iteritems():
new_row = row.split('_')[-1] ## Gives img1097
new_tempname = tempname.replace(" ", "") ## makes img 1097 to img1097
if new_row == new_tempname: ## Now the matching will work
os.rename(os.path.join(path, file_name), os.path.join(path, str(row)+'.jpg'))
To rename all image files in your directory, you can try the following python program. Just run the python script and the files will be automatically renamed corresponding to excel cell values:
import pandas as pd
import xlrd
import os
import glob
# Your directory
mydir = (os.getcwd()).replace('\\','/') + '/'
# Get data of cells from excel
df=pd.read_excel(r''+ mydir +'test.xlsx')
for i in range(len(df['Name'])):
print( df['Name'][i])
number_of_cell=len(df['Name'])
# Get the list of specified files in a directory
filelist=os.listdir(mydir)
for file in filelist[:]: # filelist[:] makes a copy of filelist.
if not(file.endswith('.png') or file.endswith('.jpg')):
filelist.remove(file)
print (filelist)
number_of_files=len(filelist)
# Rename all files in directory with a value of cell in excel
for i in range(number_of_files):
os.rename(mydir+ filelist[i], mydir+ df['Name'][i] + '.' + os.path.splitext(filelist[i])[1][1:])
print ('Success')
This is the excel file:
The file name of all images before and after:
==== EDITED ====
import pandas as pd
import xlrd
import os
import glob
# Your directory
mydir = (os.getcwd()).replace('\\','/') + '/'
# Get data of cells from excel
df=pd.read_excel(r''+ mydir +'Book1.xlsx')
for i in range(len(df['New_Image_Name'])):
print( df['New_Image_Name'][i])
number_of_cell=len(df['New_Image_Name'])
# Get the list of specified files in a directory
filelist=os.listdir(mydir)
for file in filelist[:]: # filelist[:] makes a copy of filelist.
if not(file.endswith('.png') or file.endswith('.jpg')):
filelist.remove(file)
print (filelist)
number_of_files=len(filelist)
# Rename all files in directory with a value of cell in excel
# Files will be ranamed with cell value if the digits of last name of file is matched with the last digits in excel cell
for i in range(number_of_files):
for n in range(number_of_cell):
if ((((df['New_Image_Name'][n].split('_'))[len((df['New_Image_Name'][n].split('_')))-1]).split('.'))[0] == (((filelist[i].split('_'))[len((filelist[i].split('_')))-1]).split('.'))[0]):
os.rename(mydir+ filelist[i], mydir+ df['New_Image_Name'][n] + '.' + os.path.splitext(filelist[i])[1][1:])
break
print ('Success')
The file name of all images before and after:
files before it were named
The file after being renamed
Hope this can help you. Have a nice day.
I have about 10 CSV files that I'd like to append into one file. My thought was to assign the file names to numbered data_files, and then append them in a while loop, but I'm having trouble updating the file to the next numbered date_file in my loop. I keep getting errors related to "data_file does not exist" and "cannot concatenate 'str' and 'int' objects". I'm not even sure if this is a realistic approach to my problem. Any help would be appreciated.
import pandas as pd
path = '//pathname'
data_file1= path + 'filename1.csv'
data_file2= path + 'filename2.csv'
data_file3= path + 'filename3.csv'
data_file4= path + 'filename4.csv'
data_file5= path + 'filename5.csv'
data_file6= path + 'filename6.csv'
data_file7= path + 'filename7.csv'
df = pd.read_csv(data_file1)
x = 2
while x < 8:
data_file = 'data file' + str(x)
tmdDF = pd.read_csv(data_file)
df = df.append(tmpDF)
x += x + 1
Not quite sure what you're doing in terms of constructing that string data_file within the loop. You can't address variables using a string of their name. Also as noted by Paulo, you're not incrementing the indices correctly either. Try the following code but note that for the purposes of merely concatenating csv files, you certainly do not need pandas.
import pandas
filenames = ["filename1.csv", "filename2.csv", ...] # Fill in remaining files.
df = pandas.DataFrame()
for filename in filenames:
df = df.append(pandas.read_csv(filename))
# df is now a dataframe of all the csv's in filenames appended together
You can use fileinput for this:
import fileinput
path = '//pathname'
files = [path + 'filename' + str(i) + '.csv' for i in range(1,8)]
with open('output.csv', 'w') as output, fileinput.input(files) as fh:
for line in fh:
if fileinput.isfirstline() and fileinput.lineno() != 1:
continue
output.write(line)