Pandas - Move output columns into a new sheet - python

I have many dataframes as txt files that I'm converting into xlsx. For each file, I want to take my output columns and move them into a new sheet called "Analyzed Data". I'm not sure how to do this with ExcelWriter:
writer = pd.ExcelWriter('filepath', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name = ' Data Analyzed')
writer.save()
My understanding is that this requires my file to be xlsx, I have to write the filepath separately for each xlsx file, and I'm not sure how to select only my output columns as the ones to move to the new sheet. Each file has a different amount of columns with different column names. My code is below:
import os
import pandas as pd
path = r'C:\Users\Me\1Test'
filelist = []
for root, dirs, files in os.walk(path):
for f in files:
if not f.endswith('.txt'):
continue
filelist.append(os.path.join(root, f))
for f in filelist:
df = pd.read_table(f)
col = df.iloc[ : , : -3]
df['Average'] = col.mean(axis = 1)
out = (df.join(df.drop(df.columns[[-3,-1]], axis=1)
.sub(df[df.columns[-3]], axis=0)
.add_suffix(' - Background')))
out.to_excel(f.replace('txt', 'xlsx'), 'Sheet1')

Related

How to keep top 500 rows a csv loop (python) and overwrite each file

I am trying to read more than 100 csv files in python to keep the TOP 500 rows (they each have more than 55,0000 rows). So far I know how to do that, but I need save each modified file in the loop with its own filename in csv format. because normally I can output the concatenated dataframe to one big csv file, but this time I need to basically truncate each csv file to only keep top 500 rows and save each.
this is the code I have had so far:
import pandas as pd
import glob
FolderName = str(input("What's the name of the folder are you comparing? "))
path = str(input('Enter full path of the folder: '))
#r'C:\Users\si\Documents\UST\AST' # use your path
all_files = glob.glob(path + "/*.csv")
#list1 = []
d = {}
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, nrows=500)
#list1.append(df)
d[filename] = df.columns
#frame = pd.concat(list1, axis=0, ignore_index=True)
frame = pd.DataFrame.from_dict(d, orient='index')
output_path = r'C:\Users\si\Downloads\New\{}_header.xlsx'.format(FolderName)
frame.to_excel(output_path)
Dataframes can write as well as read CSVs. So, just create and call to_csv with the same filename.
import pandas as pd
import glob
FolderName = str(input("What's the name of the folder are you comparing? "))
path = input('Enter full path of the folder: ')
all_files = glob.glob(path + "/*.csv")
for filename in all_files:
pd.read_csv(filename, index_col=None, header=0, nrows=500).to_csv(filename)

Extract single column from multiple CSVs and save to new CSV

I would like to read out a specific column from over 100 CSV files to create a new CSV file. The source column's header will be renamed with the filename the column is extracted from.
I can get the individual columns, but I have been unable to rename each column's header without the ".csv" extension:
import os
import pandas as pd
folder = "C:/Users/Doc/Data"
files = os.scandir(folder)
E2080 = []
with os.scandir(folder) as files:
for file in files:
#print(file)
df = pd.read_csv(file, index_col=None)
dist = {file: (df['lnt_dist'])}
E = pd.DataFrame(dist)
E2080.append(E)
dist = pd.concat(E2080, ignore_index=False, axis=1)
dist.head()
dist.to_csv('E2080', index=False)
This is the final code that worked for me (see output 1):
E2080 = []
with os.scandir(folder) as files:
for file in files:
#print(file)
df = pd.read_csv(file, index_col=None)
dist = {file: (df['lnt_dist'])}
E = pd.DataFrame(dist)
E_1 = E.rename(columns={file: file.name.split('.')[0]}) # rename df header while dropping the ext **[.csv]** and the `os.scandir` attribute `<DirEntry>`
E2080.append(E_1)
dist = pd.concat(E_28, ignore_index=False, axis=1)
#dist.head()
dist.to_csv('E2080.csv', index=False)
You should use file.name instead of file to get string with name.
And with string you can use .split(".") to get name without extension.
for file in os.scandir(folder):
print(file.name, '=>', file.name.split(".")[0])
Or you could use pathlib.Path instead of os.scandir() to have more functions.
for file in pathlib.Path('test').iterdir():
print(file.name, '=>', file.stem)

Read from mutiple excel and write to one file

I am trying to read data from multiple xls files and write it to one single file.
My code below is writing only the first file. Not sure what I am missing.
import glob import os import pandas as pd
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
r.append(os.path.join(root, name))
return r
files = list_files("C:\\Users\\12345\\BOFS")
for file in files:
df = pd.read_excel(file)
new_header = df.iloc[1]
df = df[2:]
df.columns = new_header
with pd.ExcelWriter("C:\\Users\\12345\\Test\\Test.xls", mode='a') as writer:
df.to_excel(writer,index=False, header=True,)
Documentation says:
ExcelWriter can also be used to append to an existing Excel file:
with pd.ExcelWriter('output.xlsx',
mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet_name_3')
And that probably replaces given sheet
But you could use pd.concat(<dataframes>) to concatenate dataframes and write all data at once in a single sheet.
I tested this piece of code, hopefully its work in your case.
import glob, os
os.chdir("D:/Data Science/stackoverflow")
for file in glob.glob("*.xlsx"):
df = pd.read_excel(file)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()

Multiple tabs in single excel

I am using below code to create a single excel with multiple tab based on the csv files present on path. I have two files in my path. so instead of getting two tabs in a single excel getting one tab with blank. Please help me to fix this code.
import os
import glob
import xlsxwriter
import csv
import pandas
path='/axp/buanalytics/csgsn/dev/GSN/VGEN_Files/Test/Tulu/VG/Data/'
flist = [os.path.basename(x) for x in glob.glob(os.getcwd() + '/axp/buanalytics/csgsn/dev/GSN/VGEN_Files/Test/Tulu/VG/Data/*.csv')]
workbook = xlsxwriter.Workbook('/axp/buanalytics/csgsn/dev/GSN/VGEN_Files/Test/Tulu/VG/Data/split_book.xlsx')
for sh in flist:
worksheet = workbook.add_worksheet(sh)
with open(sh, 'rb') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
worksheet.write(r, c, col)
workbook.close()
Three problems:
1) flist = [os.path.basename(x) for x in glob.glob(os.getcwd() + '/axp/buanalytics/csgsn/dev/GSN/VGEN_Files/Test/Tulu/VG/Data/*.csv')]
Assuming that os.getcwd() is the same as your path, you will end up with the pathname twice. This means that flist will be empty. Since you have gone through the trouble of setting path, why not just
flist = [os.path.basename(x) for x in glob.glob(path + '*.csv')]
2) Same as above
workbook = path + 'split_book.xlsx'
3) The file should be opened as a text file
with open(sh, 'r') as f
Try that and your program should work. You don't need pandas for this - is that for later?
Read the files using pandas and combine all of them
import os
import pandas as pd
csv_names = [files for files in os.listdir("Your Directory/")] #get names of csv files in directory "Directory/"
writer = pd.ExcelWriter('Multiple Workbooks.xlsx', engine='xlsxwriter')
for files in csv_names:
df = pd.read_csv(os.path.join("Your Directory",files)) #read csv file
filename = files[:-4] #remove ".csv" from filename
df.to_excel(writer, sheet_name=filename) #add to workbook
writer.save()
In short you can add a tab to an dataframe using
writer = pd.ExcelWriter('Multiple Workbooks.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name="SheetName")
df2.to_excel(writer, sheet_name="SheetName2")

Df export all into one larger file

I have a df reading in multiple .xlsx files. I have manipulated what I need in the files and the export view is exact. However, I need the data to export into one larger 2 column file rather than multiple individual files.
Any help is appreciated. I haven't been able to figure the problem out on my own.
import os
import glob
import pandas as pd
folder = input('Enter the folder name: ')
os.chdir('C:/Users/PCTR261010/Desktop/' + folder)
FileList = glob.glob('*.xlsx')
for fname in FileList:
df = pd.read_excel(fname).assign(New=os.path.basename('mpcc_' + (fname.split('-', 1)[0]).split('#', 1)[1]))
df1 = df[['New', '<ID>']]
writer = pd.ExcelWriter('ParttoMPCC_Import.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Import', index=False, header=False)
writer.save()
You can append desired columns in a single DataFrame and write that DataFrame to an excel file. Below code should do the job.
import os
import glob
import pandas as pd
folder = input('Enter the folder name: ')
os.chdir('C:/Users/PCTR261010/Desktop/' + folder)
FileList = glob.glob('*.xlsx')
df1 = pd.DataFrame() # create an empty df
for fname in FileList:
df = pd.read_excel(fname).assign(New=os.path.basename('mpcc_' + (fname.split('-', 1)[0]).split('#', 1)[1]))
df1 = df1.append(df[['New', '<ID>']]) # append columns data to the df1
writer = pd.ExcelWriter('ParttoMPCC_Import.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Import', index=False, header=False)
writer.save()
You can use pd.concat as follows:
data = []
for fname in FileList:
df = pd.read_excel(fname).assign(New=os.path.basename('mpcc_' + (fname.split('-', 1)[0]).split('#', 1)[1]))
df1 = df[['New', '<ID>']]
data.append(df1)
writer = pd.ExcelWriter('ParttoMPCC_Import.xlsx', engine='xlsxwriter')
df = pd.concat(data)
df.to_excel(writer, sheet_name='Import', index=False, header=False)
writer.save()

Categories