Want to store all the data from the for loop in an excel file, currently only storing the last output:
import pandas
import openpyxl
outputFile = 'outputData.xlsx'
workbook = openpyxl.load_workbook(os.getcwd() + '/sourceData.xlsx')
sheet = workbook["Sheet1"]
for i in range(2, sheet.max_row + 1):
<I do some ops to copy the data>
data = pyperclip.paste() #Want this data to be stored in the outPut excel, there is different input for every cell, so there will also be different output
df = pd.DataFrame({'Address':[mapData]})
df2 = pd.DataFrame()
df2 = df2.append(df, ignore_index=True, sort=False)
writer = ExcelWriter(outputFile)
df2.to_excel(writer,'Sheet1',index=False)
writer.save()
just move df2 first initialization out of loop:
...
# define df2 here, just once
df2 = pd.DataFrame()
for i in range(2, sheet.max_row + 1):
... # your operations
df = pd.DataFrame({'Address':[mapData]})
# append df2 immediately after generation of df
df2 = df2.append(df, ignore_index=True, sort=False)
# save as before
writer = ExcelWriter(outputFile)
df2.to_excel(writer,'Sheet1',index=False)
writer.save()
Related
I want to read a list of CSV files, for example exon_kipan.00001.csv, exon_kipan.00002.csv, exon_kipan.00003.csv, and exon_kipan.00004.csv (24 files in total), and then perform a series of operations using pandas before concatenating the dataframes.
For a single file, I would do:
df= pd.read_csv("exon_kipan.csv", sep="\t", index_col=0, low_memory=False)
df= df[df.columns[::3]]
df= df.T
del df[df.columns[0]]
df.index = df.index.str.upper()
df= df.sort_index()
df.index = ['-'.join( s.split('-')[:4]) for s in df.index.tolist() ]
df.rename_axis(None, axis=1, inplace=True)
However, now I want to read, manipulate, and concatenate multiple files.
filename = '/work/exon_kipan.{}.csv'
df_dict = {}
exon_clin_list = []
for i in range(1, 25):
df_dict[i] = pd.read_csv(filename, sep="\t", index_col=0, low_memory=False)
df_dict[i] = df_dict[i][df_dict[i].columns[::3]]
df_dict[i] = df_dict[i].T
del df_dict[i][df_dict[i].columns[0]]
df_dict[i].index = df_dict[i].index.str.upper()
df_dict[i] = df_dict[i].sort_index()
df_dict[i].index = ['-'.join( s.split('-')[:4]) for s in df_dict[i].index.tolist() ]
df_dict[i].rename_axis(None, axis=1, inplace=True)
exon_clin_list.append(df_dict[i])
exon_clin = pd.concat(df_list)
My code raised:
FileNotFoundError: [Errno 2] No such file or directory: '/work/exon_kipan.{}.csv'
You have to use format method of str:
filename = '/work/exon_kipan.{:05}.csv' # <- don't forget to modify here
...
for i in range(1, 25):
df_dict[i] = pd.read_csv(filename.format(i), ...)
Test:
filename = '/work/exon_kipan.{:05}.csv'
for i in range(1, 25):
print(filename.format(i))
# Output
/work/exon_kipan.00001.csv
/work/exon_kipan.00002.csv
/work/exon_kipan.00003.csv
/work/exon_kipan.00004.csv
/work/exon_kipan.00005.csv
/work/exon_kipan.00006.csv
/work/exon_kipan.00007.csv
/work/exon_kipan.00008.csv
/work/exon_kipan.00009.csv
/work/exon_kipan.00010.csv
/work/exon_kipan.00011.csv
/work/exon_kipan.00012.csv
/work/exon_kipan.00013.csv
/work/exon_kipan.00014.csv
/work/exon_kipan.00015.csv
/work/exon_kipan.00016.csv
/work/exon_kipan.00017.csv
/work/exon_kipan.00018.csv
/work/exon_kipan.00019.csv
/work/exon_kipan.00020.csv
/work/exon_kipan.00021.csv
/work/exon_kipan.00022.csv
/work/exon_kipan.00023.csv
/work/exon_kipan.00024.csv
may be something like this will work
#write a function to read file do some processing and return a dataframe
def read_file_and_do_some_actions(filename):
df = pd.read_csv(filename, index_col=None, header=0)
#############################
#do some processing
#############################
return df
path = r'/home/tester/inputdata/exon_kipan'
all_files = glob.glob(os.path.join(path, "/work/exon_kipan.*.csv"))
#for each file in all_files list, call function read_file_and_do_some_actions and then concatenate all the dataframes into one dataframe
df = pd.concat((read_file_and_do_some_actions(f) for f in all_files), ignore_index=True)
I have to append the data in CSV, the problem I am facing is intead of appending I am overwriting the data, not able to retain the old data, example :
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)"]
df["tergetz"] = ["str(target_path)"]
df["TMP"] = ["total_matching_points"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False)
Now if I add a new value
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)_New"]
df["tergetz"] = ["str(target_path)_New"]
df["TMP"] = ["total_matching_points_New"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False)
It is keeping the latest data in csv instead I want both data to be updated in csv. any idea?
I have tried to create a new csv with pandas dataframe and I want to append the values instead overwriting
I have tried:
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)"]
df["tergetz"] = ["str(target_path)"]
df["TMP"] = ["total_matching_points"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False, mode='a+')
But the problem is heading is repeating csv:
sourcez,tergetz,TMP
str(source_Path),str(target_path),total_matching_points
sourcez,tergetz,TMP
str(source_Path)_New,str(target_path)_New,total_matching_points_New
How to remove repeated headings sourcez,tergetz,TMP
I would I really appreciate some help.
I'm trying to use a loop to create sheets, and add data to those sheets for every loop. The position of my data is correct, however Panda ExcelWriter creates a new sheet instead of appending to the one created the first time the loop runs.
I'm a beginner, and right function is over form, so forgive me.
My code:
import pandas as pd
# initial files for dataframes
excel_file = 'output.xlsx'
setup_file = 'setup.xlsx'
# write to excel
output_filename = 'output_final.xlsx'
df = pd.read_excel(excel_file) # create dataframe of entire sheet
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')',
'') # clean dataframe titles
df_setup = pd.read_excel(setup_file)
df_setup.columns = df_setup.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')',
'') # clean dataframe titles
df_2 = pd.merge(df, df_setup) # Merge data with setup to have krymp size for each wire in dataframe
df_2['wirelabel'] = "'" + df_2['cable'] + "_" + df_2['function_code'] + "-" + df_2['terminal_strip'] + ":" + df_2[
'terminal'] # creates column for the wirelabel by appending columns with set delimiters. #TODO: delimiters to be by inputs.
df_2.sort_values(by=['switchboard']) # sort so we get proper order
switchboard_unique = df.switchboard.unique().tolist() # crate variable containing unique switchboards for printing to excel sheets
def createsheets(output_filename, sheetname, row_start, column_start, df_towrite):
with pd.ExcelWriter(output_filename, engine='openpyxl', mode='a') as writer:
df_towrite.to_excel(writer, sheet_name=sheetname, columns=['wirelabel'], startrow=row_start, startcol=column_start, index=False, header=False)
writer.save()
writer.close()
def sorter():
for s in switchboard_unique:
df_3 = df_2.loc[df_2['switchboard'] == s]
krymp_unique = df_3.krymp.unique().tolist()
krymp_unique.sort()
# print(krymp_unique)
column_start = 0
row_start = 0
for k in krymp_unique:
df_3.loc[df_3['krymp'] == k]
# print(k)
# print(s)
# print(df_3['wirelabel'])
createsheets(output_filename, s, row_start, column_start, df_3)
column_start = column_start + 1
sorter()
current behavior:
if sheetname is = sheet, then my script creates sheet1, sheet2, sheet3..etc.
pictureofcurrent
Wanted behavior
Create a sheet for each item in "df_3", and put data into columns according to the position calculated in column_start. The position in my code works, just goes to the wrong sheet.
pictureofwanted
I hope it's clear what im trying to accomplish, and all help is appriciated.
I tried all example codes i have sound regarding writing to excel.
I know my code is not a work of art, but I will update this post with the answer to my own question for the sake of completeness, and if anyone stumbles on this post.
It turns out i misunderstood the capabilities of the "append" function in Pandas "pd.ExcelWriter". It is not possible to append to a sheet already existing, the sheet will get overwritten though mode is set to 'a'.
Realizing this i changed my code to build a dataframe for the entire sheet (df_sheet), an then call the "createsheets" function in my code. The first version wrote my data column by column.
"Final" code:
import pandas as pd
# initial files for dataframes
excel_file = 'output.xlsx'
setup_file = 'setup.xlsx'
# write to excel
output_filename = 'output_final.xlsx'
column_name = 0
df = pd.read_excel(excel_file) # create dataframe of entire sheet
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')',
'') # clean dataframe titles
df_setup = pd.read_excel(setup_file)
df_setup.columns = df_setup.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')',
'') # clean dataframe titles
df_2 = pd.merge(df, df_setup) # Merge data with setup to have krymp size for each wire in dataframe
df_2['wirelabel'] = "'" + df_2['cable'] + "_" + df_2['function_code'] + "-" + df_2['terminal_strip'] + ":" + df_2[
'terminal'] # creates column for the wirelabel by appending columns with set delimiters. #TODO: delimiters to be by inputs.
df_2.sort_values(by=['switchboard']) # sort so we get proper order
switchboard_unique = df.switchboard.unique().tolist() # crate variable containing unique switchboards for printing to excel sheets
def createsheets(output_filename, sheetname, df_towrite):
with pd.ExcelWriter(output_filename, engine='openpyxl', mode='a') as writer:
df_towrite.to_excel(writer, sheet_name=sheetname, index=False, header=True)
def to_csv_file(output_filename, df_towrite):
df_towrite.to_csv(output_filename, mode='w', index=False)
def sorter():
for s in switchboard_unique:
df_3 = df_2.loc[df_2['switchboard'] == s]
krymp_unique = df_3.krymp.unique().tolist()
krymp_unique.sort()
column_start = 0
row_start = 0
df_sheet = pd.DataFrame([])
for k in krymp_unique:
df_5 = df_3.loc[df_3['krymp'] == k]
df_4 = df_5.filter(['wirelabel'])
column_name = "krymp " + str(k) + " Tavle: " + str(s)
df_4 = df_4.rename(columns={"wirelabel": column_name})
df_4 = df_4.reset_index(drop=True)
df_sheet = pd.concat([df_sheet, df_4], axis=1)
column_start = column_start + 1
row_start = row_start + len(df_5.index) + 1
createsheets(output_filename, s, df_sheet)
to_csv_file(s + ".csv", df_sheet)
sorter()
Thank you.
My python is rudimentary. What I want it to do is take the first dataframe, search for a unique number and create a new df in the same formatted template, the use the same unique number, search through the second df and create a new df pertinent to that unique number in the specified format, then merge all the looped data one top of each other.
This is the code
#function
def multiple_dfs(df_list, sheets, file_name, spaces):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
row = 0
for i in uniqueIR:
dftopi = df_out[df_out['Invoice Reference Number'] == i]
df2 = df_out_fin[df_out_fin['Invoice Reference Number'] == i]
df3 = df2.drop(columns = ['Invoice Reference Number'])
for dataframe in df_list:
dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0, index = False)
row = row + len(dataframe.index) + spaces
writer.save()
# list of dataframes
dfs = [dftopi,df3]
# run function
multiple_dfs(dfs, 'Validation', 'test1.xlsx', 1)
This is what I want:
table output
Figured out a solution if anyone in the future is wonder:
writer = pd.ExcelWriter('test3.xlsx', engine = 'xlsxwriter')
dflist = []
for i in uniqueIR:
dftopi = df_out[df_out['Invoice Reference Number'] == i]
df2 = df_out_fin[df_out_fin['Invoice Reference Number'] == i]
df3 = df2.drop(columns = ['Invoice Reference Number'])
dftopi.to_excel(writer, sheet_name = 'Top Half' + str(i), index = False)
df3.to_excel(writer, sheet_name = 'Bottom Half' + str(i), index = False)
dflist.append(dftopi)
dflist.append(df3)
writer.save()
def multiple_dfs(df_list, sheets, file_name, spaces):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
row = 0
for dataframe in df_list:
dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0, index = False)
row = row + len(dataframe.index) + spaces
writer.save()
multiple_dfs(dflist, 'Validation', 'test4.xlsx', 1)
I have been struggling with this code all day. During each run of the loop, a table is read from a different MS Word file. The table is copied to a dataframe and then it is copied to a row in an Excel file.
With each subsequent run of the for-loop, the Excel row is incremented so the new dataframe can be written to a new row, but after the file executes only one row is showing a dataframe.
When I print(tfile), I get the following .. ('CIV-ASCS-016_TRS.docx', 'CIV-ASCS-018_TRS .docx', 'CIV-ASCS-020_TRS.docx', 'CIV-ASCS-021_TRS .docx') This proves that loop ran 4 times based on 4 files in the directory. I set the initial row pos to 0 outside of the for-loop.
Note: I am not showing any lines of code with regards to importing the necessary libraries.
files = glob('*.docx')
pos = 1
for i, wfile in enumerate(files[:1]):
document = Document(wfile)
table = document.tables[0]
data = []
keys = {}
for j, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if j == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
tfile = tuple(files)
df = pd.DataFrame(data)
df.loc[-1] = [wfile, 'Test Case ID']
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
df1 = df.rename(index=str, columns={"Test Case ID": "TC Attributes"})
df21 = df1.drop(columns = ['TC Attributes'])
df3 = df21.T
# read the existing sheets so that openpyxl won't create a new one later
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df3.to_excel(writer, 'sheet7', header = False, index = False, \
startrow = pos)
pos += 1
writer.save()