Excel file overwritten instead of concat - Python - Pandas - python

I'm trying to contact all excel files and worksheets in them into one using the below script. It kinda works but then the excel file c.xlsx is overwritten per file, so only the last excel file is concated not sure why?
import pandas as pd
import os
import ntpath
import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)
cdf = None
for excel_names in glob.glob('*.xlsx'):
print(excel_names)
df = pd.read_excel(excel_names, sheet_name=None, ignore_index=True)
cdf = pd.concat(df.values())
cdf.to_excel("c.xlsx", header=False, index=False)

Idea is create list of DataFrames in list comprehension, but because working with orderdict is necessary concat in loop and then again concat for one big final DataFrame:
cdf = [pd.read_excel(excel_names, sheet_name=None, ignore_index=True).values()
for excel_names in glob.glob('files/*.xlsx')]
df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True)
#print (df)
df.to_excel("c.xlsx", index=False)

I just tested the code below. It merges data from all Excel files in a folder into one, single, Excel file.
import pandas as pd
import numpy as np
import glob
glob.glob("C:\\your_path\\*.xlsx")
all_data = pd.DataFrame()
for f in glob.glob("C:\\your_path\\*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
print(all_data)
df = pd.DataFrame(all_data)
df.shape
df.to_excel("C:\\your_path\\final.xlsx", sheet_name='Sheet1')

I got it working using the below script which uses #ryguy72's answer but works on all worksheets as well as the header row.
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob("my_path/*.xlsx"):
df = pd.read_excel(f, sheet_name=None, ignore_index=True)
cdf = pd.concat(df.values())
all_data = all_data.append(cdf,ignore_index=True)
print(all_data)
df = pd.DataFrame(all_data)
df.shape
df.to_excel("my_path/final.xlsx", sheet_name='Sheet1')

Related

Column with end of file name python

I have a code that merges all txt files from a directory into a dataframe
follow the code below
import pandas as pd
import os
import glob
diretorio = "F:\PROJETOS\LOTE45\ARQUIVOS\RISK\RISK_CUSTOM_FUND_N1"
files = []
files = [pd.read_csv(file, delimiter='\t')
for file in glob.glob(os.path.join(diretorio ,"*.txt"))]
df = pd.concat(files, ignore_index=True)
df
that gives result to this table
I needed to add a date column to this table, but I only have the date available at the end of the filename.
How can I get the date at the end of the filename and put it inside the dataframe.
I have no idea how to do this
Assuming the file naming pattern is constant, you can parse the end of the filename for every iteration of the loop this way :-
from datetime import datetime
files = []
for file in glob.glob(os.path.join(diretorio ,"*.txt")):
df_f = pd.read_csv(file, delimiter='\t')
df_f['date'] = datetime.strptime(file[-11:-4], "%d%m%Y")
files.append(df_f)
df = pd.concat(files, ignore_index=True)
import pandas as pd
import os
diretorio = "F:/PROJETOS/LOTE45/ARQUIVOS/RISK/RISK_CUSTOM_FUND_N1/"
files = []
for filename in os.listdir(diretorio):
if filename.endswith(".csv"):
df = pd.read_csv(diretorio + filename, sep=";")
df['Date'] = filename.split('.')[0].split("_")[-1]
files.append(df)
df = pd.concat(files, ignore_index=True)
print(df)

Using Pandas to convert CSV data to numbers when concatenating into excel

I am writing a small program to concatenate a load of measurements from multiple csv files. into one excel file. I have pretty much all the program written and working, the only thing i'm struggling to do is to get the data from the csv files to automatically turn into numbers when the dataframe places them into the excel file.
The code I have looks like this:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import os
import csv
import glob
os.chdir(r"directoryname")
retval = os.getcwd()
print ("Directory changed to %s" % retval)
files = glob.glob(r"directoryname\datafiles*csv")
print(files)
files.sort(key=lambda x: os.path.getmtime(x))
writer = pd.ExcelWriter('test.xlsx')
df = pd.read_csv("datafile.csv", index_col=False)
df = df.iloc[0:41, 1]
df.to_excel(writer, 'sheetname', startrow =0, startcol=1, index=False)
for f in files:
i+=1
df = pd.read_csv(f, index_col=False)
df = df.iloc[0:41,2]
df.to_excel(writer, 'sheetname', startrow=0, startcol=1+i, index=False)
Thanks in advance
Do you mean:
df.loc[:,'measurements'] = df.loc[:,'measurements'].astype(float)
So when you read the dataframe you can cast all your columns like that for example.
Different solution is, while reading your csv to cast the columns by using dtypes (see Documentation)
EXAMPLE
df = pd.read_csv(os.path.join(savepath,'test.csv') , sep=";" , dtype={
ID' : 'Int64' , 'STATUS' : 'object' } ,encoding = 'utf-8' )

Python Pandas join a few files

I import a few xlsx files into pandas dataframe. It works fine, but my problem that it copies all the data under each other (so I have 10 excel file with 100 lines = 1000 lines).
I need the Dataframe with 100 lines and 10 columns, so each file will be copied next to each other and not below.
Are there any ideas how to do it?
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.DataFrame()
for f in files:
info = pd.read_excel(f,'Sheet1')
allNames = allNames.append(info)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
You can feed your spreadsheets as an array of dataframes directly to pd.concat():
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.concat([pd.read_excel(f,'Sheet1') for f in files], axis=1)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
Instead of stacking the tables vertically like this:
allNames = allNames.append(info)
You'll want to concatenate them horizontally like this:
allNames = pd.concat([allNames , info], axis=1)

Join multiple excel files into one excel file via python

I need to combine all the excel files in my directory into one excel file.
for example, I've 3 excel files:
file1:
file 2:
file 3:
I need to concatenate them and get an output as follows
but instead, they were appended one after another
this is my code :
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob('C:/Users/test-results/FinalResult/05-01-2019/*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()
during my quest, all I found was how to append dfs,as shown in my code and I didn't figure out how too use join
You could use
files = glob.glob('C:/Users/test-results/FinalResult/05-01-2019/*.xlsx')
dfs = (pd.read_excel(f, index_col=0) for f in files)
all_data = pd.concat(dfs, axis=1)
Try this:
all_data = pd.concat([all_data,df],axis=1)
all_data = all_data.merge(df, on = ['first_column_name'], how = 'left')

Not full Import multiple csv files into pandas and concatenate into one DataFrame

Please help me to find solution for the problem with importing data from multiple csv files to one DataFrame in python.
Code is:
import pandas as pd
import os
import glob
path = r'my_full_path'
os.chdir(path)
results = pd.DataFrame()
for counter, current_file in enumerate(glob.glob("*.csv")):
namedf = pd.read_csv(current_file, header=None, sep=",", delim_whitespace=True)
results = pd.concat([results, namedf], join='outer')
results.to_csv('Result.csv', index=None, header=None, sep=",")
The problem is that some part of data are moving to the rows instead of new columns as required.
What is wrong in my code?
P.S.: I found questions about importing multiple csv-files to DataFrame, for example here: Import multiple csv files into pandas and concatenate into one DataFrame, but solution doesn't solve my issue:-(
it was solved by using join inside of pd.read_csv.read_csv() -> append(dataFrames) -> concat:
def get_merged_files(files_list, **kwargs):
dataframes = []
for file in files_list:
df = pd.read_csv(os.path.join(file), **kwargs)
dataframes.append(df)
return pd.concat(dataframes, axis=1)
You can try using this:
import pandas as pd
import os
files = [file for file in os.listdir('./Your_Folder')] # Here is where all the files are located.
all_csv_files = pd.DataFrame()
for file in files:
df = pd.read_csv("./Your_Folder/"+file)
all_csv_files = pd.concat([all_csv_files, df])
all_csv_files.to_csv("All_CSV_Files_Concat.csv", index=False)

Categories