Merge Sheets of an Excel Workbook using Python - python

I have been trying to merge sheets of an excel file using python. I was successful on appending them but merging is becoming a bit twisted for me. Any kind of help is always welcomed.
Following is the code that I tried
import pandas as pd
import numpy as np
import glob
import os, collections, csv
from os.path import basename
f=pd.ExcelFile('E:/internship/All/A.xlsx')
n1=len(f.sheet_names)
print(n1)
data=pd.read_excel(f,sheet_name = 'Sheet1' ,header=None)
for j in range(1, int(n1)+1):
data1 = pd.read_excel(f, sheet_name = 'Sheet'+ str(j), header=None)
data = pd.merge(data,data1,how= 'outer')
print(data)
data.to_excel('Final.xlsx',index=False)
But as this program executes, it seems to join the sheets down instead of merging, something like the picture given below:
Result that i want
Result that my program is giving

Related

How can I use Python multiprocessing to speed up the excel reading file?

I am kinda new to Python and multiprocessing. I've been trying to write a program that will combine excel files into one. I want to use multiprocessing to read all the files so I can combine them later. I've tried different approaches but none worked. Any recommendations are welcome. Thank you!
Here is my full code with no multiprocessing:
from concurrent.futures import process
import os
import queue
import time
import pandas as pd
import multiprocessing as mp
from joblib import Parallel, delayed
#convert CSV to excel
read_csv = pd.read_csv('Data\SampleCollectionID.csv')
excel_writer = pd.ExcelWriter('Data\converted_CSV_file.xlsx')
read_csv.to_excel(excel_writer, index=False)
excel_writer.save()
#Combining the files
filenames = ['Data/converted_CSV_file.xlsx', 'Data/LabData.xlsx']
frame = pd.DataFrame()
for i in filenames:
df = pd.read_excel(i)
df = df.loc[:,~df.columns.duplicated()].copy()
frame = pd.concat([frame, df], axis=1)
frame.to_excel('combinedData.xlsx', index=False)
I had the same challenge and the code below worked nicely for me:
import pandas as pd
import glob
from joblib import Parallel, delayed
files = glob.glob(my_path"\*.xlsx")
def loop(file_number):
return pd.read_excel(file)
df = Parallel(n_jobs=-1, verbose=10)(delayed(loop)(file) for file in files)
df = pd.concat(df, ignore_index=True)
This piece of code was adapted from this excellent post: https://towardsdatascience.com/read-excel-files-with-python-1000x-faster-407d07ad0ed8

How to add filename as column to every file in a directory python

Hi there stack overflow community,
I have several csv-files in a folder and I need to append a column containing the first 8 chars of each filename in a aditional column of the csv. After this step i want to save the datafram including the new colum to the same file.
I get the right output, but it doesn't save the changes in the csv file :/
Maybe someone has some inspiration for me. Thanks a lot!
from tkinter.messagebox import YES
import pandas as pd
import glob, os
import fnmatch
import os
files = glob.glob(r'path\*.csv')
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
#for i in df('date'):
#Decoder problem
print(df)
use:
df.to_csv
like this:
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv
btw, I think this may also work and is more readable:
for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)

Concatenating Excel and CSV files

I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?

converting all the .xls to .xlsx

Hello I am having an issue to convert all the .xls files to .xlsx. other challenge is each .xls file have multiple sheets and I have lot of files to convert. Can you some one help me with a solution
import glob
import pandas as pd
import os
from pandas import ExcelWriter
_list_of_xls_files = glob.glob(r'C:\Users\enter_your_pc_username_here\Documents\*xls')
for _xls_file in _list_of_xls_files:
df = pd.read_excel(_xls_file,sheet_name = None)
_list_of_tabs_inside_xls_file = df.keys()
with ExcelWriter(str(_xls_file).replace('.xls','.xlsx')) as writer:
for n, _sheet_name in enumerate(list_of_tabs_inside_xls_file):
df[_sheet_name].to_excel(writer,'sheet%s' % n)
Source:
1 Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Pandas Create Excel with Table formatted as a Table

I have a .csv file that I am converting into a table format using the following python script. In order to make this useful, I need to create a table within the Excel that holds the data (actually formatted as a table (Insert > Table). Is this possible within python? I feel like it should be relatively easy, but can't find anything on the internet.
The idea here is that the python takes the csv file, converts it to xlsx with a table embedded on sheet1, and then moves it to the correct folder.
import os
import shutil
import pandas as pd
src = r"C:\Users\xxxx\Python\filename.csv"
src2 = r"C:\Users\xxxx\Python\filename.xlsx"
read_file = pd.read_csv (src) - convert to Excel
read_file.to_excel (src2, index = None, header=True)
dest = path = r"C:\Users\xxxx\Python\repository"
destination = shutil.copy2(src2, dest)
Edit: I got sidetracked by the original MWE.
This should work, using xlsxwriter:
import pandas as pd
import xlsxwriter
#Dummy data
my_data={"list1":[1,2,3,4], "list2":"a b c d".split()}
df1=pd.DataFrame(my_data)
df1.to_csv("myfile.csv", index=False)
df2=pd.read_csv("myfile.csv")
#List of column name dictionaries
headers=[{"header" : i} for i in list(df2.columns)]
#Create and propagate workbook
workbook=xlsxwriter.Workbook('output.xlsx')
worksheet1=workbook.add_worksheet()
worksheet1.add_table(0, 0, len(df2), len(df2.columns)-1, {"columns":headers, "data":df2.values.tolist()})
workbook.close()

Categories