I have been trying to merge sheets of an excel file using python. I was successful on appending them but merging is becoming a bit twisted for me. Any kind of help is always welcomed.
Following is the code that I tried
import pandas as pd
import numpy as np
import glob
import os, collections, csv
from os.path import basename
f=pd.ExcelFile('E:/internship/All/A.xlsx')
n1=len(f.sheet_names)
print(n1)
data=pd.read_excel(f,sheet_name = 'Sheet1' ,header=None)
for j in range(1, int(n1)+1):
data1 = pd.read_excel(f, sheet_name = 'Sheet'+ str(j), header=None)
data = pd.merge(data,data1,how= 'outer')
print(data)
data.to_excel('Final.xlsx',index=False)
But as this program executes, it seems to join the sheets down instead of merging, something like the picture given below:
Result that i want
Result that my program is giving
Related
I am kinda new to Python and multiprocessing. I've been trying to write a program that will combine excel files into one. I want to use multiprocessing to read all the files so I can combine them later. I've tried different approaches but none worked. Any recommendations are welcome. Thank you!
Here is my full code with no multiprocessing:
from concurrent.futures import process
import os
import queue
import time
import pandas as pd
import multiprocessing as mp
from joblib import Parallel, delayed
#convert CSV to excel
read_csv = pd.read_csv('Data\SampleCollectionID.csv')
excel_writer = pd.ExcelWriter('Data\converted_CSV_file.xlsx')
read_csv.to_excel(excel_writer, index=False)
excel_writer.save()
#Combining the files
filenames = ['Data/converted_CSV_file.xlsx', 'Data/LabData.xlsx']
frame = pd.DataFrame()
for i in filenames:
df = pd.read_excel(i)
df = df.loc[:,~df.columns.duplicated()].copy()
frame = pd.concat([frame, df], axis=1)
frame.to_excel('combinedData.xlsx', index=False)
I had the same challenge and the code below worked nicely for me:
import pandas as pd
import glob
from joblib import Parallel, delayed
files = glob.glob(my_path"\*.xlsx")
def loop(file_number):
return pd.read_excel(file)
df = Parallel(n_jobs=-1, verbose=10)(delayed(loop)(file) for file in files)
df = pd.concat(df, ignore_index=True)
This piece of code was adapted from this excellent post: https://towardsdatascience.com/read-excel-files-with-python-1000x-faster-407d07ad0ed8
Hi there stack overflow community,
I have several csv-files in a folder and I need to append a column containing the first 8 chars of each filename in a aditional column of the csv. After this step i want to save the datafram including the new colum to the same file.
I get the right output, but it doesn't save the changes in the csv file :/
Maybe someone has some inspiration for me. Thanks a lot!
from tkinter.messagebox import YES
import pandas as pd
import glob, os
import fnmatch
import os
files = glob.glob(r'path\*.csv')
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
#for i in df('date'):
#Decoder problem
print(df)
use:
df.to_csv
like this:
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv
btw, I think this may also work and is more readable:
for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)
I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?
Hello I am having an issue to convert all the .xls files to .xlsx. other challenge is each .xls file have multiple sheets and I have lot of files to convert. Can you some one help me with a solution
import glob
import pandas as pd
import os
from pandas import ExcelWriter
_list_of_xls_files = glob.glob(r'C:\Users\enter_your_pc_username_here\Documents\*xls')
for _xls_file in _list_of_xls_files:
df = pd.read_excel(_xls_file,sheet_name = None)
_list_of_tabs_inside_xls_file = df.keys()
with ExcelWriter(str(_xls_file).replace('.xls','.xlsx')) as writer:
for n, _sheet_name in enumerate(list_of_tabs_inside_xls_file):
df[_sheet_name].to_excel(writer,'sheet%s' % n)
Source:
1 Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
I have a .csv file that I am converting into a table format using the following python script. In order to make this useful, I need to create a table within the Excel that holds the data (actually formatted as a table (Insert > Table). Is this possible within python? I feel like it should be relatively easy, but can't find anything on the internet.
The idea here is that the python takes the csv file, converts it to xlsx with a table embedded on sheet1, and then moves it to the correct folder.
import os
import shutil
import pandas as pd
src = r"C:\Users\xxxx\Python\filename.csv"
src2 = r"C:\Users\xxxx\Python\filename.xlsx"
read_file = pd.read_csv (src) - convert to Excel
read_file.to_excel (src2, index = None, header=True)
dest = path = r"C:\Users\xxxx\Python\repository"
destination = shutil.copy2(src2, dest)
Edit: I got sidetracked by the original MWE.
This should work, using xlsxwriter:
import pandas as pd
import xlsxwriter
#Dummy data
my_data={"list1":[1,2,3,4], "list2":"a b c d".split()}
df1=pd.DataFrame(my_data)
df1.to_csv("myfile.csv", index=False)
df2=pd.read_csv("myfile.csv")
#List of column name dictionaries
headers=[{"header" : i} for i in list(df2.columns)]
#Create and propagate workbook
workbook=xlsxwriter.Workbook('output.xlsx')
worksheet1=workbook.add_worksheet()
worksheet1.add_table(0, 0, len(df2), len(df2.columns)-1, {"columns":headers, "data":df2.values.tolist()})
workbook.close()