Hello I am having an issue to convert all the .xls files to .xlsx. other challenge is each .xls file have multiple sheets and I have lot of files to convert. Can you some one help me with a solution
import glob
import pandas as pd
import os
from pandas import ExcelWriter
_list_of_xls_files = glob.glob(r'C:\Users\enter_your_pc_username_here\Documents\*xls')
for _xls_file in _list_of_xls_files:
df = pd.read_excel(_xls_file,sheet_name = None)
_list_of_tabs_inside_xls_file = df.keys()
with ExcelWriter(str(_xls_file).replace('.xls','.xlsx')) as writer:
for n, _sheet_name in enumerate(list_of_tabs_inside_xls_file):
df[_sheet_name].to_excel(writer,'sheet%s' % n)
Source:
1 Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
Related
I am splitting a xlsm file ( with multiple sheets) into a csv with each sheet as a separate csv file. I want to save into csv files only the sheets whose name contain the keyword "Robot" or "Auto". How can I do it? Currently it is saving all sheets into csv files. Here is the code I am using -
import pandas as pd
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
df = pd.read_excel(xl,sheet_name=sheet)
df1.to_csv(f"{sheet}.csv",index=False)
Can you try this?
import pandas as pd
import re
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
if re.search('Robot|Auto', sheet):
df = pd.read_excel(xl,sheet_name=sheet)
df.to_csv(f"{sheet}.csv",index=False)
I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?
I have been trying to merge sheets of an excel file using python. I was successful on appending them but merging is becoming a bit twisted for me. Any kind of help is always welcomed.
Following is the code that I tried
import pandas as pd
import numpy as np
import glob
import os, collections, csv
from os.path import basename
f=pd.ExcelFile('E:/internship/All/A.xlsx')
n1=len(f.sheet_names)
print(n1)
data=pd.read_excel(f,sheet_name = 'Sheet1' ,header=None)
for j in range(1, int(n1)+1):
data1 = pd.read_excel(f, sheet_name = 'Sheet'+ str(j), header=None)
data = pd.merge(data,data1,how= 'outer')
print(data)
data.to_excel('Final.xlsx',index=False)
But as this program executes, it seems to join the sheets down instead of merging, something like the picture given below:
Result that i want
Result that my program is giving
I have a bunch of DAT files that I need to convert to XLS files using Python. Should I use the CSV library to do this or is there a better way?
I'd use pandas.
import pandas as pd
df = pd.read_table('DATA.DAT')
df.to_excel('DATA.xlsx')
and of course you can setup a loop to get through all you files. Something along these lines maybe
import glob
import os
os.chdir("C:\\FILEPATH\\")
for file in glob.glob("*.DAT"):
#What file is being converted
print file
df = pd.read_table(file)
file1 = file.replace('DAT','xlsx')
df.to_excel(file1)
writer = pd.ExcelWriter('pandas_example.dat',
engine='xlsxwriter',
options={'strings_to_urls': False})
or you can use :
pd.to_excel('example.xlsx')
Reading data (just 20000 numbers) from a xlsx file takes forever:
import pandas as pd
xlsxfile = pd.ExcelFile("myfile.xlsx")
data = xlsxfile.parse('Sheet1', index_col = None, header = None)
takes about 9 seconds.
If I save the same file in csv format it takes ~25ms:
import pandas as pd
csvfile = "myfile.csv"
data = pd.read_csv(csvfile, index_col = None, header = None)
Is this an issue of openpyxl or am I missing something? Are there any alternatives?
xlrd has support for .xlsx files, and this answer suggests that at least the beta version of xlrd with .xlsx support was quicker than openpyxl.
The current stable version of Pandas (11.0) uses openpyxl for .xlsx files, but this has been changed for the next release. If you want to give it a go, you can download the dev version from GitHub