I'm trying to save my output to an Excel file, but some of the values have '=' at the beginning of the string.
So while exporting, Excel converts them to formulas, and instead of strings, I have #NAME error in Excel.
I need to save only some columns as text, as I have dates and numerics in other columns, and they should be saved as is.
I've already tried to convert them with the .astype() function, but with no result.
def create_excel(datadir, filename, data):
df = col_type_converter(filename, pd.DataFrame(data))
filepath = os.path.join(datadir, filename + '.xlsx')
writer = pd.ExcelWriter(filepath, engine='xlsxwriter')
df.to_excel(writer, index=False)
writer.save()
return filepath
def col_type_converter(name, dataframe):
df = dataframe
if name == 'flights':
df['departure_station'] = df['departure_station'].astype(str)
df['arrival_station'] = df['arrival_station'].astype(str)
return df
return df
When I'm importing from CSV using the built-in Excel importer, I can make it import values as text.
Is there any way to say to Pandas how I want to import columns?
nvm, you can just pass xlsxwriter options through pandas:
writer = pd.ExcelWriter(filepath, engine='xlsxwriter', options={'strings_to_formulas': False})
https://xlsxwriter.readthedocs.io/working_with_pandas.html#passing-xlsxwriter-constructor-options-to-pandas
https://xlsxwriter.readthedocs.io/worksheet.html#worksheetwrite
Related
I am trying to turn a series of Excel Sheets into .txt files. The data I'm working with has some specific formatting I want to keep (decimal places and scientific notation specifically), but I can't seem to get it to work. Am I missing something with .format? The code below works for the most part (except for the final 3 lines, the ones I'm working on).
import pandas as pd
file_names = ["Example.xlsx"]
for xl_file in file_names:
xl = pd.ExcelFile("Example.xlsx")
sheet_names = xl.sheet_names
for k in range(len(sheet_names)):
txt_name = xl_file.split(".")[0] + str(sheet_names[k])+".txt"
df = pd.read_excel("Example.xlsx", sheet_name = sheet_names[k])
with open(txt_name, 'w', encoding="utf-8") as outfile:
df.to_string(outfile, index=False)
col0 = [0]
df0 = pd.read_excel("Example.xlsx", usecols=col0)
"El": "{:<}".format(df0)'''
I am writing a small program to concatenate a load of measurements from multiple csv files. into one excel file. I have pretty much all the program written and working, the only thing i'm struggling to do is to get the data from the csv files to automatically turn into numbers when the dataframe places them into the excel file.
The code I have looks like this:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import os
import csv
import glob
os.chdir(r"directoryname")
retval = os.getcwd()
print ("Directory changed to %s" % retval)
files = glob.glob(r"directoryname\datafiles*csv")
print(files)
files.sort(key=lambda x: os.path.getmtime(x))
writer = pd.ExcelWriter('test.xlsx')
df = pd.read_csv("datafile.csv", index_col=False)
df = df.iloc[0:41, 1]
df.to_excel(writer, 'sheetname', startrow =0, startcol=1, index=False)
for f in files:
i+=1
df = pd.read_csv(f, index_col=False)
df = df.iloc[0:41,2]
df.to_excel(writer, 'sheetname', startrow=0, startcol=1+i, index=False)
Thanks in advance
Do you mean:
df.loc[:,'measurements'] = df.loc[:,'measurements'].astype(float)
So when you read the dataframe you can cast all your columns like that for example.
Different solution is, while reading your csv to cast the columns by using dtypes (see Documentation)
EXAMPLE
df = pd.read_csv(os.path.join(savepath,'test.csv') , sep=";" , dtype={
ID' : 'Int64' , 'STATUS' : 'object' } ,encoding = 'utf-8' )
I am trying to convert a large number of Excel documents to CSV using Python, and the sheet I am converting from each document can either be called "Pivot", "PVT", "pivot", or "pvt". The way I am doing some right now seems to be working, but I was wondering if there was any quicker way as this takes a long time to go through my Excel files. Is there a way I can accomplish the same thing all in one pd.read_excel line using an OR operator to specify multiple variations of the sheet name?
for f in glob.glob("../Test/Drawsheet*.xlsx"):
try:
data_xlsx = pd.read_excel(f, 'PVT', index_col=None)
except:
try:
data_xlsx = pd.read_excel(f, 'pvt', index_col=None)
except:
try:
data_xlsx = pd.read_excel(f, 'pivot', index_col=None)
except:
try:
data_xlsx = pd.read_excel(f, 'Pivot', index_col=None)
except:
continue
data_xlsx.to_csv('csvfile' + str(counter) + '.csv', encoding='utf-8')
counter += 1
Your problem isn't so much about find the correct special syntax for pd.read_excel but rather knowing which sheet to read from. Pandas has an ExcelFile that encapsulates and some basic info about an Excel file. The class has a sheet_names property that tell you what sheets are in the file. (Unfortunately documnetation on this class is a bit hard to find so I can't give you a link)
valid_sheet_names = ['PVT', 'pvt', 'pivot', 'Pivot']
for f in glob.iglob('../Test/Drawsheet*.xlsx'):
file = pd.ExcelFile(f)
sheet_name = None
for name in file.sheet_names:
if name in valid_sheet_names:
sheet_name = name
break
if sheet_name is None:
continue
data_xlsx = pd.read_excel(f, sheet_name, index_col=None)
...
However, this is not strictly equivalent to your code as it does not do 2 things:
Cascade read_excel if the chosen sheet fails to be loaded into a data frame
Have a priority ranking for the sheet names (like PVT first, then pvt, then pivot, etc.)
I'll leave you on how to handle these two problems as your program requires.
In a Pandas DataFrame i have some "cells" with values and some that need to contain excel formulas. I have read that i can get formulas with
link = 'HYPERLINK("#Groups!A' + str(someInt) + '"; "LINKTEXT")'
xlwt.Formula(link)
and store them in the dataframe.
When i try to save my dataframe as an xlsx file with
writer = pd.ExcelWriter("pandas" + str(fileCounter) + ".xlsx", engine = "xlsxwriter")
df.to_excel(writer, sheet_name = "Paths", index = False)
# insert more sheets here
writer.save()
i get the error:
TypeError: Unsupported type <class 'xlwt.ExcelFormula.Formula'> in write()
So i tried to write my formula as a string to my dataframe but Excel wants to restore the file content and then fills all formula cells with 0's.
Edit: I managed to get it work with regular strings but nevertheless would be interested in a solution for xlwt formulas.
So my question is: How do i save dataframes with formulas to xlsx files?
Since you are using xlsxwriter, strings are parsed as formulas by default ("strings_to_formulas: Enable the worksheet.write() method to convert strings to formulas. The default is True"), so you can simply specify formulas as strings in your dataframe.
Example of a formula column which references other columns in your dataframe:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
writer = pd.ExcelWriter("foo.xlsx", engine="xlsxwriter")
df["product"] = None
df["product"] = (
'=INDIRECT("R[0]C[%s]", 0)+INDIRECT("R[0]C[%s]", 0)'
% (
df.columns.get_loc("col1") - df.columns.get_loc("product"),
df.columns.get_loc("col2") - df.columns.get_loc("product"),
)
)
df.to_excel(writer, index=False)
writer.save()
Produces the following output:
After writing the df using table.to_excel(writer, sheet_name=...), I use write_formula() as in this example (edited to add the full loop). To write all the formulas in your dataframe, read each formula in your dataframe.
# replace the right side below with reading the formula from your dataframe
# e.g., formula_to_write = df.loc(...)`
rows = table.shape[0]
for row_num in range(1 + startrow, rows + startrow + 1):
formula_to_write = '=I{} * (1 - AM{})'.format(row_num+1, row_num+1)
worksheet.write_formula(row_num, col, formula_to_write)`
Later in the code (I seem to recall one of these might be redundant, but I haven't looked it up):
writer.save() workbook.close()
Documentation is here.
you need to save in as usual just keep in mind to write the formula as string.
you can use also f strings with vars.
writer = pd.ExcelWriter(FILE_PATH ,mode='a', if_sheet_exists='overlay')
col_Q_index = 3
best_formula = f'=max(L1,N98,Q{col_Q_index})'
formula_df = pd.DataFrame([[best_formula]])
formula_df.to_excel(writer, sheet_name=SHEET_NAME, startrow=i, startcol=17, index=False, header=False)
writer.save()
I know that you can specify data types when reading excels using pd.read_excel (as outlined here). Can you do the same using pd.ExcelFile?
I have the following code:
if ".xls" in
xl = pd.ExcelFile(path + "\\" + name, )
for sheet in xl.sheet_names:
xl_parsed = xl.parse(sheet)
When parsing the sheet, some of the values in the columns are displayed in scientific notation. I don't know the column names before loading so I need to import everything as string. Ideally I would like to be able to do something like xl_parsed = xl.parse(sheet, dtype = str). Any suggestions?
If you would prefer a cleaner solution, I used the following:
excel = pd.ExcelFile(path)
for sheet in excel.sheet_names:
columns = excel.parse(sheet).columns
converters = {column: str for column in columns}
data = excel.parse(sheet, converters=converters)
I went with roganjosh's suggestion - open the excel first, get column names and then pass as converter.
if ".xls" in name:
xl = pd.ExcelFile(path)
sheetCounter = 1
for sheet in xl.sheet_names:
### Force to read as string ###
column_list = []
df_column = pd.read_excel(path, sheetCounter - 1).columns
for i in df_column:
column_list.append(i)
converter = {col: str for col in column_list}
##################
xl_parsed = xl.parse(sheet, converters=converter)
sheetCounter = sheetCounter + 1