I created dataframe, and use df.to_excel('test.xlsx', index=False). I can see my python code generated excel file in a local directory, but problem is I can't open it with excel.
I also added more parameter engine='xlsxwriter' in df.to_excel('test.xlsx', index=False). Thus I tried
df.to_excel('test.xlsx', index=False, engine='xlsxwriter'), but didn't work out.
import pandas as pd
import numpy as np
df = pd.read_csv('123.tsv', sep='\t')
df['M'] = df['M'].astype(str)
m = df.M.str.split(',', expand=True).values.ravel()
df = df.dropna()
df = df[~df.M.str.contains("#")]
df = df.drop_duplicates()
df.to_excel('123.xlsx', index=False, engine='xlsxwriter')
expected outcome: just wanna open 123.xlsx in excel
actual result:
Excel cannot open the file '123.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file. (Mac Excel 2016)
I'm responding some time later, but it may be helpful to someone
You may try by using an ExcelWriter, paying attention to include the .close(), which actually saves the file. In fact, as documentation reports "The writer should be used as a context manager. Otherwise, call close() to save and close any opened file handles."
import pandas as pd
writer = pd.ExcelWriter('your_filename.xlsx'))
df.to_excel(writer, sheet_name='your_sheet_name')
writer.save()
This is a late reply, but I'm having this problem when saving in OneDrive only (e.g. saving to c:\users\me\onedrive\folder\whatever.xlsx). If I save it to a non-Onedrive location (e.g. c:\work\whatever.xlsx). The following is the code I'm using:
with pd.ExcelWriter("c:\\work\\whatever.xlsx") as writer:
sheet1.to_excel(writer, sheet_name="sheet1")
sheet2.to_excel(writer, sheet_name="sheet2")
writer.close()
I'm not 100% sure the final writer.close() is needed.
It should work as expected. I tried your program with the following sample input (tab separated):
M L
foo 123
bar 456
baz 789
And was able to open the output 123.xlsx file:
Can you try this simple input and see if it works.
It is excel issue after I updated it. Please close this issue stackoverflow team. Thank you.
Related
I' am new to Python and trying to write into a merged cell within Excel. I can see the data that is already stored within this cell/row, so I know its there. However when I try to overwrite it nothing happens.
I have tried messing with the index and header as well but nothing seems to work.
import pandas as pd
from openpyxl import load_workbook
Read the excel file into a pandas DataFrame
df = pd.read_excel(file here', sheet_name='Sheet1')
print(df.iloc[8, 2])
Make the changes to the DataFrame
df.iloc[8, 2] = "Bob Smith"
Load the workbook
book = load_workbook(file here)
writer = pd.ExcelWriter(file here, engine='openpyxl')
writer.book = book
Write the DataFrame to the first sheet
df.to_excel(writer, index=False)
Save the changes to the Excel file
writer.save()
import pandas as pd
from openpyxl import *
file="C:/Users/OneDrive/Bureau/draftExcel.xlsx"
df = pd.read_excel(file,sheet_name='sheet1')
df.iat[5,0]='cell is updated'
print(df) # to check first in the terminal if the content of the cell is updated
book=load_workbook(file)
writer=pd.ExcelWriter(file, engine='openpyxl')
df.to_excel(writer,sheet_name='sheet1',index=False)
writer.close()
I tried to make an example from what you explained because you didn't show your code, so I hope it was helpful.
Instead of using .iloc I used .iat so you can update the data in a specific cell in your DataFrame using column_index instead of column_label.
Remember that the Excel file you are working on must be closed while you are editing data with python, if it is open you will get an error.
import pandas as pd
import openpyxl
df = pd.read_excel('file.xlsx', sheet_name='Лист1')
with pd.ExcelWriter('file.xlsx', engine="openpyxl", mode="a") as writer:
df.to_excel(writer, index=False, sheet_name='processing')
This code reads one sheet of an excel file and copies all the data to an adjacent sheet, but when searching for sheets, it turns out that some of the information is lost both in the original sheet and in the new sheet. Each time the code is executed, a different amount of information is overwritten.
But how is this possible? If there is a clear command in the code to read the first sheet - what makes it change? The same questions apply to writing a new sheet.
The screenshot below shows three different categories with symptoms:
enter image description here
I use versions: Pandas - 1.4.4, Openpyxl - 3.0.10
For example: it works well in pycharm, but it doesn't work in Jupyter notebook.
My suspicion is that the code somehow reacts incorrectly to the characters that are in the text (emoji) and this text with tags. In the initial version, there were many columns, but only one is problematic - which is in the example. But I am an amateur, and I only express my assumptions. I've run out of solutions.
Original file
Help me please. Best Regards.
My friend suggested this code, but it doesn't work either.
from html import unescape
from unicodedata import normalize
import pandas as pd
df = pd.read_excel(
'file.xlsx',
engine='openpyxl',
converters={
'Описание - Description': lambda x: normalize('NFKC', unescape(x))
}
)
with pd.ExcelWriter('file.xlsx', engine="openpyxl", mode="a") as writer:
df.to_excel(writer, index=False, sheet_name='processing')
I have a 140MB Excel file I need to analyze using pandas. The problem is that if I open this file as xlsx it takes python 5 minutes simply to read it. I tried to manually save this file as csv and then it takes Python about a second to open and read it! There are different 2012-2014 solutions that why Python 3 don't really work on my end.
Can somebody suggest how to convert very quickly file 'C:\master_file.xlsx' to 'C:\master_file.csv'?
There is a project aiming to be very pythonic on dealing with data called "rows". It relies on "openpyxl" for xlsx, though. I don't know if this will be faster than Pandas, but anyway:
$ pip install rows openpyxl
And:
import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))
I faced the same problem as you. Pandas and openpyxl didn't work for me.
I came across with this solution and that worked great for me:
import win32com.client
xl=win32com.client.Dispatch("Excel.Application")
xl.DisplayAlerts = False
xl.Workbooks.Open(Filename=your_file_path,ReadOnly=1)
wb = xl.Workbooks(1)
wb.SaveAs(Filename='new_file.csv', FileFormat='6') #6 means csv
wb.Close(False)
xl.Application.Quit()
wb=None
xl=None
Here you convert the file to csv by means of Excel. All the other ways that I tried refuse to work.
Use read-only mode in openpyxl. Something like the following should work.
import csv
import openpyxl
wb = load_workbook("myfile.xlsx", read_only=True)
ws = wb['sheetname']
with open("myfile.csv", "wb") as out:
writer = csv.writer(out)
for row in ws:
values = (cell.value for cell in row)
writer.writerow(values)
Fastest way that pops to mind:
pandas.read_excel
pandas.DataFrame.to_csv
As an added benefit, you'll be able to do cleanup of the data before saving it to csv.
import pandas as pd
df = pd.read_excel('C:\master_file.xlsx', header=0) #, sheetname='<your sheet>'
df.to_csv('C:\master_file.csv', index=False, quotechar="'")
At some point, dealing with lots of data will take lots of time. Just a fact of life. Good to look for options if it's a problem, though.
I get this error while i want to keep my dataframe in excel file which name pandas_simple.xlsx
Below is my error:
This is my code:
import pandas as pd
df = pd.DataFrame({'Car': [101, 20, 350, 20, 15, 320, 454]})
writer = pd.ExcelWriter('pandas_simple.xlsx')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
writer.close()
Anyone can share some idea to me here?
This error could also occur if you have a version of the same file (pandas_simple.xlsx in this case) already open on your desktop. In that case, python will not have permission to close and overwrite the same file as well. Closing the excel file and re-running the script should resolve the issue.
You try to write to a folder where you need administration rights. Change:
writer = pd.ExcelWriter("pandas_simple.xlsx")
to:
writer = pd.ExcelWriter("C:\\...\\pandas_simple.xlsx")
with the full path and you will not have a problem.
The documentation of pandas.DataFrame.to_excel says that the first argument can be a string that represents the file path. In your case i would drop all lines with writer and just try
df.to_excel('pandas_simple.xlsx')
That should write pandas_simple.xlsx to your current working directory. If that does not work try to provide the full path name (e.g. C:\\Users\\John\\pandas_simple.xlsx). Also make sure that you don't try to write to a directory which needs adminstration rights.
What if the path is correct?!!!
Try closing the xlsx file opened in Excel application and run the code again, it worked for me and same should happen with you.
I am attaching my code snippet for your reference
import pandas as pd
file='C:/Users/Aladahalli/Desktop/Book1.xlsx'
xls = pd.ExcelFile(file)
df = pd.read_excel(xls, sheet_name='Sheet1')
#create a column by name Final and store concatinated columns
df["Final"] = df["Name"] + " " + df["Rank/Designation"] + " " + df["PS"]
print(df.head())
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('C:/Users/Aladahalli/Desktop/Final.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()
Make sure you dont have the file open that you are trying to write to.
Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility