Pandas partially removes contents of cells in random order - python

import pandas as pd
import openpyxl
df = pd.read_excel('file.xlsx', sheet_name='Лист1')
with pd.ExcelWriter('file.xlsx', engine="openpyxl", mode="a") as writer:
df.to_excel(writer, index=False, sheet_name='processing')
This code reads one sheet of an excel file and copies all the data to an adjacent sheet, but when searching for sheets, it turns out that some of the information is lost both in the original sheet and in the new sheet. Each time the code is executed, a different amount of information is overwritten.
But how is this possible? If there is a clear command in the code to read the first sheet - what makes it change? The same questions apply to writing a new sheet.
The screenshot below shows three different categories with symptoms:
enter image description here
I use versions: Pandas - 1.4.4, Openpyxl - 3.0.10
For example: it works well in pycharm, but it doesn't work in Jupyter notebook.
My suspicion is that the code somehow reacts incorrectly to the characters that are in the text (emoji) and this text with tags. In the initial version, there were many columns, but only one is problematic - which is in the example. But I am an amateur, and I only express my assumptions. I've run out of solutions.
Original file
Help me please. Best Regards.
My friend suggested this code, but it doesn't work either.
from html import unescape
from unicodedata import normalize
import pandas as pd
df = pd.read_excel(
'file.xlsx',
engine='openpyxl',
converters={
'Описание - Description': lambda x: normalize('NFKC', unescape(x))
}
)
with pd.ExcelWriter('file.xlsx', engine="openpyxl", mode="a") as writer:
df.to_excel(writer, index=False, sheet_name='processing')

Related

Trouble wrting to Excel

I' am new to Python and trying to write into a merged cell within Excel. I can see the data that is already stored within this cell/row, so I know its there. However when I try to overwrite it nothing happens.
I have tried messing with the index and header as well but nothing seems to work.
import pandas as pd
from openpyxl import load_workbook
Read the excel file into a pandas DataFrame
df = pd.read_excel(file here', sheet_name='Sheet1')
print(df.iloc[8, 2])
Make the changes to the DataFrame
df.iloc[8, 2] = "Bob Smith"
Load the workbook
book = load_workbook(file here)
writer = pd.ExcelWriter(file here, engine='openpyxl')
writer.book = book
Write the DataFrame to the first sheet
df.to_excel(writer, index=False)
Save the changes to the Excel file
writer.save()
import pandas as pd
from openpyxl import *
file="C:/Users/OneDrive/Bureau/draftExcel.xlsx"
df = pd.read_excel(file,sheet_name='sheet1')
df.iat[5,0]='cell is updated'
print(df) # to check first in the terminal if the content of the cell is updated
book=load_workbook(file)
writer=pd.ExcelWriter(file, engine='openpyxl')
df.to_excel(writer,sheet_name='sheet1',index=False)
writer.close()
I tried to make an example from what you explained because you didn't show your code, so I hope it was helpful.
Instead of using .iloc I used .iat so you can update the data in a specific cell in your DataFrame using column_index instead of column_label.
Remember that the Excel file you are working on must be closed while you are editing data with python, if it is open you will get an error.

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

saving a dataframe to csv file (python)

I am trying to restructure the way my precipitations' data is being organized in an excel file. To do this, I've written the following code:
import pandas as pd
df = pd.read_excel('El Jem_Souassi.xlsx', sheetname=None, header=None)
data=df["El Jem"]
T=[]
for column in range(1,56):
liste=data[column].tolist()
for row in range(1,len(liste)):
liste[row]=str(liste[row])
if liste[row]!='nan':
T.append(liste[row])
result=pd.DataFrame(T)
result
This code works fine and through Jupyter I can see that the result is good
screenshot
However, I am facing a problem when attempting to save this dataframe to a csv file.
result.to_csv("output.csv")
The resulting file contains the vertical index column and it seems I am unable to call for a specific cell.
(Hopefully, someone can help me with this problem)
Many thanks !!
It's all in the docs.
You are interested in skipping the index column, so do:
result.to_csv("output.csv", index=False)
If you also want to skip the header add:
result.to_csv("output.csv", index=False, header=False)
I don't know how your input data looks like (it is a good idea to make it available in your question). But note that currently you can obtain the same results just by doing:
import pandas as pd
df = pd.DataFrame([0]*16)
df.to_csv('results.csv', index=False, header=False)

How to write to an Excel sheet without exporting a dataframe first?

I am trying to write some text to a specific sheet in an Excel file. I export a number of pandas dataframes to the other tabs, but in this one I need only some text - basically some comments explaining how the other tabs were calculated.
I have tried this but it doesn't work:
import pandas as pd
writer=pd.ExcelWriter('myfile.xlsx')
writer.sheets['mytab'].write(1,1,'This is a test')
writer.close()
I have tried adding writer.book.add_worksheet('mytab') and
ws=writer.sheets['mytab']
ws.write(1,1,'This is a test')
but in all cases I am getting: keyerror:'mytab'.
The only solution I have found is to write an empty dataframe to the tab before writing my text to the same tab:
emptydf=pd.DataFrame()
emptydf['x']=[None]
emptydf.to_excel(writer,'mytab',header=False, index=False)
I could of course create a workbook instance, as in the example on the documentation of xlsxwriter: http://xlsxwriter.readthedocs.io/worksheet.html
However, my problem is that I already have a pd.ExcelWriter instance, which is used in the rest of my code to create the other excel sheets.
I even tried passing a workbook instance to to_excel(), but it doesn't work:
workbook = xlsxwriter.Workbook('filename.xlsx')
emptydf.to_excel(workbook,'mytab',header=False, index=False)
Is there any alternative to my solution of exporting an empty dataframe - which seems as unpythonic as it can get?
You mentioned that you used add_worksheet() method from the writer.book object, but it seems to work and do what you wanted it to do. Below I've put in a reproducible example that worked successfully.
import pandas as pd
print(pd.__version__)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
ws = workbook.add_worksheet('mytab')
ws.write(1,1,'This is a test')
writer.close()
Thought I'd also mention that I'm using pandas 0.18.1.

pandas read excel values not formulas

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

Categories