I'm just wondering how to update a single cell in an excel spreadsheet with Pandas in a python script. I don't want any of the other cells in the file to be overwritten, just the one cell I'm trying to update. I tried using .at[], .iat[], and .loc() but my excel spreadsheet does not update. None of the other deprecated methods like .set_value() work either. What am I doing wrong?
import pandas as pd
tp = pd.read_excel("testbook.xlsx", sheet_name = "Sheet1")
tp.at[1, 'A'] = 10
I might suggest using xlwings for this operation, as it might be easier than reading and writing a sheet in pandas dataframes. The example below changes the value of "A1".
import xlwings as xw
sheet = xw.Book("testbook.xlsx").sheets("Sheet1")
sheet.range("A1").value = "hello world"
Also note xlwings is included with all Anaconda packages if you're using that: https://docs.xlwings.org/en/stable/api.html
Related
I'd like to read the autofilter rules from an excel sheet in python.
Suppose this kind of input:
original input
then I filter with excel autofilter one column, for example:
filtered input
Is there a way to retrieve the applied autofilter rule in python?
Currently the only option I know, it is to set the autofilter via xlwings:
import xlwings as xw
# Open the workbook
workbook = xw.Book(r"C:\Users\Desktop\Example.xlsx")
# Set Autofilter
workbook.sheets[0].api.Range("A1:D4").AutoFilter(4,"Yes")
but does it exist the "inverse" function?
It could be fine also with other way like pandas, openpyxl, xlsxwriter and so on.
With Xlwings you should be able to duplicate what VBA can do so it's usually the better for this type of query.
You should be able to show the Filter set from Criteria1 as shown below;
import xlwings as xw
# Open the workbook
workbook = xw.Book(r"C:\Users\Desktop\Example.xlsx")
# Set Autofilter
workbook.sheets[0].api.Range("A1:D4").AutoFilter(4,"Yes")
for count, item in enumerate(workbook.sheets[0].api.AutoFilter.Filters,1):
if item.On:
print(f'{count}, {item.Criteria1}')
Output would be
4, =Yes
Suppose my data.xlsx's first sheet contains some computed columns.
I'm trying to pull out a pd.DataFrame of that sheet that holds the computed values.
But try as I may, I cannot achieve this.
Fails:
🔸
# > pip install openpyxl
import pandas as pd
pd.read_excel(f'data.xlsx', 'firstSheetName')
# NOTE: Adding `, engine='openpyxl'` makes no difference
df_nodal.head()
This gives NaN in all calculate fields.
🔸
xl = pd.ExcelFile(f'data.xlsx')
df = xl.parse('firstSheetName')
df.head()
Same.
🔸
how to read xlsx as pandas dataframe with formulas as strings
from openpyxl import load_workbook
wb = load_workbook(filename = f'data.xlsx')
ws = wb['mySheetName']
df = pd.DataFrame(ws.values)
df.head()
Now this is giving the formulae: =H2, =H3 etc. in the cells.
An attempt to 'type-convert' these colums failed:
df[12][2:].astype(float)
# ValueError: could not convert string to float: '=H3'
🔸 How to force pandas to evaluate formulas of xlsx and not read them as NaN? might offer a solution, which involves saving and reloading the .xlsx. However I can't get it working. That syntax appears invalid.
import pandas as pd, xlwings as xw
def df_from_excel(path):
book = xw.Book(path)
book.save()
return pd.read_excel(path,header=0)
df = df_from_excel('nodal0.xlsx')
This gives XlwingsError: Make sure to have "appscript" and "psutil", dependencies of xlwings, installed.
And pip install appscript psutil says they're both already installed.
Note: Same idea here: Pandas read_excel with formulas and get values
🔸🔸🔸
I'm trying to find a way for it to render into a dataframe, which will then contain numeric values.
Is there any way to do it?
🔸🔸🔸
EDIT:
Here's what I'm dealing with:
The raw .xlsx is shown below. I've double-clicked a calculated cell revealing the underlying =H2.
Notice the corresponding cell of the dataframe (generated from this .xlsx) is showing NaN
I am trying to get the value from cell with row = 11 and column B and C. See screenshot for more clarification.
I tried following code using xlrd package but it does not print anything.
import xlrd
path = "C:/myfilepath/data.xlsx"
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
sheet.cell_value(10,1)
sheet.cell_value(10,2)
I am not able to output the value from particular merged cells using xlrd package in python.
Above code should print the cell value i.e PCHGFT001KS
I don't know how xlrd works, but I do know how the lovely openpyxl works. You should use openpyxl! it's a robust tool for working with xlsx files. (NOT xls).
import openpyxl
wb = openpyxl.load_workbook(excel)
ws = wb[wb.get_sheet_names()[0]]
print(ws['B11'].value)
Extra:
If you want to unmerge those blocks you can do the following.
for items in ws.merged_cell_ranges:
ws.unmerge_cells(str(items))
wb.save(excel)
Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility
I've got to read .xlsx file every 10min in python.
What is the most efficient way to do this?
I've tried using xlrd, but it doesn't read .xlsx - according to documentation he does, but I can't do this - getting Unsupported format, or corrupt file exceptions.
What is the best way to read xlsx?
I need to read comments in cells too.
xlrd hasn't released the version yet to read xlsx. Until then, Eric Gazoni built a package called openpyxl - reads xlsx files, and does limited writing of them.
Use Openpyxl some basic examples:
import openpyxl
# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)
# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)
# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)
# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)
o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)
o_cell = o_sheet['H1']
print(o_cell.value)
# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)
There are multiple ways to read XLSX formatted files using python. Two are illustrated below and require that you install openpyxl at least and if you want to parse into pandas directly you want to install pandas, eg. pip install pandas openpyxl
Option 1: pandas direct
Primary use case: load just the data for further processing.
Using read_excel() function in pandas would be your best choice. Note that pandas should fall back to openpyxl automatically but in the event of format issues its best to specify the engine directly.
df_pd = pd.read_excel("path/file_name.xlsx", engine="openpyxl")
Option 2 - openpyxl direct
Primary use case: getting or editing specific Excel document elements such as comments (requested by OP), formatting properties or formulas.
Using load_workbook() followed by comment extraction using the comment attribute for each cell would be achieved by the following.
from openpyxl import load_workbook
wb = load_workbook(filename = "path/file_name.xlsx")
ws = wb.active
ws["A1"].comment # <- loop through row & columns to extract all comments