I am trying to do something like grabbing all values in each cell while they are referencing one by one. Maybe an example help illustration.
Example:
A
B
C
=B2
='I am' & C2
'Peter
Example2 - in term of number:
A
B
C
D
=B2
=C2*D2
12
56
So I want to get a concat string 'I am Peter' or 672 (from 12*56) when I reading the cell A2
Code I tried:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'new.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
print(df)
The formula will become 'NaN'
Any suggestion to achieve it? Thanks!
If you want to have the actual values of the cells, you have to use data_only=True
wb = load_workbook(filename = 'new.xlsx', data_only=True)
Look here: Read Excel cell value and not the formula computing it -openpyxl
Anyway, as you use pandas, it would be way easier to go directly:
import pandas as pd
df = pd.read_excel('new.xlsx')
print(df)
which grabs the first sheet (but could be specified) and gives the values as output.
openpyxl supports either the formula or the value of the formula. You can select which using the data_only parameter when loading a workbook.
You can change your code like below:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename='new.xlsx', data_only=True)
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
print(df)
Related
I want to read from excel sheet a specific cell: h6. So I try it like this:
import pandas as pd
excel_file = './docs/fruit.xlsx'
df = pd.read_excel(excel_file,'Overzicht')
sheet = df.active
x1 = sheet['H6'].value
print(x1)
But then I get this error:
File "C:\Python310\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'active'
So my questiion is: How to read specif cell from sheet from excelsheet?
Thank you
Oke, I tried with openpyxl:
import openpyxl
path = "./docs/fruit.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
cell_obj = sheet_obj.cell(row = 6, column = 9)
print(cell_obj.value)
But then the formula is printed. Like this:
=(H6*1000)/F6/G6
and not the value: 93
You can do this using openpyxl directly or pandas (which internally uses openpyxl behind the scene)...
Using Openpyxl
You will need to use data_only=True when you open the file. Also, make sure you know the row and column number. To read the data in H6, row would be 6 and 8 would be H
import openpyxl
path = "./docs/Schoolfruit.xlsx"
wb_obj = openpyxl.load_workbook(path, data_only=True)
sheet_obj = wb_obj.active ## Or use sheet_obj = wb_obj['Sheet1'] if you know sheet name
val = sheet_obj.cell(row = 6, column = 8).value
print(val)
Using Pandas
The other option is to use pandas read_excel() which will read the whole sheet into a dataframe. You can use iloc() or at() to read the specific cell. Note that this is probably the less optimal solution if you need to read just one cell...
Another point to note here is that, once you have read the data into a dataframe, the row 1 will be considered as the header and the first row would now be 0. So the row number would be 4 instead of 6. Similarly, the first column would now be 0 and not 1, which would change the position to [4,7]
import pandas as pd
path = "./docs/Schoolfruit.xlsx"
df = pd.read_excel(path, 'Sheet1')
print(df.iloc[4,7])
I found a solution and hope, it works for you.
import pandas as pd
excel_file = './docs/Schoolfruit.xlsx'
df = pd.read_excel(excel_file, sheet_name='active' ,header=None, skiprows=1)
print(df[7][4])
7: Hth column
4: 6th row (skipped first row and index is began from 0)
I have a list value and want to assign it into a column in a excel file. The values I want to change are in sheet 6.
my poor code looks something like this the best I could do is try to first change the AF6:AF22 to a fixed value 5 with hope that I could change it to list.
But is there a simple way to change AF6:AF22 values to a list?
something simple ws['AF6:AF22'] = l?
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
l = list(range(5))
FilePath = 'excel_file.xlsx'
wb = load_workbook(FilePath)
ws = wb.worksheets
sheet_number = 6
for sheet_number in ws.iter_cols('AF6:AF22'):
for cell in sheet_number:
cell.value = 5
Option 1
Hi - I am adding a faster way here. This is probably better as it avoids the for loop and updating cells one at a time.
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
l = list(range(17)) #The list - You can replace l with whatever you need
with pd.ExcelWriter('excel_file.xlsx', mode='a', engine = 'openpyxl') as writer:
pd.DataFrame(l).to_excel(writer, sheet_name='Sheet6', startrow = 5, startcol= 31, index=False, header=None)
Option 2
You can use the below code to do what you need. Added comments, so you get an understanding of my logic...
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
l = list(range(17)) #The list - You can replace l with whatever you need
FilePath = 'excel_file.xlsx'
wb = load_workbook(FilePath)
ws = wb.worksheets[5] #Worksheet 5 is the 6th sheet as numbering starts from zero
for i in range(6,23): # Column numbers 6 through 22
ws.cell(row=i, column=32).value = l[i-6] #Write to cell in AF = column 32
wb.save("excel_file.xlsx")
I was wondering if I can return several rows of an excel sheet that where some columns consist of a unique string. And then I want to export them into a CSV.
I was considering openpyxl but am not getting too far.
If my Excel looks like that:
Sample
I would e.g. search for ID2 and return all rows
ID2,1,ping
ID2,2,pong
from openpyxl import Workbook
import openpyxl
file = "test.xlsx"
wb = openpyxl.load_workbook(file, read_only=True)
ws = wb.active
for row in ws.iter_rows("A"):
for cell in row:
if cell.value == "ID2":
print(ws.cell(row=cell.row, column=1,2,3).value)
Can anyone help me?
Try using pandas pd.read_excel() and pd.to_csv(), for example:
import pandas as pd
df = pd.read_excel('/file/path/excel.xslx')
df_filtered = df[df['id_column'] == 'ID1'] # returns df with only rows where 'id_column' is 'ID1'
df_filtered.to_csv('/file/path/output.csv')
Would export a csv with only rows where your 'id_column' is equal to 'ID1'.
I am working on a project to filter all rows in Excel with condition that contains yesterday's date and append filtered rows with column names into a new workbook. I've tried and searched all over the place but failed to accomplish this goal. Here is the input file:
Court # Received Date column b column c
502419/2020
01/30/2020 xxx xxx
Here is the code that I tried:
import xlrd
sheet_data = []
wb = xlrd.open_workbook('path to input file')
sh = wb.sheet_by_index(0)
i = 0
for i in range (sh.nrows):
if i != sh.row_values.str.find('02/25/2020'):
i += 1
else:
sheet_data.append(i)
Errors that produced is 'function' object has no attribute 'str'. I changed it to contains function which produces the same result.
Any help is much appreciated. Thanks guys!
I spent way too long coming up with a solution. Google's results were pretty poor and all gave the same resultl. What you needed was openpyxl.
import pandas as pd
from datetime import datetime, timedelta
# requires openpyxl, xlrd
file = 'FILENAME.xlsx'
df = pd.read_excel(file, index_col=None)
yest = datetime.today() - timedelta(days=1)
yest = yest.strftime('%m/%d/%y')
df2: pd.DataFrame = df[df['COLUMN A'].str.contains(yest)]
df2.to_excel('filtered.xlsx', index=False)
Using just openpyxl you can try this:
import openpyxl
from datetime import datetime, timedelta
wb1 = openpyxl.load_workbook('Book1.xlsx')
wb2 = openpyxl.workbook.Workbook(write_only=True)
wb2.create_sheet('sheet1')
sh1 = wb1[wb1.sheetnames[0]]
sh2 = wb2['sheet1']
sh2.append(next(sh1.iter_rows()))
yest = (datetime.today() - timedelta(days=1)).strftime('%m/%d/%y')
for row in sh1.iter_rows():
if yest in row[0].value:
sh2.append((cell.value for cell in row))
wb2.save("filtered.xlsx")
I am trying to split only the merged cells in Excel file (with multiple sheets) that are like:
Please note that there are partially/fully empty rows. These rows are not merged.
Using openpyxl, I found the merged cell ranges in each sheet with this code:
wb2 = load_workbook('Example.xlsx')
sheets = wb2.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb2[sheets[i]]
print(ws.merged_cell_ranges)
The print output:
['B3:B9', 'B13:B14', 'A3:A9', 'A13:A14', 'B20:B22', 'A20:A22']
['B5:B9', 'A12:A14', 'B12:B14', 'A17:A18', 'B17:B18', 'A27:A28', 'B27:B28', 'A20:A22', 'B20:B22', 'A3:A4', 'B3:B4', 'A5:A9']
Since I found the merged cell ranges, I need to split the ranges and fill in the corresponding rows like this:
How can I split like this using openpyxl? I am new to using this module. Any feedback is greatly appreciated!
You need to use the unmerge function. Example:
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
when you use unmerge_cells function, sheet.merged_cells.ranges will be modified, so don't use sheet.merged_cells.ranges in for loop.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
from openpyxl.utils.cell import range_boundaries
wb = load_workbook(filename = 'tmp.xlsx')
for st_name in wb.sheetnames:
st = wb[st_name]
mcr_coord_list = [mcr.coord for mcr in st.merged_cells.ranges]
for mcr in mcr_coord_list:
min_col, min_row, max_col, max_row = range_boundaries(mcr)
top_left_cell_value = st.cell(row=min_row, column=min_col).value
st.unmerge_cells(mcr)
for row in st.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wb.save('merged_tmp.xlsx')