Replacing row names in excel using python - python

I want to replace the names of the rows in my excel sheet. Whatever the row names may be, I have to replace them with:
Street
City
State
Zip
I am able to read the row names. Can anybody help me with replacing the names I read. Here is my piece of code. Thanks
import xlrd
workbook = xlrd.open_workbook('Path to Excel File')
sheet = workbook.sheet_by_index(0)
print(sheet)
for value in sheet.row_values(0):
print(value)

from openpyxl import load_workbook
wb = load_workbook('filename.xlsx')
ws = wb['sheetname']
ws.cell(row=1,column=1).value = 'Street'
ws.cell(row=1,column=2).value = 'City'
ws.cell(row=1,column=3).value = 'State'
ws.cell(row=1,column=4).value = 'Zip'
wb.save('filename.xlsx')

Related

Creating and naming column in openpyxl

Background-
The following code snippet will iterate over all worksheets in a workbook, and write a formula to every last column -
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
wb.save(filename_output)
Question -I cannot find documentation on how to name the column. Does anyone know how to achieve this?
Context -I want this column (in each worksheet) to be called 'Calculation'.
To get the last column, you can use sheet.max_column. Once you have updated the formulas, you can use sheet.cell(1,col).value = "Calc" to update the header. Updated code below...
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
sheet.cell(1,sheet.max_column).value = "Calculation" ## Add line inside FOR loop
wb.save(filename_output)
Output would look something like this.

How can we get file name, sheet name, max rows, and max columns for all Excel files in a folder?

I am trying to get the file name, sheet name, max rows, and max columns of each sheet in each Excel file. I did some research today on how to use Python to take an inventory of Excel files in a folder. I put together the code below and it seems to get me the file name and sheet name, but it gets stuck on the rows and columns. As I know, the rows and columns are strings, right. I'm trying to accommodate that requirement, but something seems to be off here. Can someone tell me what's wrong here?
import openpyxl
import glob
import pandas as pd
inventory = []
all_data = pd.DataFrame()
path = '\\Users\\ryans\\OneDrive\\Desktop\\sample\\*.xlsx'
for f in glob.glob(path):
print(f)
inventory.append(f)
theFile = openpyxl.load_workbook(f)
sheetnames = theFile.active
for sheet in sheetnames:
print(sheet)
inventory.append(sheet)
row_count = str(sheet.max_row)
col_count = str(sheet.max_col)
inventory.append(row_count)
inventory.append(col_count)
print(inventory)
To iterate over the worksheets in a workbook, you should use for sheet in theFile.worksheets. Your current attempt is actually iterating over all of the rows in your workbook, starting at the active sheet.
sheet.max_col is also the incorrect function, use sheet.max_column
So your working code is now:
import openpyxl
import glob
inventory = []
path = '\\Users\\ryans\\OneDrive\\Desktop\\sample\\*.xlsx'
for f in glob.glob(path):
# print(f)
inventory.append(f)
theFile = openpyxl.load_workbook(f)
sheetnames = theFile.active
for sheet in theFile.worksheets:
# print(sheet)
inventory.append(sheet)
row_count = str(sheet.max_row)
col_count = str(sheet.max_column)
inventory.append(row_count)
inventory.append(col_count)
print(inventory)

Cant read xlsx file with pandas

I am trying read .xlsx file as dataframe. File itself has two worksheet but when I tried to read it returns empty worksheet. Even though I have specified the sheet_name, it returns there is not a worksheet named like you have provided.
I have used several methods but all returns [].
'''
from openpyxl import load_workbook
workbook = load_workbook(filename="filename.xlsx",read_only = True, data_only = True)
print(workbook.sheetnames)
'''
'''
xl = pd.read_excel('filename.xlsx',engine='openpyxl')
xl.sheet_names
'''
If you need list of sheet names:
xl = pd.ExcelFile('filename.xlsx')
xl.sheet_names
# to read from specific sheet
xl.parse(sheetname)
If you know the sheet name just use:
pd.read_excel('filename.xlsx', sheet_name='sheetname')
With pandas:
pandas.read_excel
import pandas as pd
df = pd.read_excel(
io='filename.xlsx',
sheet_name='your sheet name',
engine='openpyxl'
)
With openpyxl:
Read an existing workbook and
Converting a worksheet to a Dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(
filename='filename.xlsx',
data_only=True
)
sheet_names = wb.sheetnames # list available sheet names in workbook
ws = wb['your sheet name']
df = pd.DataFrame(data=ws)
Thanks everyone, I found the problem. It was because of excel.

Reading values from Excel workbook in every worksheet in particular column

I'd like to read the values from column B in every worksheet within my workbook.
After a fair amount of reading and playing around I can return the cell names of the cells I want the values from, but I can't figure out how to get the values.
from openpyxl import load_workbook
wb = load_workbook(r"C:/Users/username/Documents/test.xlsx")
for sheet in wb.worksheets:
for row in range(2,sheet.max_row+1):
for column in "B":
cell_name = "{}{}".format(column, row)
print (cell_name)
This is returning the cell names (i.e. B2, B3) that have values in column B in every worksheet.
According to the documentation https://openpyxl.readthedocs.io/en/stable/usage.html you can access cell values as:
sheet['B5'].value
Replace B5 with the cell(s) you need.
import xlrd
loc = ("foo.xlsx") # excel file name
wb = xlrd.open_workbook(loc)
# sheet = wb.sheet_by_index(0)
for sheet in wb.sheets():
for i in range(sheet.nrows):
print(sheet.cell_value(i, 1))
Edit: I edited my answer to read all sheets in excel file.
just play with the range
from openpyxl import load_workbook
wb = load_workbook('')
for sheet in wb:
for i in range(1,50):
if sheet['B'+str(i)].value:
print(sheet['B'+str(i)].value)
Better one,
from openpyxl import load_workbook
wb = load_workbook('')
for sheet in wb:
for row in sheet['B']:
print(row.value)

How to split merged Excel cells with Python?

I am trying to split only the merged cells in Excel file (with multiple sheets) that are like:
Please note that there are partially/fully empty rows. These rows are not merged.
Using openpyxl, I found the merged cell ranges in each sheet with this code:
wb2 = load_workbook('Example.xlsx')
sheets = wb2.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb2[sheets[i]]
print(ws.merged_cell_ranges)
The print output:
['B3:B9', 'B13:B14', 'A3:A9', 'A13:A14', 'B20:B22', 'A20:A22']
['B5:B9', 'A12:A14', 'B12:B14', 'A17:A18', 'B17:B18', 'A27:A28', 'B27:B28', 'A20:A22', 'B20:B22', 'A3:A4', 'B3:B4', 'A5:A9']
Since I found the merged cell ranges, I need to split the ranges and fill in the corresponding rows like this:
How can I split like this using openpyxl? I am new to using this module. Any feedback is greatly appreciated!
You need to use the unmerge function. Example:
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
when you use unmerge_cells function, sheet.merged_cells.ranges will be modified, so don't use sheet.merged_cells.ranges in for loop.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
from openpyxl.utils.cell import range_boundaries
wb = load_workbook(filename = 'tmp.xlsx')
for st_name in wb.sheetnames:
st = wb[st_name]
mcr_coord_list = [mcr.coord for mcr in st.merged_cells.ranges]
for mcr in mcr_coord_list:
min_col, min_row, max_col, max_row = range_boundaries(mcr)
top_left_cell_value = st.cell(row=min_row, column=min_col).value
st.unmerge_cells(mcr)
for row in st.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wb.save('merged_tmp.xlsx')

Categories