Add new rows in each iteration in the Excel file - python

I want to increase the number of rows on each iteration and write the new calculated values in there without deleting the old ones.
For example this program
import xlsxwriter
for i in range(5):
a=i*i
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(i, 0, a)
workbook.close()
I found this library on the internet https://xlsxwriter.readthedocs.io/examples.html. Is this library good or is there a better one?
Unfortunately, only the value from the last iteration is displayed in the Excel file.
How can I make it so that all values are displayed to me?
Thank You

Related

Read cell value, NOT the formula in Excel sheet using Python

I am trying to read only the cell value in an Excel spread sheet using Python's openpyxl, but I am only able to read the forumulas.
I have already come across countless questions on Stack Overflow that ask this question and they all says to set the flag data_only=True like this:
wb = openpyxl.load_workbook(reference_filename, data_only=True)
ws = wb.worksheets[0]
cell_value = ws.cell(7, 1).value
print(cell_value)
However, this is still only printing the formula.. Why??
I just need the value that is in the cell.
The openpyxl documentation (https://openpyxl.readthedocs.io/en/latest/usage.html?#read-an-existing-workbook)
notes that...
data_only controls whether cells with formulae have either the formula
(default) or the value stored the last time Excel read the sheet.
If the worksheet hasn't been opened by Excel previously, it may not have the last-calculated values stored and therefore openpyxl may not be able to extract it.
Do you have to use openpyxl?
Check this out using pandas:
ws = pd.read_excel('try.xlsx', sheet_name=0, header=None)
cell_value = ws[7][1]
print(cell_value)
For me this gives the result not the formula.
The problem could be on the excel file. If the cell you are trying to read is set as 'Show formula', then the openpyxl will read the formula instead of the value. Go to your excel and Formulas -> Formula Auditing -> Uncheck Show Formulas save the file and run the python program again

Deleting rows from a large file using openpyxl

i'm working with openpyxl on a .xlsx file which has around 10K products, of which some are "regular items" and some are products that need to be ordered when required. For the project I'm doing I would like to delete all of the rows containing the items that need to be ordered.
I tested this with a small sample size of the actual workbook and did have the code working the way I wanted to. However when I tried this in the actual workbook with 10K rows it seems to be taking forever to delete those rows (it has been running for nearly and hour now).
Here's the code that I used:
wb = openpyxl.load_workbook('prod.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
def clean_workbook():
for row in sheet:
for cell in row:
if cell.value == 'ordered':
sheet.delete_rows(cell.row)
I would like to know is there a faster way of doing this with some tweaks in my code? Or is there a better way to just read just the regular stock from the workbook without deleting the unwanted items?
Deleting rows in loops can be slow because openpyxl has to update all the cells below the row being deleted. Therefore, you should do this as little as possible. One way is to collect a list of row numbers, check for contiguous groups and then delete using this list from the bottom.
A better approach might be to loop through ws.values and write to a new worksheet filtering out the relevant rows. Copy any other relevant data such as formatting, etc. Then you can delete the original worksheet and rename the new one.
ws1 = wb['My Sheet']
ws2 = wb.create_sheet('My Sheet New')
for row in ws1.values:
if row[x] == "ordered": # we can assume this is always the same column
continue
ws2.append(row)
del wb["My Sheet"]
ws2.title = "My Sheet"
For more sophisticated filtering you will probably want to load the values into a Pandas dataframe, make the changes and then write to a new sheet.
You can open with read-only mode, and import all content into a list, then modify in list is always a lot more faster than working in excel. After you modify the list, made a new worksheet and upload your list back to excel. I did this way with my 100k items excel .

How to append data to the last row (every time) of an Excel file?

I am looking for a way to append data from a Python program to an excel sheet. For this, I chose the openpyxl library to save this data.
My problem is how to put new data in the excel file without losing the current data, in the last row of the sheet. I look into the documentation but I did not see any answer.
I do not know if this library has a method to add new data or I need to make a logic to this task.
The last row of the sheet can be found using max_row():
from openpyxl import load_workbook
myFileName=r'C:\DemoFile.xlsx'
#load the workbook, and put the sheet into a variable
wb = load_workbook(filename=myFileName)
ws = wb['Sheet1']
#max_row is a sheet function that gets the last row in a sheet.
newRowLocation = ws.max_row +1
#write to the cell you want, specifying row and column, and value :-)
ws.cell(column=1,row=newRowLocation, value="aha! a new entry at the end")
wb.save(filename=myFileName)
wb.close()
What you're looking for is the Worksheet.append method:
Appends a group of values at the bottom of the current sheet.
If it’s a list: all values are added in order, starting from the first column
If it’s a dict: values are assigned to the columns indicated by the keys (numbers or letters)
So no need to check for the last row. Just use this method to always add the data at the end.
ws.append(["some", "test", "data"])

How to pull last cell in column using openpyxl in python

I created a small program that writes to an excel file. I have another program that needs to read the last entry (in column A) every day. Since there is a new data imported into the excel file every day, the cell that I need to capture is different.
I'm looking to see if there is a way for me to grab the last cell in Column A using openpyxl in python?
I don't have much experience with this, so I wasn't sure where to start.
import openpyxl
wb = openpyxl.load_workbook('text.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
from https://openpyxl.readthedocs.io/en/stable/tutorial.html
try this, it should get the entire A column and take the last entry:
sheet['A'][-1]

Moving Data From One Workbook To Another With Openpyxl

I'm trying to create a new workbook with data from an already existing workbook. The existing workbook is extremely large so I have it loaded as a read-only workbook.
Because of this, I need to iterate through the rows but I can't seem to figure out how to do this AND get data into the new workbook.
Along with this, the data is from column A and is only put into the new workbook if the cell in column B say "IL".
for row in existing_sheet.iter_rows(min_col=2, max_col=2):
for cell in row:
print("CHECKING IF IT IS IN IL")
if "IL" in str(cell.value):
currSheet.cell(row=counter, column=1).value = existing_sheet.cell(row=counter, column=41).value
I keep getting deprecation warnings and the program is going much slower than I think it should be.
When I simply do a print statement to see the cell value, it goes through all 40,000 rows in just a few minutes.
My current code takes hours, if not longer.
existing_sheet.cell(row=counter, column=41).value
This is what is slowing everything down. In read-only mode every call to iter_rows() or cell() will force openpyxl to parse the worksheet again. But you will need to have a wider row to get the 41st cell row[40].
for row in ws1.iter_rows(min_col=2, max_col=41):
if "IL" in row[2].value:
ws2.cell(row=row[2].row, column=1).value = row[40].value

Categories