I tried using Pandas but am getting errors that I don't understand quite yet when trying to import the file. I am able, however, to import the file easily using openpyxl.
I have a very large sheet that has header data. It also contains a table that I'll need to transpose up towards the top and then the main table starts on a row (39). I am able to import the sheet and then run:
sheet_obj = wb_obj.active
sheet_obj.delete_rows(1,39)
I want to be able to write the new object to a new sheet (I called "test") in the same work book so that I can test what I'm deleting. (Eventually, I will be exporting this to MySQL, but wanting to see the contents of the table as sort of a debugging effort. I am unable to figure out how to write the sheet_obj to the other sheet. I'm sure this is simple....
Two questions:
How to write a sheet object to a NEW sheet
Is there a simple way to transpose the object? (I saw that pandas has a wb.T method - does openpyxl have something similar?
THANK YOU SO much! I'm very new to python and learning all of this on the fly.
Sincerely,
Rob
Related
im trying to generate a excel while not deleting the user configuration.
For example you can create here
a new view. And save it.
But when im reading the excel file with pandas or anything else and generate the excel 'the view' would be deleted.
Is there a way where I can create the view in python again? Or dont delete the view?
I looked into some other libraries like openpyxl, xlswriter, but i didnt found any option that can do this.
Openpyxl has the functionality to use Sheet Views. I've never used it, so I can't give you specifics. In theory it would allow you to read and rebuild a Sheet View.
Pandas doesn't include that functionality as far as I know. What it does have is the recent ability (and it's also in openpyxl) to append to an existing Excel workbook instead of overwriting.
If you have a Sheet View pointing at a particular sheet, and are adding/editing a different sheet, you could use this and it shouldn't impact the sheet view.
If you are editing the sheet the sheet view is pointing at, then you would need to rebuild the view using Openpyxl (but you could still write to it initially with Pandas if that is easier for you).
The code for appending in Pandas is:
# use ExcelWriter rather than using to_Excel directly in order to give access to the append & replace functions
with pd.ExcelWriter("data.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, 'My Data', index=False)
If you are using openpyxl directly, then workbook.create_sheet(sheet_name) will append a new sheet to an existing workbook.
You may find that you have to use win32com, a module which gives you access to some of the functionality that vba has in Excel. The documentation for Views seems scarce though; all I could easily find where these two:
https://learn.microsoft.com/en-us/office/vba/api/excel.window.sheetviews
https://learn.microsoft.com/en-us/office/vba/api/excel.sheetviews
I am fairly new to programming and wrote a simple Python code to concatenate many Excel sheets into one big one to make processing it faster.
The Excel sheets contain a name and are then processed by a web application to compare the entries with existing data in a database. Entries that are in the Excel file and the database are then displayed. So far, so good.
I have written the following code:
import openpyxl as xl
import glob
import pandas as pd
inputpath="C:/Users/Me/myinputfolder"
filenames=glob.glob(inputpath+"/*.xlsx")
files=[]
#I only need to use the first two columns
fields=[0,1]
# create new data frame
output=pd.DataFrame(columns=['Name', 'ID(*)','Alternative Name','Version', 'Nationality', 'State', 'City'])
for file in filenames:
files.append(pd.read_excel(file, sheet_name="Data", usecols=fields,
names=["Name", "ID(*)"], skiprows=[i for i in range(1,5)]))
for excl_file in files:
output=pd.concat([output,excl_file],ignore_index=True)
outpath="C:/Users/Me/outputfolde/Output.xlsx"
output.to_excel(outpath, index=False, sheet_name="Data")
This code runs just fine. It takes all xlsx files from a specified folder and generates a new xlsx with the data I want. The only problem occurs when I upload it to the web application. The sheet uploads just fine and now errors are displayed but it seems like the application cannot read any data from the sheet. Once I manually clear all formatting in the output.xlsx file, it works.
I was wondering if there is a way to implement the "clear formatting" in my code as well.
Here is what I added to the end of my previous code:
ws = wb.worksheets[0]
for row in ws.iter_rows():
for cell in row:
cell.style='Normal'
wb.save(outpath)
Seemingly, this also looks fine and just like a file with manually cleared formatting but it still cannot be read by the application. Does anyone have any ideas why this still does not work?
Unfortunately, I do not have any information on how the web application reads/processes the data and the person in charge unfortunately does not reply. I would appreciate any ideas/suggestions to solve my issue.
I have created a python script that parses some data and uses it to generate a few worksheet excel files (xlsx) (I am not much knowledgeable in excel by the way).
The original worksheet i am reading the data from has a main sheet used to fill in lots of info which is then distributed across many other sheets, which perform some formulas to calculate some results.
My python script does the processing i want by automatically filling in this main sheet, which then produces the results of the other sheets. I then save the worksheet and eveything looks fine.
Now, I want to split those worksheets into individual ones without including the main sheet in any of them. This appears to be pretty challenging.
I first tried using the data_only argument of load_workbook, but I quickly discovered that in order to preserve the values and not the formulas in the spreadsheet(or just get None back), I'd have to manually open and save each one of the files so that a temporary cache is created. That won't really do it in my case since i want the whole thing to be automated.
Among other things, the one that came closer is this piece of code:
workbook = xw.Book('generatedFiles/generated.xlsx')
sheet1 = workbook.sheets['PRIM'].used_range.value
df = pd.DataFrame(sheet1)
df.to_excel('generatedFiles/fixed-generated.xlsx', index = False, header = False)
This does indeed generate a spreadsheet with the values, using its pandas dataframe, but the problem is that it doesn't preserve any information about the types of the values. So for example, an integer is being treated as a string when saved.
This messes the processing being done by an external parser that I feed these files with.
Any ideas on how to fix that ? What would be the best way to go about doing something like that ?
Thanks !
I am new to Py. I need to write data to, read data from, and run a few VBA macros within an excel book. Would rather the book never be opened (non-graphical), but I'll take a "remote control" approach if that's what it takes. I installed Openpyxl and tried to load in the complicated xlsm book, and it complained "Data Validation Extension" is not supported. I was able to read a value from a cell anyway, but not able to write a new value into a cell. And when i saved the new book, it was half the size it should be, and excel couldn't open it due to it being "corrupt".
Is there a more robust way to do this? Maybe i just need a couple load options?
I used: wb = load_workbook(filename = 'myBook.xlsm')
Thanks in advance for any help :)
you can try to use pandas lib- https://pandas.pydata.org/
import the library-
import pandas as pd
data_frame = pd.read_excel("myBook.xlsm")
pandas has lots of helpful methods.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Went through some trial-n-error using online documentation (xlrd, etc) but having no luck that accomplishes all of the below:
I have 1 xlsx workbook with several sheets and each sheet has several cells with static values and formulae. I want to read these sheets into pandas, generate new dataframes and then update the same workbook. However, I want to update only certain cells and in a way that retains the formula of the corresponding cells after the update. I want to update the same workbook and not create a new excel file.
What's the most reliable way to accomplish this? Any guidelines would be much appreciated. Thank You.