I've been looking for ages to find a suitable module to interact with excel, which needs to do the following:
Check a column of cells for an "incorrect" value and change it
Check for empty cells, and if so, replace it
Check a cell value is consistent with the contents of another cell(for example, if called Datasheet, the code in another cell = DS)and if not, change it.
I've looked at openpxyl but I am running Python 3 and I can only seem to find it working for 2.
I've seen a few others but they seem to be mainly focusing creating a new spreadsheet and simple writing/reading.
The Pandas library is amazing to work with excel files. It can read excel files easily and you then have access to a lot of tools. You can do all the operations you mentionned above. You can also save your result in the excel format
Related
I plan to make an educational web game. I have thousands of trivia questions I need to write down in a way that can be easily transferred out and automatically organized based on their column, at a later date.
I was suggested to use google sheets so I can later export as a .csv, and that should be easy to work with for a developer. When i exported a .csv and opened it in Panda python the a column was cut off and 1 column was used as a 'header', not just a normal entry https://imgur.com/a/olcpVO8. This obviously wont work and seems to be an issue.
Should I just leave the first row and column empty and work around the issue? I don't want to write thousands of sets only to find out I did this the wrong way. Can anyone give any insight into whether this is my best option and how I should best format it?
I have to write Questions(1), Answers(4), Explanations(1) per entry
I hope this makes sense, thanks for your time.
I tried doing this and have no issue at all using the exported CSV from Google Sheets, using the same data as in your example.
In my opinion, whatever software you're using in your second screenshot is your issue, it seems like its removing numbers from the first row because that should be your header row. Check around in your software for options like, "First column contains headers" or "Use row 1 as Header" and make sure these aren't being used.
We have a rather complicated Excel based VBA Tool that shall be replaced by a proper Database and Python based application step by step.
There will be time of the transition between were the not yet completely ready Python tool and the already existing VBA solution will coexist.
To allow interoperability the Python tool must be able to export the database values into the Excel VBA Tool keeping it intact. Meaning that not only all VBA codes have to work as expected but also Shapes, Special Formats etc, Checkboxes etc. have to work after the export.
Currently a simple:
from openpyxl import load_workbook
wb = load_workbook(r'Tool.xlsm', keep_vba=True)
# Write some data i.e. (not required to destroy the file)
wb["SomeSheet!SomeCell"] = "SomeValue"
wb.save(r"Tool_filled.xlsm")
will destroy the file, i.e. shapes won't work, checkboxes neither. (The resulting file is only 5 MB from originally 8 MB, showing that something went quite wrong).
Is there a way to only modify only the data of an ExcelSheet keeping everything else intact/untouched?
As far I know an Excel Sheet are only zipped .xml files. So it should be possible to edit only the related sheets? Correct?
Is there a more comfortable way as writing everything from scratch to only modify the data of an existing Excel file?
Note: The solution has to work in Linux, so simple remote Excel calls are not an option.
I got an excel file that has four sheets. One sheet, sheet 4. contains data in simple CSV and the others read the data of this sheet and make different calculations and graphs. In my python application I would like to open the excel file, open sheet 4, and replace the data. I know you technically can't open and edit excel however you like with Python, due to the complex file structure of XLS (previous relevant answer), but is there a work around for this specific case? Remember the only thing I want to do is to open the data sheet, write to it, and ignore the others...
Note: Previous answers to relevant questions have suggested using the copy function in xlutils. But that doesn't work in this case, as the rest of the sheets are rather complex. The graphs, for example, can't be preserved with the copy function.
I used to use pyExcelerator. It did certainly a good job, but I'm not sure if it is maintained.
https://pypi.python.org/pypi/pyExcelerator/
hth.
Two-headed question here guys,
First, I've been trying to do some searching for a way to read .xlsx files in python. Does xlrd read .xlsx files now? If not, what's the recommended way to read/write to such a file?
Second, I have two files with similar information. One primary field with scoping subfields (like coordinates(the primary field) -> city -> state -> country). In the older file, the information is given an ID number while the newer file (with records deleted/added) does not have these ID's. In python, I'd 1) open the two files 2) check the primary field of the older file against the primary field of the newer file and merge their information to a new file if they match. Given that its not too big of a file, I don't mind the O(n^2) complexity. My question is this: is there a well-defined way to do this in VBA or excel? Everything I think of using excel's library seems too slow and I'm not excellent with VBA.
I frequently access excel files through python and xlrd, python and the Excel COM object. For this job, xlrd won't work because it does not support the xlsx format. But no matter, both approaches are overkill for what you are looking for. Simple Excel formulas will deliver what you want, specifically VLOOKUP.
VLOOKUP "looks for a value in the lefmost column of a table, and then returns a value in the same row from the column you specify".
Some advice on VLOOKUP, First, if you want to match on multiple cells, create a "key" cell which concatenates the cells you are interested in (in both workbooks). Second, make sure to set the last argument to VLOOKUP as FALSE because you will only want exact matches.
Regarding performance, excel formulas are often very fast.
Read the help file on VLOOKUP and ask further questions here.
Late edit (from Mark Baker's answer): There is now a python solution for xlsx. Openpyxl was created this year by Eric Gazoni to read and write Excel's xlsx format.
I only heard about this project this morning, so I've not had an opportunity to look at it, and have no idea what it's like; but take a look at Eric' Gazoni's openpyxl project. The code can be found on bitbucket. The driving force behind this was the ability to read/write xlsx files from Python.
Try http://www.python-excel.org/
My mistake - I missed the .xlsx detail.
I guess it's a question of what's easier: finding or writing a library that handles .xlsx format natively OR save all the Excel spreadsheets as .xls and get on with it with the libraries that merely handle the older format.
Adding on the answer of Steven Rubalski:
You might want to be able to have your lookup value in any other than the leftmost column. In those cases the Index and Match functions come in handy.
See: http://www.mrexcel.com/articles/excel-vlookup-index-match.php
Python communicating with EXCEL... i need to find a way so that I can find/search a row for given column datas. Now, i m scanning entire rows one by one... It would be useful, If there is some functions like FIND/SEARCH/REPLACE .... I dont see these features in pyExcelerator or xlrd modules.. I dont want to use win32com modules! it makes my tool windows based!
FIND/SEARCH Excel rows through Python.... Any idea, anybody?
#John Fouhy: [I'm the maintainer of xlwt, and author of xlrd]
The spreadsheet-reading part of pyExcelerator was so severely deprecated that it vanished completely out of xlwt. To read any XLS files created by Excel 2.0 up to 11.0 (Excel 2003) or compatible software, using Python 2.1+, use xlrd
That "simple optimi[sz]ation" isn't needed with xlrd:
import xlrd
book = xlrd.open_workbook("foo.xls")
sheet = book.sheet_by_number(0) # alternatively: sheet_by_name("Budget")
for row_index in xrange(sheet.nrows):
for col_index in xrange(sheet.ncols):
"Now, i m scanning entire rows one by one"
What's wrong with that? "search" -- in a spreadsheet context -- is really complicated. Search values? Search formulas? Search down rows then across columns? Search specific columns only? Search specific rows only?
A spreadsheet isn't simple text -- simple text processing design patterns don't apply.
Spreadsheet search is hard and you're doing it correctly. There's nothing better because it's hard.
You can't. Those tools don't offer search capabilities. You must iterate over the data in a loop and search yourself. Sorry.
With pyExcelerator you can do a simple optimization by finding the maximum row and column indices first (and storing them), so that you iterate over (row, i) for i in range(maxcol+1) instead of iterating over all the dictionary keys. That may be the best you get, unless you want to go through and build up a dictionary mapping value to set of keys.
Incidentally, if you're using pyExcelerator to write spreadsheets, be aware that it has some bugs. I've encountered one involving writing integers between 230 and 232 (or thereabouts). The original author is apparently hard to contact these days, so xlwt is a fork that fixes the (known) bugs. For writing spreadsheets, it's a drop-in replacement for pyExcelerator; you could do import xlwt as pyExcelerator and change nothing else. It doesn't read spreadsheets, though.