Merging .xlsx file with Python - python

Two-headed question here guys,
First, I've been trying to do some searching for a way to read .xlsx files in python. Does xlrd read .xlsx files now? If not, what's the recommended way to read/write to such a file?
Second, I have two files with similar information. One primary field with scoping subfields (like coordinates(the primary field) -> city -> state -> country). In the older file, the information is given an ID number while the newer file (with records deleted/added) does not have these ID's. In python, I'd 1) open the two files 2) check the primary field of the older file against the primary field of the newer file and merge their information to a new file if they match. Given that its not too big of a file, I don't mind the O(n^2) complexity. My question is this: is there a well-defined way to do this in VBA or excel? Everything I think of using excel's library seems too slow and I'm not excellent with VBA.

I frequently access excel files through python and xlrd, python and the Excel COM object. For this job, xlrd won't work because it does not support the xlsx format. But no matter, both approaches are overkill for what you are looking for. Simple Excel formulas will deliver what you want, specifically VLOOKUP.
VLOOKUP "looks for a value in the lefmost column of a table, and then returns a value in the same row from the column you specify".
Some advice on VLOOKUP, First, if you want to match on multiple cells, create a "key" cell which concatenates the cells you are interested in (in both workbooks). Second, make sure to set the last argument to VLOOKUP as FALSE because you will only want exact matches.
Regarding performance, excel formulas are often very fast.
Read the help file on VLOOKUP and ask further questions here.
Late edit (from Mark Baker's answer): There is now a python solution for xlsx. Openpyxl was created this year by Eric Gazoni to read and write Excel's xlsx format.

I only heard about this project this morning, so I've not had an opportunity to look at it, and have no idea what it's like; but take a look at Eric' Gazoni's openpyxl project. The code can be found on bitbucket. The driving force behind this was the ability to read/write xlsx files from Python.

Try http://www.python-excel.org/
My mistake - I missed the .xlsx detail.
I guess it's a question of what's easier: finding or writing a library that handles .xlsx format natively OR save all the Excel spreadsheets as .xls and get on with it with the libraries that merely handle the older format.

Adding on the answer of Steven Rubalski:
You might want to be able to have your lookup value in any other than the leftmost column. In those cases the Index and Match functions come in handy.
See: http://www.mrexcel.com/articles/excel-vlookup-index-match.php

Related

Is it possible to use Python XlsxWriter to write a stock data type to a cell?

I have an interest in utilizing the stock quote feature of excel while also creating the file using the xlsxwriter libray in Python. I am familiar with how to write and format text using xlsxwriter but I do not see any option to create a file with certain cells already set to have a stock data type. To be clear, the link from microsoft below basically summarizes the manual process I'm looking to have taken care of in the excel sheet before an actual user ever opens up the file.
https://support.microsoft.com/en-us/office/get-a-stock-quote-e5af3212-e024-4d4c-bea0-623cf07fbc54
I am open to other python based solutions to this issue if the general consensus is that xlsxwriter doesn't support this feature. I really appreciate any advice here.
I am the author of XlsxWriter. I just looked into this and these aren't regular Excel formulas. They have a lot of of additional metadata and richdata helper files associated with them and even the company names aren't standard string types. So unfortunately these aren't, and probably won't be, supported.

Python 3 and Excel, Finding complex module to use

I've been looking for ages to find a suitable module to interact with excel, which needs to do the following:
Check a column of cells for an "incorrect" value and change it
Check for empty cells, and if so, replace it
Check a cell value is consistent with the contents of another cell(for example, if called Datasheet, the code in another cell = DS)and if not, change it.
I've looked at openpxyl but I am running Python 3 and I can only seem to find it working for 2.
I've seen a few others but they seem to be mainly focusing creating a new spreadsheet and simple writing/reading.
The Pandas library is amazing to work with excel files. It can read excel files easily and you then have access to a lot of tools. You can do all the operations you mentionned above. You can also save your result in the excel format

Automatic input from text file in excel

My problem is rather simple : I have an Excel Sheet that does calculations and creates a graph based on the values of two cells in the sheet. I also have two lists of inputs in text files. I would like to loop through those text files, add the values to the excel sheet, refresh the sheet, and print the resulting graph to a pdf file or an excel file named something like 'input1 - input2.xlsx'.
My programming knowledge is limited, I am decent with Python and have looked into python libraries that work with excel such as openpyxl, however most of those don't seem to work for me for various reasons. Openpyxl deletes the graphs when opening an excel file; XlsxWriter can only write files, not read from them; and xlwings won't work for me.
Should I use python, which I'm familiar with, or would VBA work for this kind of problem? Have any of you ever done something of the sort?
Thanks in advance
As a more transitional approach to what m. wasowski wrote above, I'd suggest you do the following.
Install the pandas package, and see how easy it is to load a file using read_excel. Then, read 10 Minutes to Pandas, and manipulate the data.
You state that the Excel sheet is complex. In general, the more complex it is, this approach will eventually make it simpler. But you don't have to switch everything immediately. You can still do parts in Excel and parts in pandas.
I think you should consider win32Com for excel operation in python instead of Openpyxl,XlsxWriter.
you can read/write excel, create chart and format excel file using win32com without any limitation.
And creating chart you can consider matplotlib, in that after creating chart you can save it in pdf file also.

How do I write to one sheet in an already existing excel sheet in Python?

I got an excel file that has four sheets. One sheet, sheet 4. contains data in simple CSV and the others read the data of this sheet and make different calculations and graphs. In my python application I would like to open the excel file, open sheet 4, and replace the data. I know you technically can't open and edit excel however you like with Python, due to the complex file structure of XLS (previous relevant answer), but is there a work around for this specific case? Remember the only thing I want to do is to open the data sheet, write to it, and ignore the others...
Note: Previous answers to relevant questions have suggested using the copy function in xlutils. But that doesn't work in this case, as the rest of the sheets are rather complex. The graphs, for example, can't be preserved with the copy function.
I used to use pyExcelerator. It did certainly a good job, but I'm not sure if it is maintained.
https://pypi.python.org/pypi/pyExcelerator/
hth.

pyExcelerator or xlrd - How to FIND/SEARCH a row for the given few column data?

Python communicating with EXCEL... i need to find a way so that I can find/search a row for given column datas. Now, i m scanning entire rows one by one... It would be useful, If there is some functions like FIND/SEARCH/REPLACE .... I dont see these features in pyExcelerator or xlrd modules.. I dont want to use win32com modules! it makes my tool windows based!
FIND/SEARCH Excel rows through Python.... Any idea, anybody?
#John Fouhy: [I'm the maintainer of xlwt, and author of xlrd]
The spreadsheet-reading part of pyExcelerator was so severely deprecated that it vanished completely out of xlwt. To read any XLS files created by Excel 2.0 up to 11.0 (Excel 2003) or compatible software, using Python 2.1+, use xlrd
That "simple optimi[sz]ation" isn't needed with xlrd:
import xlrd
book = xlrd.open_workbook("foo.xls")
sheet = book.sheet_by_number(0) # alternatively: sheet_by_name("Budget")
for row_index in xrange(sheet.nrows):
for col_index in xrange(sheet.ncols):
"Now, i m scanning entire rows one by one"
What's wrong with that? "search" -- in a spreadsheet context -- is really complicated. Search values? Search formulas? Search down rows then across columns? Search specific columns only? Search specific rows only?
A spreadsheet isn't simple text -- simple text processing design patterns don't apply.
Spreadsheet search is hard and you're doing it correctly. There's nothing better because it's hard.
You can't. Those tools don't offer search capabilities. You must iterate over the data in a loop and search yourself. Sorry.
With pyExcelerator you can do a simple optimization by finding the maximum row and column indices first (and storing them), so that you iterate over (row, i) for i in range(maxcol+1) instead of iterating over all the dictionary keys. That may be the best you get, unless you want to go through and build up a dictionary mapping value to set of keys.
Incidentally, if you're using pyExcelerator to write spreadsheets, be aware that it has some bugs. I've encountered one involving writing integers between 230 and 232 (or thereabouts). The original author is apparently hard to contact these days, so xlwt is a fork that fixes the (known) bugs. For writing spreadsheets, it's a drop-in replacement for pyExcelerator; you could do import xlwt as pyExcelerator and change nothing else. It doesn't read spreadsheets, though.

Categories