Excel Big Data Calculation (PCA...) - python

I have to do some calculation on data stored in excel for my internship.
I am supposed to aggregate market datas (50 assets over 15 years) and do a Principal Component Analysis over the aggregated datas.
For the moment I have the market data in a worksheet, I save it as a tabulation separated text (like csv but with tabulation instead of commas). Then I read it with R and use some powerfull package to do the PCA. Finally, with R I create another tabulation separated text and read it trough excel. I now have datas and results in excel and I can plot everything I want.
The problem is that the process is not enough automated for my colleagues.
As they said, they want a button in excel which launch the PCA when clicked.
I've tried to install some Excel Package (Rexcel) which allow to use R function directly in excel. It's not working (a server problem) and not well documented. So I'm trying to find others way to do big calculation directly in excel. It seems that there is the same kind of package to use Python in Excel. I've also heard about other powerfull langage compatible with excel. The problem is that I can't install what I want on my computer (yeah I have to call an IT guy for every package I want to install...), so it already took me 2/3 days to try the R solution. This is also why i'm looking for a simple solution, my colleagues won't have 2/3 days to install some excel package to use my macro...
So i'm here to ask: what would be the easiest way to do PCA, using tools from other languages, directly in excel ?
Many thanks in advance

You can use the very handy executable Rscriptto launch automatically your R scripts.
Within VBA you create a macro where you type something like this :
retVal = Shell(MY_RSCRIPT_BAT, vbNormalFocus) ## vba code here
I assume that you can call a VBA macro from a button.
your MY_RSCRIPT_BAT , is .bat file where you type something like:
#echo off
C:
PATH R_PATH;%path%
cd DEMO_PATH
Rscript your_pca_script.R
exit

Related

Automating excel reporting and graphs - Python xlsxWriter/xlswings or Ruby axlsx/win32ole

I want to create a program, which automates excel reporting including various graphs in colours. The program needs to be able to read an excel dataset. Based on this dataset, the program then has to create report pages and graphs and then export to an excel file as well as pdf file.
I have done some research and it seems this is possible using python with pandas - xlsxWriter or xlswings as well as Ruby gems - axlsx or win32ole.
Which is the user-friendlier and easy to learn alternative? What are the advantages and disadvantages? Are there other options I should consider (I would like to avoid VBA - as this is how the reports are currently produced)?
Any responses and comments are appreciated. Thank you!
If you already have VBA that works for your project, then translating it to Ruby + WIN32OLE is probably your quickest path to working code. Anything you can do in VBA is doable in Ruby (if you find something you can't do, post here to ask for help).
I prefer working with Excel via OLE since I know the file produced by Excel will work anywhere I open it. I haven't used axlsx but I'm sure it's a fine project; I just wouldn't trust that it would produce working Excel files every time.

Modifying, recalculating and and extracting results from Excel in Python

I would like to try and make a program which does the following, preferably in Python, but if it needs to be C# or other, that's OK.
Writes data to an excel spreadsheet
Makes Excel recalculate formulas etc. in the modified spreadsheet
Extracts the results back out
I have used things like openpyxl before, but this obviously can't do step 2. Is the a way of recalculating the spreadsheet without opening it in Excel? Any thoughts greatly appreciated.
Thanks,
Jack
You need some sort of UI automation with which you can control a UI application such as Excel. Excel probably exposes some COM interface that you should be able to use for what you need. Python has the PyWin32 library which you should install, after which you'll have the win32com module available.
See also:
Excel Python API
Automation Excel from Python
If you don't necessarily have to work with Excel specifically and just need to do spreadsheet using Python, you might want to look at http://manns.github.io/pyspread/.
you could you pandas for reading in the data, using python to recalculate and then write the new files.
For pandas it's something like:
#Import Excel file
xls = pd.ExcelFile('path_to_file' + '/' + 'file.xlsx')
xls.parse('nyc-condominium-dataset', index_col='property_id', na_values=['NA'])
so not difficult. Here the link to pandas.
Have fun!

Is it possible create/edit an excel macro from python?

I'm currently working on a project that requires me to write a (seemingly endless) series of macros to reproduce pivot tables designed by one of our Analysts. The details change, but the code is largely the same from example to example.
I would like to programmaticly generate the vba code based on a handful of options and then add the macro to the given worksheet. Is it possible to create an excel macro with python using win32com or other? Is it possible to create an excel macro from another excel macro? Any other ideas?
Project Background, I have a python script that does the following:
pulls Google Analytic data
does various analysis
writes the results to excel
triggers a pre-written macro to turn the data into a beautifully formatted pivot table
emails it off to people who probably don't actually read it
Yes you can create an Excel macro with another Excel macro. For that to work, you need to tell Excel to Trust Access to the VBA Project Object model. (The setting is found in Macro Options in the Trust Center.) I don't know if you can do it from Python, but if you can, you probably also need to tell Excel it is ok.
For ease of coding, if you are doing this in Excel, add a reference to Micorsoft Visual Basic for Applications Extensibility.
You might also want to check out MZ-Tools 3.0 I find it very helpful for adding default\common code to a project.
On the other hand, your project sounds ripe for code reuse. If the common pivot table code is in one class/module, it is really easy to copy it from one open Excel project to another. (Just click and drag in the Project Explorer window.) You can also export it out to a text file and import it back in to another project later.

Excel Python API

Does anyone know of a way of accessing MS Excel from Python? Specifically I am looking to create new sheets and fill them with data, including formulae.
Preferably I would like to do this on Linux if possible, but can do it from in a VM if there is no other way.
xlwt and xlrd can read and write Excel files, without using Excel itself:
http://www.python-excel.org/
Long time after the original question, but last answer pushed it top of feed again. Others might benefit from my experience using python and excel.
I am using excel and python quite bit. Instead of using the xlrd, xlwt modules directly, I normally use pandas. I think pandas uses these modules as imports, but i find it much easier using the pandas provided framework to create and read the spreadsheets. Pandas's Dataframe structure is very "spreadsheet-like" and makes life a lot easier in my opinion.
The other option that I use (not in direct answer to your problem) is DataNitro. It allows you to use python directly within excel. Different use case, but you would use it where you would normally have to write VBA code in Excel.
there is Python library to read/write Excel 2007 xlsx/xlsm files http://pythonhosted.org/openpyxl/
I wrote python class that allows working with Excel via COM interface in Windows http://sourceforge.net/projects/excelcomforpython/
The class uses win32com to interact with Excel. You can use class directly or use it as example. A lot of options implemented like array formulas, conditional formatting, charts etc.
It's surely possible through the Excel object model via COM: just use win32com modules for Python. Can't remember more but I once controlled the Media Player through COM from Python. It was piece of cake.
Its actually very simple. You can actually run anything from any program. Just see a way to reach command prompt from that program. In case of Excel, create a user defined function by pressing Alt+F11 and paste the following code.
Function call_cmd()
Shell "CMD /C Notepad", vbNormalFocus
End Function
Now press ctrl+s and go back to Excel, select a cell and run the function =call_cmd(). Here I ran Notepad. In the same way, you can see where python.exe is installed and run it. If you want to pass any inputs to python, then save the cells as file in local directory as csv file and read them in python using os.system().

Importing Excel sheets, including formulae, into Django

I have an Excel spreadsheet with calculations I would like to use in a Django web application. I do not need to present the spreadsheet as it appears in Excel. I only want to use the formulae embedded in it. What is the best way to do this?
You can control Excel with Python via COM. See this thread: Driving Excel from Python in Windows
It might be a challenge to get this to work reliably as part of a Django app.
In addition to the COM solution, xlrd is cross-platform. That might be more suitable, since I believe Linux is still the most common deployment environment for django. It's also a lighter-weight solution than pyUno.
I think the only thing you can do is use some python/excel mechanism (the only one I could find was this: http://www.python-excel.org/; the tutorial makes me think it might be doable) to read and write from an excel spreadsheet.
You would write to certain cells that would be used by the spreadsheet formulas and then read the results from the formulas from other cells.
Django per-se has nothing to help you with this.
I'll retag your question to include python so that, maybe, someone with Python-excel experience can comment...
You need to use Excel to calculate the results? I mean, maybe you could run the Excel sheet from OpenOffice and use a pyUNO macro, which is somehow "native" python.
A different approach will be to create a macro to generate some more friendly code to python, if you want Excel to perform the calculation is easy you end up with a very slow process.

Categories