What I need to know is, can I get Python to read a spreadsheet (preferably Microsoft Excel), then parse the information and input it into an equation?
It's for a horse-racing program, where the information for several horses will be in one excel spreadsheet, in different rows or columns. I need to know if I can run a calculation for each of those horses separately and then calculate a score for the given horse.
My suggestion is:
Save the Excel file as a csv comma separated value file, which is a plain text format and much easier to work with.
Use Python's built-in csv module to work with the data in csv format.
You can work with Excel files directly in Python (Excel 2003 format supported via the third party modules xlwt, xlrd) but this is much harder than working with CSV.
OpenPyXL ("A Python library to read/write Excel 2007 xlsx/xlsm files") has a very nice and Pythonic API.
Use xlrd package. It's on PyPI, so you can just easy_install xlrd
You can export the spreadsheet as a .csv and read it in as a text file, then process it. I have a niggling feeling there might even a CSV parsing python library.
AFAIK there isn't a .xls parser, although I might be wrong.
EDIT: I was wrong: http://www.python-excel.org/
Related
I have 20+ excel files in Japanese language. Most excel files are Microsoft Excel 2007+ and few them are in Microsoft Excel OOXML file type. I would like to convert these files to csv and load in Snowflake, but prior to converting to csv, I was wondering if there is any library or pre-built function that I can use in python to determine which delimiter, escape character might be better for particular file ? Please also note few excel file contains multiple sheets.
Thanks in advance for your time and efforts!
I dont really know what you mean by "right delimiter", if you want to detect which one is used, there is a library called detect_delimiter, if YOU want to choose a new delimiter the best approach is probably to choose one that is less likely to be used inside the data (% for example) to avoid splitting the data the wrong way. You can always upload the data as a pandas dataframe and then reconvert it to a csv after exploring which way is the optimal in your case.
Morning,
I have dynamic data which is updated either daily, weekly or monthly in excel (this is the only api link). However, for use in python, is it better to keep the data stored in excel or transfer it to SQLite and access it from there?
Or is there a more efficient way of managing this process?
thanks
It depends on what you really need (see below, formulae). KISS (Keep it stupid simple) way is often the good one.
Some Python API like xlwt and xlrd can read and write Excel files :
http://www.python-excel.org/
But xlwt and xlrd can't evaluate formulae. If you need formulae, try openpyxl http://openpyxl.readthedocs.org/en/2.5/
My problem is rather simple : I have an Excel Sheet that does calculations and creates a graph based on the values of two cells in the sheet. I also have two lists of inputs in text files. I would like to loop through those text files, add the values to the excel sheet, refresh the sheet, and print the resulting graph to a pdf file or an excel file named something like 'input1 - input2.xlsx'.
My programming knowledge is limited, I am decent with Python and have looked into python libraries that work with excel such as openpyxl, however most of those don't seem to work for me for various reasons. Openpyxl deletes the graphs when opening an excel file; XlsxWriter can only write files, not read from them; and xlwings won't work for me.
Should I use python, which I'm familiar with, or would VBA work for this kind of problem? Have any of you ever done something of the sort?
Thanks in advance
As a more transitional approach to what m. wasowski wrote above, I'd suggest you do the following.
Install the pandas package, and see how easy it is to load a file using read_excel. Then, read 10 Minutes to Pandas, and manipulate the data.
You state that the Excel sheet is complex. In general, the more complex it is, this approach will eventually make it simpler. But you don't have to switch everything immediately. You can still do parts in Excel and parts in pandas.
I think you should consider win32Com for excel operation in python instead of Openpyxl,XlsxWriter.
you can read/write excel, create chart and format excel file using win32com without any limitation.
And creating chart you can consider matplotlib, in that after creating chart you can save it in pdf file also.
I would like to try and make a program which does the following, preferably in Python, but if it needs to be C# or other, that's OK.
Writes data to an excel spreadsheet
Makes Excel recalculate formulas etc. in the modified spreadsheet
Extracts the results back out
I have used things like openpyxl before, but this obviously can't do step 2. Is the a way of recalculating the spreadsheet without opening it in Excel? Any thoughts greatly appreciated.
Thanks,
Jack
You need some sort of UI automation with which you can control a UI application such as Excel. Excel probably exposes some COM interface that you should be able to use for what you need. Python has the PyWin32 library which you should install, after which you'll have the win32com module available.
See also:
Excel Python API
Automation Excel from Python
If you don't necessarily have to work with Excel specifically and just need to do spreadsheet using Python, you might want to look at http://manns.github.io/pyspread/.
you could you pandas for reading in the data, using python to recalculate and then write the new files.
For pandas it's something like:
#Import Excel file
xls = pd.ExcelFile('path_to_file' + '/' + 'file.xlsx')
xls.parse('nyc-condominium-dataset', index_col='property_id', na_values=['NA'])
so not difficult. Here the link to pandas.
Have fun!
I need to combine several tab-separated value (TSV) files into an Excel 2007 (XLSX) spreadsheet, preferably using Python. There is not much cleverness needed in combining them - just copying each TSV file onto a separate sheet in Excel will do. Of course, the data needs to be split into columns and rows same as Excel does when I manually copy-paste the data into the UI.
I've had a look at the raw XML file Excel 2007 generates and it's huge and complex, so writing that from scratch doesn't seem realistic. Are there any libraries available for this?
Looks like xlwt may serve your needs -- you can read each TSV file with Python's standard library csv module (which DOES do tab-separated as well as comma-separated etc, don't worry!-) and use xlwt (maybe via this cheatsheet;-) to create an XLS file, make sheets in it, build each sheet from the data you read via csv, etc. Not sure about XLSX vs plain XLS support but maybe the XLS might be enough...?
The best python module for directly creating Excel files is xlwt, but it doesn't support XLSX.
As I see it, your options are:
If you only have "several", you could just do it by hand.
Use pythonwin to control Excel through COM. This requires you to run the code on a Windows machine with Excel 2007 installed.
Use python to do some preprocessing on the TSV to produce a format that will make step (1) easier. I'm not sure if Excel reads TSV, but it will certainly read CSV files directly.
Note that Excel 2007 will quite happily read "legacy" XLS files (those written by Excel 97-2003 and by xlwt). You need XLSX files because .....?
If you want to go with the defaults that Excel will choose when deciding whether each piece of your data is a number, a date, or some text, use pythonwin to drive Excel 2007. If the data is in a fixed layout such that other than a possible heading row, each column contains data that is all of one known type, consider using xlwt.
You may wish to approach xlwt via http://www.python-excel.org which contains an up-to-date tutorial for xlrd, xlwt, and xlutils.