I have spreadsheet data in OpenOffice Calc that I would like to send to a python file.
The spreadsheet contains strings that use some odd fonts, and OpenOffice displays them properly. I can't find a way to get the font information into a .csv export; I would be just as happy to find a python method of extracting data from a spreadsheet file.
Any ideas?
CSV files don't store font meta data. Instead, you should try to convert the stored spreadsheet into an XLS format (Or another format of your choice).
Then, you can use python libraries that support the format such as:
http://www.python-excel.org/
Related
I am relatively new to python and I am trying to read information from an excel sheet to generate a graph. So far I am using the most current version of the xlrd library (0.9.4) in a nested for loop to grab the value from each cell. However, I am unsure how to access the formatting information for each cell
For example, if a cell were formatted to display as currency in the excel file, using the standard sheet.cell(row, column).value from xlrd would only return 5.0 instead of $5.00
I found here that you can set the formatting_info parameter to true when opening the workbook in order to see some of the format information, however I am primarily using excel 2013 and my excel sheets are being saved by default as .xlsx files. According to this issue on GitHub, support for formatting_info has not yet been implemented for .xlsx files.
Is there any way around using the formatting_info flag, or any other way that I can detect when a format, currency specifically, has been used in order to reflect that in my graphs? I am aware that it is possible to convert .xlsx files to .xls files such as shown here, but I am concerned about information/formatting loss.
I have a very large (> 2 million rows) csv file that is being generated and viewed in an internal web service. The problem is that when users of this system want to export this csv to run custom queries, they open these files in excel. Excel is formatting the numbers the best it can, but there are some requests to have the data in xlsx format with filters and whatnot.
The question boils down to: Using python2.7, how can I read a large csv file (>2 million rows) into excel (or multiple excel files) and control the formatting? (dates, numbers, autofilters, etc)
I am open to python and internal excel solutions.
Without more information about the data types in the csv, or your exact issue with EXCEL properly handling those data types, it's hard to give you an exact answer.
However, recommending looking at this module (https://xlsxwriter.readthedocs.org/) which can be used in Python to create xlsx files. I haven't used it, but it seems to have more features than you need.
Especially if you need to split between multiple files, or workbooks. And it looks like you can pre-create the filters and have total control over the formating
I have created .xls using xlwt module.
Now i want to convert this newly created xls file to pdf. Can anyone please tell me how can i do this through python scripting. Am using python 2.6 version.
There are at least two possibilities here: either you read the .xls back in as cells and generate a PDF, or maybe you want to print the .xls to .pdf from the program that just created the .xls.
In the first case you might want to consider writing the .pdf directly (you have the data to make the .xls in the program at some point). I have a grid class that can be written out in HTML format, or to .xls' (usingxlwt),.xlsm(usingopenpyxl) and to.pdf(usingreportlab`).
You will need to use the second option if you write formula and that kind of stuff in your .xls and want to have the calculated results in your .pdf. In that case use subprocess to call Microsoft Excel or OpenOffice/LibreOffice Calc with the right commandline parameters for printing/converting the file to PDF.
E.g. for LibreOffice 4.0 the commandline would be:
scalc --convert-to pdf yourfile.xls
which will result in a yourfile.pdf
What I need to know is, can I get Python to read a spreadsheet (preferably Microsoft Excel), then parse the information and input it into an equation?
It's for a horse-racing program, where the information for several horses will be in one excel spreadsheet, in different rows or columns. I need to know if I can run a calculation for each of those horses separately and then calculate a score for the given horse.
My suggestion is:
Save the Excel file as a csv comma separated value file, which is a plain text format and much easier to work with.
Use Python's built-in csv module to work with the data in csv format.
You can work with Excel files directly in Python (Excel 2003 format supported via the third party modules xlwt, xlrd) but this is much harder than working with CSV.
OpenPyXL ("A Python library to read/write Excel 2007 xlsx/xlsm files") has a very nice and Pythonic API.
Use xlrd package. It's on PyPI, so you can just easy_install xlrd
You can export the spreadsheet as a .csv and read it in as a text file, then process it. I have a niggling feeling there might even a CSV parsing python library.
AFAIK there isn't a .xls parser, although I might be wrong.
EDIT: I was wrong: http://www.python-excel.org/
I need to combine several tab-separated value (TSV) files into an Excel 2007 (XLSX) spreadsheet, preferably using Python. There is not much cleverness needed in combining them - just copying each TSV file onto a separate sheet in Excel will do. Of course, the data needs to be split into columns and rows same as Excel does when I manually copy-paste the data into the UI.
I've had a look at the raw XML file Excel 2007 generates and it's huge and complex, so writing that from scratch doesn't seem realistic. Are there any libraries available for this?
Looks like xlwt may serve your needs -- you can read each TSV file with Python's standard library csv module (which DOES do tab-separated as well as comma-separated etc, don't worry!-) and use xlwt (maybe via this cheatsheet;-) to create an XLS file, make sheets in it, build each sheet from the data you read via csv, etc. Not sure about XLSX vs plain XLS support but maybe the XLS might be enough...?
The best python module for directly creating Excel files is xlwt, but it doesn't support XLSX.
As I see it, your options are:
If you only have "several", you could just do it by hand.
Use pythonwin to control Excel through COM. This requires you to run the code on a Windows machine with Excel 2007 installed.
Use python to do some preprocessing on the TSV to produce a format that will make step (1) easier. I'm not sure if Excel reads TSV, but it will certainly read CSV files directly.
Note that Excel 2007 will quite happily read "legacy" XLS files (those written by Excel 97-2003 and by xlwt). You need XLSX files because .....?
If you want to go with the defaults that Excel will choose when deciding whether each piece of your data is a number, a date, or some text, use pythonwin to drive Excel 2007. If the data is in a fixed layout such that other than a possible heading row, each column contains data that is all of one known type, consider using xlwt.
You may wish to approach xlwt via http://www.python-excel.org which contains an up-to-date tutorial for xlrd, xlwt, and xlutils.