I've been scouring the net to find a Python library or tool that can converts an Excel file to/from ODS format, but haven't been able to come across anything.
I need the ability to input and output data in either format. We don't need to worry about merged cells, formulas or anything non-straightforward.
If you have libreoffice installed, you can do a python execution wrapper around its headless mode:
$ /usr/bin/libreoffice --headless --invisible -convert-to ods /home/cwgem/Downloads/QTL_Sample_data.xls
convert /home/cwgem/Downloads/QTL_Sample_data.xls -> /home/cwgem/QTL_Sample_data.ods using OpenDocument Spreadsheet Flat XML
$ /usr/bin/libreoffice --headless --invisible -convert-to xls /home/cwgem/QTL_Sample_data.ods
convert /home/cwgem/QTL_Sample_data.ods -> /home/cwgem/QTL_Sample_data.xls using
Which would be a bit easier than trying to do it through the library route.
I succeeded to convert an xlsx file to an ods file with this method :
install pyexcel-xlsx
install pyexcel-ods3
And use the following code :
from pyexcel_ods3 import save_data
from pyexcel_xlsx import get_data
dataXlsx = get_data("file.xlsx")
save_data("file.ods", dataXlsx)
Attention : the colors/design of the xlsx file is removed in the ouput ods file...so that it is not a real success.
Related
I would like to convert tables in PDF to Excel. I realize Adobe Acrobat Pro has this functionality - I would just like to program this because I have many files.
Subhobroto's reply in this post explains how to do this in python (just subbing xls for word), but it's for a Windows version of python. What would be the analogous way to connect to Adobe Acrobat using Python3 on a Mac?
def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
avDoc = Dispatch("AcroExch.AVDoc") # this is the line I need to sub with a method from another module
# Open the input file (as a pdf)
ret = avDoc.Open(f_path, f_path)
assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
I found several questions that were similar to mine, but none of the answers came close to what I need.
Specifications: I'm working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.
I'm using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.
So, I've got a .doc file that I need to convert to .docx. This script might have to run in the cloud so I can't install any kind of Office or Office-like software. Can this be done?
You are working with Linux/ubuntu, you can use LibreOffice’s inbuilt converter.
SYNTAX
lowriter --convert-to docx *.doc
Example
lowriter --convert-to docx testdoc.doc
This will convert all doc files to docx and save in the same folder itself.
You could use unoconv - Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.
unoconv -d document --format=docx *.doc
subprocess.call(['unoconv', '-d', 'document', '--format=docx', filename])
Aspose.Words Cloud SDK for Python can convert DOC to DOCX. The package can open, generate, edit, split, merge, compare and convert a Word document in Python on any platform without depending on MS Word.
It is a paid product, but the free plan provides 150 free monthly API calls.
P.S: I'm developer evangelist at Aspose.
# Import module
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile
# Get your credentials from https://dashboard.aspose.cloud (free registration is required).
words_api = asposewordscloud.WordsApi(app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx',app_key='xxxxxxxxxxxxxxxxxxxxxxxxx')
words_api.api_client.configuration.host = 'https://api.aspose.cloud'
filename = 'C:/Temp/02_pages.doc'
dest_name = 'C:/Temp/02_pages.docx'
#Convert RTF to text
request = asposewordscloud.models.requests.ConvertDocumentRequest(document=open(filename, 'rb'), format='docx')
result = words_api.convert_document(request)
copyfile(result, dest_name)
import aspose.words as aw
path1="doc file path"
path2="path to save converted file"
file2=file.rsplit('.',1)[0]+'.docx'
filename1=os.path.join(path2,file2)
filename=os.path.join(path1,file)
doc = aw.Document(filename)
doc.save(filename1)
First you will need to be using Windows. If that is an acceptable barrier then please read on....
Next you need to install the Microsoft Office Compatibility Pack.
Now download and install the Microsoft Office Migration Planning Manager.
To run the tool you need to create a .ini file that controls the program. An example .ini file and further information is available on this blog post.
There is more detailed information from Microsoft here.
I have been provided with a xlsb file full of data. I want to process the data using python. I can convert it to csv using excel or open office, but I would like the whole process to be more automated. Any ideas?
Update: I took a look at this question and used the first answer:
import subprocess
subprocess.call("cscript XlsToCsv.vbs data.xlsb data.csv", shell=False)
The issue is the file contains greek letters so the encoding is not preserved. Opening the csv with Notepad++ it looks as it should, but when I try to insert into a database comes like this ���. Opening the file as csv, just to read text is displayed like this:
\xc2\xc5\xcb instead of ΒΕΛ.
I realize it's an issue in encoding, but it's possible to retain the original encoding converting the xlsb file to csv ?
I've encountered this same problem and using pyxlsb does it for me:
from pyxlsb import open_workbook
with open_workbook('HugeDataFile.xlsb') as wb:
for sheetname in wb.sheets:
with wb.get_sheet(sheetname) as sheet:
for row in sheet.rows():
values = [r.v for r in row] # retrieving content
csv_line = ','.join(values) # or do your thing
Most popular Excel python packages openpyxl and xlrd have no support for xlsb format (bug tracker entries: openpyxl, xlrd).
So I'm afraid there is no native python way =/. However, since you are using windows, it should be easy to script the task with external tools.
I would suggest taking look at Convert XLS to XLSB Programatically?. You mention python in title but the matter of the question does not imply you are strongly coupled to it, so you could go pure c# way.
If you feel really comfortable only with python one of the answers there suggests a command line tool under a fancy name of Convert-XLSB. You could script it as an external tool from python with subprocess.
I know this is not a good answer, but I don't think there is better/easier way as of now.
In my previous experience, i was handling converting xlsb using libreoffice command line utility,
In ruby i just execute system command to call libreoffice for converting xlsb format to csv:
`libreoffice --headless --convert-to csv your_xlsb_file.xlsb --outdir /path/csv`
and to change the encoding i use command line to using iconv, using ruby :
`iconv -f ISO-8859-1 -t UTF-8 your_csv_file.csv > new_file_csv.csv`
I also looked at the problem and the following worked for me. First opening the file in excel via python and than saving it to different file. Bit of a workaround but I like it more than other solutions. In example I use file format 6 which is CSV but you can also use other ones.
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.DisplayAlerts = False
excel.Visible=False
doc = excel.Workbooks.Open("C:/users/A295998/Python/#TA1PROG3.xlsb")
doc.SaveAs(Filename="C:\\users\\A295998\\Python\\test5.csv",FileFormat=6)
doc.Close()
excel.Quit()
XLSB is a binary format and I don't think you'll be able to parse it with current python tools and packages. If you still want to somehow automate the process with python you can do what the others have told you and script that windows CLI tool. Calling the .exe from the command line with subprocess, and passing an array of the files you want to convert.
I.e: with a script similar to this one you could convert all the .xlsb files that you place in the "xlsb" folder to .csv format...
├── xlsb
│ ├── file1.xlsb
│ ├── file2.xlsb
│ └── file3.xlsb
└── xlsb_to_csv.py
xlsb_to_csv.py
#!/usr/bin/env python
import os
files = [f for f in os.listdir('./xlsb')]
for f in files:
subprocess.call("ConvertXLS.EXE " + str(f) + " --arguments", shell=True)
Note: the Windows command is pseudocode... I use a similar approach to batch-convert stuff in headless windows servers for testing purpouses. You just have to figure out the exe location and the windows command...
Hope it helps... good luck!
I think you can do this using pyuno. This blog entry shows how to convert xls files to csv, and as open office supports xlsb files since version 3.2, this code might just work for you. You will have to go through hassle of setting up the pyuno environment though..
The script you reference seem to use the ActiveX interface to Excel, and save via its Workbook.SaveAs method.
According to the MSDN documentation this method have a TextCodepage argument which may be helpful.
Sidenote: You can rewrite the VB script in python, see this question.
I am trying to use the appdailysales.py module to download daily our iPhone apps. I am a .NET developer, so I tried running this using IronPython in a C# solution using the following code:
using IronPython.Hosting;
var ipy = Python.CreateRuntime();
dynamic appSales = ipy.UseFile("appdailysales.py");
appSales.main();
Because I didn't have gzip, I took out the references to that module. I was going to use the GZipStream C# class to decompress the file (Apple, provides their downloads as .gz files). So, I commented out lines 75 and 429-435.
I have tried executing appdailysales.py in my C# solution, directly from IronPython and using Python 2.7 (installed ActivePython last night); all with the same results: When I try to open the .gz file using 7zip, I get the following error:
CRC Failed ... file is broken
When I try using the GZipStream class I get:
The CRC in GZip footer does not match the CRC calculated from the decompressed data
If I download the .gz file manually, I can decompress the file just fine using 7Zip or GZipStream.
I am fluent in C#, but new to Python. Any help you can provide would be much appreciated.
Thanks for your time.
Looks like line 444 is the problem. Here are lines 444-446:
downloadFile = open(filename, 'w')
downloadFile.write(filebuffer)
downloadFile.close()
At this stage, IF you have deleted lines 429-435 OR selected not to unzip, then filebuffer refers to the raw gzipped stream that you got from the web. The output file is opened in TEXT mode, and you are on Windows, so every \n in the BINARY gzipped stream will be converted to \r\n -- CORRUPTION, like the error message said.
So: for the module to be used portably on both Windows and other platforms, the open mode must be "wb" (b for binary). If the gunzipped result file is also a binary file, "wb" can be hardcoded in the open call. However if the gunzipped file is a text file (meant to be capable of being opened in a text editor), then you need just "w" for that purpose, and you should set a variable mode to either "wb" or "w" as appropriate, and use mode in the open call.
Big question: I understand why you removed the gzip references for IronPython usage. Did you remove those lines for Python 2.7? Or did you run it under Python 2.7 with those lines still in, but set options.unzipFile to False?
I've got a bunch of FoxPro (VFP9) DBF files on my Ubuntu system, is there a library to open these in Python? I only need to read them, and would preferably have access to the memo fields too.
Update: Thanks #cnu, I used Yusdi Santoso's dbf.py and it works nicely. One gotcha: The memo file name extension must be lower case, i.e. .fpt, not .FPT which was how the filename came over from Windows.
I prefer dbfpy. It supports both reading and writing of .DBF files and can cope with most variations of the format. It's the only implementation I have found that could both read and write the legacy DBF files of some older systems I have worked with.
I was able to read a DBF file (with associated BAK, CDX, FBT, TBK files**) using the dbf package from PyPI http://pypi.python.org/pypi/dbf . I am new to python and know nothing about DBF files, but it worked easily to read a DBF file from my girlfriend's business (created with a music store POS application called AIMsi).
After installing the dbf package (I used aptitude and installed dbf version 0.88 I think), the following python code worked:
from dbf import *
test = Table("testfile.dbf")
for record in test:
print record
x = raw_input("") # to pause between showing records
That's all I know for now, but hopefully it's a useful start for someone else who finds this question!
April 21, 2012 SJK Edit: Per Ethan Furman's comment, I should point out that I actually don't know which of the data files were necessary, besides the DBF file. The first time I ran the script, with only the DBF available, it complained of a missing support file. So, I just copied over the BAK, CDX, FPT (not FBT as I said before edit), TBK files and then it worked.
If you're still checking this, I have a GPL FoxPro-to-PostgreSQL converter at https://github.com/kstrauser/pgdbf . We use it to routinely copy our tables into PostgreSQL for fast reporting.
You can try this recipe on Active State.
There is also a DBFReader module which you can try.
For support for memo fields.
Check out http://groups.google.com/group/python-dbase
It currently supports dBase III and Visual Foxpro 6.0 db files... not sure if the file layout change in VFP 9 or not...
It's 2016 now and I had to fiddle with the dbf package to get it to work. Here is a python3 version to just export a dbf file to a csv
import dbf
d=dbf.Table('mydbf.dbf')
d.open()
dbf.export(d, filename='mydf_exported.csv', format='csv', header=True)
I had some unicode error at first, but got around that by turning off memos.
import dbf
d=dbf.Table('mydbf.dbf', ignore_memos=True)
d.open()
dbf.export(d, filename='mydf_exported.csv', format='csv', header=True)