BadDataError when converting .DBF to .csv

BadDataError when converting .DBF to .csv - python

I am trying to convert a .DBF file to .csv using Python3. I am trying using the dbf library (https://pypi.python.org/pypi/dbf)
import dbf
def dbf_to_csv(dbf_file_name, csv_file_name):
dbf_file = dbf.Table(dbf_file_name, ignore_memos=True)
dbf_file.open()
dbf.export(dbf_file, filename = csv_file_name, format='csv', header=True)
The DBF file I am using can be opened in Excel and appears to be fine. However, when I run the above method I get an error on the dbf.export line above:
dbf.ver_33.BadDataError: record data is not the correct length (should be 1442, not 1438)
The dbf file opens fine in Excel, however, I need to automate this conversion. What should I be doing differently to get this method to create a pdf from a .DBF file?

If the file is opening correctly in Excel, then I suggest you use Excel to do the conversion for you to csv format as follows:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r"input.dbf")
excel.DisplayAlerts = False
wb.DoNotPromptForConvert = True
wb.CheckCompatibility = False
wb.SaveAs(r"output.csv", FileFormat=6, ConflictResolution=2)
excel.Application.Quit()
Do not forget to add full paths to the required files. Note, FileFormat=6 tells Excel to save the file in CSV format.
To export the workbook as a PDF, you could use:
wb.ExportAsFixedFormat(0, r"output.pdf")
If you do not already have win32com installed you should be able to use the following:
pip install pypiwin32
This solution is suitable for Windows installations only.

Related

Using Python convert excel sheet to different format

I have an Excel file.
I want to automatically open it and save it in HTML format.
The below code is not working as internded. The converted HTML is not readable
Kindly let me know how this can be done.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True
wb = excel.Workbooks.Open(r"Report.xlsx")
wb.Saveas("Report.html")
wb.Close()
excel.Quit()

You could use pandas to open the Excel file and then pandas will export to HTML and other formats.

Python: read .xls with formula inside

I need to transform an input file (.xls) with formula into an .xlsx file that has only the value/data of the formula.
-Openpyxl cant read xls files, but got the "data only" flag when reading the file.
-xlrd etc. can read xls files, but cant read these with "data only" flag like openpyxl can..
When I try to transform the xls file to an xlsx in python to open it with openpyxl afterwards, all the values with formula become "0".
Does anyone know how I can deal with this issue?

You can use xlwings
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df_from_excel('path to xls')

How to convert Excel SpreadsheetML to .xlsx using python

I'm new to python, so i have this XML file which i can open pretty easily using Excel,
this is XML file
andi want to convert it to .xlsx or any compatible format to be able to use openpyxl module to it so that i can read it easily, is there any way to do this on python? Any advice would be appreciated thankyou.

I had the same problem. This answer helped me
Attempting to Parse an XLS (XML) File Using Python
You may save your Excel styled XML as xlsx with Workbook.SaveAs method using win32com (only for Windows users) and read in with pandas.read_excel
import win32com.client
import pandas as pd
original_file = "Your_downloaded_file.xml"
output = "Your_converted_file.xlsx"
xlApp = win32com.client.Dispatch("Excel.Application")
xlWbk = xlApp.Workbooks.Open(original_file)
xlWbk.SaveAs(output, 51)
xlWbk.Close(True)
xlApp.Quit()
output_df = pd.read_excel(output)
print(output_df.columns.ravel())

Error (little-endian) reading a XLS file with python

I download a XLS file from the web using selenium.
I tried many options I found in stack-overflow and other websites to read the XLS file :
import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'
And
df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'
And again
from xlrd import open_workbook
book = open_workbook('test.xls')
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'
I have tried different encoding: utf-8, ANSII, utf_16_be, utf16
I have even tried to get the encoding of the file from notepad or other applications.
Type of file : Microsoft Excel 97-2003 Worksheet (.xls)
I can open the file with Excel without any issue.
What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.
I would be really grateful if someone could provide me other ideas I could try. I need to open this file with a python script only.
Thanks,
Max
Solution(Somewhat messy but simple) that could potentially work for any type of Excel file :
Called VBA from python to Open and save the file in Excel. Excel "clean-up" the file and then Python is able to read it with any read Excel type function
Solution inspired by #Serge Ballesta and #John Y comments.
## Open a file in Excel and save it to correct the encoding error
import win32com.client
import pandas
downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"
xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True # enables Excel pop up message for saving the file
df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file
Thank you all!

What does pd mean?? What
pandas is made for data science. In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).
from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =....
It has several examples with this on Internet...

Lost all format when editing data on existing excel file using Openpyxl

I'm new in python, and I tried to write data into existing excel file using openpyxl, my excel file has a little complicated format. After running my simple code, I checked my excel file and all the format is corrupted.
This is my code:
import openpyxl
xfile = openpyxl.load_workbook('sample.xlsx')
sheet = xfile.get_sheet_by_name('Sheet1')
sheet['A1'] = 'hello world'
xfile.save('sample.xlsx')
Please help me figure out how to fix it, or suggest an alternative library that can work well with "xlsx" file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

BadDataError when converting .DBF to .csv - python

Related

Using Python convert excel sheet to different format

Python: read .xls with formula inside

How to convert Excel SpreadsheetML to .xlsx using python

Error (little-endian) reading a XLS file with python

Lost all format when editing data on existing excel file using Openpyxl

Categories

Resources