Python Win32, how to save an XLS as a CSV?

Python Win32, how to save an XLS as a CSV? - python

I'm loading up a .xlsx with win32com and would like to save the results as a csv when I'm done.
myworkbook.SaveAs('results.csv')
gives me an xlsx file with a csv extension. How do I save as an actual CSV?

I think that if you add the type after the filename, it should work. (Can't test right now.)
I think the type for CSV (DOS) is 24.
myworkbook.SaveAs('results.csv', 24)

Here are the docs for saveAs:
http://msdn.microsoft.com/en-us/library/bb214129.aspx
from win32com.client import constants as c
myWorkBook.SaveAs('results.csv', c.xlCSV)

You have to specify the type after the filename.
For CSV the following modes are available:
xlCSV = 6 # Comma separated value.
xlCSVMac = 22, # Comma separated value.
xlCSVMSDOS = 24, # Comma separated value.
xlCSVWindows =23, # Comma separated value.
Available file formats can be fond here, the spec of the saveAs method can be found here. Even as there is no example for python, the parameters and values should be the same.

I have not used this library but it might be worth giving a shot:
http://pypi.python.org/pypi/ooxml

Related

Python Pandas.to_csv unable to export columns with semicolon(;) as one column

I have a dataframe with 3 columns, but 1 of the columns contain data that is separated by a semicolon(;) during export. I am trying to export a dataframe into a csv but my csv output data keeps getting separated into the following format when opening in excel:
import pandas as pd
my_dict = { 'name' : ["a", "b"],
'age' : [20,27],
'tag': ["Login Location;Visit Location;Appointment Location", "Login Location;Visit Location;Appointment Location"]}
df=pd.DataFrame(my_dict)
df.to_csv('output.csv',index=False)
print('done')
I would like to have the output in excel to be:
where the data in the tag column is intact. I've tried adding sep=',' or delimiter=',' but it still gives me the same output.
Thank you in advance,
John

Thank you #Alex and #joao for your inputs, this guided me to the right direction. I was able to get the output I needed by forcing excel to use , as the separator. By default, Excel was using tab as the delimiter, that's why it was showing me an incorrect format. Here's the link to forcing excel to use comma as a list separator: https://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files

Excel does some stuff based on the fact that your file has a .csvsuffix, probably using ; as a default delimiter, as suggested in the comments.
One workaround is to use the .txt suffix instead:
df.to_csv('output.txt',index=False)
then open the file in Excel, and in the Text Import Wizard specify "Delimited" and comma as separator.
Do not pick the file in the list of previously opened files, if it's there, that won't work, you really need to do File/Open then browse the directory to find your .txt file.

Python Openpyxl VLOOKUP from other file

I have 2 files, lets say 't1.xlsx' and 't2.xlsx'.
What i want to do is to do the VLOOKUP fucntion inside the t1 file using the data from t2 file.
I try to paste
"sheet["O2"].value = "=VLOOKUP(C:C;'C:\\Users\\KKK\\Desktop\\sheets\\excellent\\
[t2.xlsx]baza'!$A$2:$AI$10480;25;0)"
where baza is a sheet name, but sadly when i try open the file it says it can not be open due to the error and offers me repairing tool.
rest of the code:
import openpyxl
wb = openpyxl.load_workbook('t1.xlsx')
sheets = wb.get_sheet_names()
sheet = wb.get_sheet_by_name('Sheet1')
[VLOOKUP STUFF FROM BEFORE]
wb.save("t1.xlsx")

With more complicated formulae you should always check the syntax in the XML because they are often stored differently than they appear in Excel. This is covered in the documentation. You might be okay simply using a comma as a separator but I suspect you'll also have change the path of the file and use a Python raw string (the r prefix).

How to open a .data file extension

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges

Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")

It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.

To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset

I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")

It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.

How to rename files and change the file type as well?

I have a list with .dbf files which I want to change to .csv files. By hand I open them in excel and re-save them as .csv, but this takes too much time.
Now I made a script which changes the file name, but when I open it, it is still a .dbf file type (although it is called .csv). How can I rename the files in such a way that the file type also changes?
My script uses (the dbf and csv file name are listed in a seperate csv file):
IN = dbffile name
OUT = csvfile name
for output_line in lstRename:
shutil.copyfile(IN,OUT)

Changing the name of a file (and the extension is just part of the complete name) has absolutely no effect on the contents of the file. You need to somehow convert the contents from one format to the other.
Using my dbf module and python it is quite simple:
import dbf
IN = 'some_file.dbf'
OUT = 'new_name.csv'
dbf.Table(IN).export(filename=OUT)
This will create a .csv file that is actually in csv format.

If you have ever used VB or looked into VBA, you can write a simple excel script to open each file, save it as csv and then save it with a new name.
Use the macro recorder to record you once doing it yourself and then edit the resulting script.
I have now created a application that automates this. Its called xlsto (look for the xls.exe release file). It allows you to pick a folder and convert all xls files to csv (or any other type).

You need a converter
Search for dbf2csv in google.

It depends what you want to do. It seems like you want to convert files to other types. There are many converters out there, but a computer alone doesn't know every file type. For that you will need to download some software. If all you want to do is change the file extension,
(ex. .png, .doc, .wav) then you can set your computer to be able to change both the name and the extension. I hoped I helped in some way :)

descargar libreria dbfpy desde http://sourceforge.net/projects/dbfpy/?source=dlp
import csv,glob
from dbfpy import dbf
entrada = raw_input(" entresucarpetadbf ::")
lisDbf = glob.glob(entrada + "\\*dbf")
for db in lisDbf:
print db
try:
dbfFile = dbf.Dbf(open(db,'r'))
csvFile = csv.writer(open(db[:-3] + "csv", 'wb'))
headers = range(len(dbfFile.fieldNames))
allRows = []
for row in dbfFile:
rows = []
for num in headers:
rows.append(row[num])
allRows.append(rows)
csvFile.writerow(dbfFile.fieldNames)
for row in allRows:
print row
csvFile.writerow(row)
except Exception,e:
print e

It might be that the new file name is "xzy.csv.dbf". Usually in C# I put quotes in the filename. This forces the OS to change the filename. Try something like "Xzy.csv" in quotes.

Get the inputs from Excel and use those inputs in python script

How to get the inputs from excel and use those inputs in python.

Take a look at xlrd
This is the best reference I found for learning how to use it: http://www.dev-explorer.com/articles/excel-spreadsheets-and-python

Not sure if this is exactly what you're talking about, but:
If you have a very simple excel file (i.e. basically just one table filled with string-values, nothing fancy), and all you want to do is basic processing, then I'd suggest just converting it to a csv (comma-seperated value file). This can be done by "saving as..." in excel and selecting csv.
This is just a file with the same data as the excel, except represented by lines seperated with commas:
cell A:1, cell A:2, cell A:3
cell B:1, cell B:2, cell b:3
This is then very easy to parse using standard python functions (i.e., readlines to get each line of the file, then it's just a list that you can split on ",").
This if of course only helpful in some situations, like when you get a log from a program and want to quickly run a python script which handles it.
Note: As was pointed out in the comments, splitting the string on "," is actually not very good, since you run into all sorts of problems. Better to use the csv module (which another answer here teaches how to use).

import win32com
Excel=win32com.client.Dispatch("Excel.Application")
Excel.Workbooks.Open(file path)
Cells=Excel.ActiveWorkBook.ActiveSheet.Cells
Cells(row,column).Value=Input
Output=Cells(row,column).Value

If you can save as a csv file with headers:
Attrib1, Attrib2, Attrib3
value1.1, value1.2, value1.3
value2,1,...
Then I would highly recommend looking at built-in the csv module
With that you can do things like:
csvFile = csv.DictReader(open("csvFile.csv", "r"))
for row in csvFile:
print row['Attrib1'], row['Attrib2']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Win32, how to save an XLS as a CSV? - python

I'm loading up a .xlsx with win32com and would like to save the results as a csv when I'm done. myworkbook.SaveAs('results.csv') gives me an xlsx file with a csv extension. How do I save as an actual CSV?

I think that if you add the type after the filename, it should work. (Can't test right now.) I think the type for CSV (DOS) is 24. myworkbook.SaveAs('results.csv', 24)

Here are the docs for saveAs: http://msdn.microsoft.com/en-us/library/bb214129.aspx from win32com.client import constants as c myWorkBook.SaveAs('results.csv', c.xlCSV)

I have not used this library but it might be worth giving a shot: http://pypi.python.org/pypi/ooxml

Related

Python Pandas.to_csv unable to export columns with semicolon(;) as one column

Python Openpyxl VLOOKUP from other file

How to open a .data file extension

How to rename files and change the file type as well?

Get the inputs from Excel and use those inputs in python script

Categories

Resources