Excel fails to open Python-generated CSV files - python

I have many Python scripts that output CSV files. It is occasionally convenient to open these files in Excel. After installing OS X Mavericks, Excel no longer opens these files properly: Excel doesn't parse the files and it duplicates the rows of the file until it runs out of memory. Specifically, when Excel attempts to open the file, a prompt appears that reads: "File not loaded completely."
Example of code I'm using to generate the CSV files:
import csv
with open('csv_test.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow([1,2,3])
writer.writerow([4,5,6])
Even the simple file generated by the above code fails to load in Excel. However, if I open the CSV file in a text editor and copy/paste the text into Excel, parse it with text to columns, and then save as CSV from Excel, then I can reopen the CSV file in Excel without issue. Do I need to pass an additional parameter in my scripts to make Excel parse the CSV files the same way it used to? Or is there some setting I can change in OS X Mavericks or Excel? Thanks.

Maybe I had the similar problem, the error message "SYLK: File format is not valid" when open python autogenerated csv file. The solution is really funny. The first two characters must not be I and D in uppercase (ID). Also see "SYLK: File format is not valid" error message when you open file.

Possible solution1: use *.txt instead of *.csv. In this case Excel (at least, 2010) will show you an import data wizard where you can specify delimiters, character encoding, field types, etc.
UPD: Solution2:
The python "csv" module has a "dialect" feature. For example, the following modification of your code generates valid csv file for my environment (Python 2.7, Excel 2010, Windows7, locale with ";" list delimiters):
import csv
with open('csv_test2.csv', 'wb') as f:
csv.excel.delimiter=';'
writer = csv.writer(f, dialect=csv.excel)
writer.writerow([1,2,3])
writer.writerow([4,5,6])

Related

Python : read excel in python with default module

I have Python 2.6.6 version and I don't have access to install new modules like pandas,xlrd,xlwt.
I want to read Excel using Python . Is it possible to read Excel using Python with default modules present in Python.
Based on this link: http://davis.lbl.gov/Manuals/PYTHON/library/csv.html
you should be able to use reader and writer commands by importing csv. You can also define different delimiters:
import csv
FileReader = csv.reader(open('FileName.csv', 'rb'), delimiter=' ',quotechar='|')
Moreover, you can map the data you are reading into a dict using DictReader.
1) if you are using an IDE ex. PyCharm then you can drag an drop the file in your workspace or project and open it like this --> open("myfile.csv") it will be opened as a csv (comma seperated value)
2) otherwise, use --> the file complete path such as open("User/Desktop/myfile.csv")
in both cases use --> with open(.....) as f:
for x in f:
write some code here..
and in this case x represents an element or cell in your excel file

How to open a .data file extension

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges
Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.
To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset
I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")
It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.

How to convert a CSV to xlxs file in python

I am trying to convert a CSV to an xlxs file format because I have a code that is meant to read a an excel file, but ended up getting a CSV. Is there a way to convert a CSV file to an TEMP excel file and have it not destroyed until the reading process is done. I have tried using openpyxl but it ends up not working and throwing an error saying it's not a good zip file. I even tried converting the CSV to text and then storing it in a dictionary but it writing to excel using xlrd pakage did not work aswell. I was wondering if there is a way do it in a cc
Seems like you open the file in text mode. Try this to open file
open('sample.csv', "rt", encoding="utf8")
or
open('sample.csv', "rt", encoding="ascii")
depending on the encoding of the file

how to convert Excel file to CSV and prevent UTF-8 encoding

I have 5 Excel files that have to be compiled into one csv file that can be uploaded to our website for our affiliated stores database. Until now we've had someone manually cut and paste the rows of each file into one master csv file in Excel then they upload that file to the website.
I've been trying to use Python to consolidate the files so the user would just have to run the Python script that would do this for her. The problem is that the Excel files are encoded in Shift-JIS and when I use CSV writer in Python they get converted to UTF-8. The website we upload them to will only accept files in Shift-JIS, so I have to keep all of this data in Shift-JIS.
Since DOS automatically defaults to ascii encoding, I first have to run this:
import codecs, sys, xlrd, csv
reload(sys)
sys.setdefaultencoding('shift_jis')
Here is a sample of the code for one of the Excel files, which has data on 2 separate worksheets:
with xlrd.open_workbook('Circle.xls') as wb:
for sheet in wb.sheets():
fn = 'store-'
print "Converting files.."
with open(fn + sheet.name + ".csv","wb") as f:
c = csv.writer(f,dialect="excel")
for r in range(sheet.nrows):
c.writerow(sheet.row_values(r))
The conversion runs until it finds a UTF-8 character that doesn't exist in shift-JIS, then it errors out.
Is there a way to convert from Excel to a csv purely in shift-JIS?
(If my question has a flaw, please ask me to edit it before marking it down! I will edit it!)

Csv blank rows problem with Excel

I have a csv file which contains rows from a sqlite3 database. I wrote the rows to the csv file using python.
When I open the csv file with Ms Excel, a blank row appears below every row, but the file on notepad is fine(without any blanks).
Does anyone know why this is happenning and how I can fix it?
Edit: I used the strip() function for all the attributes before writing a row.
Thanks.
You're using open('file.csv', 'w')--try open('file.csv', 'wb').
The Python csv module requires output files be opened in binary mode.
the first that comes into my mind (just an idea) is that you might have used "\r\n" as row delimiter (which is shown as one linebrak in notepad) but excel expects to get only "\n" or only "\r" and so it interprets this as two line-breaks.

Categories