I have Python 2.6.6 version and I don't have access to install new modules like pandas,xlrd,xlwt.
I want to read Excel using Python . Is it possible to read Excel using Python with default modules present in Python.
Based on this link: http://davis.lbl.gov/Manuals/PYTHON/library/csv.html
you should be able to use reader and writer commands by importing csv. You can also define different delimiters:
import csv
FileReader = csv.reader(open('FileName.csv', 'rb'), delimiter=' ',quotechar='|')
Moreover, you can map the data you are reading into a dict using DictReader.
1) if you are using an IDE ex. PyCharm then you can drag an drop the file in your workspace or project and open it like this --> open("myfile.csv") it will be opened as a csv (comma seperated value)
2) otherwise, use --> the file complete path such as open("User/Desktop/myfile.csv")
in both cases use --> with open(.....) as f:
for x in f:
write some code here..
and in this case x represents an element or cell in your excel file
Related
So I have a list of lists.
For example:
l = [[1,2,3],['a','b','c'],['a,b','d,c','a,e']]
I am having trouble exporting this properly because I need the two letters in the same cell, so if I use commas as a delimiter, it separates them. I have attached a photo of what i would want it to look like. I have already tried a few things but I am not sure how to do this. Any help would be great.
Export it as CSV first, which you can then open and save as a normal Excel workbook.
You use the built-in csv module to do that:
import csv
with open(file_name, 'w') as f:
fc = csv.writer(f, lineterminator='\n')
fc.writerows(l)
I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges
Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.
To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset
I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")
It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.
I have many Python scripts that output CSV files. It is occasionally convenient to open these files in Excel. After installing OS X Mavericks, Excel no longer opens these files properly: Excel doesn't parse the files and it duplicates the rows of the file until it runs out of memory. Specifically, when Excel attempts to open the file, a prompt appears that reads: "File not loaded completely."
Example of code I'm using to generate the CSV files:
import csv
with open('csv_test.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow([1,2,3])
writer.writerow([4,5,6])
Even the simple file generated by the above code fails to load in Excel. However, if I open the CSV file in a text editor and copy/paste the text into Excel, parse it with text to columns, and then save as CSV from Excel, then I can reopen the CSV file in Excel without issue. Do I need to pass an additional parameter in my scripts to make Excel parse the CSV files the same way it used to? Or is there some setting I can change in OS X Mavericks or Excel? Thanks.
Maybe I had the similar problem, the error message "SYLK: File format is not valid" when open python autogenerated csv file. The solution is really funny. The first two characters must not be I and D in uppercase (ID). Also see "SYLK: File format is not valid" error message when you open file.
Possible solution1: use *.txt instead of *.csv. In this case Excel (at least, 2010) will show you an import data wizard where you can specify delimiters, character encoding, field types, etc.
UPD: Solution2:
The python "csv" module has a "dialect" feature. For example, the following modification of your code generates valid csv file for my environment (Python 2.7, Excel 2010, Windows7, locale with ";" list delimiters):
import csv
with open('csv_test2.csv', 'wb') as f:
csv.excel.delimiter=';'
writer = csv.writer(f, dialect=csv.excel)
writer.writerow([1,2,3])
writer.writerow([4,5,6])
I have a list with .dbf files which I want to change to .csv files. By hand I open them in excel and re-save them as .csv, but this takes too much time.
Now I made a script which changes the file name, but when I open it, it is still a .dbf file type (although it is called .csv). How can I rename the files in such a way that the file type also changes?
My script uses (the dbf and csv file name are listed in a seperate csv file):
IN = dbffile name
OUT = csvfile name
for output_line in lstRename:
shutil.copyfile(IN,OUT)
Changing the name of a file (and the extension is just part of the complete name) has absolutely no effect on the contents of the file. You need to somehow convert the contents from one format to the other.
Using my dbf module and python it is quite simple:
import dbf
IN = 'some_file.dbf'
OUT = 'new_name.csv'
dbf.Table(IN).export(filename=OUT)
This will create a .csv file that is actually in csv format.
If you have ever used VB or looked into VBA, you can write a simple excel script to open each file, save it as csv and then save it with a new name.
Use the macro recorder to record you once doing it yourself and then edit the resulting script.
I have now created a application that automates this. Its called xlsto (look for the xls.exe release file). It allows you to pick a folder and convert all xls files to csv (or any other type).
You need a converter
Search for dbf2csv in google.
It depends what you want to do. It seems like you want to convert files to other types. There are many converters out there, but a computer alone doesn't know every file type. For that you will need to download some software. If all you want to do is change the file extension,
(ex. .png, .doc, .wav) then you can set your computer to be able to change both the name and the extension. I hoped I helped in some way :)
descargar libreria dbfpy desde http://sourceforge.net/projects/dbfpy/?source=dlp
import csv,glob
from dbfpy import dbf
entrada = raw_input(" entresucarpetadbf ::")
lisDbf = glob.glob(entrada + "\\*dbf")
for db in lisDbf:
print db
try:
dbfFile = dbf.Dbf(open(db,'r'))
csvFile = csv.writer(open(db[:-3] + "csv", 'wb'))
headers = range(len(dbfFile.fieldNames))
allRows = []
for row in dbfFile:
rows = []
for num in headers:
rows.append(row[num])
allRows.append(rows)
csvFile.writerow(dbfFile.fieldNames)
for row in allRows:
print row
csvFile.writerow(row)
except Exception,e:
print e
It might be that the new file name is "xzy.csv.dbf". Usually in C# I put quotes in the filename. This forces the OS to change the filename. Try something like "Xzy.csv" in quotes.
[Please note that this is a different question from the already answered How to replace a column using Python’s built-in .csv writer module?]
I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python.
I'm having trouble when I try to write back to a .csv file after making a change to the contents of an entry. I've read the official csv module documentation about how to use the writer, but there isn't an example that covers this case. Specifically, I am trying to get the read, replace, and write operations accomplished in one loop. However, one cannot use the same 'row' reference in both the for loop's argument and as the parameter for writer.writerow(). So, once I've made the change in the for loop, how should I write back to the file?
edit: I implemented the suggestions from S. Lott and Jimmy, still the same result
edit #2: I added the "rb" and "wb" to the open() functions, per S. Lott's suggestion
import csv
#filename = 'C:/Documents and Settings/username/My Documents/PALTemplateData.xls'
csvfile = open("PALTemplateData.csv","rb")
csvout = open("PALTemplateDataOUT.csv","wb")
reader = csv.reader(csvfile)
writer = csv.writer(csvout)
changed = 0;
for row in reader:
row[-1] = row[-1].replace('/?', '?')
writer.writerow(row) #this is the line that's causing issues
changed=changed+1
print('Total URLs changed:', changed)
edit: For your reference, this is the new full traceback from the interpreter:
Traceback (most recent call last):
File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 13, in <module>
for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
You cannot read and write the same file.
source = open("PALTemplateData.csv","rb")
reader = csv.reader(source , dialect)
target = open("AnotherFile.csv","wb")
writer = csv.writer(target , dialect)
The normal approach to ALL file manipulation is to create a modified COPY of the original file. Don't try to update files in place. It's just a bad plan.
Edit
In the lines
source = open("PALTemplateData.csv","rb")
target = open("AnotherFile.csv","wb")
The "rb" and "wb" are absolutely required. Every time you ignore those, you open the file for reading in the wrong format.
You must use "rb" to read a .CSV file. There is no choice with Python 2.x. With Python 3.x, you can omit this, but use "r" explicitly to make it clear.
You must use "wb" to write a .CSV file. There is no choice with Python 2.x. With Python 3.x, you must use "w".
Edit
It appears you are using Python3. You'll need to drop the "b" from "rb" and "wb".
Read this: http://docs.python.org/3.0/library/functions.html#open
Opening csv files as binary is just wrong. CSV are normal text files so You need to open them with
source = open("PALTemplateData.csv","r")
target = open("AnotherFile.csv","w")
The error
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
comes because You are opening them in binary mode.
When I was opening excel csv's with python, I used something like:
try: # checking if file exists
f = csv.reader(open(filepath, "r", encoding="cp1250"), delimiter=";", quotechar='"')
except IOError:
f = []
for record in f:
# do something with record
and it worked rather fast (I was opening two about 10MB each csv files, though I did this with python 2.6, not the 3.0 version).
There are few working modules for working with excel csv files from within python - pyExcelerator is one of them.
the problem is you're trying to write to the same file you're reading from. write to a different file and then rename it after deleting the original.