I haven't been able to re.sub a csv file.
My expression is doing it's job but the writerow is where I'm stuck.
re.sub out
"A1","Address2" "A1","Address2"
0138,"DEERFIELD AVE" 0138,"DEERFIELD"
0490,"REMMINGTON COURT" 0490,"REMMINGTON"
2039,"SANDHILL DR" 2039,"SANDHILL"
import csv
import re
with open('aa_street.txt', 'rb') as f:
reader = csv.reader(f)
read=csv.reader(f)
for row in read:
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', ' ', row[1])
row_zero = row[0]
print row_one
for row in reader:
print writerow([row[0],row[1]])
Perhaps something like this is what you need?
#!/usr/local/cpython-3.3/bin/python
# "A1","Address2" "A1","Address2"
# 0138,"DEERFIELD AVE" 0138,"DEERFIELD"
# 0490,"REMMINGTON COURT" 0490,"REMMINGTON"
# 2039,"SANDHILL DR" 2039,"SANDHILL"
import re
import csv
with open('aa_street.txt', 'r') as infile, open('actual-output', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
row_zero = row[0]
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', '', row[1])
writer.writerow([row_zero, row_one])
A file is an iterator—you iterate over it once, and then it's empty.
A csv.reader is also an iterator.
In general, if you want to reuse an iterator, there are three ways to do it:
Re-generate the iterator (and, if its source was an iterator, re-generate that as well, as so on up the chain)—in this case, that means open the file again.
Use itertools.tee.
Copy the iterator into a sequence and reuse that instead.
In the special case of files, you can fake #1 by using f.seek(0). Some other iterators have similar behavior. But in general, you shouldn't rely on this.
Anyway, the last one is the easiest, so let's just see how that works:
reader = list(csv.reader(f))
read = reader
Now you've got a list of all of the rows in the file. You can copy it, loop over it, loop over the copy, close the file, loop over the copy again, it's still there.
Of course the down side it that you need enough memory to put the whole thing in memory (plus, you can't start processing the first line until you've finished reading the last one). If that's a problem, you need to either reorganize your code so it only needs one pass, or re-open (or seek) the file.
Related
I've looked at all the threads I can find but I still can't figure it out, I'm having all sorts of issues.
First issue is that I can't change the items in a list to lower case, so I have to convert it to a string first. Once I do that I can't append strings back into the list without it creating a double list. Why can't I simply change a list to lowercase, delete contents in csv, then paste the lowercase list back in?
My latest attempt, but I've tried many things.
with open(teacherDD, 'r+') as f:
read = csv.reader(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
for row in read:
copyRow = row.copy()
# print(copyRow)
del row[:]
# print(row)
getLowerStr = str(copyRow).lower()
# appendLower = row.append(getLowerStr)
# print(getLowerStr)
print(row)
f.write(getLowerStr)
f.close()
If you just want to convert to lowercase, why use the csv reader?
You can use the fileinput module to edit lines in place.
Python3.x
import fileinput
for line in fileinput.input("test.txt", inplace=1):
print(line.lower(), end='')
Python2.x
import fileinput
import sys
for line in fileinput.input("test.txt", inplace=1):
sys.stdout.write(line.lower())
One cool feature of this module is, when you open a file with it, anything that is printed with print or sys.stdout.write is redirected to the file.
Sample input:
UPPER,CASE,ROW
Output:
upper,case,row
I have looked at previous answers to this question, but in each of those scenarios the questioners were asking about something specific they were doing with the file, but the problem occurs for me even when I am not.
I have a .csv file of 27,204 rows. When I open the python interpreter:
python
import csv
o = open('btc_usd1hour.csv','r')
p = csv.reader(o)
for row in p:
print(row)
I then only see roughly the last third of the document displayed to me.
Try so, at me works:
with open(name) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
reference:
https://docs.python.org/3.6/library/csv.html#csv.DictReader
Try the following code
import csv
fname = 'btc_usd1hour.csv'
with open(fname, newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
It is difficult to tell what is the problem without having the sample. I guess the problem would be removed if you add that newline='' for opening the file.
Use the with construct to close the file automatically. Use the f name for a file object when no further explanation is needed. Store the file name to fname to make future modifications easier (and also for easy copying the code fragment for your later programs).
olisch may be right that the console just scrolled so fast you could not see the result. You can write the result to another text file like this:
with open(fname, newline='') as fin,\
open('output.txt', 'w') as fout:
reader = csv.reader(fin)
for row in reader:
fout.write(repr(row) + '\n')
The repr function converts the row list into its string representation. The print calls that function internally, so you will have the same result that you otherwise observe on screen.
maybe your scrollback buffer is just to short to see the whole list?
In general your csv.reader call should be working fine, except your 27k rows aren't extremly long so that you might be able to hit any 64bit boundaries, which would be quite uncommon.
len(o) might be interesting to see.
This:
import csv
with open('original.csv', 'rb') as inp, open('new.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[2] != "0":
writer.writerow(row)
os.remove('original.csv')
os.rename('new.csv', 'original.csv')
allows to delete certain rows of a CSV.
Is there a more pythonic way to delete some rows of a CSV file, in-place? (instead of creating a file, deleting the original, renaming, etc.)
There isn't a more Pythonic way: you can't delete stuff in the middle of a file. Write out a new file with the stuff you want, and then rename it.
I noticed that your code does not import the os module, even though you're using it. Regardless, here's a method of doing what you need it to do without using that module.
This will open in read mode first to get the data, then write mode to overwrite. Note that you need to pass the csv.reader(f) statement to the list() function or else the data variable will simply point to the memory address of the CSV file and you won't be able to do anything with the content once it's closed. list() will actually copy the information for you.
import csv
with open("original.csv", "rb") as f:
data = list(csv.reader(f))
with open("original.csv", "wb") as f:
writer = csv.writer(f)
for row in data:
if row[2] != "0":
writer.writerow(row)
I'm new to python.
I have a list with 19188 rows that I want to save as a csv.
When I write the list's rows to the csv, it does not have the last rows (it stops at 19112).
Do you have any idea what might cause this?
Here is how I write to the csv:
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
Each row of list has 4 string elements.
Another piece of information:
If I create a list that contains only the last elements that are missing and add them to the csv file, it kind of works (it is added, but twice...).
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
list_end = []
#list_end creation code
thedatawriter.writerows(list_end)
If I try to add the list_end alone, it doesn't seem to be working. I'm thinking there might be a csv writing parameter that I got wrong.
Another piece of information:
If I open the file adding ", newline=''", then it write more rows to it (though not all)
mycsvfile = open('file.csv', 'w', newline='')
There must be a simple mistake in the way I open or write to the csv (or in the dialect?)
Thanks for your help!
I found my answer! I was not closing the filehandle before script end which left unwritten rows.
Here is the fix:
with open('file.csv', 'w', newline='') as mycsvfile:
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
thedatawriter.writerows(list)
See: Writing to CSV from list, write.row seems to stop in a strange place
Close the filehandle before the script ends. Closing the filehandle
will also flush any strings waiting to be written. If you don't flush
and the script ends, some output may never get written.
Using the with open(...) as f syntax is useful because it will close
the file for you when Python leaves the with-suite. With with, you'll
never omit closing a file again.
I want to read two column of a csv file separately, but when I wrote code like below python just show first column and nothing for second, but in the csv file the second column also has lots of rows.
import csv
import pprint
f = open("arachnid.csv", 'r')
read = csv.DictReader(f)
for i in range(3):
read.next()
for i in read:
pprint.pprint(i["binomialAuthority_label"])
for i in read:
pprint.pprint(i["rdf-schema#label"])
The reason for this is that when you use DictReader the way you are using it it will create what is called an iterator/generator. So, when you have iterated over it once, you cannot iterate over it again the way you are doing it.
If you want to keep your logic as is, you can actually call seek(0) on your file reader object to reset its position as such:
f.seek(0)
The next time you iterate over your dictreader object, it will give you what you are looking for. So the part of your code of interest would be this:
for i in read:
pprint.pprint(i["binomialAuthority_label"])
# This is where you set your seek(0) before the second loop
f.seek(0)
for i in read:
pprint.pprint(i['rdf-schema#label'])
Your DictReader instance gets exhausted after your first for i in read: loop, so when you try to do your second loop, there is nothing to iterate over.
What you want to do, once you've iterated over the CSV the first time, you can seek your file back to the start, and create a new instance of the DictReader and start again. You'll want to create a new DictReader instance otherwise you'll need to manually skip the header line.
f = open(filename)
read = csv.DictReader(f)
for i in read:
print i
f.seek(0)
read = csv.DictReader(f)
for i in read:
print i