I have looked at previous answers to this question, but in each of those scenarios the questioners were asking about something specific they were doing with the file, but the problem occurs for me even when I am not.
I have a .csv file of 27,204 rows. When I open the python interpreter:
python
import csv
o = open('btc_usd1hour.csv','r')
p = csv.reader(o)
for row in p:
print(row)
I then only see roughly the last third of the document displayed to me.
Try so, at me works:
with open(name) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
reference:
https://docs.python.org/3.6/library/csv.html#csv.DictReader
Try the following code
import csv
fname = 'btc_usd1hour.csv'
with open(fname, newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
It is difficult to tell what is the problem without having the sample. I guess the problem would be removed if you add that newline='' for opening the file.
Use the with construct to close the file automatically. Use the f name for a file object when no further explanation is needed. Store the file name to fname to make future modifications easier (and also for easy copying the code fragment for your later programs).
olisch may be right that the console just scrolled so fast you could not see the result. You can write the result to another text file like this:
with open(fname, newline='') as fin,\
open('output.txt', 'w') as fout:
reader = csv.reader(fin)
for row in reader:
fout.write(repr(row) + '\n')
The repr function converts the row list into its string representation. The print calls that function internally, so you will have the same result that you otherwise observe on screen.
maybe your scrollback buffer is just to short to see the whole list?
In general your csv.reader call should be working fine, except your 27k rows aren't extremly long so that you might be able to hit any 64bit boundaries, which would be quite uncommon.
len(o) might be interesting to see.
Related
I have seen similar posts to this but they all seem to be print statements (viewing the cleaned data) rather than overwriting the original csv with the cleaned data so I am stuck. When I tried to write back to the csv myself, it just deleted everything in the file. Here is the format of the csv:
30;"unemployed";"married";"primary";"no";1787;"no";"no";"cellular";19;"oct";79;1;-1;0;"unknown";"no"
33;"services";"married";"secondary";"no";4747;"yes";"cellular";11;"may";110;1;339;2;"failure";"no"
35;"management";"single";"tertiary";"no";1470;"yes";"no";"cellular";12;"apr"185;1;330;1;"failure";"no"
It is delimited by semicolons, which is fine, but all text is wrapped in quotes and I only want to remove the quotes and write back to the file. Here is the code I reverted back to that successfully reads the file, removes all quotes, and then prints the results:
import csv
f = open("bank.csv", 'r')
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
print(' '.join(row))
finally:
f.close()
Any help on properly writing back to the csv would be appreciated, thanks!
See here: Python CSV: Remove quotes from value
I've done this basically two different ways, depending on the size of the csv.
You can read the entire csv into a python object (list), do some things and then
overwrite the other existing file with the cleaned version
As in the link above, you can use one reader and one writer, Create a new file, and write line by-line as you clean the input from the csv reader, delete the original csv and rename the new one to replace the old file.
In my opinion option #2 is vastly preferable as it avoids the possibility of data loss if your script has an error part way through writing. It also will have lower memory usage.
Finally: It may be possible to open a file as read/write, and iterate line-by-line overwriting as you go: But that will leave you open to half of your file having quotes, and half not if your script crashes part way through.
You could do something like this. Read it in, and write using quoting=csv.QUOTE_NONE
import csv
f = open("bank.csv", 'r')
inputCSV = []
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
inputCSV.append(row)
finally:
f.close()
with open('bank.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=';')
for row in inputCSV:
csvwriter.writerow(row)
I wanted to read some input from the csv file and then modify the input and replace it with the new value. For this purpose, I first read the value but then I'm stuck at this point as I want to modify all the values present in the file.
So is it possible to open the file in r mode in one for loop and then immediately in w mode in another loop to enter the modified data?
If there is a simpler way to do this please help me out
Thank you.
Yes, you can open the same file in different modes in the same program. Just be sure not to do it at the same time. For example, this is perfectly valid:
with open("data.csv") as f:
# read data into a data structure (list, dictionary, etc.)
# process lines here if you can do it line by line
# process data here as needed (replacing your values etc.)
# now open the same filename again for writing
# the main thing is that the file has been previously closed
# (after the previous `with` block finishes, python will auto close the file)
with open("data.csv", "w") as f:
# write to f here
As others have pointed out in the comments, reading and writing on the same file handle at the same time is generally a bad idea and won't work as you expect (unless for some very specific use case).
You can do open("data.csv", "rw"), this allows you to read and write at the same time.
Just like others have mentioned, modifying the same file as both input and output without any backup method is such a terrible idea, especially in a condensed file like most .csv files, which is normally more complicated than a single .Txt based file, but if you insisted you can do with the following:
import csv
file path = 'some.csv'
with open('some.csv', 'rw', newline='') as csvfile:
read_file = csv.reader(csvfile)
write_file = csv.writer(csvfile)
Note that code above will trigger an error with a message ValueError: must have exactly one of create/read/write/append mode.
For safety, I preferred to split it into two different files
import csv
in_path = 'some.csv'
out_path = 'Out.csv'
with open(in_path, 'r', newline='') as inputFile, open(out_path, 'w', newline='') as writerFile:
read_file = csv.reader(inputFile)
write_file = csv.writer(writerFile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in read_file:
# your modifying input data code here
........
This:
import csv
with open('original.csv', 'rb') as inp, open('new.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[2] != "0":
writer.writerow(row)
os.remove('original.csv')
os.rename('new.csv', 'original.csv')
allows to delete certain rows of a CSV.
Is there a more pythonic way to delete some rows of a CSV file, in-place? (instead of creating a file, deleting the original, renaming, etc.)
There isn't a more Pythonic way: you can't delete stuff in the middle of a file. Write out a new file with the stuff you want, and then rename it.
I noticed that your code does not import the os module, even though you're using it. Regardless, here's a method of doing what you need it to do without using that module.
This will open in read mode first to get the data, then write mode to overwrite. Note that you need to pass the csv.reader(f) statement to the list() function or else the data variable will simply point to the memory address of the CSV file and you won't be able to do anything with the content once it's closed. list() will actually copy the information for you.
import csv
with open("original.csv", "rb") as f:
data = list(csv.reader(f))
with open("original.csv", "wb") as f:
writer = csv.writer(f)
for row in data:
if row[2] != "0":
writer.writerow(row)
I'm new to python.
I have a list with 19188 rows that I want to save as a csv.
When I write the list's rows to the csv, it does not have the last rows (it stops at 19112).
Do you have any idea what might cause this?
Here is how I write to the csv:
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
Each row of list has 4 string elements.
Another piece of information:
If I create a list that contains only the last elements that are missing and add them to the csv file, it kind of works (it is added, but twice...).
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
list_end = []
#list_end creation code
thedatawriter.writerows(list_end)
If I try to add the list_end alone, it doesn't seem to be working. I'm thinking there might be a csv writing parameter that I got wrong.
Another piece of information:
If I open the file adding ", newline=''", then it write more rows to it (though not all)
mycsvfile = open('file.csv', 'w', newline='')
There must be a simple mistake in the way I open or write to the csv (or in the dialect?)
Thanks for your help!
I found my answer! I was not closing the filehandle before script end which left unwritten rows.
Here is the fix:
with open('file.csv', 'w', newline='') as mycsvfile:
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
thedatawriter.writerows(list)
See: Writing to CSV from list, write.row seems to stop in a strange place
Close the filehandle before the script ends. Closing the filehandle
will also flush any strings waiting to be written. If you don't flush
and the script ends, some output may never get written.
Using the with open(...) as f syntax is useful because it will close
the file for you when Python leaves the with-suite. With with, you'll
never omit closing a file again.
I haven't been able to re.sub a csv file.
My expression is doing it's job but the writerow is where I'm stuck.
re.sub out
"A1","Address2" "A1","Address2"
0138,"DEERFIELD AVE" 0138,"DEERFIELD"
0490,"REMMINGTON COURT" 0490,"REMMINGTON"
2039,"SANDHILL DR" 2039,"SANDHILL"
import csv
import re
with open('aa_street.txt', 'rb') as f:
reader = csv.reader(f)
read=csv.reader(f)
for row in read:
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', ' ', row[1])
row_zero = row[0]
print row_one
for row in reader:
print writerow([row[0],row[1]])
Perhaps something like this is what you need?
#!/usr/local/cpython-3.3/bin/python
# "A1","Address2" "A1","Address2"
# 0138,"DEERFIELD AVE" 0138,"DEERFIELD"
# 0490,"REMMINGTON COURT" 0490,"REMMINGTON"
# 2039,"SANDHILL DR" 2039,"SANDHILL"
import re
import csv
with open('aa_street.txt', 'r') as infile, open('actual-output', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
row_zero = row[0]
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', '', row[1])
writer.writerow([row_zero, row_one])
A file is an iterator—you iterate over it once, and then it's empty.
A csv.reader is also an iterator.
In general, if you want to reuse an iterator, there are three ways to do it:
Re-generate the iterator (and, if its source was an iterator, re-generate that as well, as so on up the chain)—in this case, that means open the file again.
Use itertools.tee.
Copy the iterator into a sequence and reuse that instead.
In the special case of files, you can fake #1 by using f.seek(0). Some other iterators have similar behavior. But in general, you shouldn't rely on this.
Anyway, the last one is the easiest, so let's just see how that works:
reader = list(csv.reader(f))
read = reader
Now you've got a list of all of the rows in the file. You can copy it, loop over it, loop over the copy, close the file, loop over the copy again, it's still there.
Of course the down side it that you need enough memory to put the whole thing in memory (plus, you can't start processing the first line until you've finished reading the last one). If that's a problem, you need to either reorganize your code so it only needs one pass, or re-open (or seek) the file.