Delete some rows of a CSV, in-place - python

This:
import csv
with open('original.csv', 'rb') as inp, open('new.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[2] != "0":
writer.writerow(row)
os.remove('original.csv')
os.rename('new.csv', 'original.csv')
allows to delete certain rows of a CSV.
Is there a more pythonic way to delete some rows of a CSV file, in-place? (instead of creating a file, deleting the original, renaming, etc.)

There isn't a more Pythonic way: you can't delete stuff in the middle of a file. Write out a new file with the stuff you want, and then rename it.

I noticed that your code does not import the os module, even though you're using it. Regardless, here's a method of doing what you need it to do without using that module.
This will open in read mode first to get the data, then write mode to overwrite. Note that you need to pass the csv.reader(f) statement to the list() function or else the data variable will simply point to the memory address of the CSV file and you won't be able to do anything with the content once it's closed. list() will actually copy the information for you.
import csv
with open("original.csv", "rb") as f:
data = list(csv.reader(f))
with open("original.csv", "wb") as f:
writer = csv.writer(f)
for row in data:
if row[2] != "0":
writer.writerow(row)

Related

Python: csv to pickle representation, back to csv messes with file content

I am trying to pickle a csv file and then turn its pickled representation back into a csv file.
This is the code I came up with:
from pathlib import Path
import pickle, csv
csvFilePath = Path('/path/to/file.csv')
pathToSaveTo = Path('/path/to/newFile.csv')
csvFile = open(csvFilePath, 'r')
f = csvFile.read()
csvFile.close()
f_pickled = pickle.dumps(f)
f_unpickled = pickle.loads(f_pickled)
#save unpickled csv file
new_csvFile = open(pathToSaveTo, 'w')
csvWriter = csv.writer(new_csvFile)
csvWriter.writerow(f_unpickled)
new_csvFile.close()
newFile.csv is created however there are two problems with its content:
There is now a comma between every character.
There is now a pair of quotation marks after every line.
What would I have to change about my code to get an exact copy of file.csv?
The problem is that you are reading the raw text of the file, with f = csvFile.read() then, on writting, you are feeding the data, which is a single lump of text, all in a single string, though a CSV writer object. The CSV writer will see the string as an iterable, and write each of the iterable elements in a CSV cell. Then, there is no data for a second row, and the process ends.
The pickle dumps and loads you perform is just a no-operation: nothing happens there, and if there were any issue, it would rather be due to some unpickleable object reference in the object you are passing to dumps: you'd get an exception, and not differing data when loads is called.
Now, without telling why you want to do this, and what intermediate steps you hav planned for the data, it is hard to tell you: you are performing two non-operations: reading a file, pickling and unpickling its contents, and writting those contents back to disk.
At which point do you need these data structured as rows, or as CSV cells? Just apply the proper transforms where you need it, and you are done.
If you want the whole "do nothing" cycle going through actual having the CSV data separated in different elements in Python you can perform:
from pathlib import Path
import pickle, csv
csvFilePath = Path('file.csv')
pathToSaveTo = Path('newFile.csv')
data = list(csv.reader(open(csvFilePath)))
# ^consumes all iterations of the reader: each iteration is a row, composed of a list where each cell value is a list elemnt
pickled_data = pickle.dumps(data)
restored_data = pickle.loads(pickled_data)
csv.writer(open(pathToSaveTo, "wt")).writerows(restored_data)
Perceive as in this snippet the data is read through csv.reader, not directly. Wrapping it in a list call causes all rows to be read and transformed in list items - because the reader is a lazy iterator otherwise (and it would not be pickeable, as one of the attributs it depends for its state is an open file)
I believe the problem is in how you're attempting to write the CSV file, the pickling and unpickling is fine. If you compare f with f_unpickled:
if f==f_unpickled:
print("Same")
This printed in my case. If you print the type, you'll see there's both strings.
The better option is to follow the document style and write each row one at a time rather than putting the entire string in including new lines. Something like this:
from pathlib import Path
import pickle, csv
csvFilePath = Path('file.csv')
pathToSaveTo = Path('newFile.csv')
rows = []
csvFile = open(csvFilePath, 'r')
with open(csvFilePath, 'r') as file:
reader = csv.reader(file)
for row in reader:
rows.append(row)
# pickle and unpickle
rows_pickled = pickle.dumps(rows)
rows_unpickled = pickle.loads(rows_pickled)
if rows==rows_unpickled:
print("Same")
#save unpickled csv file
with open(pathToSaveTo, 'w', newline='') as csvfile:
csvWriter = csv.writer(csvfile)
for row in rows_unpickled:
csvWriter.writerow(row)
This worked when I tested it--albeit it would take more finagling with line separators to get no empty line at the end.

I need to edit a python script to remove quotes from a csv, then write back to that same csv file, quotes removed

I have seen similar posts to this but they all seem to be print statements (viewing the cleaned data) rather than overwriting the original csv with the cleaned data so I am stuck. When I tried to write back to the csv myself, it just deleted everything in the file. Here is the format of the csv:
30;"unemployed";"married";"primary";"no";1787;"no";"no";"cellular";19;"oct";79;1;-1;0;"unknown";"no"
33;"services";"married";"secondary";"no";4747;"yes";"cellular";11;"may";110;1;339;2;"failure";"no"
35;"management";"single";"tertiary";"no";1470;"yes";"no";"cellular";12;"apr"185;1;330;1;"failure";"no"
It is delimited by semicolons, which is fine, but all text is wrapped in quotes and I only want to remove the quotes and write back to the file. Here is the code I reverted back to that successfully reads the file, removes all quotes, and then prints the results:
import csv
f = open("bank.csv", 'r')
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
print(' '.join(row))
finally:
f.close()
Any help on properly writing back to the csv would be appreciated, thanks!
See here: Python CSV: Remove quotes from value
I've done this basically two different ways, depending on the size of the csv.
You can read the entire csv into a python object (list), do some things and then
overwrite the other existing file with the cleaned version
As in the link above, you can use one reader and one writer, Create a new file, and write line by-line as you clean the input from the csv reader, delete the original csv and rename the new one to replace the old file.
In my opinion option #2 is vastly preferable as it avoids the possibility of data loss if your script has an error part way through writing. It also will have lower memory usage.
Finally: It may be possible to open a file as read/write, and iterate line-by-line overwriting as you go: But that will leave you open to half of your file having quotes, and half not if your script crashes part way through.
You could do something like this. Read it in, and write using quoting=csv.QUOTE_NONE
import csv
f = open("bank.csv", 'r')
inputCSV = []
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
inputCSV.append(row)
finally:
f.close()
with open('bank.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=';')
for row in inputCSV:
csvwriter.writerow(row)

CSV reader not reading entire file

I have looked at previous answers to this question, but in each of those scenarios the questioners were asking about something specific they were doing with the file, but the problem occurs for me even when I am not.
I have a .csv file of 27,204 rows. When I open the python interpreter:
python
import csv
o = open('btc_usd1hour.csv','r')
p = csv.reader(o)
for row in p:
print(row)
I then only see roughly the last third of the document displayed to me.
Try so, at me works:
with open(name) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
reference:
https://docs.python.org/3.6/library/csv.html#csv.DictReader
Try the following code
import csv
fname = 'btc_usd1hour.csv'
with open(fname, newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
It is difficult to tell what is the problem without having the sample. I guess the problem would be removed if you add that newline='' for opening the file.
Use the with construct to close the file automatically. Use the f name for a file object when no further explanation is needed. Store the file name to fname to make future modifications easier (and also for easy copying the code fragment for your later programs).
olisch may be right that the console just scrolled so fast you could not see the result. You can write the result to another text file like this:
with open(fname, newline='') as fin,\
open('output.txt', 'w') as fout:
reader = csv.reader(fin)
for row in reader:
fout.write(repr(row) + '\n')
The repr function converts the row list into its string representation. The print calls that function internally, so you will have the same result that you otherwise observe on screen.
maybe your scrollback buffer is just to short to see the whole list?
In general your csv.reader call should be working fine, except your 27k rows aren't extremly long so that you might be able to hit any 64bit boundaries, which would be quite uncommon.
len(o) might be interesting to see.

Merging several csv files and storing the file names as a variable - Python

I am trying to append several csv files into a single csv file using python while adding the file name (or, even better, a sub-string of the file name) as a new variable. All files have headers. The following script does the trick of merging the files, but does not cover the file name as variable issue:
import glob
filenames=glob.glob("/filepath/*.csv")
outputfile=open("out.csv","a")
for line in open(str(filenames[1])):
outputfile.write(line)
for i in range(1,len(filenames)):
f = open(str(filenames[i]))
f.next()
for line in f:
outputfile.write(line)
outputfile.close()
I was wondering if there are any good suggestions. I have about 25k small size csv files (less than 100KB each).
You can use Python's csv module to parse the CSV files for you, and to format the output. Example code (untested):
import csv
with open(output_filename, "wb") as outfile:
writer = None
for input_filename in filenames:
with open(input_filename, "rb") as infile:
reader = csv.DictReader(infile)
if writer is None:
field_names = ["Filename"] + reader.fieldnames
writer = csv.DictWriter(outfile, field_names)
writer.writeheader()
for row in reader:
row["Filename"] = input_filename
writer.writerow(row)
A few notes:
Always use with to open files. This makes sure they will get closed again when you are done with them. Your code doesn't correctly close the input files.
CSV files should be opened in binary mode.
Indices start at 0 in Python. Your code skips the first file, and includes the lines from the second file twice. If you just want to iterate over a list, you don't need to bother with indices in Python. Simply use for x in my_list instead.
Simple changes will achieve what you want:
For the first line
outputfile.write(line) -> outputfile.write(line+',file')
and later
outputfile.write(line+','+filenames[i])

Overwriting a specific row in a csv file using Python's CSV module

I'm using Python's csv module to do some reading and writing of csv files.
I've got the reading fine and appending to the csv fine, but I want to be able to overwrite a specific row in the csv.
For reference, here's my reading and then writing code to append:
#reading
b = open("bottles.csv", "rb")
bottles = csv.reader(b)
bottle_list = []
bottle_list.extend(bottles)
b.close()
#appending
b=open('bottles.csv','a')
writer = csv.writer(b)
writer.writerow([bottle,emptyButtonCount,100, img])
b.close()
And I'm using basically the same for the overwrite mode(which isn't correct, it just overwrites the whole csv file):
b=open('bottles.csv','wb')
writer = csv.writer(b)
writer.writerow([bottle,btlnum,100,img])
b.close()
In the second case, how do I tell Python I need a specific row overwritten? I've scoured Gogle and other stackoverflow posts to no avail. I assume my limited programming knowledge is to blame rather than Google.
I will add to Steven Answer :
import csv
bottle_list = []
# Read all data from the csv file.
with open('a.csv', 'rb') as b:
bottles = csv.reader(b)
bottle_list.extend(bottles)
# data to override in the format {line_num_to_override:data_to_write}.
line_to_override = {1:['e', 'c', 'd'] }
# Write data to the csv file and replace the lines in the line_to_override dict.
with open('a.csv', 'wb') as b:
writer = csv.writer(b)
for line, row in enumerate(bottle_list):
data = line_to_override.get(line, row)
writer.writerow(data)
You cannot overwrite a single row in the CSV file. You'll have to write all the rows you want to a new file and then rename it back to the original file name.
Your pattern of usage may fit a database better than a CSV file. Look into the sqlite3 module for a lightweight database.

Categories