Can I modify a CSV file inline using Python's CSV library, or similar technique?
Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:
with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
writer = csv.writer(csvOutput, delimiter=',', quotechar='"')
with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[0] = row[0].title()
writer.writerow(row)
The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.
I've tried the follow, but this appends the new records to the end of the file instead of replacing them.
with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.
The canonical method is to write to a new, temporary file and move that into place over the old file you read from.
from tempfile import NamedTemporaryFile
import shutil
import csv
filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)
with open(filename, 'r', newline='') as csvFile, tempfile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(tempfile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
shutil.move(tempfile.name, filename)
I've made use of the tempfile and shutil libraries here to make the task easier.
There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.
As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.
If you just want to modify a csv file inline by using Python, you may just employ pandas:
import pandas as pd
df = pd.read_csv('yourfilename.csv')
# modify the "name" in row 1 as "Lebron James"
df.loc[1, 'name'] = "Lebron James"
# save the file using the same name
df.to_csv("yourfilename.csv")
Related
I have some vocabulary and their counterparts to create an Anki deck. I need the program to write the output of my code in two columns of a csv file; first for the vocabulary and second for the meaning. I've tried two codes but neither of them worked. How can I solve this problem?
Notebook content(vocab):
obligatory,義務的
sole,単独,唯一
defined,一定
obey,従う
...
First try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.reader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w", newline="") as new_file:
csv_writer = csv.writer(new_file, delimiter=",")
for line in csv_reader:
csv_writer.writerow(line)
Second try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.DictReader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w",) as f:
field_names = ["Vocabulary", "Meaning"]
csv_writer = csv.DictWriter(f, fieldnames=field_names, extrasaction="ignore")
csv_writer.writeheader()
for line in csv_reader:
csv_writer.writerow(line)
Result of the first try:
https://cdn.discordapp.com/attachments/696432733882155138/746404430123106374/unknown.png
#Second try was not even close
Expected result:
https://cdn.discordapp.com/attachments/734460259560849542/746432094825087086/unknown.png
Like Kevin said, Excel uses ";" as delimiter and your csv code creates a csv file with comma(,) delimiter. That's why it's shown with commas in your Csv Reader. You can pass ";" as delimiter if you want Excel to read your file correctly. Or you can create a csv file with your own Csv Reader and read it with notepad if you want to see which delimiter it uses.
Your first try works, it's the app you're using for importing that is not recognizing the , as the delimiter. I'm not sure where you're importing this to, but at least in Google Sheets you can choose what the delimiter is, even after the fact.
The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.
with open(dst1,encoding='utf-8',errors='ignore') as input_file:
with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):
import os.path
import tempfile
with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
dir=os.path.dirname(dst1), delete=False) as tf:
try:
reader = csv.DictReader(input_file)
writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
else:
# else is optional, if you want to be extra careful that all
# data is synced to disk to reduce risk that metadata updates
# before data synced to disk:
tf.flush()
os.fsync(tf.fileno())
# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
os.replace(tf.name, dst1)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:
with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
EDIT: Please read #ShadowRanger's comment for limitations of this approach.
I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -
import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')
Hope this helps!!
I have downloaded the billing reports from AWS, which are in CSV format, onto my server.
Now, I have to parse those CSV files in Python so that it shows consolidated/individual cost information on day/week/monthly basis.
Can anyone please help me with this?
import csv
with open('588399947422-aws-billing-detailed-line-items-2015-09.csv') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
for row in readCSV :
print row
CSV headers
"InvoiceID","PayerAccountId","LinkedAccountId","RecordType","RecordId","ProductName","RateId","SubscriptionId","PricingPlanId","UsageType","Operation","AvailabilityZone","ReservedInstance","ItemDescription","UsageStartDate","UsageEndDate","UsageQuantity","BlendedRate","BlendedCost","UnBlendedRate","UnBlendedCost","ResourceId","user:Application Name","user:Business Unit"
Use built-in csv module.
From docs:
>>> import csv
>>> with open(path_to_your_file, 'rb') as csvfile:
... reader = csv.reader(csvfile, delimiter=',', quotechar='|')
... for row in reader: # iterate over reader line by line (line is a list of values in this case)
... print row # list of values
First, you have to open csv, the best option is to use with open(filename,'rb') as f:.
Then, instantiate reader - you have to specify delimiter (comma in most cases) and quotechar (quotes if there are some).
Then you can iterate over reader line by line.
This:
import csv
with open('original.csv', 'rb') as inp, open('new.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[2] != "0":
writer.writerow(row)
os.remove('original.csv')
os.rename('new.csv', 'original.csv')
allows to delete certain rows of a CSV.
Is there a more pythonic way to delete some rows of a CSV file, in-place? (instead of creating a file, deleting the original, renaming, etc.)
There isn't a more Pythonic way: you can't delete stuff in the middle of a file. Write out a new file with the stuff you want, and then rename it.
I noticed that your code does not import the os module, even though you're using it. Regardless, here's a method of doing what you need it to do without using that module.
This will open in read mode first to get the data, then write mode to overwrite. Note that you need to pass the csv.reader(f) statement to the list() function or else the data variable will simply point to the memory address of the CSV file and you won't be able to do anything with the content once it's closed. list() will actually copy the information for you.
import csv
with open("original.csv", "rb") as f:
data = list(csv.reader(f))
with open("original.csv", "wb") as f:
writer = csv.writer(f)
for row in data:
if row[2] != "0":
writer.writerow(row)
I want to convert a comma-delimited CSV file to a pipe-delimited file with Python:
This is how I am reading my csv file:
with open('C://Path//InputFile.csv') as fOpen:
reader = csv.DictReader(fOpen)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
c = csv.writer(open("C://Path//OutputFile.txt","wb"), delimiter="|")
How would I write it as pipe delimited file?
Adapting martineau's answer to fix newline issues in Python 3.
import csv
with open('C:/Path/InputFile.csv') as fin:
# newline='' prevents extra newlines when using Python 3 on Windows
# https://stackoverflow.com/a/3348664/3357935
with open('C:/Path/OutputFile.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
This does what I think you want:
import csv
with open('C:/Path/InputFile.csv', 'rb') as fin, \
open('C:/Path/OutputFile.txt', 'wb') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
I found a quick way to change the comma delimiter to a pipe with pandas. When I converted my dataframe to a csv using "|" as delimiter:
df.to_csv(fileName, sep="|")
I don't have much experience with the csv module so if these solutions aren't interchangeable then someone might have to chime in. But this worked surprisingly well for me.
You can use pandas to achieve the conversion of csv to pipe-delimited (or desired delimited) file.
import pandas as pd
df = pd.read_csv(r'C:\Users\gupta\Documents\inputfile.csv') #read inputfile in a dataframe
df.to_csv(r'C:\Users\gupta\Desktop\outputfile.txt', sep = '|', index=False) #write dataframe df to the outputfile with pipe delimited
https://docs.python.org/2/library/csv.html for Python 2.x
https://docs.python.org/3.3/library/csv.html for Python 3.x
These pages explain how to use csv.writer.
Without testing it, your code looks syntacticly valid.
All you need to do is add some c.writerow('data','here') to write your data.