Convert csv file to pipe delimited file in Python - python

I want to convert a comma-delimited CSV file to a pipe-delimited file with Python:
This is how I am reading my csv file:
with open('C://Path//InputFile.csv') as fOpen:
reader = csv.DictReader(fOpen)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
c = csv.writer(open("C://Path//OutputFile.txt","wb"), delimiter="|")
How would I write it as pipe delimited file?

Adapting martineau's answer to fix newline issues in Python 3.
import csv
with open('C:/Path/InputFile.csv') as fin:
# newline='' prevents extra newlines when using Python 3 on Windows
# https://stackoverflow.com/a/3348664/3357935
with open('C:/Path/OutputFile.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)

This does what I think you want:
import csv
with open('C:/Path/InputFile.csv', 'rb') as fin, \
open('C:/Path/OutputFile.txt', 'wb') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)

I found a quick way to change the comma delimiter to a pipe with pandas. When I converted my dataframe to a csv using "|" as delimiter:
df.to_csv(fileName, sep="|")
I don't have much experience with the csv module so if these solutions aren't interchangeable then someone might have to chime in. But this worked surprisingly well for me.

You can use pandas to achieve the conversion of csv to pipe-delimited (or desired delimited) file.
import pandas as pd
df = pd.read_csv(r'C:\Users\gupta\Documents\inputfile.csv') #read inputfile in a dataframe
df.to_csv(r'C:\Users\gupta\Desktop\outputfile.txt', sep = '|', index=False) #write dataframe df to the outputfile with pipe delimited

https://docs.python.org/2/library/csv.html for Python 2.x
https://docs.python.org/3.3/library/csv.html for Python 3.x
These pages explain how to use csv.writer.
Without testing it, your code looks syntacticly valid.
All you need to do is add some c.writerow('data','here') to write your data.

Related

Writing the output to a different csv file column

I have some vocabulary and their counterparts to create an Anki deck. I need the program to write the output of my code in two columns of a csv file; first for the vocabulary and second for the meaning. I've tried two codes but neither of them worked. How can I solve this problem?
Notebook content(vocab):
obligatory,義務的
sole,単独,唯一
defined,一定
obey,従う
...
First try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.reader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w", newline="") as new_file:
csv_writer = csv.writer(new_file, delimiter=",")
for line in csv_reader:
csv_writer.writerow(line)
Second try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.DictReader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w",) as f:
field_names = ["Vocabulary", "Meaning"]
csv_writer = csv.DictWriter(f, fieldnames=field_names, extrasaction="ignore")
csv_writer.writeheader()
for line in csv_reader:
csv_writer.writerow(line)
Result of the first try:
https://cdn.discordapp.com/attachments/696432733882155138/746404430123106374/unknown.png
#Second try was not even close
Expected result:
https://cdn.discordapp.com/attachments/734460259560849542/746432094825087086/unknown.png
Like Kevin said, Excel uses ";" as delimiter and your csv code creates a csv file with comma(,) delimiter. That's why it's shown with commas in your Csv Reader. You can pass ";" as delimiter if you want Excel to read your file correctly. Or you can create a csv file with your own Csv Reader and read it with notepad if you want to see which delimiter it uses.
Your first try works, it's the app you're using for importing that is not recognizing the , as the delimiter. I'm not sure where you're importing this to, but at least in Google Sheets you can choose what the delimiter is, even after the fact.

Replace comma with pipe delimiter within the same file using python

The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.
with open(dst1,encoding='utf-8',errors='ignore') as input_file:
with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):
import os.path
import tempfile
with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
dir=os.path.dirname(dst1), delete=False) as tf:
try:
reader = csv.DictReader(input_file)
writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
else:
# else is optional, if you want to be extra careful that all
# data is synced to disk to reduce risk that metadata updates
# before data synced to disk:
tf.flush()
os.fsync(tf.fileno())
# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
os.replace(tf.name, dst1)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:
with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
EDIT: Please read #ShadowRanger's comment for limitations of this approach.
I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -
import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')
Hope this helps!!

How to parse CSV files in Python?

I have downloaded the billing reports from AWS, which are in CSV format, onto my server.
Now, I have to parse those CSV files in Python so that it shows consolidated/individual cost information on day/week/monthly basis.
Can anyone please help me with this?
import csv
with open('588399947422-aws-billing-detailed-line-items-2015-09.c‌​sv') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
for row in readCSV :
print row
CSV headers
"InvoiceID","PayerAccountId","LinkedAccountId","RecordType",‌​"RecordId","ProductN‌​ame","RateId","Subsc‌​riptionId","PricingP‌​lanId","UsageType","‌​Operation","Availabi‌​lityZone","ReservedI‌​nstance","ItemDescri‌​ption","UsageStartDa‌​te","UsageEndDate","‌​UsageQuantity","Blen‌​dedRate","BlendedCos‌​t","UnBlendedRate","‌​UnBlendedCost","Reso‌​urceId","user:Applic‌​ation Name","user:Business Unit"
Use built-in csv module.
From docs:
>>> import csv
>>> with open(path_to_your_file, 'rb') as csvfile:
... reader = csv.reader(csvfile, delimiter=',', quotechar='|')
... for row in reader: # iterate over reader line by line (line is a list of values in this case)
... print row # list of values
First, you have to open csv, the best option is to use with open(filename,'rb') as f:.
Then, instantiate reader - you have to specify delimiter (comma in most cases) and quotechar (quotes if there are some).
Then you can iterate over reader line by line.

Inline CSV File Editing with Python

Can I modify a CSV file inline using Python's CSV library, or similar technique?
Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:
with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
writer = csv.writer(csvOutput, delimiter=',', quotechar='"')
with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[0] = row[0].title()
writer.writerow(row)
The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.
I've tried the follow, but this appends the new records to the end of the file instead of replacing them.
with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.
The canonical method is to write to a new, temporary file and move that into place over the old file you read from.
from tempfile import NamedTemporaryFile
import shutil
import csv
filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)
with open(filename, 'r', newline='') as csvFile, tempfile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(tempfile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
shutil.move(tempfile.name, filename)
I've made use of the tempfile and shutil libraries here to make the task easier.
There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.
As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.
If you just want to modify a csv file inline by using Python, you may just employ pandas:
import pandas as pd
df = pd.read_csv('yourfilename.csv')
# modify the "name" in row 1 as "Lebron James"
df.loc[1, 'name'] = "Lebron James"
# save the file using the same name
df.to_csv("yourfilename.csv")

How to add a header to a csv file in Python?

I've tried many solutions to add a header to my csv file, but nothing's working properly. Here they are :
I used the writerow method, but my data are overwriting the first row.
I used the DictWriter method, but I don't know how to fill it correctly. Here is my code:
csv = csv.DictWriter(open(directory +'/csv.csv', 'wt'), fieldnames = ["stuff1", "stuff2", "stuff3"], delimiter = ';')
csv.writeheader(["stuff1", "stuff2", "stuff3"])
I got a "2 arguments instead of one" error and I really don't know why.
Any advice?
All you need to do is call DictWriter.writeheader() without arguments:
with open(os.path.join(directory, 'csv.csv'), 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames = ["stuff1", "stuff2", "stuff3"], delimiter = ';')
writer.writeheader()
You already told DictWriter() what your headers are.
I encountered a similar problem when writing the CSV file.
I had to read the csv file and modify some of the fields in it.
To write the header in the CSV file, i used the following code:
reader = csv.DictReader(open(infile))
headers = reader.fieldnames
with open('ChartData.csv', 'wb') as outcsv:
writer1 = csv.writer(outcsv)
writer1.writerow(headers)
and when you write the data rows, you can use a DictWriter in the following way
writer = csv.DictWriter(open("ChartData.csv", 'a' ), headers)
In the above code "a" stands for appending.
In conclusion - > use a to append data to csv after you have written your header to the same file

Categories