Replace comma with pipe delimiter within the same file using python - python

The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.
with open(dst1,encoding='utf-8',errors='ignore') as input_file:
with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)

The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):
import os.path
import tempfile
with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
dir=os.path.dirname(dst1), delete=False) as tf:
try:
reader = csv.DictReader(input_file)
writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
else:
# else is optional, if you want to be extra careful that all
# data is synced to disk to reduce risk that metadata updates
# before data synced to disk:
tf.flush()
os.fsync(tf.fileno())
# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
os.replace(tf.name, dst1)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise

Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:
with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
EDIT: Please read #ShadowRanger's comment for limitations of this approach.

I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -
import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')
Hope this helps!!

Related

Writing the output to a different csv file column

I have some vocabulary and their counterparts to create an Anki deck. I need the program to write the output of my code in two columns of a csv file; first for the vocabulary and second for the meaning. I've tried two codes but neither of them worked. How can I solve this problem?
Notebook content(vocab):
obligatory,義務的
sole,単独,唯一
defined,一定
obey,従う
...
First try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.reader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w", newline="") as new_file:
csv_writer = csv.writer(new_file, delimiter=",")
for line in csv_reader:
csv_writer.writerow(line)
Second try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.DictReader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w",) as f:
field_names = ["Vocabulary", "Meaning"]
csv_writer = csv.DictWriter(f, fieldnames=field_names, extrasaction="ignore")
csv_writer.writeheader()
for line in csv_reader:
csv_writer.writerow(line)
Result of the first try:
https://cdn.discordapp.com/attachments/696432733882155138/746404430123106374/unknown.png
#Second try was not even close
Expected result:
https://cdn.discordapp.com/attachments/734460259560849542/746432094825087086/unknown.png
Like Kevin said, Excel uses ";" as delimiter and your csv code creates a csv file with comma(,) delimiter. That's why it's shown with commas in your Csv Reader. You can pass ";" as delimiter if you want Excel to read your file correctly. Or you can create a csv file with your own Csv Reader and read it with notepad if you want to see which delimiter it uses.
Your first try works, it's the app you're using for importing that is not recognizing the , as the delimiter. I'm not sure where you're importing this to, but at least in Google Sheets you can choose what the delimiter is, even after the fact.

Reading and Writing into CSV file at the same time

I wanted to read some input from the csv file and then modify the input and replace it with the new value. For this purpose, I first read the value but then I'm stuck at this point as I want to modify all the values present in the file.
So is it possible to open the file in r mode in one for loop and then immediately in w mode in another loop to enter the modified data?
If there is a simpler way to do this please help me out
Thank you.
Yes, you can open the same file in different modes in the same program. Just be sure not to do it at the same time. For example, this is perfectly valid:
with open("data.csv") as f:
# read data into a data structure (list, dictionary, etc.)
# process lines here if you can do it line by line
# process data here as needed (replacing your values etc.)
# now open the same filename again for writing
# the main thing is that the file has been previously closed
# (after the previous `with` block finishes, python will auto close the file)
with open("data.csv", "w") as f:
# write to f here
As others have pointed out in the comments, reading and writing on the same file handle at the same time is generally a bad idea and won't work as you expect (unless for some very specific use case).
You can do open("data.csv", "rw"), this allows you to read and write at the same time.
Just like others have mentioned, modifying the same file as both input and output without any backup method is such a terrible idea, especially in a condensed file like most .csv files, which is normally more complicated than a single .Txt based file, but if you insisted you can do with the following:
import csv
file path = 'some.csv'
with open('some.csv', 'rw', newline='') as csvfile:
read_file = csv.reader(csvfile)
write_file = csv.writer(csvfile)
Note that code above will trigger an error with a message ValueError: must have exactly one of create/read/write/append mode.
For safety, I preferred to split it into two different files
import csv
in_path = 'some.csv'
out_path = 'Out.csv'
with open(in_path, 'r', newline='') as inputFile, open(out_path, 'w', newline='') as writerFile:
read_file = csv.reader(inputFile)
write_file = csv.writer(writerFile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in read_file:
# your modifying input data code here
........

How do I add to a variable in separate file using python?

My situation is, I have csv file and here is its code.
user_file = Path(str(message.author.id) + '.cvs')
if user_file.exists():
with open('test.csv', 'a') as fp:
writer = csv.writer(fp, delimiter=',')
writer.writerows(data)
else:
with open(user_file, 'w') as fp:
data = [('xp', 0)]
writer = csv.writer(fp, delimiter=',')
writer.writerows(data)
I'm wanting a csv file that keeps track of how many times they type a message so i need a way of editing the csv file and adding 1 to what it already has. But i have no idea how to do that! please help!<3
test.csv:
4
Python:
# Replace test.csv with the file you wish to open. Keep "w+"
with open("test.csv", "w+") as dat:
# Assumes the text in the file is an int
n = int(dat.read())
dat.write(str(n+1))
Result in test.csv:
5
This way it opens the file as write and read, reads the number, then writes it back as a string. Note that write() will override any text in the current file, so you don't need to remove the text
P.S if that if else statement is to check the file actually exists, it's unnecessary. If you try to open() a file which doesn't exist, python will create it for you.

Convert csv file to pipe delimited file in Python

I want to convert a comma-delimited CSV file to a pipe-delimited file with Python:
This is how I am reading my csv file:
with open('C://Path//InputFile.csv') as fOpen:
reader = csv.DictReader(fOpen)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
c = csv.writer(open("C://Path//OutputFile.txt","wb"), delimiter="|")
How would I write it as pipe delimited file?
Adapting martineau's answer to fix newline issues in Python 3.
import csv
with open('C:/Path/InputFile.csv') as fin:
# newline='' prevents extra newlines when using Python 3 on Windows
# https://stackoverflow.com/a/3348664/3357935
with open('C:/Path/OutputFile.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
This does what I think you want:
import csv
with open('C:/Path/InputFile.csv', 'rb') as fin, \
open('C:/Path/OutputFile.txt', 'wb') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
I found a quick way to change the comma delimiter to a pipe with pandas. When I converted my dataframe to a csv using "|" as delimiter:
df.to_csv(fileName, sep="|")
I don't have much experience with the csv module so if these solutions aren't interchangeable then someone might have to chime in. But this worked surprisingly well for me.
You can use pandas to achieve the conversion of csv to pipe-delimited (or desired delimited) file.
import pandas as pd
df = pd.read_csv(r'C:\Users\gupta\Documents\inputfile.csv') #read inputfile in a dataframe
df.to_csv(r'C:\Users\gupta\Desktop\outputfile.txt', sep = '|', index=False) #write dataframe df to the outputfile with pipe delimited
https://docs.python.org/2/library/csv.html for Python 2.x
https://docs.python.org/3.3/library/csv.html for Python 3.x
These pages explain how to use csv.writer.
Without testing it, your code looks syntacticly valid.
All you need to do is add some c.writerow('data','here') to write your data.

Inline CSV File Editing with Python

Can I modify a CSV file inline using Python's CSV library, or similar technique?
Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:
with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
writer = csv.writer(csvOutput, delimiter=',', quotechar='"')
with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[0] = row[0].title()
writer.writerow(row)
The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.
I've tried the follow, but this appends the new records to the end of the file instead of replacing them.
with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.
The canonical method is to write to a new, temporary file and move that into place over the old file you read from.
from tempfile import NamedTemporaryFile
import shutil
import csv
filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)
with open(filename, 'r', newline='') as csvFile, tempfile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(tempfile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
shutil.move(tempfile.name, filename)
I've made use of the tempfile and shutil libraries here to make the task easier.
There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.
As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.
If you just want to modify a csv file inline by using Python, you may just employ pandas:
import pandas as pd
df = pd.read_csv('yourfilename.csv')
# modify the "name" in row 1 as "Lebron James"
df.loc[1, 'name'] = "Lebron James"
# save the file using the same name
df.to_csv("yourfilename.csv")

Categories