Delimiter change in csv using python - python

I have a .csv file of around 30000 rows. The default delimiter implemented is a semicolon. I created a small script with python that would convert that delimiter to comma and save it in the same file. The script runs without any errors but does nothingnat the end. The delimiter is still a semicolon. The .txt file is created but it does not write back on the main file. The code I am using is as follows:
import csv
from pathlib import Path
import os
cwd = os.getcwd() # Get the current working directory (cwd)
files = os.listdir(cwd) # Get all the files in that directory
print("Files in %r: %s" % (cwd, files))
with open('RadGridExport.csv', mode='r', encoding='utf-8') as infile:
reader = csv.reader(infile, dialect="excel")
with open('temp.txt', mode='w', encoding='utf-8') as outfile:
writer = csv.writer(outfile, delimiter=',')
writer.writerows(reader)

You have missed the delimiter while reading. By default it looks for comma, since it is not the case you have to specify the delimiter:
reader = csv.reader(infile, dialect="excel",delimiter=";")
And you need not mention comma as delimiter while writing since it is the default.
Or the easiest way is to use pandas package:
import pandas as pd
df=pd.read_csv(infile,sep=';')
df.to_csv(infile,index=False)

Related

Modifying multiple .csv files from same directory in python

I need to modify multiple .csv files in my directory. Is it possible to do it with a simple script?
My .csv columns are in this order:
X_center,Y_center,X_Area,Y_Area,Classification
I would like to change them to this order:
Classification,X_center,Y_center,X_Area,Y_Area
So far I managed to write:
import os
import csv
for file in os.listdir("."):
if file.endswith(".csv"):
with open('*.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
fieldnames = ['Classification','X_center','Y_center','X_Area','Y_Area']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in csv.DictReader(infile):
writer.writerow(row)
csv_file.close()
But it changes every row to Classification,X_center,Y_center,X_Area,Y_Area (replaces values in every row).
Is it possible to open a file, re-order the columns and save the file under the same name?
I checked similar solutions that were given on other threads but no luck.
Thanks for the help!
First off, I think your problem lay in opening '*.csv' in the loop instead of opening file. Also though, I would recommend never overwriting your original input files. It's much safer to write copies to a new directory. Here's a modified version of your script which does that.
import os
import csv
import argparse
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True)
ap.add_argument("-o", "--output", required=True)
args = vars(ap.parse_args())
if os.path.exists(args["output"]) and os.path.isdir(args["output"]):
print("Writing to {}".format(args["output"]))
else:
print("Cannot write to directory {}".format(args["output"]))
exit()
for file in os.listdir(args["input"]):
if file.endswith(".csv"):
print("{} ...".format(file))
with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
fieldnames = ['Classification','X_center','Y_center','X_Area','Y_Area']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in csv.DictReader(infile):
writer.writerow(row)
outfile.close()
To use it, create a new directory for your outputs and then run like so:
python this.py -i input_dir -o output_dir
Note:
From your question you seemed to want each file to be modified in place so this does basically that (outputs a file of the same name, just in a different directory) but leaves your inputs unharmed. If you actually wanted all the files reordered into a single file as your code open('reordered.csv', 'a') implies, you could easily do that by moving the output initialization code so it is executed before entering the loop.
Using pandas & pathlib.
from pathlib import Path # available in python 3.4 +
import pandas as pd
dir = r'c:\path\to\csvs' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.
cols = ['Classification','X_center','Y_center','X_Area','Y_Area']
for csv in csv_files: #iterate list
df = pd.read_csv(csv) #read csv
df[cols].to_csv(csv.name,index=False)
print(f'{csv.name} saved.')
naturally, if there a csv without those columns then this code will fail, you can add a try/except if that's the case.

Replace comma with pipe delimiter within the same file using python

The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.
with open(dst1,encoding='utf-8',errors='ignore') as input_file:
with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):
import os.path
import tempfile
with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
dir=os.path.dirname(dst1), delete=False) as tf:
try:
reader = csv.DictReader(input_file)
writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
else:
# else is optional, if you want to be extra careful that all
# data is synced to disk to reduce risk that metadata updates
# before data synced to disk:
tf.flush()
os.fsync(tf.fileno())
# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
os.replace(tf.name, dst1)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:
with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
EDIT: Please read #ShadowRanger's comment for limitations of this approach.
I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -
import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')
Hope this helps!!

Combining multiple csv files into one csv file

I am trying to combine multiple csv files into one, and have tried a number of methods but I am struggling.
I import the data from multiple csv files, and when I compile them together into one csv file, it seems that the first few rows get filled out nicely, but then it starts randomly inputting spaces of variable number in between the rows, and it never finishes filling out the combined csv file, it just seems to continuously get information added to it, which does not make sense to me because I am trying to compile a finite amount of data.
I have already tried writing close statements for the file, and I still get the same result, my designated combined csv file never stops getting data, and it will randomly space the data throughout the file - I just want a normally compiled csv.
Is there an error in my code? Is there any explanation as to why my csv file is behaving this way?
csv_file_list = glob.glob(Dir + '/*.csv') #returns the file list
print (csv_file_list)
with open(Avg_Dir + '.csv','w') as f:
wf = csv.writer(f, delimiter = ',')
print (f)
for files in csv_file_list:
rd = csv.reader(open(files,'r'),delimiter = ',')
for row in rd:
print (row)
wf.writerow(row)
Your code works for me.
Alternatively, you can merge files as follows:
csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
for file in csv_file_list:
with open(file) as rf:
for line in rf:
if line.strip(): # if line is not empty
if not line.endswith("\n"):
line+="\n"
wf.write(line)
Or, if the files are not too large, you can read each file at once. But in this case all empty lines an headers will be copied:
csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
for file in csv_file_list:
with open(file) as rf:
wf.write(rf.read().strip()+"\n")
Consider several adjustments:
Use context manager, with, for both the read and write process. This avoids the need to close() file objects which you do not do on the read objects.
For skipping lines issue: use either the argument newline='' in open() or lineterminator="\n" argument in csv.writer(). See SO answers for former and latter.
Use os.path.join() to properly concatenate folder and file paths. This method is os-agnostic so accounts for Windows or Unix machines using forward or backslashes types.
Adjusted script:
import os
import csv, glob
Dir = r"C:\Path\To\Source"
Avg_Dir = r"C:\Path\To\Destination\Output"
csv_file_list = glob.glob(os.path.join(Dir, '*.csv')) # returns the file list
print (csv_file_list)
with open(os.path.join(Avg_Dir, 'Output.csv'), 'w', newline='') as f:
wf = csv.writer(f, lineterminator='\n')
for files in csv_file_list:
with open(files, 'r') as r:
next(r) # SKIP HEADERS
rr = csv.reader(r)
for row in rr:
wf.writerow(row)

Merging several csv files and storing the file names as a variable - Python

I am trying to append several csv files into a single csv file using python while adding the file name (or, even better, a sub-string of the file name) as a new variable. All files have headers. The following script does the trick of merging the files, but does not cover the file name as variable issue:
import glob
filenames=glob.glob("/filepath/*.csv")
outputfile=open("out.csv","a")
for line in open(str(filenames[1])):
outputfile.write(line)
for i in range(1,len(filenames)):
f = open(str(filenames[i]))
f.next()
for line in f:
outputfile.write(line)
outputfile.close()
I was wondering if there are any good suggestions. I have about 25k small size csv files (less than 100KB each).
You can use Python's csv module to parse the CSV files for you, and to format the output. Example code (untested):
import csv
with open(output_filename, "wb") as outfile:
writer = None
for input_filename in filenames:
with open(input_filename, "rb") as infile:
reader = csv.DictReader(infile)
if writer is None:
field_names = ["Filename"] + reader.fieldnames
writer = csv.DictWriter(outfile, field_names)
writer.writeheader()
for row in reader:
row["Filename"] = input_filename
writer.writerow(row)
A few notes:
Always use with to open files. This makes sure they will get closed again when you are done with them. Your code doesn't correctly close the input files.
CSV files should be opened in binary mode.
Indices start at 0 in Python. Your code skips the first file, and includes the lines from the second file twice. If you just want to iterate over a list, you don't need to bother with indices in Python. Simply use for x in my_list instead.
Simple changes will achieve what you want:
For the first line
outputfile.write(line) -> outputfile.write(line+',file')
and later
outputfile.write(line+','+filenames[i])

Convert csv file to pipe delimited file in Python

I want to convert a comma-delimited CSV file to a pipe-delimited file with Python:
This is how I am reading my csv file:
with open('C://Path//InputFile.csv') as fOpen:
reader = csv.DictReader(fOpen)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
c = csv.writer(open("C://Path//OutputFile.txt","wb"), delimiter="|")
How would I write it as pipe delimited file?
Adapting martineau's answer to fix newline issues in Python 3.
import csv
with open('C:/Path/InputFile.csv') as fin:
# newline='' prevents extra newlines when using Python 3 on Windows
# https://stackoverflow.com/a/3348664/3357935
with open('C:/Path/OutputFile.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
This does what I think you want:
import csv
with open('C:/Path/InputFile.csv', 'rb') as fin, \
open('C:/Path/OutputFile.txt', 'wb') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
I found a quick way to change the comma delimiter to a pipe with pandas. When I converted my dataframe to a csv using "|" as delimiter:
df.to_csv(fileName, sep="|")
I don't have much experience with the csv module so if these solutions aren't interchangeable then someone might have to chime in. But this worked surprisingly well for me.
You can use pandas to achieve the conversion of csv to pipe-delimited (or desired delimited) file.
import pandas as pd
df = pd.read_csv(r'C:\Users\gupta\Documents\inputfile.csv') #read inputfile in a dataframe
df.to_csv(r'C:\Users\gupta\Desktop\outputfile.txt', sep = '|', index=False) #write dataframe df to the outputfile with pipe delimited
https://docs.python.org/2/library/csv.html for Python 2.x
https://docs.python.org/3.3/library/csv.html for Python 3.x
These pages explain how to use csv.writer.
Without testing it, your code looks syntacticly valid.
All you need to do is add some c.writerow('data','here') to write your data.

Categories