Add file name as last column of CSV file

Add file name as last column of CSV file - python

I have a Python script which modifies a CSV file to add the filename as the last column:
import sys
import glob
for filename in glob.glob(sys.argv[1]):
file = open(filename)
data = [line.rstrip() + "," + filename for line in file]
file.close()
file = open(filename, "w")
file.write("\n".join(data))
file.close()
Unfortunately, it also adds the filename to the header (first) row of the file. I would like the string "ID" added to the header instead. Can anybody suggest how I could do this?

Have a look at the official csv module.

Here are a few minor notes on your current code:
It's a bad idea to use file as a variable name, since that shadows the built-in type.
You can close the file objects automatically by using the with syntax.
Don't you want to add an extra column in the header line, called something like Filename, rather than just omitting a column in the first row?
If your filenames have commas (or, less probably, newlines) in them, you'll need to make sure that the filename is quoted - just appending it won't do.
That last consideration would incline me to use the csv module instead, which will deal with the quoting and unquoting for you. For example, you could try something like the following code:
import glob
import csv
import sys
for filename in glob.glob(sys.argv[1]):
data = []
with open(filename) as finput:
for i, row in enumerate(csv.reader(finput)):
to_append = "Filename" if i == 0 else filename
data.append(row+[to_append])
with open(filename,'wb') as foutput:
writer = csv.writer(foutput)
for row in data:
writer.writerow(row)
That may quote the data slightly differently from your input file, so you might want to play with the quoting options for csv.reader and csv.writer described in the documentation for the csv module.
As a further point, you might have good reasons for taking a glob as a parameter rather than just the files on the command line, but it's a bit surprising - you'll have to call your script as ./whatever.py '*.csv' rather than just ./whatever.py *.csv. Instead, you could just do:
for filename in sys.argv[1:]:
... and let the shell expand your glob before the script knows anything about it.
One last thing - the current approach you're taking is slightly dangerous, in that if anything fails when writing back to the same filename, you'll lose data. The standard way of avoiding this is to instead write to a temporary file, and, if that was successful, rename the temporary file over the original. So, you might rewrite the whole thing as:
import csv
import sys
import tempfile
import shutil
for filename in sys.argv[1:]:
tmp = tempfile.NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name,'wb') as ftmp:
writer = csv.writer(ftmp)
for i, row in enumerate(csv.reader(finput)):
to_append = "Filename" if i == 0 else filename
writer.writerow(row+[to_append])
shutil.move(tmp.name,filename)

You can try:
data = [file.readline().rstrip() + ",id"]
data += [line.rstrip() + "," + filename for line in file]

You can try changing your code, but using the csv module is recommended. This should give you the result you want:
import sys
import glob
import csv
filename = glob.glob(sys.argv[1])[0]
yourfile = csv.reader(open(filename, 'rw'))
csv_output=[]
for row in yourfile:
if len(csv_output) != 0: # skip the header
row.append(filename)
csv_output.append(row)
yourfile = csv.writer(open(filename,'w'),delimiter=',')
yourfile.writerows(csv_output)

Use the CSV module that comes with Python.
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
You can run this as follows:
blair#blair-eeepc:~$ python csv_add_filename.py file1.csv file2.csv

you can use fileinput to do in place editing
import sys
import glob
import fileinput
for filename in glob.glob(sys.argv[1]):
for line in fileinput.FileInput(filename,inplace=1) :
if fileinput.lineno()==1:
print line.rstrip() + " ID"
else
print line.rstrip() + "," + filename

Related

How to replace the header of all CSV files in a directory?

I have a folder of CSV files, and I need to simple replace the current header (first row), of the csv, with a different header. As an example, ever CSV has: A, B, C, D, E as the first first row header, but I need to be able to change that to whatever I want; i.e., Apple, Orange, Lemon, Pear, Peach || or, || 1, j, er, fd, j5
All the data in each CSV needs to be retained besides the header, and the replacement header will make all headers of all CSVs in the folder identical, per what is indicated in the code.
import shutil
import glob
files = glob.glob("/home/robert/Testing/D1/*.csv")
for i in range(len(files)):
from_file = open(files[i])
to_file = open(files[i], mode="w")
to_file.write("id,t,s,p,date,e")
shutil.copyfileobj(from_file, to_file)
I used this code, however, it deleted all of the other data in the CSV files, which I needed to keep, and only left/created the headers

from glob import glob
from pathlib import Path
def update_folder(folder: Path):
for fname in folder.glob('*.csv'):
with open(fname) as fin:
lines = fin.readlines() # element 0 is A,B,C...
lines[0] = 'Apple,Orange,Lemon\n'
with open(fname, 'w') as fout:
fout.write(''.join(readlines))

I would suggest using the Python's tempfile module to create a temporary file with the changes in it and then, after they're made, it can simply be renamed to replaced the original file. I would also using its csv module to read the original and write the updated version because it fast, debugged, and can handle many varieties of CSV.
Using the combination make the task relatively easy:
import csv
import os
from pathlib import Path
from tempfile import NamedTemporaryFile
CSV_FOLDER = Path('/home/robert/Testing/D1')
NEW_HEADER = 'id,t,s,p,date,e'.split(',')
for filepath in CSV_FOLDER.glob('*.csv'):
with open(filepath, 'r', newline='') as csv_file, \
NamedTemporaryFile('w', newline='', dir=filepath.parent, delete=False) \
as tmp_file:
reader = csv.reader(csv_file)
writer =csv.writer(tmp_file)
next(reader) # Skip header.
writer.writerow(NEW_HEADER) # Replacement.
writer.writerows(reader) # Copy remaining rows of original file.
os.replace(tmp_file.name, filepath) # Replace original file with updated version.
print('CSV files updated')

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?

You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html

file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.

Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()

import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.

See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()

in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)

Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

passing file path as command line parameter in python

I need to write something into files, which I am passing through command line in python. I am using the below code mycode.py
import csv
import sys
path = sys.argv[1]
row = ['4', ' Danny', ' New York']
with open(r"path" , 'w') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(row)
When I execute it, the file is not written, but when I hardcode path as
with open(r"C:\Users\venkat\Desktop\python\sam.csv", 'w') as
csvFile:
the file is being written, Please let me know if I am missing anything.
One more requirement is I have to pass only the directory in open, and append some file name.
For example: I can pass
C:\Users\venkat\Desktop\python, sam.csv
I have to append to the directory in code.

You should use the path variable's value.
Replace
with open(r"path" , 'w') as csvFile:
with
with open(path , 'w') as csvFile:
^^^^
If you want to append one file to a directory path, you could use os package.
file_path = os.path.join(path, file)

Well this worked
import csv
import sys
path = sys.argv[1]
row = ['4', ' Danny', ' New York']
with open(path, 'w') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(row)

If you want to append(or write) an existing file,this worked too using format:
path="insert\\pathOf\\file.txt"
with open("{}".format(path),'a') as file:
file.write("excellent\n")
The 'a' is for append,so it will add the 'excellent' string to file.txt.
If you want to write a new file just put 'w' instead of 'a'.
Using 'w' will overwrite the file.txt if already exists.
The \n is for ending in new line so if you run the same code 2 times it will add 'excellent' in two different lines and not side by side.

You should add curly braces if you want to convert it to raw format
with open(rf"{path}" , 'w') as csvFile:

Merge txt files in a folder and replacing characters in python

I have a doubt about how to do to continue the code, I need to take all files from a folder and merge them in 1 file with another text format.
Example:
The Input files are of text format like this:
"{'nr': '3173391045', 'data': '27/12/2017'}"
"{'nr': '2173391295', 'data': '05/01/2017'}"
"{'nr': '5173351035', 'data': '07/03/2017'}"
The Output files must be lines like this:
"3173391045","27/09/2017"
"2173391295","05/01/2017"
"5173351035","07/03/2017"
This is my working code, it's working for merge and taking out the blank lines
import glob2
import datetime
filenames=glob2.glob("*.txt")
with open(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w') as file:
for filename in filenames:
with open(filename,"r") as f:
file.write(f.read())
I'm trying something with .replace but is not working, I get syntax errors or blank files
filedata = filedata.replace("{", "") for line in filedata

If your input files had contained valid JSON strings, the correct way would have been to parse the lines as JSON and write them back in csv. As strings are enclosed in single quotes (') they are rejected by the json module of the Python library, and my advice is to use a regex to parse them. Code could become:
import glob2
import datetime
import csv
import re
# the regex to parse the line
rx = re.compile(r".*'nr'\s*:\s*'(\d+)'.*'data'\s*:\s*'([/\d]+)'")
filenames=glob2.glob("*.txt")
with open(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w') as file:
wr = csv.writer(file, quoting = csv.QUOTE_ALL)
for filename in filenames:
with open(filename,"r") as f:
for line in f: # process line by line
m = rx.match(line)
wr.writerow(m.groups())

With a few tweaks, the input data can be coerced into a form suitable for JSON parsing:
from datetime import datetime
import json
import glob2
import csv
with open(datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w', newline='') as f_output:
csv_output = csv.writer(f_output, quoting=csv.QUOTE_ALL)
for filename in glob2.glob('*.txt'):
with open(filename) as f_input:
for row in f_input:
row_dict = json.loads(row.strip('"\n').replace("'", '"'))
csv_output.writerow([row_dict['nr'], row_dict['data']])
Giving you:
"3173391045","27/12/2017"
"2173391295","05/01/2017"
"5173351035","07/03/2017"
Note, in Python 3.x the output file should be opened with newline=''. Without this, extra blank lines can appear in the output file.

using regex/replaces to parse those strings is dangerous. You could always stumble on a data containing the delimiter, the comma, etc..
And in this case, even if json cannot read those lines,ast.literal_eval can without any modification whatsoever:
import ast
with open("output.csv",newline="") as fw:
cw = csv.writer(fw)
for filename in filenames:
with open(filename) as f:
for line in f:
d = ast.literal_eval(line)
cw.writerow([d['nr'],d['data'])

Skip header when writing to an open CSV

I am compiling a load of CSVs into one. The first CSV contains the headers, which I am opening in write mode (maincsv). I am then making a list of all the others which live in a different folder and attempting to append them to the main one.
It works, however it just writes over the headings. I just want to start appending from line 2. I'm sure it's pretty simple but all the next(), etc. things I try just throw errors. The headings and data are aligned if that helps.
import os, csv
maincsv = open(r"C:\Data\OSdata\codepo_gb\CodepointUK.csv", 'w', newline='')
maincsvwriter = csv.writer(maincsv)
curdir = os.chdir(r"C:\Data\OSdata\codepo_gb\Data\CSV")
csvlist = os.listdir()
csvfiles = []
for file in csvlist:
path = os.path.abspath(file)
csvfiles.append(path)
for incsv in csvfiles:
opencsv = open(incsv)
csvreader = csv.reader(opencsv)
for row in csvreader:
maincsvwriter.writerow(row)
maincsv.close()

To simplify things I have the code load all the files in the directory the python code is run in. This will get the first line of the first .csv file and use it as the header.
import os
count=0
collection=open('collection.csv', 'a')
files=[f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
if ('.csv' in f):
solecsv=open(f,'r')
if count==0:
# assuming header is 1 line
header=solecsv.readline()
collection.write(header)
for x in solecsv:
if not (header in x):
collection.write(x)
collection.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add file name as last column of CSV file - python

Have a look at the official csv module.

You can try: data = [file.readline().rstrip() + ",id"] data += [line.rstrip() + "," + filename for line in file]

you can use fileinput to do in place editing import sys import glob import fileinput for filename in glob.glob(sys.argv[1]): for line in fileinput.FileInput(filename,inplace=1) : if fileinput.lineno()==1: print line.rstrip() + " ID" else print line.rstrip() + "," + filename

Related

How to replace the header of all CSV files in a directory?

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

passing file path as command line parameter in python

Merge txt files in a folder and replacing characters in python

Skip header when writing to an open CSV

Categories

Resources