Using CSV module to append multiple files while removing appended headers - python

I would like to use the Python CSV module to open a CSV file for appending. Then, from a list of CSV files, I would like to read each csv file and write it to the appended CSV file. My script works great - except that I cannot find a way to remove the headers from all but the first CSV file being read. I am certain that my else block of code is not executing properly. Perhaps my syntax for my if else code is the problem? Any thoughts would be appreciated.
writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
for files in lstFiles:
readFile = open(input_file,'rU')
reader = csv.reader(readFile,dialect='excel')
for i in range(0,len(lstFiles)):
if i == 0:
oldHeader = readFile.readline()
newHeader = writeFile.write(oldHeader)
for row in reader:
writer.writerow(row)
else:
reader.next()
for row in reader:
row = readFile.readlines()
writer.writerow(row)
readFile.close()
writeFile.close()

You're effectively iterating over lstFiles twice. For each file in your list, you're running your inner for loop up from 0. You want something like:
writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
headers_needed = True
for input_file in lstFiles:
readFile = open(input_file,'rU')
reader = csv.reader(readFile,dialect='excel')
oldHeader = reader.next()
if headers_needed:
newHeader = writer.writerow(oldHeader)
headers_needed = False
for row in reader:
writer.writerow(row)
readFile.close()
writeFile.close()
You could also use enumerate over the lstFiles to iterate over tuples containing the iteration count and the filename, but I think the boolean shows the logic more clearly.
You probably do not want to mix iterating over the csv reader and directly calling readline on the underlying file.

I think you're iterating too many times (over various things: both your list of files and the files themselves). You've definitely got some consistency problems; it's a little hard to be sure since we can't see your variable initializations. This is what I think you want:
with open(append_file,'a+b') as writeFile:
need_headers = True
for input_file in lstFiles:
with open(input_file,'rU') as readFile:
headers = readFile.readline()
if need_headers:
# Write the headers only if we need them
writeFile.write(headers)
need_headers = False
# Now write the rest of the input file.
for line in readFile:
writeFile.write(line)
I took out all the csv-specific stuff since there's no reason to use it for this operation. I also cleaned the code up considerably to make it easier to follow, using the files as context managers and a well-named boolean instead of the "magic" i == 0 check. The result is a much nicer block of code that (hopefully) won't have you jumping through hoops to understand what's going on.

Related

how to read from and write to a csv file using csv module

import csv
with open("C:\\Users\\ki386179\\Desktop\\output.csv","r") as f:
reader = csv.reader(f)
for row in reader:
if 'india' in row:
pass
if 'india' not in row:
with open("C:\\Users\\ki386179\\Desktop\\output.csv","w") as f:
writer = csv.writer(f)
writer.writerow('india')
I am trying to achieve something like this , to check for a particular value in a particular column and if not write to it.
You can't write to a file whilst you are in the middle of reading from it. Also, surely you don't want to require every line to contain the string, like your current logic does?
I'm guessing you want to add one line at the end if none of the lines matched:
import csv
seen_string = False
with open("C:\\Users\\ki386179\\Desktop\\output.csv","r") as f:
reader = csv.reader(f)
for row in reader:
if 'india' in row:
seen_string = True
break
if not seen_string:
# Notice append mode; "a" not "w"
with open("C:\\Users\\ki386179\\Desktop\\output.csv","a") as f1:
# Noice this one is f1
writer = csv.writer(f1)
writer.writerow('india')
Maybe notice also that in looks for a substring; if the file contains e.g. amerindian" the code will regard that as a match. Check for string equality with == instead if that's what you want.
You can use r+ mode for both reading and writing. Although it is difficult to understand what you are trying to achieve here, an example code that can do what you're probably trying to achieve is shown below. I put the above code in a script india.py and made it executable (chmod +x india.py)
#!/usr/bin/env python
import sys
from csv import reader, writer
with open(sys.argv[1], "r+") as f:
for row in reader(f):
if "india" in row:
break
else:
writer(f).writerow("india")
A test run
$ cat test_write.csv
us,canada,mexico
france,germany,norway
brazil,argentina,colombia
china,japan,korea
$ ./india.py test_write.csv
$ cat test_write.csv
us,canada,mexico
france,germany,norway
brazil,argentina,colombia
china,japan,korea
i,n,d,i,a

Python - Most efficient to overwrite a specific row in a CSV file

Given the following csv file :
01;blue;brown;black
02;glass;rock;paper
03;pigeon;squirel;shark
My goal is to replace the (unique) line containing '02' in the 1st posisition.
I wrote this piece of code:
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile, open('csvout', 'w', newline='', encoding='utf-8') as out:
reader = csv.reader(csvfile, delimiter=';')
writer = csv.writer(out, delimiter=';')
for row in reader:
if row[0] != '02':
writer.writerow(row)
else:
writer.writerow(['02', 'A', 'B', 'C'])
But re-writing the whole CSV in an other doesn't seem to be the most efficient way to proceed, especially for large files:
Once the match is found, we continue to read till the end.
We have to re-write every line one by one.
Writing a second file isn't very practical nor is storage
efficient.
I wrote a second piece of code who seems to answer to these two problems :
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile:
content = csvfile.readlines()
for index, row in enumerate(content):
row = row.split(';')
if row[2] == 'rock':
tochange = index
break
content.pop(tochange)
content.insert(tochange, '02;A;B;C\n')
content = "".join(content)
csvfile.seek(0)
csvfile.truncate(0) # Erase content
csvfile.write(content)
Do you agree that the second solution is more efficient ?
Do you have any improvement, or better way to proceed ?
EDIT : The number of character in the line can vary.
EDIT 2 : I'm apparently obliged to read and rewrite everything, if I don't want to use padding.
A possible solution would be a database-like solution, I will consider it for the future.
If I had to choose between those 2 solutions, which one would be the best performance-wise ?
As the caracter in the line may vary, I either have to read/write the whole file or; as #tobias_k said, use seek() to come back to the begining of the line and:
If the line is shorter, write just the line and pad with spaces;
If same length, write just the line;
If it's longer re-write that line and the following.
I want to avoid using padding so I used time.perf_counter() to measure exec time of both codes, and the second solution appears to be (almost 2*) faster (CSV of 10 000 lines, match at the 6 000th).
One alternative would be to migrate to a relational database.

Check if CSV file is empty or not after reading with DictReader

I'm opening a CSV file and I need to check if the file empty or not, I already know about checking using getsize(). I would like a way by using DictReader.
This is my code
infocsv = open('nyfile.csv', 'a')
reader = csv.DictReader(infocsv)
with open(parafile, "rb") as paracsv:
#Read in parameter values as a dictionary
paradict = csv.DictReader(paracsv)
has_rows = False
for line in paradict:
has_rows = True
if not has_rows:
return None
The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.
Here is an alternative solution:
import csv
with open('nyfile.csv') as infocsv:
reader = [i for i in csv.DictReader(infocsv)]
if len(reader)>0:
print ('not empty')
else:
print ('empty')
I tried it on a few CSV files of my own and it works. Let me know if this helps.

Can I use a Dictionary to store keywords that I need to loop through a csv file to find?

I am writing a python script that will go through a csv file row by row looking for a keyword. Once that keyword is found I need to write that entire row to a new .csv file. I am having trouble writing the for loop to complete this task and I do not understand how to write to a new .csv file. I will post what I have done so far below.
#!/usr/bin/python
# First import csv module to work on .csv file
import csv
#Lets open the files that will be used to read from and write too
infile = open('infile.csv','rb')
outfile = open('outfile.csv','w')
# Lets pass file object through csv reader method
csv_f_reader = csv.reader(infile)
writer = csv.writer(outfile,delimiter=',',quotechar='',quoting=csv.QUOTE_NONE)
#Lets create a dictionary to hold all search words as unique keys.
# the associated value will be used to keep count of how many successful
# hits the forloop hits.
search_word={'wifi':0,'WIFI':0,'wi-fi':0,'Wi-Fi':0,'Cisco':0,'cisco':0,'NETGEAR':0,'netgear':0,'Netge$
for csv_line in csv_f_reader:
match_found = False
for keyword in search_word.keys():
for csv_element in csv_line:
if keyword in csv_element:
match_found = True
search_word[keyword] +=1
if match_found:
writer.writerow(csv_line)
#Dont forget to close the file
infile.close()
outfile.close()
print search_word.keys(), search_word.values()
You really don't need to use a dictionary for your keywords here. Err... wait, oh you want to keep track of how many times you see each keyword. Your description didn't say that.
Anyway, you should be looping through the lines in the file and the keys. The loop should probably look like this:
for line in csv_f_reader:
for keyword in search_word.keys():
if keyword in line:
search_word[keyword] += 1
writer.write(line)
infile.close()
outfile.close()
I haven't double checked that you're using the csv module correctly, but that should give you an idea of what it should look like.
You don't need a dictionary for what you're describing (unless you're trying to count up the keyword instances). search_word.keys() gives you a list anyway which is OK.
First you want to iterate through the csv like this:
infile = open('infile.csv')
csv_f_reader = csv.reader(infile)
for csv_line in csv_f_reader:
print csv_line
If you try that, you'll see that each line gives you a list of all the elements. You can use your list of keywords to compare each one and write the ones that pass
for csv_line in csv_f_reader:
for k in search_word.keys():
if k in csv_line:
writer.writerow(csv_line)
In your case, the keywords aren't exactly the same as the CSV elements, they're inside them. We can deal with this by checking the elements for substrings:
for csv_line in csv_f_reader:
match_found = False
for k in search_word.keys():
for csv_element in csv_line:
if k in csv_element:
match_found = True
if match_found:
writer.writerow(csv_line)
One other thing, is you need to open the output file in write mode with:
outfile = open('outfile.csv', 'w')

Nested with blocks in Python, level of nesting variable

I would like to combine columns of various csv files into one csv file, with a new heading, concatenated horizontally. I want to only select certain columns,chosen by heading. There are different columns in each of the files to be combined.
Example input:
freestream.csv:
static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02
fan.csv:
static pressure,static temperature,mass flow
0.9e5,301,72.9
exhaust.csv:
static pressure,static temperature,mass flow
1.7e5,432,73.1
Desired output:
combined.csv:
P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1
Possible call to the function:
reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
"combined.csv",["static pressure,relative Mach number",
"static pressure,mass flow","mass flow"],
"P_amb,M0,Ps_fan,W_fan,W_exh")
Here is a previous version of the code, with only one input file allowed. I wrote this with help from write CSV columns out in a different order in Python:
def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
with open(infilename) as infile:
with open(outfilename,'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in oldheadings.split(",")]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(newheadings.split(","))
for row in reader:
towrite = reorderfunc(row)
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)
So what I have figure out, in order to adapt this to multiple files, is:
-I need infilename, oldheadings, and newheadings to be a list now (all of the same length)
-I need to iterate over the list of input files to make a list of readers
-readnames can also be a list, iterating over the readers
-which means I can make name2index a list of dictionaries
One thing I don't know how to do, is use the keyword with, nested n-levels deep, when n is known only at run time. I read this: How can I open multiple files using "with open" in Python? but that seems to only work when you know how many files you need to open.
Or is there a better way to do this?
I am quite new to python so I appreciate any tips you can give me.
I am only replying to the part about opening multiple files with with, where the number of files is unknown before. It shouldn't be too hard to write your own contextmanager, something like this (completely untested):
from contextlib import contextmanager
#contextmanager
def open_many_files(filenames):
files = [open(filename) for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
Which you would use like this:
innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
for infile in infiles:
do_stuff(in_file)
There is also a function that does something similar, but it is deprecated.
I am not sure if this is the correct way to do this, but I wanted to expand on Bas Swinckels answer. He had a couple small inconsistencies in his very helpful answer and I wanted to give the correect code.
Here is what I did, and it worked.
from contextlib import contextmanager
import csv
import operator
import itertools as IT
#contextmanager
def open_many_files(filenames):
files=[open(filename,'r') for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
with open_many_files(filter(None,infilenames.split(','))) as handles:
with open(outfilename,'w') as outfile:
readers=[csv.reader(f) for f in handles]
writer = csv.writer(outfile)
reorderfunc=[]
for i, reader in enumerate(readers):
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
reorderfunc.append(operator.itemgetter(*writeindices))
writer.writerow(filter(None,newheadings.split(",")))
for rows in IT.izip_longest(*readers,fillvalue=['']*2):
towrite=[]
for i, row in enumerate(rows):
towrite.extend(reorderfunc[i](row))
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)

Categories