I've got a string containing substrings I'd like to replace, e.g.
text = "Dear NAME, it was nice to meet you on DATE. Hope to talk with you and SPOUSE again soon!"
I've got a csv of the format (first row is a header)
NAME, DATE, SPOUSE
John, October 1, Jane
Jane, September 30, John
...
I'm trying to loop through each row in the csv file, replacing substrings in text with the csv element from the column with header row matching the original substring. I've got a list called matchedfields which contains all the fields that are found in the csv header row and text (in case there are some columns in the csv I don't need to use). My next step is to iterate through each csv row and replace the matched fields with the element from that csv column. To accomplish this, I'm using
with open('recipients.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for match in matchedfields:
print inputtext.replace(match, row[match])
My problem is that this only replaces the first matched substring in text with the appropriate element from the csv. Is there a way to make multiple replacements simultaneously so I end up with
"Dear John, it was nice to meet you on October 1. Hope to talk with you and Jane again soon!"
"Dear Jane, it was nice to meet you on September 30. Hope to talk with you and John again soon!"
I think the real way to go here is to use string templates. Makes your life easy.
Here is a general solution that works under Python2 and 3:
import string
template_text = string.Template(("Dear ${NAME}, "
"it was nice to meet you on ${DATE}. "
"Hope to talk with you and ${SPOUSE} again soon!"))
And then
import csv
with open('recipients.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(template_text.safe_substitute(row))
Now, I noticed that your csv is kind of messed up with whitespaces, so you'll have to take care of that first (or adapt the call to either the csv reader or the template).
The problem is that inputtext.replace(match, row[match]) doesn't change the inputtext variable, it just makes a new string that you aren't storing. Try this:
import copy
with open('recipients.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
inputtext_copy = copy.copy(inputtext) ## make a new copy of the input_text to be changed.
for match in matchedfields:
inputtext_copy = inputtext_copy.replace(match, row[match]) ## this saves the text with the right
print inputtext ## should be the original w/ generic field names
print inputtext_copy ## should be the new version with all fields changed to specific instantiations
You should reassign the replaced string to the original name so the previous replacements are not thrown away:
with open('recipients.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
inputtext = text
for match in matchedfields:
inputtext = inputtext.replace(match, row[match])
print inputtext
On another note, you could update the original string using string formatting with a little modification to the string like so:
text = "Dear {0[NAME]}, it was nice to meet you on {0[DATE]}. Hope to talk with you and {0[SPOUSE]} again soon!"
with open('recipients.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
inputtext = text.format(row)
print inputtext
That formats the string with the dictionary in one step, without having to make replacements iteratively.
Related
I have been doing these tasks:
Write a script that reads in the data from the CSV file pastimes.csv located in the
chapter 9 practice files folder, skipping over the header row
Display each row of data (except for the header row) as a list of strings
Add code to your script to determine whether or not the second entry in each row
(the "Favorite Pastime") converted to lower-case includes the word "fighting" using
the string methods find() and lower()
I have complited 2 of them but i really misunderstand the third one, cause my english is not very well and i really can't catch what do they want
import csv
with open("pastimes.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
next(my_file_reader)
for row in my_file_reader:
print(row)
Output: ['Fezzik', 'Fighting']
['Westley', 'Winning']
['Inigo Montoya', 'Sword fighting']
['Buttercup', 'Complaining']
Headers which i skipped: Person, Favorite pastime
You need something like:
import csv
with open("pastimes.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
next(my_file_reader)
for row in my_file_reader:
print(row)
if row[1].lower().find('fighting') >= 0:
print('Second entry lowered contains "fighting"')
I have a programs which outputs the data into a CSV file. These files contain 2 delimiters, these are , and "" for text. The text also contains commas.
How can I work with these 2 delimiters?
My current code gives me list index out of range. If the CSV file is needed I can provide it.
Current code:
def readcsv():
with open('pythontest.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024),delimiters=',"')
csvfile.seek(0)
reader = csv.reader(csvfile,dialect)
for row in reader:
asset_ip_addresses.append(row[0])
service_protocollen.append(row[1])
service_porten.append(row[2])
vurn_cvssen.append(row[3])
vurn_risk_scores.append(row[4])
vurn_descriptions.append(row[5])
vurn_cve_urls.append(row[6])
vurn_solutions.append(row[7])
The CSV File im working with: http://www.pastebin.com/bUbDC419
It seems to have problems with handling the second line. If i append the rows to a list the first row seems to be ok but the second row seems to take it as whole thing and not seperating the commas anymore.
I guess it has something to do with the "enters"
I don't think you should need to define a custom dialect, unless I'm missing something.
The official documentation shows you can provide quotechar as a keyword to the reader() method. The example from the documentation modified for your code:
import csv
with open('pythontest.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
#do something to the row
row is a list of strings for each item in the row with " quotes removed.
The issue with the index out of range suggests that one of the row[x] cannot be accessed.
OK, I think I understand what kind of file you are reading... let's say the content of your CSV file looks like this
192.168.12.255,"Great site, a lot of good, recommended",0,"Last, first, middle"
192.168.0.255,"About cats, dogs, must visit!",1,"One, two, three"
Here is the code that will allow you to read it line by line, text in quotes will be taken out as single array element, but it will not split it. The parameter that you need is this quoting=csv.QUOTE_ALL
import csv
with open('students.csv', newline='') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_ALL)
for row in reader:
print(row[0])
print(row[1])
print(row[2])
print(row[3])
The printed output will look like this
192.168.12.255
Great site, a lot of good, recommended
0
Last, first, middle
192.168.0.255
About cats, dogs, must visit!
1
One, two, three
PS solution is based on the latest official documentation, see here https://docs.python.org/3/library/csv.html
how about a quick solution like this
a quick fix, that would split a row in csv like a,"b,c",d as strings a,b,c,d
def readcsv():
with open('pythontest.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024),delimiters=',"')
csvfile.seek(0)
reader = csv.reader(csvfile,dialect)
for rowx in reader:
row=[e.split(r',') if isinstance(e,str) else e for e in rowx]
#do your stuff on row
I'm looking for a way using python to copy the first column from a csv into an empty file. I'm trying to learn python so any help would be great!
So if this is test.csv
A 32
D 21
C 2
B 20
I want this output
A
D
C
B
I've tried the following commands in python but the output file is empty
f= open("test.csv",'r')
import csv
reader = csv.reader(f,delimiter="\t")
names=""
for each_line in reader:
names=each_line[0]
First, you want to open your files. A good practice is to use the with statement (that, technically speaking, introduces a context manager) so that when your code exits from the with block all the files are automatically closed
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
next you want a loop on the lines of the input file (note the indentation, we are inside the with block), line splitting is automatic when you read a text file with lines separated by newlines…
for line in inpfile:
each line is a string, but you think of it as two fields separated by white space — this situation is so common that strings have a method to deal with this situation (note again the increasing indent, we are in the for loop block)
fields = line.split()
by default .split() splits on white space, but you can use, e.g., split(',') to split on commas, etc — that said, fields is a list of strings, for your first record it is equal to ['A', '32'] and you want to output just the first field in this list… for this purpose a file object has the .write() method, that writes a string, just a string, to the file, and fields[0] IS a string, but we have to add a newline character to it because, in this respect, .write() is different from print().
outfile.write(fields[0]+'\n')
That's all, but if you omit my comments it's 4 lines of code
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
for line in inpfile:
fields = line.split()
outfile.write(fields[0]+'\n')
When you are done with learning (some) Python, ask for an explanation of this...
with open('test.csv') as ifl, open('out.csv', 'w') as ofl:
ofl.write('\n'.join(line.split()[0] for line in ifl))
Addendum
The csv module in such a simple case adds the additional conveniences of
auto-splitting each line into a list of strings
taking care of the details of output (newlines, etc)
and when learning Python it's more fruitful to see how these steps can be done using the bare language, or at least that it is my opinion…
The situation is different when your data file is complex, has headers, has quoted strings possibly containing quoted delimiters etc etc, in those cases the use of csv is recommended, as it takes into account all the gory details. For complex data analisys requirements you will need other packages, not included in the standard library, e.g., numpy and pandas, but that is another story.
This answer reads the CSV file, understanding a column to be demarked by a space character. You have to add the header=None otherwise the first row will be taken to be the header / names of columns.
ss is a slice - the 0th column, taking all rows as denoted by :
The last line writes the slice to a new filename.
import pandas as pd
df = pd.read_csv('test.csv', sep=' ', header=None)
ss = df.ix[:, 0]
ss.to_csv('new_path.csv', sep=' ', index=False)
import csv
reader = csv.reader(open("test.csv","rb"), delimiter='\t')
writer = csv.writer(open("output.csv","wb"))
for e in reader:
writer.writerow(e[0])
The best you can do is create a empty list and append the column and then write that new list into another csv for example:
import csv
def writetocsv(l):
#convert the set to the list
b = list(l)
print (b)
with open("newfile.csv",'w',newline='',) as f:
w = csv.writer(f, delimiter=',')
for value in b:
w.writerow([value])
adcb_list = []
f= open("test.csv",'r')
reader = csv.reader(f,delimiter="\t")
for each_line in reader:
adcb_list.append(each_line)
writetocsv(adcb_list)
hope this works for you :-)
I am trying to remove a row from a csv file if the 2nd column matches a string. My csv file has the following information:
Name
15 Dog
I want the row with "Name" in it removed. The code I am using is:
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter=',')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
But the "Name" row isn't removed. What am I doing wrong?
EDIT: I was using the wrong delimiter. Changing it to \t worked. Below is the code that works now.
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter='\t')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
Seems that you are specifying the wrong delimiter (comma)in csv.reader
Each line yielded by reader is a list, split by your delimiter. Which, by the way, you specified as ,, are you sure that is the delimiter you want? Your sample is delimited by tabs.
Anyway, you want to check if 'Name' is in any element of a given line. So this will still work, regardless of whether your delimiter is correct:
for line in reader:
if any('Name' in x for x in line):
#write operation
Notice the difference. This version checks for 'Name' in each list element, yours checks if 'Name' is in the list. They are semantically different because 'Name' in ['blah blah Name'] is False.
I would recommend first fixing the delimiter error. If you still have issues, use if any(...) as it is possible that the exact token 'Name' is not in your list, but something that contains 'Name' is.
I've got a function that adds new data to a csv file. I've got it somewhat working. However, I'm having a few problems.
When I add the new values (name, phone, address, birthday), it adds them all in one column, rather than separate columns in the same row. (Not really much idea on how to split them up in various columns...)
I can only add numbers rather than string values. So if I write add_friend(blah, 31, 12, 45), it will come back saying blah is not defined. However, if I write add_friend(3,4,5,6), it'll add that to the new row—but, into a single column
An objective with the function is: If you try and add a friend that's already in the csv (say, Bob), and his address, phone, birthday are already in the csv, if you add_friend(Bob, address, phone, birthday), it should state False, and not add it. However, I have no clue how to do this. Any ideas?
Here is my code:
def add_friend (name, phone, address, birthday):
with open('friends.csv', 'ab') as f:
newrow = [name, phone, address, birthday]
friendwriter = csv.writer(open('friends.csv', 'ab'), delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
friendwriter.writerow(newrow)
#friendreader = csv.reader(open('friends.csv', 'rb'), delimiter=' ', quotechar='|')
#for row in friendreader:
#print ' '.join(row)
print newrow
Based on your requirements, and what you appear to be trying to do, I've written the following. It should be verbose enough to be understandable.
You need to be consistent with your delimiters and other properties when reading the CSV files.
Also, try and move "friends.csv" to a global, or at least in some non-hard coded constant.
import csv
def print_friends():
reader = csv.reader(open("friends.csv", "rb"), delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in reader:
print row
def friend_exists(friend):
reader = csv.reader(open("friends.csv", "rb"), delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in reader:
if (row == friend):
return True
return False
def add_friend(name, phone, address, birthday):
friend = [name, phone, address, birthday]
if friend_exists(friend):
return False
writer = csv.writer(open("friends.csv", "ab"), delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(friend)
return True
print "print_friends: "
print_friends()
print "get_friend: "
test_friend = ["barney", "4321 9876", "New York", "2000"]
print friend_exists(test_friend)
print "add_friend: "
print add_friend("barney", "4321 9876", "New York", "2000")
It doesn't do that. What makes you think that's what it does? It's possible that the quoting scheme you really want isn't the one you specified to csv.writer: i.e., spaces delimit columns, and | is the quoting character.
blah is not a string literal, "blah" is. blah without quotes is a variable reference, and the variable didn't exist here.
In order to check whether a name is already in the CSV file, you have to read the whole CSV file first, checking for the details. Open the file twice: first for reading ('r'), and use csv.reader to turn that into a row-iterator and find all the names. You can add those names to a set, and then check with that set forever after.
re #3:
To get this set, you could define a function as so:
def get_people():
with open(..., 'r') as f:
return set(map(tuple, csv.reader(f)))
And then if you assigned the set somewhere, e.g. existing_people = get_people()
you could then check against it when adding new people, as follows:
newrow = (name, phone, address, birthday)
if newrow in existing_people:
return False
else:
existing_people.add(newrow)
friendwriter.writerow(newrow)
You aren't stating how experienced with Python you already are, so I am aiming this a little low - no offence intended
There are several "requirements" for your homework. In general, you should ensure that one function does one thing. So, to meet all your requirements, you’ll need several functions; look at creating at least one module (i.e., a file with functions in it).
A space delimiter and a | for quotes is pretty unusual. For the current file, what is the delimieter between columns? And what is used to quote/escape text? (By “escaping text”, I mean: If I have a csv file that uses commas as the column delimiter, and I want to put a sentence with commas into just one column, I need to tell the difference between a comma that means “new column” and a comma that is part of a sentence in a column. Microsoft decided that Excel would support double quotes—so "hello, sailor" became a de facto standard.
If you want to know if "bob brown” is already in the file, you will need to read the whole file first before trying to insert. You can do this using 'r', then 'w'. But should you read the whole file every time you want to insert one record? What if you have a hundred records to add—should you read the whole file each time? Is there a way to store the names during the adding process?
blah is not a string. It needs to be quoted to be a string literal ("blah"). blah just refers to a variable whose name is blah. If it says blah is not defined, that’s because you have not declared the variable blah to hold anything.