I am trying to remove a row from a csv file if the 2nd column matches a string. My csv file has the following information:
Name
15 Dog
I want the row with "Name" in it removed. The code I am using is:
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter=',')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
But the "Name" row isn't removed. What am I doing wrong?
EDIT: I was using the wrong delimiter. Changing it to \t worked. Below is the code that works now.
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter='\t')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
Seems that you are specifying the wrong delimiter (comma)in csv.reader
Each line yielded by reader is a list, split by your delimiter. Which, by the way, you specified as ,, are you sure that is the delimiter you want? Your sample is delimited by tabs.
Anyway, you want to check if 'Name' is in any element of a given line. So this will still work, regardless of whether your delimiter is correct:
for line in reader:
if any('Name' in x for x in line):
#write operation
Notice the difference. This version checks for 'Name' in each list element, yours checks if 'Name' is in the list. They are semantically different because 'Name' in ['blah blah Name'] is False.
I would recommend first fixing the delimiter error. If you still have issues, use if any(...) as it is possible that the exact token 'Name' is not in your list, but something that contains 'Name' is.
Related
I have a programs which outputs the data into a CSV file. These files contain 2 delimiters, these are , and "" for text. The text also contains commas.
How can I work with these 2 delimiters?
My current code gives me list index out of range. If the CSV file is needed I can provide it.
Current code:
def readcsv():
with open('pythontest.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024),delimiters=',"')
csvfile.seek(0)
reader = csv.reader(csvfile,dialect)
for row in reader:
asset_ip_addresses.append(row[0])
service_protocollen.append(row[1])
service_porten.append(row[2])
vurn_cvssen.append(row[3])
vurn_risk_scores.append(row[4])
vurn_descriptions.append(row[5])
vurn_cve_urls.append(row[6])
vurn_solutions.append(row[7])
The CSV File im working with: http://www.pastebin.com/bUbDC419
It seems to have problems with handling the second line. If i append the rows to a list the first row seems to be ok but the second row seems to take it as whole thing and not seperating the commas anymore.
I guess it has something to do with the "enters"
I don't think you should need to define a custom dialect, unless I'm missing something.
The official documentation shows you can provide quotechar as a keyword to the reader() method. The example from the documentation modified for your code:
import csv
with open('pythontest.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
#do something to the row
row is a list of strings for each item in the row with " quotes removed.
The issue with the index out of range suggests that one of the row[x] cannot be accessed.
OK, I think I understand what kind of file you are reading... let's say the content of your CSV file looks like this
192.168.12.255,"Great site, a lot of good, recommended",0,"Last, first, middle"
192.168.0.255,"About cats, dogs, must visit!",1,"One, two, three"
Here is the code that will allow you to read it line by line, text in quotes will be taken out as single array element, but it will not split it. The parameter that you need is this quoting=csv.QUOTE_ALL
import csv
with open('students.csv', newline='') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_ALL)
for row in reader:
print(row[0])
print(row[1])
print(row[2])
print(row[3])
The printed output will look like this
192.168.12.255
Great site, a lot of good, recommended
0
Last, first, middle
192.168.0.255
About cats, dogs, must visit!
1
One, two, three
PS solution is based on the latest official documentation, see here https://docs.python.org/3/library/csv.html
how about a quick solution like this
a quick fix, that would split a row in csv like a,"b,c",d as strings a,b,c,d
def readcsv():
with open('pythontest.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024),delimiters=',"')
csvfile.seek(0)
reader = csv.reader(csvfile,dialect)
for rowx in reader:
row=[e.split(r',') if isinstance(e,str) else e for e in rowx]
#do your stuff on row
I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name
I have a tab delimited file with lines of data as such:
8600tab8661tab000000000003148415tab10037-434tabXEOL
8600tab8662tab000000000003076447tab6134505tabEOL
8600tab8661tab000000000003426726tab470005-063tabXEOL
There should be 5 fields with the possibility of the last field having a value 'X' or being empty as shown above.
I am trying to parse this file in Python (2.7) using the csv reader module as such:
file = open(fname)
reader = csv.reader(file, delimiter='\t', quoting=csv.QUOTE_NONE)
for row in reader:
for i in range(5): # there are 5 fields
print row[i] # this fails if there is no 'X' in the last column
# index out of bounds error
If the last column is empty the row structure will end up looking like:
list: ['8600', '8662', '000000000003076447', '6134505']
So when row[4] is called, the error follows..
I was hoping for something like this:
list: ['8600', '8662', '000000000003076447', '6134505', '']
This problem only seems to occur if the very last column is empty. I have been looking through the reader arguments and dialect options to see if the is a simple command to pass into the csv.reader to fix the way it handles an empty field at the end of the line. So far no luck.
Any help will be much appreciated!
The easiest option would be to check the length of the row beforehand. If the length is 4, append an empty string to your list.
for row in reader:
if len(row) == 4:
row.append('')
for i in range(5):
print row[i]
There was a minor PEBCAK on my part. I was going back and forth between editing the file in Notepad++ and Gvim. At some point I lost my last tab on the end. I fixed the file and it parsed as expected.
I am saving a list to a csv using the writerow function from csv module. Something went wrong when I opened the final file in MS office Excel.
Before I encounter this issue, the main problem I was trying to deal with is getting the list saved to each row. It was saving each line into a cell in row1. I made some small changes, now this happened. I am certainly very confused as a novice python guy.
import csv
inputfile = open('small.csv', 'r')
header_list = []
header = inputfile.readline()
header_list.append(header)
input_lines = []
for line in inputfile:
input_lines.append(line)
inputfile.close()
AA_list = []
for i in range(0,len(input_lines)):
if (input_lines[i].split(',')[4]) == 'AA':#column4 has different names including 'AA'
AA_list.append(input_lines[i])
full_list = header_list+AA_list
resultFile = open("AA2013.csv",'w+')
wr = csv.writer(resultFile, delimiter = ',')
wr.writerow(full_list)
Thanks!
UPDATE:
The full_list look like this: ['1,2,3,"MEM",...]
UPDATE2(APR.22nd):
Now I got three cells of data(the header in A1 and the rest in A2 and A3 respectively) in the same row. Apparently, the newline signs are not working for three items in one big list. I think the more specific question now is how do I save a list of records with '\n' behind each record to csv.
UPDATE3(APR.23rd):
original file
Importing the csv module is not enough, you need to use it as well. Right now, you're appending each line as an entire string to your list instead of a list of fields.
Start with
with open('small.csv', 'rb') as inputfile:
reader = csv.reader(inputfile, delimiter=",")
header_list = next(reader)
input_lines = list(reader)
Now header_list contains all the headers, and input_lines contains a nested list of all the rows, each one split into columns.
I think the rest should be pretty straightforward.
append() appends a list at the end of another list. So when you write header_list.append(header), it takes header as a list of characters and appends to header_list. You should write
headers = header.split(',')
header_list.append(headers)
This would split the header row by commas and headers would be the list of header words, then append them properly after header_list.
The same thing goes for AA_list.append(input_lines[i]).
I figured it out.
The different between [val], val, and val.split(",") in the writerow bracket was:
[val]: a string containing everything taking only the first column in excel(header and "2013, 1, 2,..." in A1, B1, C1 and so on ).
val: each letter or comma or space(I forgot the technical terms) take a cell in excel.
val.split(","): comma split the string in [val], and put each string separated by comma into an excel cell.
Here is what I found out: 1.the right way to export the flat list to each line by using with syntax, 2.split the list when writing row
csvwriter.writerow(JD.split())
full_list = header_list+AA_list
with open("AA2013.csv",'w+') as resultFile:
wr = csv.writer(resultFile, delimiter= ",", lineterminator = '\n')
for val in full_list:
wr.writerow(val.split(','))
The wanted output
Please correct my mistakenly used term and syntax! Thanks.
Hello I have this text :
1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00
This is two lines in CSV file, but really is one line of data and I want remove the break line, and put this line in just one line using Regular Expressions.
I've tried: (\")(.*)(\n)(.*)(\") , but it doesn't work.
Don't. There is no need to remove the line break.
Use the csv module to read the CSV file, it'll handle the linebreak correctly:
import csv
with open(csvfilename, 'rb') as infile:
reader = csv.reader(infile)
for row in reader:
print repr(row[5])
will print:
'Block. CertNot Valid.\nQuery with me'
for that row.
This works because that column is correctly quoted.
You can check result here: https://www.debuggex.com/r/2_X5N-wTLZ2laJKh
Console output:
>>> regex = re.compile("\"(.+?)\"",re.MULTILINE|re.DOTALL|re.VERBOSE)
>>> regex.findall(string)
[u'Block. CertNot Valid.\nQuery with me', u'test\naaa', u'bbb\nvvvv']
And 'string' value is:
1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00
1,0.00,,2.00,10,"test
aaa",2013-06-20,0,0.00
1,0.00,,2.00,10,"bbb
vvvv",2013-06-20,0,0.00