Remove double quotes from iterator when using csv writer - python

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?

I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])

writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

Related

Python reading in integers from a csv file into a list

I am having some trouble trying to read a particular column in a csv file into a list in Python. Below is an example of my csv file:
Col 1 Col 2
1,000,000 1
500,000 2
250,000 3
Basically I am wanting to add column 1 into a list as integer values and am having a lot of trouble doing so. I have tried:
for row in csv.reader(csvfile):
list = [int(row.split(',')[0]) for row in csvfile]
However, I get a ValueError that says "invalid literal for int() with base 10: '"1'
I then tried:
for row in csv.reader(csvfile):
list = [(row.split(',')[0]) for row in csvfile]
This time I don't get an error however, I get the list:
['"1', '"500', '"250']
I have also tried changing the delimiter:
for row in csv.reader(csvfile):
list = [(row.split(' ')[0]) for row in csvfile]
This almost gives me the desired list however, the list includes the second column as well as, "\n" after each value:
['"1,000,000", 1\n', etc...]
If anyone could help me fix this it would be greatly appreciated!
Cheers
You should choose your delimiter wisely :
If you have floating numbers using ., use , delimiter, or if you use , for floating numbers, use ; as delimiter.
Moreover, as referred by the doc for csv.reader you can use the delimiter= argument to define your delimiter, like so:
with open('myfile.csv', 'r') as csvfile:
mylist = []
for row in csv.reader(csvfile, delimiter=';'):
mylist.append(row[0]) # careful here with [0]
or short version:
with open('myfile.csv', 'r') as csvfile:
mylist = [row[0] for row in csv.reader(csvfile, delimiter=';')]
To parse your number to a float, you will have to do
float(row[0].replace(',', ''))
You can open the file and split at the space using regular expressions:
import re
file_data = [re.split('\s+', i.strip('\n')) for i in open('filename.csv')]
final_data = [int(i[0]) for i in file_data[1:]]
First of all, you must parse your data correctly. Because it's not, in fact, CSV (Comma-Separated Values) but rather TSV (Tab-Separated) of which you should inform CSV reader (I'm assuming it's tab but you can theoretically use any whitespace with a few tweaks):
for row in csv.reader(csvfile, delimiter="\t"):
Second of all, you should strip your integer values of any commas as they don't add new information. After that, they can be easily parsed with int():
int(row[0].replace(',', ''))
Third of all, you really really should not iterate the same list twice. Either use a list comprehension or normal for loop, not both at the same time with the same variable. For example, with list comprehension:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
next(reader, None) # skip the header
lst = [int(row[0].replace(',', '')) for row in reader]
Or with normal iteration:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
lst = []
for i, row in enumerate(reader):
if i == 0:
continue # your custom header-handling code here
lst.append(int(row[0].replace(',', '')))
In both cases, lst is set to [1000000, 500000, 250000] as it should. Enjoy.
By the way, using reserved keyword list as a variable is an extremely bad idea.
UPDATE. There's one more option that I find interesting. Instead of setting the delimiter explicitly you can use csv.Sniffer to detect it e.g.:
csvdata = "Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n"
csvfile = StringIO(csvdata)
dialect = csv.Sniffer().sniff(csvdata)
reader = csv.reader(csvfile, dialect=dialect)
and then just like the snippets above. This will continue working even if you replace tabs with semicolons or commas (would require quotes around your weird integers) or, possibly, something else.

List append not working Python

I'm working on a script to remove bad characters from a csv file then to be stored in a list.
The script runs find but doesn't remove bad characters so I'm a bit puzzled any pointers or help on why it's not working is appreciated
def remove_bad(item):
item = item.replace("%", "")
item = item.replace("test", "")
return item
raw = []
with open("test.csv", "rb") as f:
rows = csv.reader(f)
for row in rows:
raw.append((remove_bad(row[0].strip()),
row[1].strip().title()))
print raw
If I have a csv-file with one line:
tst%,testT
Then your script, slightly modified, should indeed filter the "bad" characters. I changed it to pass both items separately to remove_bad (because you mentioned you had to "remove bad characters from a csv", not only the first row):
import csv
def remove_bad(item):
item = item.replace("%","")
item = item.replace("test","")
return item
raw = []
with open("test.csv", "rb") as f:
rows = csv.reader(f)
for row in rows:
raw.append((remove_bad(row[0].strip()), remove_bad(row[1].strip()).title()))
print raw
Also, I put title() after the function call (else, "test" wouldn't get filtered out).
Output (the rows will get stored in a list as tuples, as in your example):
[('tst', 'T')]
Feel free to ask questions
import re
import csv
p = re.compile( '(test|%|anyotherchars)') #insert bad chars insted of anyotherchars
def remove_bad(item):
item = p.sub('', item)
return item
raw =[]
with open("test.csv", "rb") as f:
rows = csv.reader(f)
for row in rows:
raw.append( ( remove_bad(row[0].strip()),
row[1].strip().title() # are you really need strip() without args?
) # here you create a touple which you will append to array
)
print raw

Python 3.4 CSV Deleting items using the in function

This is my current code, the current issue I have is that search returns nothing. How do I achieve a string value for this variable.
count = 0
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
row_count = sum(1 for row in myFile)
print("aba")
for x in range(row_count):
print("aaa")
for row in myFile:
search = row[count].readline
print(search)
if self.delName.get("1.0","end-1c") in search:
count = count + 1
else:
newFile.write(row[count])
count = count + 1
The output is:
aba
aaa
aaa
So it runs through it twice, which is good as my userDatabase consists of two rows of data.
The file in question has this data:
"lukefinney","0000000","0000000","a"
"nictaylor","0000000","0000000","a"
You cannot just iterate over an open file more than once without rewinding the file object back to the start.
You'll need to add a file.seek(0) call to put the file reader back to the beginning each time you want to start reading from the first row again:
myFile.seek(0)
for row in myFile:
The rest of your code makes little sense; when iterating over a file you get individual lines from the file, so each row is a string object. Indexing into strings gives you new strings with just one character in it; 'foo'[1] is the character 'o', for example.
If you wanted to copy across rows that don't match a string, you don't need to know the row count up front at all. You are not handling a list of rows here, you can look at each row individually instead:
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if filter_string not in row:
newFile.write(row)
This does a sub-string match. If you need to match whole columns, use the csv module to give you individual columns to match against. The module handles the quotes around column values:
import csv
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv", "r", newline='') as myFile:
with open("newFile.csv", "w", newline='') as newFile:
writer = csv.writer(newFile)
for row in csv.reader(myFile):
# row is now a list of strings, like ['lukefinney', '0000000', '0000000', 'a']
if filter_string != row[0]: # test against the first column
# copied across if the first column does not match exactly.
writer.writerow(row)
One problem is that row_count = sum(1 for row in myFile) consumes all rows from myFile. Subsequent reads on myFile will return an empty string which signifies end of file. This means that for loop later in your code where you execute for row in myFile: is not entered because all rows have already been consumed.
A way around this is to add a call to myFile.seek(0) just before for row in myFile:. This will reset the file pointer and the for loop should then work.
It's not very clear from your code what it is that you are trying to do, but it kind of looks like you want to filter out rows that contain a certain string. Try this:
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if self.delName.get("1.0","end-1c") not in row:
newFile.write(row)

Trouble in saving a list to csv

I am saving a list to a csv using the writerow function from csv module. Something went wrong when I opened the final file in MS office Excel.
Before I encounter this issue, the main problem I was trying to deal with is getting the list saved to each row. It was saving each line into a cell in row1. I made some small changes, now this happened. I am certainly very confused as a novice python guy.
import csv
inputfile = open('small.csv', 'r')
header_list = []
header = inputfile.readline()
header_list.append(header)
input_lines = []
for line in inputfile:
input_lines.append(line)
inputfile.close()
AA_list = []
for i in range(0,len(input_lines)):
if (input_lines[i].split(',')[4]) == 'AA':#column4 has different names including 'AA'
AA_list.append(input_lines[i])
full_list = header_list+AA_list
resultFile = open("AA2013.csv",'w+')
wr = csv.writer(resultFile, delimiter = ',')
wr.writerow(full_list)
Thanks!
UPDATE:
The full_list look like this: ['1,2,3,"MEM",...]
UPDATE2(APR.22nd):
Now I got three cells of data(the header in A1 and the rest in A2 and A3 respectively) in the same row. Apparently, the newline signs are not working for three items in one big list. I think the more specific question now is how do I save a list of records with '\n' behind each record to csv.
UPDATE3(APR.23rd):
original file
Importing the csv module is not enough, you need to use it as well. Right now, you're appending each line as an entire string to your list instead of a list of fields.
Start with
with open('small.csv', 'rb') as inputfile:
reader = csv.reader(inputfile, delimiter=",")
header_list = next(reader)
input_lines = list(reader)
Now header_list contains all the headers, and input_lines contains a nested list of all the rows, each one split into columns.
I think the rest should be pretty straightforward.
append() appends a list at the end of another list. So when you write header_list.append(header), it takes header as a list of characters and appends to header_list. You should write
headers = header.split(',')
header_list.append(headers)
This would split the header row by commas and headers would be the list of header words, then append them properly after header_list.
The same thing goes for AA_list.append(input_lines[i]).
I figured it out.
The different between [val], val, and val.split(",") in the writerow bracket was:
[val]: a string containing everything taking only the first column in excel(header and "2013, 1, 2,..." in A1, B1, C1 and so on ).
val: each letter or comma or space(I forgot the technical terms) take a cell in excel.
val.split(","): comma split the string in [val], and put each string separated by comma into an excel cell.
Here is what I found out: 1.the right way to export the flat list to each line by using with syntax, 2.split the list when writing row
csvwriter.writerow(JD.split())
full_list = header_list+AA_list
with open("AA2013.csv",'w+') as resultFile:
wr = csv.writer(resultFile, delimiter= ",", lineterminator = '\n')
for val in full_list:
wr.writerow(val.split(','))
The wanted output
Please correct my mistakenly used term and syntax! Thanks.

Editing a code to create a filter based on a condition and then stripping the condition

SO,
I'm looking for some help making a bit of code so that it also includes an if statement so that the filter is only added if the line contains (BIPL) but then stripping it out of the filters list once it's added...
1test,tester,testing (BIPL),no,yes
2test,tester,testing,no,yes
3data,datas,datatest (BIPL),yes,no
Current code...
with open('test.csv', 'rb') as old_csv:
filters = {(row[0].lower(), row[1][:3].upper(), row[2].upper()) for row in csv.reader(old_csv, delimiter=',')}
Effectively the outcome would be as follows, just in a different format.
1test,TES,TESTING
3data,DAT,DATATEST
It should be a simple change but I can't figure it out
csv.reader can accept an iterator as its first argument (not just file handles). So you can define a generator which yields only those lines which contain '(BIPL)' and send that to csv.reader:
import csv
import re
def only_bipl(f):
for line in f:
if '(BIPL)' in line:
yield re.sub(r'\s*\(BIPL\)', '', line)
with open('test.csv', 'rb') as old_csv:
reader = csv.reader(only_bipl(old_csv), delimiter=',')
filters = {(row[0].lower(), row[1][:3].upper(), row[2].upper()) for row in reader}
Note the above will yield any line that contains '(BIPL)' anywhere. A better, more targeted alternative would be to match only those lines which contain '(BIPL)' at the end of the third item. You can do that using an if-clause inside the set comprehension:
with open('test.csv', 'rb') as old_csv:
reader = csv.reader(old_csv, delimiter=',')
filters = {(row[0].lower(), row[1][:3].upper(), row[2][:-6].strip().upper())
for row in reader
if row[2].endswith('(BIPL)')}

Categories