Write data with special chars/quotation into CSV, using python - python

My data, that I have in a Python list, can contain quotes, etc.:
the Government's
projections`
indexation" that
Now I'd like to write it into a CSV file but it seems the special chars "break" the CSV structure.
csv.register_dialect('doublequote', quotechar='"', delimiter=';', quoting=csv.QUOTE_ALL)
with open ( 'csv_data.csv', 'r+b' ) as f:
header = next (csv.reader(f))
dict_writer = csv.DictWriter(f, header, -999, dialect='doublequote')
dict_writer.writerow(csv_data_list)
It usually can write up to the first 50 lines or so.
I tried to delete the next row from the source list and it wrote to 60 lines.
Is there any "better" way of writing all sorts of data into a CSV?
I'm trying sth like this:
data['title'] = data['title'].replace("'", '`' )
but that doesn't seem to be right?

Related

Python – cleaning CSV file with split records

I have a delimited file in which some of the fields contain line termination characters. They can be LF or CR/LF.
The line terminators cause the records to split over multiple lines.
My objective is to read the file, remove the line termination characters, then write out a delimited file with quotes around the fields.
Sample input record:
444,2018-04-06,19:43:47,43762485,"Request processed"CR\LF
555,2018-04-30,19:17:56,43762485,"Added further note:LF
email customer a receipt" CR\LF
The first record is fine but the second has a LF (line feed) causing the record to fold.
import csv
with open(raw_data, 'r', newline='') as inp, open(csv_data, 'w') as out:
csvreader = csv.reader(inp, delimiter=',', quotechar='"')
for row in csvreader:
print(str(row))
out.write(str(row)[1:-1] + '\n')
My code nearly works but I don’t think it is correct.
The output I get is:
['444', '2020-04-06', '19:43:47', '344376882485', 'Request processed']
['555', '2020-04-30', '19:17:56', '344376882485', 'Added further note:\nemail customer a receipt']
I use the substring to remove the square brackets at the start and end of the line which I think is not the correct way.
Notice on the second record the new line character has been converted to \n. I would like to know how to get rid of that and also incorporate a csv writer to the code to place double quoting around the fields.
To remove the line terminators I tried replace but did not work.
(row.replace('\r', '').replace('\n', '') for row in csvreader)
I also tried to incorporate a csv writer but could not get it working with the list.
Any advice would be appreciated.
This snippet does what you want:
with open('raw_data.csv', 'r', newline='') as inp, open('csv_data.csv', 'w') as out:
reader = csv.reader(inp, delimiter=',', quotechar='"')
writer = csv.writer(out, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
fixed = [cell.replace('\n', '') for cell in row]
writer.writerow(fixed)
Quoting all cells is handled by passing csv.QUOTE_ALL as the writer's "quoting" argument.
The line
fixed = [cell.replace('\n', '') for cell in row]
creates a new list of cells where embedded '\n' characters are replaced by the empty string.
By default, Python will set the end-of-line to your platform's default. If you want to override this you can pass a lineterminator argument to the writer.
To me the original csv seems fine: it's normal to have embedded newlines ("soft line-breaks") inside quoted cells, and csv-aware applications should as spreadsheets will handle them correctly. However they will look wrong in applications that don't understand csv formatting and so treat the embedded newlines as actual end of line characters.

Python how to get the tweet data using specific word in csv file and put it in new csv file

I have data twitter in a CSV file (that I'm mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all.
The sample data twitter is example1.csv and the new file which will be created after the search of the word "macet" or "kecelakaan" is example2.csv. But there is no result.
import re
import csv
with open('example1.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
if re.search(r'macet', reader):
for row in reader:
myData = list(row)
print(row)
newFile = open('example2.csv', 'w')
with newFile:
writer = csv.writer(newFile)
writer.writerows(myData)
print("Writing complete")
I use spyder for environment Python 3.6.
The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data
myCSVtwitterData
updated : Sample of csv file. OS using : Windows
There are a couple of problems with your code.
In your reading loop you are passing a csv.reader object to re.search, but it doesn't know how to search that object. You need to pass it text or byte strings.
The line
myData = list(row)
converts row into a new list and saves it to myData, but it's already a list, so no conversion is necessary. And that line replaces the previous contents of myData, but you actually want to save all the matching rows. However, there's no need to save the rows, you can just write them to the new file as you go.
Anyway, here's a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I've created a regex that searches for the whole words "macet" and "kecelakaan", the "\b" matches at word boundaries so we don't get a match if "macet" or "kecelakaan" is part of a larger word.
import re
import csv
# Make a case-insensitive regex to match the words "macet" or "kecelakaan"
pattern = re.compile(r'\bmacet\b|\bkecelakaan\b', re.I)
with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile)
for row in reader:
# Skip empty rows
if not row:
continue
if pattern.search(row[2]):
print(row)
writer.writerow(row)
print("Writing complete")
I've just made a couple of improvements to that code. It now uses the newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.
Not answering about Python. But if you have a Linux OS, you can do it in one command line :
grep -i "macet" exemple1.csv > exemple2.csv
-i is for ignore case, so it will also match "Macet"
how is it~?
this code visit rows one by one
and find cells that contain a word in word_list
and write the value list on the row
import re
import csv
word_list = ['macet', 'kecelakaan']
with open('example1.csv', 'r') as csvFile, open('example2.csv', 'w') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile, lineterminator='\n')
for row in reader:
new_row = [content for content in row if any(map(lambda word: word in content, word_list))]
if(new_row != []):
print(new_row)
writer.writerow(new_row)
print("Writing complete")

Number formatting a CSV

I have developed a script that produces a CSV file. On inspection of the file, some cell's are being interpreted not the way I want..
E.g In my list in python, values that are '02e4' are being automatically formatted to be 2.00E+04.
table = [['aa02', 'fb4a82', '0a0009'], ['02e4, '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(0, len(table[i]):
ofile.write(table[i][j] + ",")
ofile.write("\n")
This gives me:
aa02 fb4a82 0a0009
2.00E+04 452ca2 0b0004
I've tried using the csv.writer instead where writer = csv.writer(ofile, ...)
and giving attributes from the lib (e.g csv.QUOTE_ALL)... but its the same output as before..
Is there a way using the CSV lib to automatically format all my values as strings before it's written?
Or is this not possible?
Thanks
Try setting the quoting parameter in your csv writer to csv.QUOTE_ALL.
See the doc for more info:
import csv
with open('myfile.csv', 'wb') as csvfile:
wtr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
wtr.writerow(...)
Although it sounds like the problem might lie with your csv viewer. Excel has a rather annoying habit of auto-formatting data like you describe.
If you want the '02e4' to show up in excel as "02e4" then annoyingly you have to write a csv with triple-double quotes: """02e4""". I don't know of a way to do this with the csv writer because it limits your quote character to a character. However, you can do something similar to your original attempt:
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(len(table[i])):
ofile.write('"""%s""",'%table[i][j])
ofile.write("\n")
If opened in a text editor your csv file will read:
"""aa02""","""fb4a82""","""0a0009""",
"""02e4""","""452ca2""","""0b0004""",
This produces the following result in Excel:
If you wanted to use any single character quotation you could use the csv module like so:
import csv
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
writer = csv.writer(ofile, delimiter=',', quotechar='|',quoting=csv.QUOTE_ALL)
for i in range(len(table)):
writer.writerow(table[i])
The output in the text editor will be:
|aa02|,|fb4a82|,|0a0009|
|02e4|,|452ca2|,|0b0004|
and Excel will show:

Trying to copy column1 from a csv file to another empty file using python

I'm looking for a way using python to copy the first column from a csv into an empty file. I'm trying to learn python so any help would be great!
So if this is test.csv
A 32
D 21
C 2
B 20
I want this output
A
D
C
B
I've tried the following commands in python but the output file is empty
f= open("test.csv",'r')
import csv
reader = csv.reader(f,delimiter="\t")
names=""
for each_line in reader:
names=each_line[0]
First, you want to open your files. A good practice is to use the with statement (that, technically speaking, introduces a context manager) so that when your code exits from the with block all the files are automatically closed
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
next you want a loop on the lines of the input file (note the indentation, we are inside the with block), line splitting is automatic when you read a text file with lines separated by newlines…
for line in inpfile:
each line is a string, but you think of it as two fields separated by white space — this situation is so common that strings have a method to deal with this situation (note again the increasing indent, we are in the for loop block)
fields = line.split()
by default .split() splits on white space, but you can use, e.g., split(',') to split on commas, etc — that said, fields is a list of strings, for your first record it is equal to ['A', '32'] and you want to output just the first field in this list… for this purpose a file object has the .write() method, that writes a string, just a string, to the file, and fields[0] IS a string, but we have to add a newline character to it because, in this respect, .write() is different from print().
outfile.write(fields[0]+'\n')
That's all, but if you omit my comments it's 4 lines of code
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
for line in inpfile:
fields = line.split()
outfile.write(fields[0]+'\n')
When you are done with learning (some) Python, ask for an explanation of this...
with open('test.csv') as ifl, open('out.csv', 'w') as ofl:
ofl.write('\n'.join(line.split()[0] for line in ifl))
Addendum
The csv module in such a simple case adds the additional conveniences of
auto-splitting each line into a list of strings
taking care of the details of output (newlines, etc)
and when learning Python it's more fruitful to see how these steps can be done using the bare language, or at least that it is my opinion…
The situation is different when your data file is complex, has headers, has quoted strings possibly containing quoted delimiters etc etc, in those cases the use of csv is recommended, as it takes into account all the gory details. For complex data analisys requirements you will need other packages, not included in the standard library, e.g., numpy and pandas, but that is another story.
This answer reads the CSV file, understanding a column to be demarked by a space character. You have to add the header=None otherwise the first row will be taken to be the header / names of columns.
ss is a slice - the 0th column, taking all rows as denoted by :
The last line writes the slice to a new filename.
import pandas as pd
df = pd.read_csv('test.csv', sep=' ', header=None)
ss = df.ix[:, 0]
ss.to_csv('new_path.csv', sep=' ', index=False)
import csv
reader = csv.reader(open("test.csv","rb"), delimiter='\t')
writer = csv.writer(open("output.csv","wb"))
for e in reader:
writer.writerow(e[0])
The best you can do is create a empty list and append the column and then write that new list into another csv for example:
import csv
def writetocsv(l):
#convert the set to the list
b = list(l)
print (b)
with open("newfile.csv",'w',newline='',) as f:
w = csv.writer(f, delimiter=',')
for value in b:
w.writerow([value])
adcb_list = []
f= open("test.csv",'r')
reader = csv.reader(f,delimiter="\t")
for each_line in reader:
adcb_list.append(each_line)
writetocsv(adcb_list)
hope this works for you :-)

Python: writing an entire row to a CSV file. Why does it work this way?

I had exported a csv from Nokia Suite.
"sms","SENT","","+12345678901","","2015.01.07 23:06","","Text"
Reading from the PythonDoc, I tried
import csv
with open(sourcefile,'r', encoding = 'utf8') as f:
reader = csv.reader(f, delimiter = ',')
for line in reader:
# write entire csv row
with open(filename,'a', encoding = 'utf8', newline='') as t:
a = csv.writer(t, delimiter = ',')
a.writerows(line)
It didn't work, until I put brackets around 'line' as so i.e. [line].
So at the last part I had
a.writerows([line])
Why is that so?
The writerows method accepts a container object. The line object isn't a container. [line] turns it into a list with one item in it.
What you probably want to use instead is writerow.

Categories