I'm trying to write to a CSV file with output that looks like this:
14897,40.50891,-81.03926,168.19999
but the CSV writer keeps writing the output with quotes at beginning and end
'14897,40.50891,-81.03926,168.19999'
When I print the line normally, the output is correct but I need to do line.split() or else the csv writer puts output as 1,4,8,9,7 etc...
But when I do line.split() the output is then
['14897,40.50891,-81.03926,168.19999']
Which is written as '14897,40.50891,-81.03926,168.19999'
How do I make the quotes go away? I already tried csv.QUOTE_NONE but doesn't work.
with open(results_csv, 'wb') as out_file:
writer = csv.writer(out_file, delimiter=',')
writer.writerow(["time", "lat", "lon", "alt"])
for f in file_directory):
for line in open(f):
print line
line = line.split()
writer.writerow(line)
with line.split(), you're not splitting according to commas but to blanks (spaces, linefeeds, tabs). Since there are none, you end up with only 1 item per row.
Since this item contains commas, csv module has to quote to make the difference with the actual separator (which is also comma). You would need line.strip().split(",") for it to work, but...
using csv to read your data would be a better idea to fix this:
replace that:
for line in open(some_file):
print line
line = line.split()
writer.writerow(line)
by:
with open(some_file) as f:
cr = csv.reader(f) # default separator is comma already
writer.writerows(cr)
You don't need to read the file manually. You can simply use csv reader.
Replace the inner for loop with:
# with ensures that the file handle is closed, after the execution of the code inside the block
with open(some_file) as file:
row = csv.reader(file) # read rows
writer.writerows(row) # write multiple rows at once
Related
I'd like to create a CSV from a TXT file. I have a text file with lines (300 lines+) separated by backslashes. I'd like each line to be a separate row, and each backslash to be a separate new column.
The text file looks like:
example 1\example 2\example 3\example 4
test 1\test 2\test 3\test 4
I'd like the CSV to look like:
Example 1
Example 2
Example 3
Example 4
Test 1
Test 2
Test 3
Test 4
So far I have:
import csv
with open('Report.txt') as report:
report_txt = report.read()
with open('Report.csv','w',newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(report_txt)
I know I need to use \ as a delimiter, but I'm not sure how. Thanks for any help!
Define your delimiter like this (escape the \):
reader = csv.reader(open("Report.csv"), delimiter="\\")
Code:
import csv
with open('Report.txt') as report:
reader = csv.reader(report, delimiter="\\")
with open('Report_output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for line in reader:
writer.writerow(line)
First you got to split the string based on the delimeter. You can achieve this by using the split operator or regex.
import csv
with open('file.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split("\\") for line in stripped if line)
Then pretty much write it to the csv.
with open('report.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(lines)
Tweak your code accordingly. The concept is pretty much the same. Note the double backslash is to account for the escape character.
If you are just trying to convert that text into CSV, you can just replace every "\" character with ";" and you'll have a valid CSV file.
Else, if you want to do something with the parsed data before reexporting to CSV, you can read the file line by line and use the split() Method with "\", then rejoin and write line by line, like here:
with open('in.txt') as input_file:
with open('out.csv','a') as output_file:
txt_line = input_file.readline()
while txt_line:
cells = txt_line.split("\\")
# Do something with each cell...
csv_line = ";".join(cells)
output_file.write(csv_line)
txt_line = input_file.readline()
I am trying to remove duplicate lines from a text file and keep facing issues... The output file keeps putting the first two accounts on the same line. Each account should have a different line... Does anyone know why this is happening and how to fix it?
with open('accounts.txt', 'r') as f:
unique_lines = set(f.readlines())
with open('accounts_No_Dup.txt', 'w') as f:
f.writelines(unique_lines)
accounts.txt:
#account1
#account2
#account3
#account4
#account5
#account6
#account7
#account5
#account8
#account4
accounts_No_Dup.txt:
#account4#account3
#account4
#account8
#account5
#account7
#account1
#account2
#account6
print(unique_lines)
{'#account4', '#account7\n', '#account3\n', '#account6\n', '#account5\n', '#account8\n', '#account4\n', '#account2\n', '#account1\n'}
The last line in your file is missing a newline (technically a violation of POSIX standards for text files, but so common you have to account for it), so "#account4\n" earlier on is interpreted as unique relative to "#account4" at the end. I'd suggest unconditionally stripping newlines, and adding them back when writing:
with open('accounts.txt', 'r') as f:
unique_lines = {line.rstrip("\r\n") for line in f} # Remove newlines for consistent deduplication
with open('accounts_No_Dup.txt', 'w') as f:
f.writelines(f'{line}\n' for line in unique_lines) # Add newlines back
By the by, on modern Python (CPython/PyPy 3.6+, 3.7+ for any interpreter), you can preserve order of first appearance by using a dict rather than a set. Just change the read from the file to:
unique_lines = {line.rstrip("\r\n"): None for line in f}
and you'll see each line the first time it appears, in that order, with subsequent duplicates being ignored.
Your problem is that set changes the order of your lines and your last element doesn't end with \n as you don't have an empty line at the end of your file.
Just add the separator or don't use set.
with open('accounts.txt', 'r') as f:
unique_lines = set()
for line in f.readlines():
if not line.endswith('\n'):
line += '\n'
unique_lines.add(line)
with open('accounts_No_Dup.txt', 'w') as f:
f.writelines(unique_lines)
You can easily do it using unique keyword
The code is as below
import pandas as pd
data = pd.read_csv('d:\\test.txt', sep="/n", header=None)
df = pd.DataFrame(data[0].unique())
with open('d:\\testnew.txt', 'a') as f:
f.write(df.to_string(header = False, index = False)))
Results: Test file to read has data
The result is it removed the duplicate lines
I have a delimited file in which some of the fields contain line termination characters. They can be LF or CR/LF.
The line terminators cause the records to split over multiple lines.
My objective is to read the file, remove the line termination characters, then write out a delimited file with quotes around the fields.
Sample input record:
444,2018-04-06,19:43:47,43762485,"Request processed"CR\LF
555,2018-04-30,19:17:56,43762485,"Added further note:LF
email customer a receipt" CR\LF
The first record is fine but the second has a LF (line feed) causing the record to fold.
import csv
with open(raw_data, 'r', newline='') as inp, open(csv_data, 'w') as out:
csvreader = csv.reader(inp, delimiter=',', quotechar='"')
for row in csvreader:
print(str(row))
out.write(str(row)[1:-1] + '\n')
My code nearly works but I don’t think it is correct.
The output I get is:
['444', '2020-04-06', '19:43:47', '344376882485', 'Request processed']
['555', '2020-04-30', '19:17:56', '344376882485', 'Added further note:\nemail customer a receipt']
I use the substring to remove the square brackets at the start and end of the line which I think is not the correct way.
Notice on the second record the new line character has been converted to \n. I would like to know how to get rid of that and also incorporate a csv writer to the code to place double quoting around the fields.
To remove the line terminators I tried replace but did not work.
(row.replace('\r', '').replace('\n', '') for row in csvreader)
I also tried to incorporate a csv writer but could not get it working with the list.
Any advice would be appreciated.
This snippet does what you want:
with open('raw_data.csv', 'r', newline='') as inp, open('csv_data.csv', 'w') as out:
reader = csv.reader(inp, delimiter=',', quotechar='"')
writer = csv.writer(out, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
fixed = [cell.replace('\n', '') for cell in row]
writer.writerow(fixed)
Quoting all cells is handled by passing csv.QUOTE_ALL as the writer's "quoting" argument.
The line
fixed = [cell.replace('\n', '') for cell in row]
creates a new list of cells where embedded '\n' characters are replaced by the empty string.
By default, Python will set the end-of-line to your platform's default. If you want to override this you can pass a lineterminator argument to the writer.
To me the original csv seems fine: it's normal to have embedded newlines ("soft line-breaks") inside quoted cells, and csv-aware applications should as spreadsheets will handle them correctly. However they will look wrong in applications that don't understand csv formatting and so treat the embedded newlines as actual end of line characters.
I have a CSV file where I am looping and matching with my database for getting results according to these matches.
I encountered a problem in the case where there is a space at the end of the text. So I did my research and found that I need to add the rstrip function to remove spaces at the end of the text.
Here is my code:
with open(path, encoding='utf-8') as f:
data = csv.reader(f, delimiter='|')
for row in data:
line = row[0]
cleanline = line.rstrip()
lines.append(cleanline)
query = line
The code is not working. I tried also strings like /s or strip, and replace functions as well but nothing is working. What can be the reason? What am I doing wrong?
CSV File with empty space at the end:
Sistem en az 23.8 inç boyutlarında olmalıdır.
1 adet HDMI port olmalıdır.
You could try the following approach:
import csv
path = 'input.csv'
lines = []
with open(path, newline='', encoding='utf-8') as f:
data = csv.reader(f, delimiter='|', skipinitialspace=True)
for row in data:
lines.append([c.strip() for c in row])
print(lines)
This removes all leading and trailing spaces from each cell in a row using the strip() command. Depending on your data, it might be just enough to add the additional skipinitialspace=True parameter. This though would not remove trailing spaces before the next delimiter. newline='' should also be used in Python 3.x when used with a csv.reader().
The file you have given just contains lines of text, as such you could read it as follows:
lines = []
with open('input.csv', encoding='utf-8') as f_input:
for line in f_input:
lines.append(line.strip())
print(lines)
This would give you lines containing:
['Sistem en az 23.8 inç boyutlarında olmalıdır.', '1 adet HDMI port olmalıdır.']
Hello I have this text :
1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00
This is two lines in CSV file, but really is one line of data and I want remove the break line, and put this line in just one line using Regular Expressions.
I've tried: (\")(.*)(\n)(.*)(\") , but it doesn't work.
Don't. There is no need to remove the line break.
Use the csv module to read the CSV file, it'll handle the linebreak correctly:
import csv
with open(csvfilename, 'rb') as infile:
reader = csv.reader(infile)
for row in reader:
print repr(row[5])
will print:
'Block. CertNot Valid.\nQuery with me'
for that row.
This works because that column is correctly quoted.
You can check result here: https://www.debuggex.com/r/2_X5N-wTLZ2laJKh
Console output:
>>> regex = re.compile("\"(.+?)\"",re.MULTILINE|re.DOTALL|re.VERBOSE)
>>> regex.findall(string)
[u'Block. CertNot Valid.\nQuery with me', u'test\naaa', u'bbb\nvvvv']
And 'string' value is:
1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00
1,0.00,,2.00,10,"test
aaa",2013-06-20,0,0.00
1,0.00,,2.00,10,"bbb
vvvv",2013-06-20,0,0.00