python parsing string to csv format - python

I have a file containing a line with the following format
aaa=A;bbb=B;ccc=C
I want to convert it to a csv format so the literals on the equation sides will be columns and the semicolon as a row separator. I tried doing something like this
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
writer = csv.writer(csvFile)
rows = []
if f.mode == 'r':
single = f.readline()
lns = single.split(";")
for item in lns:
rows.append(item.replace("=", ","))
writer.writerows(rows)
f.close()
csvFile.close()
but I am getting each letter as a column so the result looks like :
a,a,a,",",A
b,b,b,",",B
c,c,c,",",C,"
The expected result should look like
aaa,A
bbb,B
ccc,C

The following 1 line change worked for me:
rows.append(item.split('='))
instead of the existing code
rows.append(item.replace("=", ",")).
That way, I was able to create a list of lists which can easily be read by the writer so that the row list looks like [['aaa', 'A'], ['bbb', 'B'], ['ccc', 'C']]instead of ['aaa,A', 'bbb,B', 'ccc,C']

Just write the strings into the target file line by line:
import os
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
single = f.readline()
lns = single.split(";")
for item in lns:
csvFile.write(item.replace("=", ",") + os.linesep)
f.close()
The output would be:
aaa,A
bbb,B
ccc,C

It helps to interactively execute the commands and print the values, or add debug print in the code (that will be removed or commented when everything works). Here you could have seen that rows is ['aaa,A', 'bbb,B', 'ccc,C'] that is 3 strings when it should be three sequences.
As a string is a (read only) sequence of chars writerows uses each char as a field.
So you do not want to replace the = with a comma (,), but want to split on the equal sign:
...
for item in lns:
rows.append(item.split("=", 1))
...
But the csv module requires for proper operation the output file to be opened with newline=''.
So you should have:
with open("ccc.csv", 'w', newline='') as csvFile:
...

The parameter to writer.writerows() must be an iterable of rows, which must in turn be iterables of strings or numbers. Since you pass it a list of strings, characters in the strings are treated as separate fields. You can obtain the proper list of rows by splitting the line first on ';', then on '=':
import csv
with open('in.txt') as in_file, open('out.csv', 'w') as out_file:
writer = csv.writer(out_file)
line = next(in_file).rstrip('\n')
rows = [item.split('=') for item in line.split(';')]
writer.writerows(rows)

Related

Writing List to CSV in Python

I'm writing data from a PDF to a CSV. The CSV needs to have one column, with each word on a separate row.
The code below writes each word on a separate row, but also puts each letter in a separate cell.
with open('annualreport.csv', 'w', encoding='utf-8') as f:
write = csv.writer(f)
for i in keywords:
write.writerow(i)
I have also attempted the following, which writes all the words to one row, with each word in a separate column:
with open('annualreport.csv', 'w', encoding='utf-8') as f:
write = csv.writer(f)
write.writerow(keywords)
As far as I know, writerow expects an array. Thus a word is treated as an array with the individual letters -> each letter is written into a new cell.
Putting the value into a single array should fix the problem:
with open('annualreport.csv', 'w', encoding='utf-8') as f:
write = csv.writer(f)
for i in keywords:
write.writerow( [ i ] ) # <-- before: write.writerow(i)
import csv
# data to be written row-wise in csv fil
data = [['test'], [try], ['goal']]
# opening the csv file in 'w+' mode
file = open('output.csv', 'w+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(data)

How to format txt file in Python

I am trying to convert a txt file into a csv file in Python. The current format of the txt file are several strings separated by spaces. I would like to write each string into one cell in the csv file.
The txt file has got following structure:
UserID Desktop Display (Version) (Server/Port handle), Date
UserID Desktop Display (Version) (Server/Port handle), Date
etc.
My approach would be following:
with open('licfile.txt', "r+") as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split(" ") for line in stripped if line)
with open('licfile.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('user', 'desktop', 'display', 'version', 'server', 'handle', 'date'))
writer.writerows(lines)
Unfortunately this is not working as expected. I do get following ValueError: I/O operation on closed file. Additionally only the intended row headers are shown in one cell in the csv file.
Any tips on how to proceed? Many thanks in advance.
how about
with open('licfile.txt', 'r') as in_file, open('licfile.csv', 'w') as out_file:
for line in in_file:
if line.strip():
out_file.write(line.strip().replace(' ', ',') + '\n')
and for the german Excel enthusiasts...
...
...
...
... .replace(' ', ';') + '\n')
:)
You can also use the built in csv module to accomplish this easily:
import csv
with open('licfile.txt', 'r') as in_file, open('licfile.csv', 'w') as out_file:
reader = csv.reader(in_file, delimiter=" ")
writer = csv.writer(out_file, lineterminator='\n')
writer.writerows(reader)
I used lineterminator='\n' argument here as the default is \r\n and it ends up giving you an extra line of return per row in most cases.
There are also a few arguments you could use if say quoting is needed or a different delimiter is desired: https://docs.python.org/3/library/csv.html#csv-fmt-params
You are using comprehension with round brackets which will cause to create tuple object. Instead of that just use square bracket which will return list. see below example:
stripped = [line.strip() for line in in_file]
lines = [line.split(" ") for line in stripped if line]
licfile_df = pd.read_csv('licfile.txt',sep=",", header=None)

Python - save csv file with tab separated words in separate cell

I have this input file:
one\tone
two\ttwo
three\tthree
With a tab between each word.
I am trying to save it in a csv file where each word ends up in its own cell. This is my code:
import csv
input = open('input.txt').read()
lines = input.split('\n')
with open('output.csv', 'w') as f:
writer = csv.writer(f)
for line in lines:
writer.writerow([line])
However, both words end up in the same cell:
How do I change the code so that each word ends up in its own cell?
Try this:
import csv
input = open('input.txt').read()
lines = input.split('\n')
with open('output.csv', 'w') as f:
writer = csv.writer(f)
for line in lines:
writer.writerow(line.split('\t'))
The writerow method in the CSV writer library takes a list of columns.
Currently, you are providing your whole string the value of the first column
writer.writerow([line])
Instead, try splitting the string by \t, thus creating a list of each individual word and provide that to the library instead.
writer.writerow(line.split("\t"))
You need to split the input lines into a list, so that csv.writer() will put them into seperate columns. Try:
with open('output.csv', 'w') as f:
writer = csv.writer(f)
for line in lines:
writer.writerow(line.split('\t'))

Python csv write a list to file

I am writing a script to write a list with tab separated as below to a csv file. But i am not getting proper output on this.
out_l = ['host\tuptime\tnfsserver\tnfs status\n', 'node1\t2\tnfs_host\tok\n', 'node2\t100\tnfs_host\tna\n', 'node3\t59\tnfs_host\tok\n']
code:
out_f = open('test.csv', 'w')
w = csv.writer(out_f)
for l in out_l:
w.writerow(l)
out_f.close()
The output csv file reads as below.
h,o,s,t, ,s,s,h, , , , , ,s,u,d,o,_,h,o,s,t, , , , , , , ,n,f,s,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,1, ,o,k, ,n,f,s,h,o,s,t, ,o,k,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,2, ,f,a,i,l,e,d, ,n,a, ,n,a,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,3, ,o,k, ,n,f,s,h,o,s,t, ,s,h,o,w,m,o,u,n,t, ,f,a,i,l,e,d,"
"
Also I have checked the csv.writer option like delimiter, dialect=excel, but no luck.
Can some one help to format the output?
With the formatting you have in out_l, you can just write it to a file:
out_l = ['host\tuptime\tnfsserver\tnfs status\n', 'node1\t2\tnfs_host\tok\n', 'node2\t100\tnfs_host\tna\n', 'node3\t59\tnfs_host\tok\n']
with open('test.csv', 'w') as out_f:
for l in out_l:
out_f.write(l)
To properly use csv, out_l should just be lists of the columns and let the csv module do the formatting with tabs and newlines:
import csv
out_l = [['host','uptime','nfsserver','nfs status'],
['node1','2','nfs_host','ok'],
['node2','100','nfs_host','na'],
['node3','59','nfs_host','ok']]
#with open('test.csv', 'wb') as out_f: # Python 2
with open('test.csv', 'w', newline='') as out_f: # Python 3
w = csv.writer(out_f, delimiter='\t') # override for tab delimiter
w.writerows(out_l) # writerows (plural) doesn't need for loop
Note that with will automatically close the file.
See the csv documentation for the correct way to open a file for use with csv.reader or csv.writer.
The csv.Writer.writerow method takes an iterable and writes the values said iterable produces into the csv fields separated by the specified delimeter:
out_f = open('test.csv', 'w')
w = csv.writer(out_f, delimiter='\t') # set tab as delimiter
for l in out_l: # l is string (iterable of chars!)
w.writerow(l.split('\t')) # split to get the correct tokens
out_f.close()
As the strings in your list already contain the necessary tabs, you could just write them directly to the file, no csv tools needed. If you have built/joined the strings in out_l manually, you can omit that step and just pass the original data structure to writerow.
The delimiter parameter
The delimiter parameter controls the delimiter in the output. It has nothing to do with the input out_l.
Why your output is garbled
csv.writer.writerow iterates the input. In your case you are giving it a string (host\tuptime\tnfsserver\tnfs status\n', etc.), therefore the function iterates the string, giving you a sequence of chars.
How to produce the correct output
Give it a list of fields instead of the full string by using str.split(). In your case the string ends with \n, so use str.strip() as well:
import csv
out_l = ['host\tuptime\tnfsserver\tnfs status\n',
'node1\t2\tnfs_host\tok\n',
'node2\t100\tnfs_host\tna\n',
'node3\t59\tnfs_host\tok\n']
out_f = open('test.csv', 'w')
w = csv.writer(out_f)
for l in out_l:
w.writerow(l.strip().split('\t'))
out_f.close()
This should be what you want:
host,uptime,nfsserver,nfs status
node1,2,nfs_host,ok
node2,100,nfs_host,na
node3,59,nfs_host,ok
Reference: https://docs.python.org/3/library/csv.html
Very simple:
with open("test.csv" , 'w') as csv_file:
writer = csv.writer(csv_file, delemeter='\t')
for item in out_l:
writer.writerow([item,])

Remove double quotes from iterator when using csv writer

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

Categories