CSV new-line character and qoute inside quoted field - python

I have tried so many options inside csv.reader but its not working. I am new to python and tried almost every parameter,the single messy message inside my csv file look like this
"Hey Hi
how are you all,I stuck into this problem,i have tried with such parameter but exceeding the existing number of records,in short file is not getting read properly.
\"I have tried with
datareader=csv.reader(csvfile,quotechar='"',lineterminator='\n\n\n\r\r',quoting=csv.QUOTE_ALL)
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? \"......... hence the problem continue.
"
as expected due to \" and \n in message getting more records or the records getting break,i have tried with different line terminator as well as you can see in the message but not succeed,this is my code right now..
with open("D:/Python/mssg5.csv", "r") as csvfile:
datareader = csv.reader(csvfile,quotechar='"' ,lineterminator='\n',quoting=csv.QUOTE_ALL)
count = 0
#csv_out = open('D:/Python/mycsv.csv', 'wb')
#mywriter = csv.writer(csv_out)
for row in datareader:
count = count + 1
print "COUNT is :%d" % count
Any kind of help,thanks.

A couple of things to try in the csv file:
Put the messy string into tipple quotes """ the string """
At the end of each line within your messy field use the continue char \

Related

Error when getting the tab delimated data from the text file to 2 variables in python

I have a text file which has tab delimated data with 2 language translations as follows;
to the regimes thanthrayanta
according to the anuwa
great maha
situation thathwaya
parabraman parabrahman
two of the two dwithwayan
on a matha
depends randa
exist pawathee
he ohu
I am trying to get those data as follows,
# Read the file and split into lines
lines = open('old data/eng-sin.txt' % (lang1, lang2), encoding='utf-8').\
read().strip().split('\n')
But when I run the code, I get an error as ;
TypeError: not all arguments converted during string formatting
As I searched the error I got an answer as the % used is depreciated and new way is to use .formate but still it doesn't solve the issue. Please help to fix this issue
Anyway this wouldn't work since you can have spaces, so for example you would obtain lang1 = 'to' and lang2 = 'the' in the first line.
You could do something like this:
with open('old data/eng-sin.txt', encoding='utf-8') as f:
lines = [ line.split('\t') for line in f ]

Python inconsistent use of tabs writing to csv file

I'm getting an error writing a 'title' line to a csv file:
File ".\aws_ec2_list_instances.py", line 58
title_writer.writerow("AWS Master Instance List " + today)
^
TabError: inconsistent use of tabs and spaces in indentation
I have a variable called today that I want to use:
today = datetime.today()
today = today.strftime("%m-%d-%Y")
This is the line causing the error:
title_writer = csv.writer(output_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
title_writer.writerow("AWS Master Instance List " + today)
I want the date as represented by the today variable listed next to the title.
How can I do this correctly?
You can fix this with a Find and Replace operation on your code:
Find: tab '\t'
Replace with: four spaces ' '
Having both tabs and spaces will make python unhappy, pick one and stick with it, I suggest spaces.
In fact, depending on what you are using to write your code, you can have this done automatically if you press tab. In Notepad++ it is under Settings > Preferences > Language > Replace by space

Unable to Format Output Text File to Desired Form using "write" function Python

I am unable to format the output of my text file as I want it. I have fooled around with this for almost an hour, to no avail, and it's driving me mad. I want the first four floats to be on one line, and the next 10 values to be delimited by new lines.
if not (debug_flag>0):
text_file = open("Markov.txt", "w")
text_file.write("%.2f,%.2f,%.2f,%.2f" % (prob_not_to_not,prob_not_to_occured, prob_occured_to_not, prob_occured_to_occured))
for x in xrange(0,10):
text_file.write("\n%d" % markov_sampler(final_probability))
text_file.close()
Does anyone know what the issue is? The output I'm getting is all on 1 line.
You have to put the line feed at the end of the first line for it to work.
Also your text editor may be configure to have the \r\n end of line( if you are using notepad ), in wich case you should be seeing everything in the same line.
The code with the desired output may look something like this
if not (debug_flag>0):
text_file = open("Markov.txt", "w")
text_file.write("%.2f,%.2f,%.2f,%.2f\n" % (prob_not_to_not,prob_not_to_occured, prob_occured_to_not, prob_occured_to_occured))
for x in xrange(0,10):
text_file.write("%d\n" % markov_sampler(final_probability))
text_file.close()

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""
The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)
If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.
Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

Python not splitting CRLF correctly

I'm writing a script to convert very simple function documentation to XML in python. The format I'm using would convert:
date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.
to:
<item name="date_time_of">
<arg>(date)</arg>
<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>
</item>
So far it works great (the XML I posted above was generated from the program) but the problem is that it should be working with several lines of documentation pasted, but it only works for the first line pasted into the application. I checked the pasted documentation in Notepad++ and the lines did indeed have CRLF at the end, so what is my problem?
Here is my code:
mainText = input("Enter your text to convert:\r\n")
try:
for line in mainText.split('\r\n'):
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
Any idea of what the issue is here?
Thanks.
input() only reads one line.
Try this. Enter a blank line to stop collecting lines.
lines = []
while True:
line = input('line: ')
if line:
lines.append(line)
else:
break
print(lines)
The best way to handle reading lines from standard input (the console) is to iterate over the sys.stdin object. Rewritten to do this, your code would look something like this:
from sys import stdin
try:
for line in stdin:
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
That said, It's worth noting that your parsing code could be significantly simplified with a little help from regular expressions. Here's an example:
import re, sys
for line in sys.stdin:
result = re.match(r"(.*?)\((.*?)\)(.*)", line)
if result:
name = result.group(1)
arg = result.group(2).split(",")
hlp = result.group(3)
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
else:
print "There was an error parsing this line: '%s'" % line
I hope this helps you simplify your code.
Patrick Moriarty,
It seems to me that you didn't particularly mention the console and that your main concern is to pass several lines together at one time to be treated. There's only one manner in which I could reproduce your problem: it is, executing the program in IDLE, to copy manually several lines from a file and pasting them to raw_input()
Trying to understand your problem led me to the following facts:
when data is copied from a file and pasted to raw_input() , the newlines \r\n are transformed into \n , so the string returned by raw_input() has no more \r\n . Hence no split('\r\n') is possible on this string
pasting in a Notepad++ window a data containing isolated \r and \n characters, and activating display of the special characters, it appears CR LF symbols at all the extremities of the lines, even at the places where there are \r and \n alone. Hence, using Notepad++ to verify the nature of the newlines leads to erroneous conclusion
.
The first fact is the cause of your problem. I ignore the prior reason of this transformation affecting data copied from a file and passed to raw_input() , that's why I posted a question on stackoverflow:
Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()
The second fact is responsible of your confusion and despair. Not a chance....
.
So, what to do to solve your problem ?
Here's a code that reproduce this problem. Note the modified algorithm in it, replacing your repeated splits applied to each line.
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"
print "\nrepr(mainText)==",repr(mainText)
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
Here's the solution mentioned by delnan : « read from the source instead of having a human copy and paste it. »
It works with your split('\r\n') :
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
#####################################
with open('funcdoc.txt','rb') as f:
mainText = f.read()
print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"
print "\nrepr(mainText)==",repr(mainText)
print
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
And finally, here's the solution of Python to process the altered human copy: providing the splitlines() function that treat all kind of newlines (\r or \n or \r\n) as splitters. So replace
for line in mainText.split('\r\n'):
by
for line in mainText.splitlines():

Categories