Comparing 2 variables when one is retrieved from text file - python

I am trying to write code that compares variable b with value retrieved from text file using linecache.getline
The problem is it will never print our "ITS WORKING" because the values never match, even if they do :-(
THE TEXT FILE: In the text file there is only one character and its "a"
Here is the code:
import linecache
b="a"
a=linecache.getline("TextFile.txt",1)
if a==b:
print("ITS WORKING")

According to the documentation, linecache.getline will include the trailing newline character, that's why your match does not work.

You probably need to strip the extra spaces at the end of line that is read.
a=linecache.getline("TextFile.txt",1).strip()
Keerthana:~ kiran$ cat TextFile.txt
a
Keerthana:~ kiran$ py Desktop/test.py
a
ITS WORKING
Keerthana:~ kiran$
Hope it helps!

Related

Spaces are different in the same sentence/string

I have some stuff in an Excel spreadsheet, which is loaded into a webpage, where the content is displayed. However, what I have noticed is, that some of the content has weird formatting, i.e. a sudden line shift or something.
Then I just tried to copy the text from the spreadsheet, and pasting it into Notepad++, and enabled "Show White Space and Tab", and then the output was this:
The second line is the one directly copied from the spreadsheet, where the first one is just where I copied the string into a variable in Python, printed it, and then copied the output from the output console.
And as you can see the first line has all dots for space, where the other misses some dots. And I have an idea that that is what is doing this trickery, especially because it's at those place the line shift happens.
I have tried to just do something like:
import pandas as pd
data = pd.read_excel("my_spreadsheet.xlsx")
data["Strings"] = [str(x).replace(" ", " ") for x in data["Strings"]]
data.to_excel("my_spreadsheet.xlsx", index=False)
But that didn't change anything, as if I copied it straight from the output console.
So yeah, is there any easy way to make spaces the same type of spaces, or do I have to do something else ?
I think you would need to figure out which exact character is being used there.
You can load the file and print out the characters one by one together with the character code to figure out what's what.
See the code example below. I added some code to skip alphanumeric characters to reduce the actual output somewhat...
with open("filename.txt") as infile:
text = infile.readlines()
def print_ordinal(text: str, skip_alphanum: bool=True):
for line in text:
for character in line:
if not(skip_alphanum and character.isalnum()):
print(f"{character} - {ord(character)}")
print_ordinal(text)

Write new line at the end of a file

I am working with a numpy array in python. I want to print the array and its properties to a txt output. I want the text output to end with a blank line. How can I do this?
I have tried:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])) + '\n')
And also:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])))
text.write('\n')
However, when I open the file in GitHub desktop, I still get the indication that the last line of the file is "dtype"
When you do "\n".join( ... ) you will get a string of the following form:
abc\ndef\nghi\nhjk
-- in other words, it won't end with \n.
If your code writes another \n then your string will be of the form
abc\ndef\nghi\nhjk\n
But that does not put a blank line at the end of your file because textfiles are supposed to have lines that end in \n. That is what the Posix standard says.
So you need another \n so that the last two lines of your file are
hjk\n
\n
Python will not choke if you ask it to read a textfile where the final trailing \n is missing. But it also won't treat a single trailing \n in a textfile as a blank line. It would not surprise me to learn that GitHub does likewise.
This was solved using the Python 3.x print function, which automatically inserts a new line at the end of each print statement.
Here is the code:
with open("demo_numpy.txt","w") as text:
print(a, file = text)
text.close()
Note- apparently it is more appropriate to use the print function rather than .write when dealing with string values as opposed to binary files.

Python not splitting CRLF correctly

I'm writing a script to convert very simple function documentation to XML in python. The format I'm using would convert:
date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.
to:
<item name="date_time_of">
<arg>(date)</arg>
<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>
</item>
So far it works great (the XML I posted above was generated from the program) but the problem is that it should be working with several lines of documentation pasted, but it only works for the first line pasted into the application. I checked the pasted documentation in Notepad++ and the lines did indeed have CRLF at the end, so what is my problem?
Here is my code:
mainText = input("Enter your text to convert:\r\n")
try:
for line in mainText.split('\r\n'):
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
Any idea of what the issue is here?
Thanks.
input() only reads one line.
Try this. Enter a blank line to stop collecting lines.
lines = []
while True:
line = input('line: ')
if line:
lines.append(line)
else:
break
print(lines)
The best way to handle reading lines from standard input (the console) is to iterate over the sys.stdin object. Rewritten to do this, your code would look something like this:
from sys import stdin
try:
for line in stdin:
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
That said, It's worth noting that your parsing code could be significantly simplified with a little help from regular expressions. Here's an example:
import re, sys
for line in sys.stdin:
result = re.match(r"(.*?)\((.*?)\)(.*)", line)
if result:
name = result.group(1)
arg = result.group(2).split(",")
hlp = result.group(3)
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
else:
print "There was an error parsing this line: '%s'" % line
I hope this helps you simplify your code.
Patrick Moriarty,
It seems to me that you didn't particularly mention the console and that your main concern is to pass several lines together at one time to be treated. There's only one manner in which I could reproduce your problem: it is, executing the program in IDLE, to copy manually several lines from a file and pasting them to raw_input()
Trying to understand your problem led me to the following facts:
when data is copied from a file and pasted to raw_input() , the newlines \r\n are transformed into \n , so the string returned by raw_input() has no more \r\n . Hence no split('\r\n') is possible on this string
pasting in a Notepad++ window a data containing isolated \r and \n characters, and activating display of the special characters, it appears CR LF symbols at all the extremities of the lines, even at the places where there are \r and \n alone. Hence, using Notepad++ to verify the nature of the newlines leads to erroneous conclusion
.
The first fact is the cause of your problem. I ignore the prior reason of this transformation affecting data copied from a file and passed to raw_input() , that's why I posted a question on stackoverflow:
Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()
The second fact is responsible of your confusion and despair. Not a chance....
.
So, what to do to solve your problem ?
Here's a code that reproduce this problem. Note the modified algorithm in it, replacing your repeated splits applied to each line.
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"
print "\nrepr(mainText)==",repr(mainText)
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
Here's the solution mentioned by delnan : « read from the source instead of having a human copy and paste it. »
It works with your split('\r\n') :
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
#####################################
with open('funcdoc.txt','rb') as f:
mainText = f.read()
print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"
print "\nrepr(mainText)==",repr(mainText)
print
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
And finally, here's the solution of Python to process the altered human copy: providing the splitlines() function that treat all kind of newlines (\r or \n or \r\n) as splitters. So replace
for line in mainText.split('\r\n'):
by
for line in mainText.splitlines():

Replace text in file with Python

I'm trying to replace some text in a file with a value. Everything works fine but when I look at the file after its completed there is a new (blank) line after each line in the file. Is there something I can do to prevent this from happening.
Here is the code as I have it:
import fileinput
for line in fileinput.FileInput("testfile.txt",inplace=1):
line = line.replace("newhost",host)
print line
Thank you,
Aaron
Each line is read from the file with its ending newline, and the print adds one of its own.
You can:
print line,
Which won't add a newline after the line.
The print line automatically adds a newline. You'd best do a sys.stdout.write(line) instead.
print adds a new-line character:
A '\n' character is written at the end, unless the print statement ends with a comma. This is the only action if the statement contains just the keyword print.

How can I detect DOS line breaks in a file?

I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch the line endings.
How would I do this? Is there a flag I can test for? Something similar?
Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:
f = open('myfile.txt', 'U')
f.readline() # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)
This gives the newline ending of the first line (Unix, DOS, etc.), if any.
As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.
Reference: http://docs.python.org/2/library/functions.html#open
If you just want to convert a file, you can simply do:
with open('myfile.txt', 'U') as infile:
text = infile.read() # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
outfile.write(text) # Writes newlines for the platform running the program
You could search the string for \r\n. That's DOS style line ending.
EDIT: Take a look at this
(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:
print open('myfile.txt', 'U').read()
That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".
http://docs.python.org/library/functions.html#open
(Thanks handle!)
As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:
if "\r\n" in open("/path/file.txt","rb").read():
print "DOS line endings found"
Edit: simplified as per John Machin's comment (no need to use regular expressions).
dos linebreaks are \r\n, unix only \n. So just search for \r\n.
Using grep & bash:
grep -c -m 1 $'\r$' file
echo $'\r\n\r\n' | grep -c $'\r$' # test
echo $'\r\n\r\n' | grep -c -m 1 $'\r$'
You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.
In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.
def get_newline(filename):
with open(filename, "rb") as f:
while True:
c = f.read(1)
if not c or c == b'\n':
break
if c == b'\r':
if f.read(1) == b'\n':
return '\r\n'
return '\r'
return '\n'

Categories