Python reading file adds extra characters

Python reading file adds extra characters - python

I have a python file with some passwords. The problem is when Python prints these passwords or saves them in a variable it adds a random "Â" to the mix and I'm trying to figure out why. I could just replace the "Â" with nothing but I would like to know why this is happening.
Some examples:
^oqi£"HS prints out ^oqiÂ£"HS
rS£g)5Q% prints out rSÂ£g)5Q%
Code:
with open('pass.txt') as f:
first_line = f.readline().rstrip()
print(first_line)

Related

Spaces are different in the same sentence/string

I have some stuff in an Excel spreadsheet, which is loaded into a webpage, where the content is displayed. However, what I have noticed is, that some of the content has weird formatting, i.e. a sudden line shift or something.
Then I just tried to copy the text from the spreadsheet, and pasting it into Notepad++, and enabled "Show White Space and Tab", and then the output was this:
The second line is the one directly copied from the spreadsheet, where the first one is just where I copied the string into a variable in Python, printed it, and then copied the output from the output console.
And as you can see the first line has all dots for space, where the other misses some dots. And I have an idea that that is what is doing this trickery, especially because it's at those place the line shift happens.
I have tried to just do something like:
import pandas as pd
data = pd.read_excel("my_spreadsheet.xlsx")
data["Strings"] = [str(x).replace(" ", " ") for x in data["Strings"]]
data.to_excel("my_spreadsheet.xlsx", index=False)
But that didn't change anything, as if I copied it straight from the output console.
So yeah, is there any easy way to make spaces the same type of spaces, or do I have to do something else ?

I think you would need to figure out which exact character is being used there.
You can load the file and print out the characters one by one together with the character code to figure out what's what.
See the code example below. I added some code to skip alphanumeric characters to reduce the actual output somewhat...
with open("filename.txt") as infile:
text = infile.readlines()
def print_ordinal(text: str, skip_alphanum: bool=True):
for line in text:
for character in line:
if not(skip_alphanum and character.isalnum()):
print(f"{character} - {ord(character)}")
print_ordinal(text)

Write new line at the end of a file

I am working with a numpy array in python. I want to print the array and its properties to a txt output. I want the text output to end with a blank line. How can I do this?
I have tried:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])) + '\n')
And also:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])))
text.write('\n')
However, when I open the file in GitHub desktop, I still get the indication that the last line of the file is "dtype"

When you do "\n".join( ... ) you will get a string of the following form:
abc\ndef\nghi\nhjk
-- in other words, it won't end with \n.
If your code writes another \n then your string will be of the form
abc\ndef\nghi\nhjk\n
But that does not put a blank line at the end of your file because textfiles are supposed to have lines that end in \n. That is what the Posix standard says.
So you need another \n so that the last two lines of your file are
hjk\n
\n
Python will not choke if you ask it to read a textfile where the final trailing \n is missing. But it also won't treat a single trailing \n in a textfile as a blank line. It would not surprise me to learn that GitHub does likewise.

This was solved using the Python 3.x print function, which automatically inserts a new line at the end of each print statement.
Here is the code:
with open("demo_numpy.txt","w") as text:
print(a, file = text)
text.close()
Note- apparently it is more appropriate to use the print function rather than .write when dealing with string values as opposed to binary files.

Scripting a website-opener

I'm using a python script to take in a file containing a bunch of website URLs and open all of them in new tabs. However, I'm getting an error message when opening the first website: this is what I get:
0:41: execution error: "https://www.pandora.com/
" doesn’t understand the “open location” message. (-1708)
My script thus far looks like this:
import os
import webbrowser
websites = []
with open("websites.txt", "r+") as my_file:
websites.append(my_file.readline())
for x in websites:
try:
webbrowser.open(x)
except:
print (x + " does not work.")
My file consists of a bunch of URLs on their own lines.

I tried running your code and it works on my machine with python 2.7.9
It may be a character encoding issue when you are trying to open the file
This is my suggestion with the following edits:
import webbrowser
with open("websites.txt", "r+") as sites:
sites = sites.readlines() # readlines returns a list of all the lines in your file, this makes code more concise
# In addition we can use the variable 'sites' to hold the list returned to us by the file object 'sites.readlines()'
print sites # here we send the output of the list to the shell to make sure it contains the right information
for url in sites:
webbrowser.open_new_tab( url.encode('utf-8') ) # this is here just in-case, to encode characters that the webbrowser module can interpret
# sometimes special characters like '\' or '/' can cause issues for us unless we encode/decode them or make them raw strings
Hope this helps!

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""

The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)

If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.

Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

Python not splitting CRLF correctly

I'm writing a script to convert very simple function documentation to XML in python. The format I'm using would convert:
date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.
to:
<item name="date_time_of">
<arg>(date)</arg>
<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>
</item>
So far it works great (the XML I posted above was generated from the program) but the problem is that it should be working with several lines of documentation pasted, but it only works for the first line pasted into the application. I checked the pasted documentation in Notepad++ and the lines did indeed have CRLF at the end, so what is my problem?
Here is my code:
mainText = input("Enter your text to convert:\r\n")
try:
for line in mainText.split('\r\n'):
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
Any idea of what the issue is here?
Thanks.

input() only reads one line.
Try this. Enter a blank line to stop collecting lines.
lines = []
while True:
line = input('line: ')
if line:
lines.append(line)
else:
break
print(lines)

The best way to handle reading lines from standard input (the console) is to iterate over the sys.stdin object. Rewritten to do this, your code would look something like this:
from sys import stdin
try:
for line in stdin:
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
That said, It's worth noting that your parsing code could be significantly simplified with a little help from regular expressions. Here's an example:
import re, sys
for line in sys.stdin:
result = re.match(r"(.*?)\((.*?)\)(.*)", line)
if result:
name = result.group(1)
arg = result.group(2).split(",")
hlp = result.group(3)
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
else:
print "There was an error parsing this line: '%s'" % line
I hope this helps you simplify your code.

Patrick Moriarty,
It seems to me that you didn't particularly mention the console and that your main concern is to pass several lines together at one time to be treated. There's only one manner in which I could reproduce your problem: it is, executing the program in IDLE, to copy manually several lines from a file and pasting them to raw_input()
Trying to understand your problem led me to the following facts:
when data is copied from a file and pasted to raw_input() , the newlines \r\n are transformed into \n , so the string returned by raw_input() has no more \r\n . Hence no split('\r\n') is possible on this string
pasting in a Notepad++ window a data containing isolated \r and \n characters, and activating display of the special characters, it appears CR LF symbols at all the extremities of the lines, even at the places where there are \r and \n alone. Hence, using Notepad++ to verify the nature of the newlines leads to erroneous conclusion
.
The first fact is the cause of your problem. I ignore the prior reason of this transformation affecting data copied from a file and passed to raw_input() , that's why I posted a question on stackoverflow:
Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()
The second fact is responsible of your confusion and despair. Not a chance....
.
So, what to do to solve your problem ?
Here's a code that reproduce this problem. Note the modified algorithm in it, replacing your repeated splits applied to each line.
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"
print "\nrepr(mainText)==",repr(mainText)
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
Here's the solution mentioned by delnan : « read from the source instead of having a human copy and paste it. »
It works with your split('\r\n') :
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
#####################################
with open('funcdoc.txt','rb') as f:
mainText = f.read()
print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"
print "\nrepr(mainText)==",repr(mainText)
print
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
And finally, here's the solution of Python to process the altered human copy: providing the splitlines() function that treat all kind of newlines (\r or \n or \r\n) as splitters. So replace
for line in mainText.split('\r\n'):
by
for line in mainText.splitlines():

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python reading file adds extra characters - python

Related

Spaces are different in the same sentence/string

Write new line at the end of a file

Scripting a website-opener

Parse log file in python

Python not splitting CRLF correctly

Categories

Resources