Spaces are different in the same sentence/string - python

I have some stuff in an Excel spreadsheet, which is loaded into a webpage, where the content is displayed. However, what I have noticed is, that some of the content has weird formatting, i.e. a sudden line shift or something.
Then I just tried to copy the text from the spreadsheet, and pasting it into Notepad++, and enabled "Show White Space and Tab", and then the output was this:
The second line is the one directly copied from the spreadsheet, where the first one is just where I copied the string into a variable in Python, printed it, and then copied the output from the output console.
And as you can see the first line has all dots for space, where the other misses some dots. And I have an idea that that is what is doing this trickery, especially because it's at those place the line shift happens.
I have tried to just do something like:
import pandas as pd
data = pd.read_excel("my_spreadsheet.xlsx")
data["Strings"] = [str(x).replace(" ", " ") for x in data["Strings"]]
data.to_excel("my_spreadsheet.xlsx", index=False)
But that didn't change anything, as if I copied it straight from the output console.
So yeah, is there any easy way to make spaces the same type of spaces, or do I have to do something else ?

I think you would need to figure out which exact character is being used there.
You can load the file and print out the characters one by one together with the character code to figure out what's what.
See the code example below. I added some code to skip alphanumeric characters to reduce the actual output somewhat...
with open("filename.txt") as infile:
text = infile.readlines()
def print_ordinal(text: str, skip_alphanum: bool=True):
for line in text:
for character in line:
if not(skip_alphanum and character.isalnum()):
print(f"{character} - {ord(character)}")
print_ordinal(text)

Related

Python reading file adds extra characters

I have a python file with some passwords. The problem is when Python prints these passwords or saves them in a variable it adds a random "Â" to the mix and I'm trying to figure out why. I could just replace the "Â" with nothing but I would like to know why this is happening.
Some examples:
^oqi£"HS prints out ^oqi£"HS
rS£g)5Q% prints out rS£g)5Q%
Code:
with open('pass.txt') as f:
first_line = f.readline().rstrip()
print(first_line)

Pandas Output File not separating into different lines

I have this:
with open(str(ssis_txt_file_names_only[a]) + '.dts', 'w', encoding='utf16') as file:
whatever = whatever.replace("\n","")
print(whatever)
file.write(str(whatever))
When I do a print(whatever) all of the text appears on 1 line instead of broken up. Do anyone know what might be the cause?
Currently, my output looks like this:
>N</IsConnectionProperty> <Flags> 0</Flags> </AdapterProperty> <AdapterProperty>
What I want is this:
>N<I/IsConnectionProperty>
<Flags> 0</Flags>
</AdapterProperty>
<AdapterProperty>
Shouldn't the \n be doing this?
Your line whatever = whatever.replace("\n","") is replacing all linebreaks with nothing, so that's the culprit.
To your issue in the comments, Notepad doesn't recognize \n only as a linebreak; it needs the full Windows-style \r\n. Chances are if you open it in another editor, you'll see the linebreaks if you comment out the .replace line. Alternatively, if you make the line read whatever = whatever.replace("\n","\r\n"), it should display as expected in Notepad.

Write new line at the end of a file

I am working with a numpy array in python. I want to print the array and its properties to a txt output. I want the text output to end with a blank line. How can I do this?
I have tried:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])) + '\n')
And also:
# Create a text document of the output
with open("demo_numpy.txt","w") as text:
text.write('\n'.join(map(str, [a,shape,size,itemsize,ndim,dtype])))
text.write('\n')
However, when I open the file in GitHub desktop, I still get the indication that the last line of the file is "dtype"
When you do "\n".join( ... ) you will get a string of the following form:
abc\ndef\nghi\nhjk
-- in other words, it won't end with \n.
If your code writes another \n then your string will be of the form
abc\ndef\nghi\nhjk\n
But that does not put a blank line at the end of your file because textfiles are supposed to have lines that end in \n. That is what the Posix standard says.
So you need another \n so that the last two lines of your file are
hjk\n
\n
Python will not choke if you ask it to read a textfile where the final trailing \n is missing. But it also won't treat a single trailing \n in a textfile as a blank line. It would not surprise me to learn that GitHub does likewise.
This was solved using the Python 3.x print function, which automatically inserts a new line at the end of each print statement.
Here is the code:
with open("demo_numpy.txt","w") as text:
print(a, file = text)
text.close()
Note- apparently it is more appropriate to use the print function rather than .write when dealing with string values as opposed to binary files.

How to erase part of a read only file when printing it

Practically, I'm reading a file line by line and then printing onto the screen in pygame.
textbeingread = f.readline()
The code takes 'textbeingread' and uses that to show text on the screen but because each piece of writing is on a separate line it has the little icon to show that there a line underneath it (not exactly sure how to show it). I was just wondering if there was a way (because each line is a different length) to omit the last character in the line but use everything else. Thanks in advance :)
textbeingread = f.readline()[:-1]
or
textbeingread = f.readline()[:-2]
Depends on whether you want to get rid of newline character or also the character before it.
textbeingread = f.readline().rstrip("\r\n")

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""
The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)
If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.
Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

Categories