'2' character with star inside? - python

Edit: Determined so far: It's not the 2, it's a character before the two, hex value BF, causing the star in the following character (which happens to be 2)
I'm running an elastic-mapreduce job using python scripts I have written, and I'm getting some weird output in the form of unexpected lines. I have noticed a pattern, however. The expected lines all have unexpected '2's in the form of characters with small stars just inside the top curve of the character. That is, when I open the file in Notepad++ (but not Notepad or Word) I see some twos show up like this (excuse the links, I am unable to embed images at less than 10 rep):
In text: http://i.imgur.com/zaWtC3S.png
Zoomed in: http://i.imgur.com/bTYIlh6.png
The weird '2's also show up when I run the python scripts on my own machine (though the unexpected lines do not). Does anyone know what might be causing this? It might shed some light on the odd extra lines of output I'm getting. I'm also just genuinely curious.
Also, I thought it might have had to do with encoding/decoding I was doing to parse safe URLs, but when I took out those parts the weird '2's remained, so it wasn't that.
Thanks

You have EF BB BF in there... that's the UTF-8 encoding of the BOM mark: byte order mark. See http://en.wikipedia.org/wiki/Byte_order_mark . I suspect that the star in the letter is your editor's way of signifying "I just got a BOM". See this earlier question . It seems to be a well known "thing", and that thread has some suggestions for dealing with it.

Related

docstring (triple quotes) in iPython/Jupyter with autoclose brackets/quotes?

I'm trying to use docstrings w/ triple-quotes in my Jupyter notebooks using Python 2.7 .
I can disable the autoclose brackets/quotes thing but I'm quite keen on them; major increase in workflow.
Does anyone know how to do triple quotes without over-quoting while keeping the autoclose feature?
If I press the " key 3x I get """""";
If I press it 3x and delete once, I get """" pressing; and
If I press it 3x and delete twice, I get ""
Annoying, right? How can I have the best of both worlds (autoclose | docstrings) ?
This is a pretty low-level question, but I haven't seen an easy fix anywhere so the answer should be useful for the community. If you downvote, can you explain why this is a poor question please?
Nothing is wrong. When you type three " your cursor is at the middle of the resulting six. Thus, anything you type is within the string and has been auto-closed.
Type this exact string of characters: """This is working without clicking or otherwise moving the cursor. The result will be a correcly formatted string, because it will have auto-closed the string. Therefore you have both strings and autoclose.

Vexing Python syntax error

I am writing a python script using version 2.7.3. In the script a line is
toolsDir = 'tools/'
When I run this in terminal I get SyntaxError: invalid syntax on the last character in the string 'r'. I've tried renaming the string, using " as opposed to '. If I actually go into python via bash and declare the string in one line and print it I get no error.
I checked the encoding via file -i update.py and I get text/x-python; charset=us-ascii
I have used TextWrangler, nano and LeafPad as the text editors.
I have a feeling it may be something with the encoding of one of the editors. I have had this script run before without any errors.
Any advice would be greatly appreciated.
The string is 'tools/'. toolsDir is a variable. You're free to use different terminology, of course, but you'll end up confusing people trying to help you. The only r in that line is the last character of the variable name, so I assume that's the location of the error.
Most likely you've managed to introduce a fixed-width space (character code 0xA0) instead of an ordinary space. Try deleting SP=SP (all three characters) and retyping them.
Try running the code through pylint.
You probably have a syntax error on a nearby line before this one. Try commenting this line out and see if the error moves.
You might have a whitespace error, don't forget whitespace counts in python. If you've mixed tabs and spaces anywhere in your file it can throw the syntax checker off by several lines.
If you copied and pasted lines into this from any other source you may have copied whitespace in that doesn't fit with whichever convention you used.
The error was, of course, a silly one.
In one of my imports I use try: without closing or catching the error condition. pylint did not catch this and the error message did not indicate this.
If someone in the future has this triple check all opening code for syntax errors.

How to recognize special eol character when I see it, using Python?

I'm scraping a set of originally pdf files, using Python. Having gotten them to text, I had a lot of trouble getting the line endings out. I couldn't figure out what the line separator was. The trouble is, I still don't know.
It's not a '\n', or, I don't think, '\r\n'. However, I've managed to isolate one of these special characters. I literally have it in memory, and by doing a call to my_str.replace(eol, ''), I can remove all of these characters from one of my files.
So my question is open-ended. I'm a bit lost when it comes to unicode and such. How can I identify this character in my files without resorting to something ridiculous, like serializing it and then reading it in? Is there a way I can refer to it as a code, perhaps? I can't get Python to yield what it actually IS. All I ever see if I print it, or call unicode(special_eol) is the character in its functional usage as a newline.
Please help! Thanks, and sorry if I'm missing something obvious.
To determine what specific character that is, you can use str.encode('unicode_escape') or repr() to get (in Python 2) a ASCII-printable representation of the character:
>>> print u'☃'.encode('unicode_escape')
\u2603
>>> print repr(u'☃')
u'\u2603'

PDFtotext - whitespace showing as aacute on commandline

I am extracting text using python from a textfile created from pdf using pdftotext. It is one of 2000 files and in this particular one, a line of keywords ends in EU. The remainder of the line is blank to the naked eye and so is the following line.
The program normally strips off any trailing blanks at the end of a line and ignores the subsequent blank line.
In this instance, it is saving the whitespace which is seen when it is printed out in at textfile between "EU. " and similarly in html (Simile Exhibit).
I also printed to the command line and here I see a string of aacute. [?]
I thought the obvious way to deal with this was to search and replace the accute. I've tried to do that with a compile statement and I've played with permutations of decoding the incoming text.
Oddly though, when I print "\255" I don't get an aacute, I get an o grave.
It seems likely with this odd combination of errors that I have misunderstood something fundamental. Any tips of how to begin unravelling this?
Many thanks.
The first tip is not to print wildly to all possible output mechanisms using various unstated encodings. Find out exactly what you have got. Do this:
print repr(the_line_with_the_problem) # Python 2.x
print(ascii(the_line_with_the_problem)) # Python 3.x
and edit your question and copy/paste the result.
Second tip: When asking for help, give information about your environment:
What version of Python? What version of what operating system?
Also show locale-related info; following example is from my computer running Python 2.7 in a Windows 7 Command Prompt window::
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> sys.stdout.encoding
'cp850'
>>> locale.getdefaultlocale()
('en_AU', 'cp1252')
>>>
Third tip: Don't use your own jargon ... the concepts "Simile Exhibit", "printed to the command line", and "compile statement" need explanation.
What is the relevance of "\255"? Where did you get that from?
Wild guesses while waiting for some facts to emerge:
(1) The offending character is U+00A0 NO-BREAK SPACE aka NBSP which appears in your text as "\xA0" and when sent to stdout in a Western European locale on Windows using a Command Prompt window would be treated as being encoded in cp850 and thus appear as a-acute. How this could be transmogrified into o-grave is a mystery.
(2) "\255" == \xAD implies the offending character is U+00AD SOFT HYPHEN but why this would be seen as o-grave is a mystery, and it's not "whitespace"; it shouldn't be shown at all, and it it is shown it should be as a hyphen/minus-sign, not a space.

Problems with Nested Functions

THIS TURNED OUT TO BE A SYNTAX ERROR ON MY PART A LINE EARLIER IN THE CODE.
Hello, I'm having some trouble with a nested function I wrote in python. Here is the relevant code.
device = "/dev/sr0"
def burn():
global device
burnaudiotrack(device)
createiso(device)
burntrack2(device)
I'm confused, because every time I try to run the script, python returns this:
File "./install.py", line 72
burnaudiotrack(device)
^
SyntaxError: invalid syntax
I've nested functions before, and done so in a similar manner. I feel like I'm missing something fairly obvious here, but I can't pinpoint it. Thank you for your help/suggestions!.
EDIT:
Full code: (I tried to just post relevant info in the original)
http://dpaste.com/hold/291347/
It's a tad messy, and there may be other errors, but this one is vexing me at the moment.
You are missing a close parenthesis on line 61.
Looks like the quote and paren at the end of the line are swapped.
speed = raw_input("Recomended(4);Default(8))"
should be
speed = raw_input("Recomended(4);Default(8)")
The code you have pasted into your question appears to have tabs as well as spaces. You should (according to PEP-8) always use spaces for indenting in Python. Check your text editor settings.
What's probably happened is you have some mix of tabs and spaces that looks correct in your editor, but is being interpreted differently by the Python compiler. The Python compiler sees a different inconsistent indenting, and throws a SyntaxError.
Update: As another answer points out, you are missing a closing parenthesis on a line of code you didn't show in your original question. Nevertheless, my comments about tabs in your source still hold.

Categories