Does the inchstr() curses-function work in python? - python

I want to retrieve multiple strings in one row of my terminal right now I'm using instr() but that only extracts the string in that exact position. The function that should actually do this is inchstr() but that doesn't seem to work in python or is it?

No. Python's curses binding does not extend the underlying curses library (much). There's more than one related curses function which python might use, depending on what you are looking at, but none read more than a single line of text:
int instr(char *str);
int inwstr(wchar_t *wstr);
int inchstr(chtype *chstr);
int in_wchstr(cchar_t *wchstr);
The first (instr) and third (inchstr) both read from the screen, but the latter returns attributes (color, underline, etc) along with the text.
Python's instr appears to use the former, since its documentation states
Return a bytes object of characters, extracted from the window starting at the current cursor position, or at y, x if specified. Attributes are stripped from the characters. If n is specified, instr() returns a string at most n characters long (exclusive of the trailing NUL).
The second (inwstr) and fourth (in_wchstr) differ from the other two by allowing for reading wide-characters directly. python actually should provide for using either set (narrow or wide character interfaces), since ncurses' wide-character interface is better suited to returning Unicode strings, but it is using the narrow interface in either case, returning a byte array (and requiring the application to puzzle out how to convert the data into a string).

Related

Python np.fromfile() adding arbitrary random comma when reading from binary file

I encounter weird problem and could not solve it for days. I have created byte array that contains values from 1 to 250 and write it to binary file from C# using WriteAllBytes.
Later i read it from Python using np.fromfile(filename, dtype=np.ubyte). However, i realize this functions was adding arbitrary comma (see the image). Interestingly it is not visible in array property. And if i call numpy.array2string, comma turns '\n'. One solution is to replace comma with none, however i have very long sequences it will take forever on 100gb of data to use replace function. I also recheck the files by reading using .net Core, i'm quite sure comma is not there.
What could i be missing?
Edit:
I was trying to read all byte values to array and cast each member to or entire array to string. I found out that most reliable way to do this is:
list(map(str, (ubyte_array))
Above code returns string list that its elements without any arbitrary comma or blank space.

Can't regex string between string and line break - str(x) versus x.decode()

Why does a regex fail of a string cast from an object when line breaks are present?
That is why does this fail to find a match (ie print 'Green') in a string created from str(obj):
import re
s = str(b'Package Name: Green\\r\\n Release version: 8.1\\r\\n')
match = re.search(r'Package Name: (.*)\r\n', s)
print(match.group(1))
When this succeeds using a string created from obj.decode()?
import re
s = (b'Package Name: Green\\r\\n Release version: 8.1\\r\\n').decode()
match = re.search(r'Package Name: (.*)\r\n', s)
print(match.group(1))
No matter what search pattern was tried, searching the string created by str(obj) failed to find a match...
The reason you get different results is that you’re doing different things. Calling str on a bytes with newline characters returns a string with a literal backslash and n; calling decode returns a string with a newline character in it. So, if you’re searching the results for newline characters, the second one will succeed, and the first will fail. And it’s the second one that you wanted.
In other words, using decode here is right, and str is wrong; that’s why you get different results. If you can’t think through the difference, try just printing them out: print(b.decode()); print(str(b)) and you’ll see the difference immediately.
In fact, you should usually be decoding the strings as soon as you receive them, and never looking at the bytes again. Then you never have to worry about the str representation of bytes objects (except maybe in some code that logs errors caused by invalid strings that you couldn’t decode). The only exception is when you know the bytes are some kind of encoded text, but can’t be sure what the encoding is. For example, if you’re parsing HTTP headers or email messages or Python source code, you don’t know the character set until you read part of the file and search it for special ASCII-encoded strings. Or, if you’re converting a bunch of old text files from Windows to Unix line endings and some are cp1252 while others are cp1250, you don’t care which is which because they both encode line endings the same way. For those cases, just stick with bytes, and search for b'\n' instead of '\n'.
If you want to know why Python makes this so complicated:
bytes objects are used to store strings encoded in your default encoding—but they’re also used to store strings encoded in different encodings, and binary data that isn’t a string at all. And a bytes object has no idea which of those it’s storing; they’re all just sequences of numbers.
Python 2 effectively assumed that a bytes was being used to store a string in your default encoding, so it let you convert back and forth to Unicode by calling functions like str, or even concatenating a bytes and Unicode string. That turned out to be one of the biggest sources of errors in the language. You still see Python 2 users posting questions here every few days asking why they got a UnicodeEncodeError when they weren’t calling encode anywhere (or, worse, when they were calling decode), and fixing that was one of the main reasons for Python 3’s existence.
The human-readable representation of a bytes object has to be something that can be produced without error, and read unambiguously, whether it’s a string in the default encoding, a string in a completely different encoding, or a sequence of pixel brightness values ranging from 0 to 255. The compromise solution (for things like that HTTP headers case above) is the backslash-escaped quoted string.
By the way, during the Python 2 to 3 transition, the core devs assumed multiple people would come up with clever EncodedBytes types that carried around their encoding, and could therefore act more like Python 2 byte strings but without all the associated errors, and after a couple years one of them would be the clear winner on PyPI and maybe they could add it to Python 3.3 or so. That’s what you’re probably instinctively reaching for here. But, as it turned out, nobody used any such libraries, because it’s almost always easier to just decode and encode at the edges of your program and use Unicode everywhere, and the exceptions are almost always cases where you don’t know the encoding so EncodedBytes wouldn’t help.
One last thing: thinking of functions like str or float as “casts” is misleading. While it looks superficially similar to the way you do explicit casts in C or Java or Go or whatever language you’re used to, it has a very different meaning

Python: different byte values for the same character?

The program I'm writing captures individual keypresses with the function mscvrt.getch(), which works very similarly to the C function of the same name, but instead of returning a char variable, it returns a byte, which I have to decode afterwards.
However, it has a problem decoding non-ascii characters, like accented letters (it triggers a UnicodeDecodeError), so I handle this exception with a function that compares the returned byte value with a list of byte values of special characters I want, and if it matches with one of them, the function returns its char equivalent.
The problem is that I noticed that the byte value is different on two machines I use (probably something to do with the system being in different languages, and/or I using keyboards with a different layout).
For example, if I input the character à, the byte value returned will be b'\x85' in one machine, and b'\xe0' in the other.
Why does this happen? How can I make a "universal solution" (elegant, preferably) that can work as I want in any machine?
Use msvcrt.getwch().
It will return a str (rather than a byte) that contains the character, and works with unicode rather than ascii.

What is the end of string notation in python

I am from a c background. started learning python few days ago. my question is what is the end of string notation in python. like we are having \0 in c. is there anything like that in python.
There isn't one. Python strings store the length of the string independent from the string contents.
There is nothing like that in Python. A string is simply a string. The following:
test = "Hello, world!"
is simply a string of 13 characters. It's a self-contained object and it knows how many character it contains, there is no need for an end-of-string notation.
Python's string management is internally a little more complex than that. Strings is a sequence type so that from a python coder's point of view it is more an array of characters than anything. (And so it has no terminating character but just a length property.)
If you must know: Internally python strings' data character arrays are null terminated. But the String object stores a couple of other properties as well. (e.g. the hash of the string for use as key in dictionaries.)
For more detailed info (especially for C coders) see here: http://www.laurentluce.com/posts/python-string-objects-implementation/

Mapping Unicode to ASCII in Python

I receive strings after querying via urlopen in JSON format:
def get_clean_text(text):
return text.translate(maketrans("!?,.;():", " ")).lower().strip()
for track in json["tracks"]:
print track["name"].lower()
get_clean_text(track["name"].lower())
For the string "türlich, türlich (sicher, dicker)" I then get
File "main.py", line 23, in get_clean_text
return text.translate(maketrans("!?,.;():", " ")).lower().strip()
TypeError: character mapping must return integer, None or unicode
I want to format the string to be "türlich türlich sicher dicker".
The question is not a complete self-contained example; I can't be sure whether it's Python 2 or 3, where maketrans came from, etc. There's a good chance I will guess wrong, which is why you should be sure to tag your questions appropriately and provide a short, self contained, correct example. (That, and the fact that various other people—some of them probably smarter than me—likely ignored your question because it was ambiguous.)
Assuming you're using 2.x, and you've done a from string import * to get maketrans, and json["name"] is unicode rather than str/bytes, here's your problem:
There are two kinds of translation tables: old-style 8-bit tables (which are just an array of 256 characters) and new-style sparse tables (which are just a dict mapping one character's ordinal to another). The str.translate function can use either, but unicode.translate can only use the second (for reasons that should be obvious if you think about it for a bit).
The string.maketrans function makes old-style 8-bit translation tables. So you can't use it with unicode.translate.
You can always write your own "makeunitrans" function as a drop-in replacement, something like this:
def makeunitrans(frm, to):
return {ord(f):ord(t) for (f,t) in zip(frm, to)}
But if you just want to map out certain characters, you could do something a bit more special purpose:
def makeunitrans(frm):
return {ord(f):ord(' ') for f in frm}
However, from your final comment, I'm not sure translate is even what you want:
I want to format the string to be "türlich türlich sicher dicker"
If you get this right, you're going to format the string to be "türlich türlich sicher dicker ", because you're mapping all those punctuation characters to spaces, not nothing.
With new-style translation tables you can map anything you want to None, which solves that problem. But you might want to step back and ask why you're using the translate method in the first place instead of, e.g., calling replace multiple times (people usually say "for performance", but you wouldn't be building the translation table in-line every time through if that were an issue) or using a trivial regular expression.

Categories