Unicode as String without conversion Python

Unicode as String without conversion Python - python

I'm trying to convert unicode text to string literally, but I don't seem to find a way to do this.
input= u'/123/123/123'
convert to string:
output="/123/123/123"
If I try to do str(), it will encode it and if I try to loop over the text and convert letter by letter, it will give me each one of the unicode characters.
EDIT: Take into consideration that the objective is not to convert the string but to take the letters in the unicode text and create a string. If I follow the link provided in the comment:
Convert a Unicode string to a string in Python (containing extra symbols)
import unicodedata
unicodedata.normalize('NFKD', input).encode('ascii','ignore')
output='SSS'
and as it is possible to see..it is not the expected output.
Edit: I wrote as an example the unicode u'/123' but Im trying to convert chinese characters, example:
a=u'\u6c34'
str(a)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u6c34' in position 0: ordinal not in range(128)
output_expected="\u6c34"

I've tried to convert it with str() as you mention in your question, and it does work for me. You can check the encoding with type().
>>> input= u'/123/123/123'
>>> type(input)
<type 'unicode'>
>>> output=str(input)
>>> print output
/123/123/123
>>> type(output)
<type 'str'>
How do you try to iterate among the letters? I've tried and they are still as a string. You could convert the input first and then do whatever you want once they are str:
letters = [x for x in output]
for letter in letters:
... print type(letter)
...
I hope it helps!

Here's how to do it the easy way:
>>> a=u'\x83\u6c34\U00103ABC'
>>> a.encode('unicode_escape')
'\\x83\\u6c34\\U00103abc'
>>> print a.encode('unicode_escape')
\x83\u6c34\U00103abc

Here's how to do it the hard way.
ascii_printable = set(unichr(i) for i in range(0x20, 0x7f))
def convert(ch):
if ch in ascii_printable:
return ch
ix = ord(ch)
if ix < 0x100:
return '\\x%02x' % ix
elif ix < 0x10000:
return '\\u%04x' % ix
return '\\U%08x' % ix
output = ''.join(convert(ch) for ch in input)
For Python 3 use chr instead of unichr.

Somebody wrote a really complete code for doing this, so cool, sources:
import unicodedata
def fix_bad_unicode(text):
if not isinstance(text, unicode):
raise TypeError("This isn't even decoded into Unicode yet. "
"Decode it first.")
if len(text) == 0:
return text
maxord = max(ord(char) for char in text)
tried_fixing = []
if maxord < 128:
# Hooray! It's ASCII!
return text
else:
attempts = [(text, text_badness(text) + len(text))]
if maxord < 256:
tried_fixing = reinterpret_latin1_as_utf8(text)
tried_fixing2 = reinterpret_latin1_as_windows1252(text)
attempts.append((tried_fixing, text_cost(tried_fixing)))
attempts.append((tried_fixing2, text_cost(tried_fixing2)))
elif all(ord(char) in WINDOWS_1252_CODEPOINTS for char in text):
tried_fixing = reinterpret_windows1252_as_utf8(text)
attempts.append((tried_fixing, text_cost(tried_fixing)))
else:
# We can't imagine how this would be anything but valid text.
return text
# Sort the results by badness
attempts.sort(key=lambda x: x[1])
#print attempts
goodtext = attempts[0][0]
if goodtext == text:
return goodtext
else:
return fix_bad_unicode(goodtext)
def reinterpret_latin1_as_utf8(wrongtext):
newbytes = wrongtext.encode('latin-1', 'replace')
return newbytes.decode('utf-8', 'replace')
def reinterpret_windows1252_as_utf8(wrongtext):
altered_bytes = []
for char in wrongtext:
if ord(char) in WINDOWS_1252_GREMLINS:
altered_bytes.append(char.encode('WINDOWS_1252'))
else:
altered_bytes.append(char.encode('latin-1', 'replace'))
return ''.join(altered_bytes).decode('utf-8', 'replace')
def reinterpret_latin1_as_windows1252(wrongtext):
return wrongtext.encode('latin-1').decode('WINDOWS_1252', 'replace')
def text_badness(text):
assert isinstance(text, unicode)
errors = 0
very_weird_things = 0
weird_things = 0
prev_letter_script = None
for pos in xrange(len(text)):
char = text[pos]
index = ord(char)
if index < 256:
weird_things += SINGLE_BYTE_WEIRDNESS[index]
if SINGLE_BYTE_LETTERS[index]:
prev_letter_script = 'latin'
else:
prev_letter_script = None
else:
category = unicodedata.category(char)
if category == 'Co':
# Unassigned or private use
errors += 1
elif index == 0xfffd:
# Replacement character
errors += 1
elif index in WINDOWS_1252_GREMLINS:
lowchar = char.encode('WINDOWS_1252').decode('latin-1')
weird_things += SINGLE_BYTE_WEIRDNESS[ord(lowchar)] - 0.5
if category.startswith('L'):
name = unicodedata.name(char)
scriptname = name.split()[0]
freq, script = SCRIPT_TABLE.get(scriptname, (0, 'other'))
if prev_letter_script:
if script != prev_letter_script:
very_weird_things += 1
if freq == 1:
weird_things += 2
elif freq == 0:
very_weird_things += 1
prev_letter_script = script
else:
prev_letter_script = None
return 100 * errors + 10 * very_weird_things + weird_things
def text_cost(text):
"""
Assign a cost function to the length plus weirdness of a text string.
"""
return text_badness(text) + len(text)
WINDOWS_1252_GREMLINS = [
# adapted from http://effbot.org/zone/unicode-gremlins.htm
0x0152, # LATIN CAPITAL LIGATURE OE
0x0153, # LATIN SMALL LIGATURE OE
0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x0161, # LATIN SMALL LETTER S WITH CARON
0x0178, # LATIN CAPITAL LETTER Y WITH DIAERESIS
0x017E, # LATIN SMALL LETTER Z WITH CARON
0x017D, # LATIN CAPITAL LETTER Z WITH CARON
0x0192, # LATIN SMALL LETTER F WITH HOOK
0x02C6, # MODIFIER LETTER CIRCUMFLEX ACCENT
0x02DC, # SMALL TILDE
0x2013, # EN DASH
0x2014, # EM DASH
0x201A, # SINGLE LOW-9 QUOTATION MARK
0x201C, # LEFT DOUBLE QUOTATION MARK
0x201D, # RIGHT DOUBLE QUOTATION MARK
0x201E, # DOUBLE LOW-9 QUOTATION MARK
0x2018, # LEFT SINGLE QUOTATION MARK
0x2019, # RIGHT SINGLE QUOTATION MARK
0x2020, # DAGGER
0x2021, # DOUBLE DAGGER
0x2022, # BULLET
0x2026, # HORIZONTAL ELLIPSIS
0x2030, # PER MILLE SIGN
0x2039, # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x203A, # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x20AC, # EURO SIGN
0x2122, # TRADE MARK SIGN
]
# a list of Unicode characters that might appear in Windows-1252 text
WINDOWS_1252_CODEPOINTS = range(256) + WINDOWS_1252_GREMLINS
# Rank the characters typically represented by a single byte -- that is, in
# Latin-1 or Windows-1252 -- by how weird it would be to see them in running
# text.
#
# 0 = not weird at all
# 1 = rare punctuation or rare letter that someone could certainly
# have a good reason to use. All Windows-1252 gremlins are at least
# weirdness 1.
# 2 = things that probably don't appear next to letters or other
# symbols, such as math or currency symbols
# 3 = obscure symbols that nobody would go out of their way to use
# (includes symbols that were replaced in ISO-8859-15)
# 4 = why would you use this?
# 5 = unprintable control character
#
# The Portuguese letter Ã (0xc3) is marked as weird because it would usually
# appear in the middle of a word in actual Portuguese, and meanwhile it
# appears in the mis-encodings of many common characters.
SINGLE_BYTE_WEIRDNESS = (
# 0 1 2 3 4 5 6 7 8 9 a b c d e f
5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 0, 5, 5, 5, 5, 5, # 0x00
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, # 0x10
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x20
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x30
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x40
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x50
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x60
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, # 0x70
2, 5, 1, 4, 1, 1, 3, 3, 4, 3, 1, 1, 1, 5, 1, 5, # 0x80
5, 1, 1, 1, 1, 3, 1, 1, 4, 1, 1, 1, 1, 5, 1, 1, # 0x90
1, 0, 2, 2, 3, 2, 4, 2, 4, 2, 2, 0, 3, 1, 1, 4, # 0xa0
2, 2, 3, 3, 4, 3, 3, 2, 4, 4, 4, 0, 3, 3, 3, 0, # 0xb0
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0xc0
1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, # 0xd0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0xe0
1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, # 0xf0
)
# Pre-cache the Unicode data saying which of these first 256 characters are
# letters. We'll need it often.
SINGLE_BYTE_LETTERS = [
unicodedata.category(unichr(i)).startswith('L')
for i in xrange(256)
]
# A table telling us how to interpret the first word of a letter's Unicode
# name. The number indicates how frequently we expect this script to be used
# on computers. Many scripts not included here are assumed to have a frequency
# of "0" -- if you're going to write in Linear B using Unicode, you're
# probably aware enough of encoding issues to get it right.
#
# The lowercase name is a general category -- for example, Han characters and
# Hiragana characters are very frequently adjacent in Japanese, so they all go
# into category 'cjk'. Letters of different categories are assumed not to
# appear next to each other often.
SCRIPT_TABLE = {
'LATIN': (3, 'latin'),
'CJK': (2, 'cjk'),
'ARABIC': (2, 'arabic'),
'CYRILLIC': (2, 'cyrillic'),
'GREEK': (2, 'greek'),
'HEBREW': (2, 'hebrew'),
'KATAKANA': (2, 'cjk'),
'HIRAGANA': (2, 'cjk'),
'HIRAGANA-KATAKANA': (2, 'cjk'),
'HANGUL': (2, 'cjk'),
'DEVANAGARI': (2, 'devanagari'),
'THAI': (2, 'thai'),
'FULLWIDTH': (2, 'cjk'),
'MODIFIER': (2, None),
'HALFWIDTH': (1, 'cjk'),
'BENGALI': (1, 'bengali'),
'LAO': (1, 'lao'),
'KHMER': (1, 'khmer'),
'TELUGU': (1, 'telugu'),
'MALAYALAM': (1, 'malayalam'),
'SINHALA': (1, 'sinhala'),
'TAMIL': (1, 'tamil'),
'GEORGIAN': (1, 'georgian'),
'ARMENIAN': (1, 'armenian'),
'KANNADA': (1, 'kannada'), # mostly used for looks of disapproval
'MASCULINE': (1, 'latin'),
'FEMININE': (1, 'latin')
}
Then you just call the method:
fix_bad_unicode(u'aあä')
>> u'a\u3042\xe4'

Related

i have a python list, using map function is omitting the first zero of the list

I have this code in python, when I print the last line, it is giving an output "11100101100". I'm expecting the output,"011100101100". Notice that the output starts with 1 and not 0. although the variable gamma_sum_list is a list containing 12 digits and its starts with 0. The function somehow deletes the first zero automatically. The following is the exact gamma_sum_list:
def convert(list)
res = int("".join(map(str,list)))
return res
print(convert(gamma_sum_list))
Input:
[0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0]
Expected Output:
011100101100
Actual Output :
11100101100

Your issue is caused by converting the result of the join operation to an integer. Integers do not have leading zeroes. If you remove the int function you'll get a string with the leading zero you're after.
gamma_sum_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0]
def convert(my_list):
res = "".join(map(str,my_list))
return res
print(convert(gamma_sum_list))
Output:
011100101100

def convert(some_list):
res = "".join(map(str,some_list))
return res
gamma_sum_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0]
print(convert(gamma_sum_list))
or
conv = lambda x: ''.join(map(str, x))
print(conv(gamma_sum_list))

Consider that:
>>> "".join(list(map(str, [0, 1])))
'01'
How would you convert '01' to an integer? Well, its just 1.
>>> int("".join(list(map(str, [0, 1]))))
1
So you probably want to not convert the string to an int, just keep it as a str.

How to convert a string to list using python?

I am working with RC-522 RFID Reader for my project. I want to use it for paying transportation fee. I am using python and used the code in: https://github.com/mxgxw/MFRC522-python.git
On python script Read.py, Sector 8 was read with the use of this code:
# Check if authenticated
if status == MIFAREReader.MI_OK:
MIFAREReader.MFRC522_Read(8) <---- prints the sector 8
MIFAREReader.MFRC522_StopCrypto1()
else:
print "Authentication error"
The output of this was:
Sector 8 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
So that last part(Sector 8 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), I convert it to string. I want that to be a list but I can't. Tried to put it on a variable x and use x.split() but the output when I execute print(x) is "None".
x = str(MIFAREReader.MFRC22_READ(8))
x = x.split()
print x #PRINTS ['NONE']
I want it to be like this:
DATA = [100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
so that I can use the sum(DATA) to check for balance, and I can access it using indexes like DATA[0]
Thanks a lot!!

Follow these steps:
Open MFRC522.py >> header file for RFID Reader
vi MFRC522.py
look for function
def MFRC522_Read(self, blockAddr)
add this line return backData at the end of function.
Save it.
In read() program, call it like
DATA=(MIFAREReader.MFRC522_Read(8))
print 'DATA :',DATA
I hope this solves the problem.

You can use .split(",") to specify the delimiter ",".
Something like that:
input_string = "[100, 234, 0, 0, 567, 0, 0, 0, 3, 0, 235, 0, 0, 12, 0, 0]"
listed_string = input_string[1:-1].split(",")
sum = 0
for item in listed_string:
sum += int(item)
print(sum)
prints
1151

In line with Moutch answer, using list comprehension:
input='[100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]'
DATA = [int(item) for item in input[1:-1].split(',')]
print(sum(DATA))
If data string is entire output of Read.Py
input="""Card read UID: 67,149,225,43
Size: 8
Sector 8 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"""
#find index position of 'Sector' text and select from this using slices.
inputn = input[input.index('Sector')+9:]
DATA = [int(item) for item in inputn[1:-1].split(',')]
print(DATA)
print(sum(DATA))

If you have some guarantee about the source and nature of the data in that list (and you know the format will always be the same), Python's eval would work. For example:
original_string = 'Sector 8 [100, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]'
data_start_index = original_string.index('[') # find '['
data_string = original_string[data_start_index:] # extract the list
data = eval(data_string)
print(type(data)) # <class 'list'>
print(sum(data)) # 101
If you don't have these guarantees, you'll have to use the split method as suggested by Moutch, due to the fragility and exploitability of eval - it blindly executes whatever (potentially malicious) code is passed to it.
Edit: Use ast.literal_eval instead of plain old eval for safety guarantees. This still requires that the formatting of the string be consistent (e.g., that it always have square brackets) in order to properly evaluate to a Python list.

Trying to print vertically in Python

I am trying to create an image that needs to be printed vertically:
From the for loop, I can print a image fine by indenting to a new line; however, I want the image to rotate counter clockwise 90 degrees (Is this transpose?).
I tried to use from itertools import zip_longest but it gives:
TypeError: zip_longest argument #1 must support iteration
class Reservoir:
def __init__(self,landscape):
self.landscape = landscape
self.image = ''
for dam in landscape:
self.image += '#'*dam + '\n'
print(self.image)
landscape = [4, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0,
1, 1, 2, 5, 6, 5, 2, 2, 2, 3, 3, 3, 4, 5, 3, 2, 2]
lake = Reservoir(landscape)
print(lake)

I don't know if you will find a function or a lib that do that for you. But you can code this rotation by hand.
You don't want to display a real image here, but to print chars that represents a landscape. You have to print the "image" line by line, but since your landscape array represents the number of '#' you want in each column, you have to loop over the total number of lines you want, and for each char in that line, print a ' ' or a '#' depending on the corresponding landscape column value
With
h = max(landscape)
you calculate the total number of lines you want to print by finding the max of the landscape values.
Then, you loop over theses lines
for line in reversed(range(h)):
in that loop, line takes values 6, 5, 4, etc.
For each line, you have to loop over the whole landscape array to determine, for each column if you want to print a space or a '#', depending on the value of the landscape column (v) and the current line
for v in self.landscape:
self.image += ' ' if line >= v else '#'
The full program:
class Reservoir:
def __init__(self, landscape):
self.landscape = landscape
h = max(landscape)
self.image = ''
for line in reversed(range(h)):
for v in self.landscape:
self.image += ' ' if line >= v else '#'
self.image += '\n'
landscape = [4, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 5, 6, 5, 2, 2, 2, 3, 3, 3, 4, 5, 3, 2, 2]
lake = Reservoir(landscape)
print(lake.image)
The result:
#
### #
# ### ##
## ### ######
#### ###############
###### #################

Values from tuples named the same as user input in Python (Hard for me to explain, will update title if I can find a better way to express myself)

I am an extremely novice learner, I am taking a Python class to learn. This question has nothing to do with the class, just with a project I am working on as I am going through the class (I am working on a simple game, using concepts that I learn in class to update, expand, and clean up my code.)
I am learning about Tuples, Lists, and Dictionaries at this time, and thought that simple tuples would clear up a lot of IF statements and streamline the code. However, I cannot get it to work exactly how I would like it to work.
Basically I have a set of tuples for all of my classes (Note, these are training classifications, and not Python classes). They have different numbers in them, and then I have a tuple with a list of the names of all of the classes. At some point in the code, I ask for user input to determine a character's class. I would like to be able to use that input so I can extract (Is the correct term splice?) values from the tuple, say I want to add whatever value is in the third position of the tuple to another value. Right now I cannot get the user input associate with the tuple of the same name. Is there a way to do this?
# Class list
Apprentice = (6, 0, 0, 0, 0, 0, 0)
Warrior = (12, 2, 0, 2, 0, 0, 0)
Paladin = (14, 2, 0, 2, 1, 0, 1)
Barbarian = (12, 6, 0, 3, -1, -1, -1)
Blademaster = (10, 4, 4, 0, 0, 0, 0)
Assassin = (8, 0, 8, -2, 0, 0, 0)
Rogue = (8, 0, 4, 0, 0, 0, 0)
Monk = (10, 2, 2, 2, 2, 2, -4)
Bard = (8, 0, 0, 0, 0, 0, 4)
Mage = (6, 0, 0, 0, 2, 2, 0)
Priest = (6, 0, 0, 0, 1, 2, 1)
Wizard = (4, -2, -2, -2, 6, 8, 0)
Scholar = (6, -1, -1, 0, 4, 4, 0)
Necromancer = (6, 0, 0, 0, 6, 6, -5)
classList = ('Apprentice', 'Warrior', 'Priest', 'Mage', 'Wizard', 'Rogue', 'Bard', 'Paladin', 'Scholar', 'Necromancer', 'Barbarian', 'Blademaster', 'Assassin', 'Monk')
validClass = False
while validClass == False:
charClass = raw_input('What type of training have you had (Class)? ')
if charClass in classList:
print ''
validClass = True
else:
print 'That is not a valid class.'

You can do this by accessing the global variable list, however I would suggest not doing it this way. A better way of doing it is to create a dictionary of classes as follows:
classes = {'Apprentice':Apprentice,'Warrior':Warrior, ...}
Then do something like
selected_class = None
while True:
charClass = raw_input('What type of training have you had (Class)? ')
if charClass in classes:
selected_class = classes[charClass]
break
else:
print 'That is not a valid class.'

You should use dict
my_class = dict(
Apprentice=(6, 0, 0, 0, 0, 0, 0),
Warrior=(12, 2, 0, 2, 0, 0, 0),
Paladin=(14, 2, 0, 2, 1, 0, 1),
Barbarian=(12, 6, 0, 3, -1, -1, -1),
Blademaster=(10, 4, 4, 0, 0, 0, 0),
Assassin=(8, 0, 8, -2, 0, 0, 0),
Rogue=(8, 0, 4, 0, 0, 0, 0),
Monk=(10, 2, 2, 2, 2, 2, -4),
Bard=(8, 0, 0, 0, 0, 0, 4),
Mage=(6, 0, 0, 0, 2, 2, 0),
Priest=(6, 0, 0, 0, 1, 2, 1),
Wizard=(4, -2, -2, -2, 6, 8, 0),
Scholar=(6, -1, -1, 0, 4, 4, 0),
Necromancer=(6, 0, 0, 0, 6, 6, -5),
)
while 1:
try:
val = my_class[raw_input('What type of training have you had (Class)? ')]
break
except KeyError:
print 'That is not a valid class.'

It's better to use a dictionary but if it's an assignment and you're not allowed to use dicts you can do the following:
validClass = False
while validClass == False:
charClass = raw_input('What type of training have you had (Class)? ')
if charClass in classList:
print eval(charClass)
validClass = True
else:
print 'That is not a valid class.'
The eval function lets you run python code within itself. Again, it's better to use a dictionary.

It's much better to use a dictionary, but if you aren't allowed to do so, you could use the vars() function, which returns a dictionary of all the global values.
while validClass == False:
try:
vals = vars()[raw_input('What type of training have you had (Class)? ')]
except KeyError:
print 'That is not a valid class.'

Try storing each variable in a dictionary along with strings of the character's name instead of creating separate tuples. Right now it would be impossible to link your ClassList with the stats because the variable name of each class cannot be compared with string name of each class (you would have to compare a string to another string).
If you haven't worked with Dictionaries before, try learning. I think it would be really helpful in this scenario!
http://www.tutorialspoint.com/python/python_dictionary.htm

Convert string to list of bits and viceversa

I need to convert an ASCII string into a list of bits and vice versa:
str = "Hi" -> [0,1,0,0,1,0,0,0,0,1,1,0,1,0,0,1]
[0,1,0,0,1,0,0,0,0,1,1,0,1,0,0,1] -> "Hi"

There are many ways to do this with library functions. But I am partial to the third-party bitarray module.
>>> import bitarray
>>> ba = bitarray.bitarray()
Conversion from strings requires a bit of ceremony. Once upon a time, you could just use fromstring, but that method is now deprecated, since it has to implicitly encode the string into bytes. To avoid the inevitable encoding errors, it's better to pass a bytes object to frombytes. When starting from a string, that means you have to specify an encoding explicitly -- which is good practice anyway.
>>> ba.frombytes('Hi'.encode('utf-8'))
>>> ba
bitarray('0100100001101001')
Conversion to a list is easy. (Also, bitstring objects have a lot of list-like functions already.)
>>> l = ba.tolist()
>>> l
[False, True, False, False, True, False, False, False,
False, True, True, False, True, False, False, True]
bitstrings can be created from any iterable:
>>> bitarray.bitarray(l)
bitarray('0100100001101001')
Conversion back to bytes or strings is relatively easy too:
>>> bitarray.bitarray(l).tobytes().decode('utf-8')
'Hi'
And for the sake of sheer entertainment:
>>> def s_to_bitlist(s):
... ords = (ord(c) for c in s)
... shifts = (7, 6, 5, 4, 3, 2, 1, 0)
... return [(o >> shift) & 1 for o in ords for shift in shifts]
...
>>> def bitlist_to_chars(bl):
... bi = iter(bl)
... bytes = zip(*(bi,) * 8)
... shifts = (7, 6, 5, 4, 3, 2, 1, 0)
... for byte in bytes:
... yield chr(sum(bit << s for bit, s in zip(byte, shifts)))
...
>>> def bitlist_to_s(bl):
... return ''.join(bitlist_to_chars(bl))
...
>>> s_to_bitlist('Hi')
[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
>>> bitlist_to_s(s_to_bitlist('Hi'))
'Hi'

There are probably faster ways to do this, but using no extra modules:
def tobits(s):
result = []
for c in s:
bits = bin(ord(c))[2:]
bits = '00000000'[len(bits):] + bits
result.extend([int(b) for b in bits])
return result
def frombits(bits):
chars = []
for b in range(len(bits) / 8):
byte = bits[b*8:(b+1)*8]
chars.append(chr(int(''.join([str(bit) for bit in byte]), 2)))
return ''.join(chars)

not sure why, but here are two ugly oneliners using only builtins:
s = "Hi"
l = map(int, ''.join([bin(ord(i)).lstrip('0b').rjust(8,'0') for i in s]))
s = "".join(chr(int("".join(map(str,l[i:i+8])),2)) for i in range(0,len(l),8))
yields:
>>> l
[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
>>> s
'Hi'
In real world code, use the struct or the bitarray module.

You could use the built-in bytearray:
>>> for i in bytearray('Hi', 'ascii'):
... print(i)
...
72
105
>>> bytearray([72, 105]).decode('ascii')
'Hi'
And bin() to convert to binary.

def text_to_bits(text):
"""
>>> text_to_bits("Hi")
[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
"""
bits = bin(int.from_bytes(text.encode(), 'big'))[2:]
return list(map(int, bits.zfill(8 * ((len(bits) + 7) // 8))))
def text_from_bits(bits):
"""
>>> text_from_bits([0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1])
'Hi'
"""
n = int(''.join(map(str, bits)), 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
See also, Convert Binary to ASCII and vice versa (Python).

def to_bin(string):
res = ''
for char in string:
tmp = bin(ord(char))[2:]
tmp = '%08d' %int(tmp)
res += tmp
return res
def to_str(string):
res = ''
for idx in range(len(string)/8):
tmp = chr(int(string[idx*8:(idx+1)*8], 2))
res += tmp
return res
These function is really simple.
It doesn't use third party module.

A few speed comparisons. Each of these were run using
python -m timeit "code"
or
cat <<-EOF | python -m timeit
code
EOF
if multiline.
Bits to Byte
A: 100000000 loops, best of 3: 0.00838 usec per loop
res = 0
for idx,x in enumerate([0,0,1,0,1,0,0,1]):
res |= (x << idx)
B: 100000000 loops, best of 3: 0.00838 usec per loop
int(''.join(map(str, [0,0,1,0,1,0,0,1])), 2)
Byte to Bits
A: 100000000 loops, best of 3: 0.00836 usec per loop
[(41 >> x) & 1 for x in range(7, -1, -1)]
B: 100000 loops, best of 3: 2.07 usec per loop
map(int, bin(41)[2:])

import math
class BitList:
def __init__(self, value):
if isinstance(value, str):
value = sum([bytearray(value, "utf-8")[-i - 1] << (8*i) for i in range(len(bytearray(value, "utf-8")))])
try:
self.value = sum([value[-i - 1] << i for i in range(len(value))])
except Exception:
self.value = value
def __getitem__(self, index):
if isinstance(index, slice):
if index.step != None and index.step != 1:
return list(self)[index]
else:
start = index.start if index.start else 0
stop = index.stop if index.stop != None else len(self)
return BitList(math.floor((self.value % (2 ** (len(self) - start))) >> (len(self) - stop)))
else:
return bool(self[index:index + 1].value)
def __len__(self):
return math.ceil(math.log2(self.value + 1))
def __str__(self):
return self.value
def __repr__(self):
return "BitList(" + str(self.value) + ")"
def __iter__(self):
yield from [self[i] for i in range(len(self))]
Then you can initialize BitList with a number or a list (of numbers or booleans), then you can get its value, get positional items, get slices, and convert it to a list. Note: Cannot currently set items, but when I add that I will edit this post.
I made this my self, then went looking for how to convert a string (or a file) into a list of bits, then figured that out from another answer.

This might work, but it does not work if you ask PEP 8 (long line, complex)
tobits = lambda x: "".join(map(lambda y:'00000000'[len(bin(ord(y))[2:]):]+bin(ord(y))[2:],x))
frombits = lambda x: ''.join([chr(int(str(y), 2)) for y in [x[y:y+8] for y in range(0,len(x),8)]])
These are used like normal functions.

Because I like generators, I'll post my version here:
def bits(s):
for c in s:
yield from (int(bit) for bit in bin(ord(c))[2:].zfill(8))
def from_bits(b):
for i in range(0, len(b), 8):
yield chr(int(''.join(str(bit) for bit in b[i:i + 8]), 2))
print(list(bits('Hi')))
[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
print(''.join(from_bits([0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1])))
Hi

If you have bits in a list then you simply convert it into str and then to a number. Number will behave like a bit string and then bitwise operation can be applied.
For example :
int(str([1,0,0,1]) | int(str([1,0,1,1])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unicode as String without conversion Python - python

Here's how to do it the easy way: >>> a=u'\x83\u6c34\U00103ABC' >>> a.encode('unicode_escape') '\\x83\\u6c34\\U00103abc' >>> print a.encode('unicode_escape') \x83\u6c34\U00103abc

Related

i have a python list, using map function is omitting the first zero of the list

How to convert a string to list using python?

Trying to print vertically in Python

Values from tuples named the same as user input in Python (Hard for me to explain, will update title if I can find a better way to express myself)

Convert string to list of bits and viceversa

Categories

Resources