Is it possible replace one character by two using maketrans? - python

I want to replace æ character with ae. How can I obtain it? Here is my try with maketrans and translate:
word = 'være'
letters = ('å','ø', 'æ')
replacements = ('a','o','ae')
table = word.maketrans(letters, replacements)
#table = word.maketrans(''.join(letters),''.join(replacements))
word_translated = word.translate(table)
print(word_translated)
It generates errors:
TypeError: maketrans() argument 2 must be str, not tuple
ValueError: the first two maketrans arguments must have equal length

Yes, it's possible. You need to supply a dict as argument to maketrans(). As stated in the docs
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None. Character keys will then be converted to ordinals.
word = 'være'
letters = ('å','ø', 'æ')
replacements = ('a','o','ae')
table = word.maketrans(dict(zip(letters, replacements)))
word_translated = word.translate(table)
print(word_translated)
output
vaere

Related

From the 4 numbers code-point to the unicode character?

I've got a 4 number string corresponding to the code-point of an unicode character.
I need to dynamically convert it to its unicode character to be stored inside a variable.
For example, my program will spit during its loop a variable a = '0590'. (https://www.compart.com/en/unicode/U+0590)
How do I get the variable b = '\u0590'?
I've tried string concatenation '\u' + a but obviously it's not the way.
chr will take a code point as an integer and convert it to the corresponding character. You need to have an integer though, of course.
a = '0590'
result = chr(int(a))
print(result)
On Python 2, the function is called unichr, not chr. And if you want to interpret the string as a hex number, you can pass an explicit radix to int.
a = '0590'
result = unichr(int(a, 16))
print(result)

Convert number values into ascii characters?

The part where I need to go from the number values I obtained to characters to spell out a word it not working, it says I need to use an integer for the last part?
accept string
print "This program reduces and decodes a coded message and determines if it is a palindrome"
string=(str(raw_input("The code is:")))
change it to lower case
string_lowercase=string.lower()
print "lower case string is:", string_lowercase
strip special characters
specialcharacters="1234567890~`!##$%^&*()_-+={[}]|\:;'<,>.?/"
for char in specialcharacters:
string_lowercase=string_lowercase.replace(char,"")
print "With the specials stripped out the string is:", string_lowercase
input offset
offset=(int(raw_input("enter offset:")))
conversion of text to ASCII code
result=[]
for i in string_lowercase:
code=ord(i)
result.append([code-offset])
conversion from ASCII code to text
text=''.join(chr(i) for i in result)
print "The decoded string is:", text.format(chr(result))
It looks like you have a list of lists instead of a list of ints when you call result.append([code-offset]). This means later when you call chr(i) for i in result, you are passing a list instead of an int to chr().
Try changing this to result.append(code-offset).
Other small suggestions:
raw_input already gives you a string, so there's no need to explicitly cast it.
Your removal of special characters can be more efficiently written as:
special_characters = '1234567890~`!##$%^&*()_-+={[}]|\:;'<,>.?/'
string_lowercase = ''.join(c for c in string_lowercase if string not in special_characters)
This allows you to only have to iterate through string_lowercase once instead of per character in special_characters.
While doing .append() to list, use code-offset instead of [code-offset]. As in later you are storing the value as a list (of one ASCII) instead of storing the ASCII value directly.
Hence your code should be:
result = []
for i in string_lowercase:
code = ord(i)
result.append(code-offset)
However you may simplified this code as:
result = [ord(ch)-offset for ch in string_lowercase]
You may even further simplify your code. The one line to get decoded string will be:
decoded_string = ''.join(chr(ord(ch)-offset) for ch in string_lowercase)
Example with offset as 2:
>>> string_lowercase = 'abcdefghijklmnopqrstuvwxyz'
>>> offset = 2
>>> decoded_string = ''.join(chr(ord(ch)-offset) for ch in string_lowercase)
>>> decoded_string
'_`abcdefghijklmnopqrstuvwx'
You are passing a list to chr when it only accepts integers. Try result.append(code-offset). [code-offset] is a one-item list.
Specifically, instead of:
result=[]
for i in string_lowercase:
code=ord(i)
result.append([code-offset])
use:
result=[]
for i in string_lowercase:
code=ord(i)
result.append(code-offset)
If you understand list comprehension, this works too: result = [ord(i)-offset for i in string_lowercase]

How do I unstring in python

So I am trying to make a text encryptionish program that will change the letters in text to a different ordred alphabet, however a = key[1](Key being the name of the rearanged alphabet) but it dosn't work because key[1] can't be assigned to a litteral, any ideas of how to get arround this.
So key is your rearranged alphabet, and ALPHA is the normal alphabet.
ALPHA = 'abcdefghijklmnopqrstuvwxyz'
key = 'zwlsqpugxackmirnhfdvbjoeyt'
msg = 'secretmessage'
code = []
for i in msg:
code.append(key[ALPHA.index(i)])
print(''.join(code))
Make the string after encoding, rather than during encoding.
Strings in Python, and many other languages, are immutable, because reasons.
What you need is to create a new string, replacing characters as needed.
For byte strings (that in Python are plain byte arrays) there's .translate. It takes a 256-byte string that describes how to replace each possible byte.
For Unicode strings .translate takes a map which is a bit more convenient but still maybe cumbersome:
unicode('foo bar').translate({ord('f'): u'F', ord('b'): u'B'})
In general case, something like this should work:
def transform_char(char):
# shift a characte 3 positions forward
return chr(ord(char) + 3)
def transform(source_string):
return ''.join(transform_char(c) for c in source_string)
What happens in transform? It generates a list of transformed characters ([transform_char(c) for c in source_string]) is called a "list comprehension". This list contains a transformed character for each character in source_string. Then all elements of this list are efficiently joined together by placing an empty string '' between them.
I hope it's enough for you now.

Get character position in alphabet

I'm 90% sure there is a built in function that does this.
I need to find the position of a character in an alphabet. So the character "b" is position 1 (counting from 0), etc. Does anyone know what the function is called?
What I'm trying to do is to send all the characters X amount of "steps" back in the alpha bet, so if I have a string with "hi", it would be "gh" if I send it back one step. There might be a better way of doing it, any tips?
It is called index. For e.g.
>>> import string
>>> string.lowercase.index('b')
1
>>>
Note: in Python 3, string.lowercase has been renamed to string.ascii_lowercase.
Without the import
def char_position(letter):
return ord(letter) - 97
def pos_to_char(pos):
return chr(pos + 97)
You can use ord() to get a character's ASCII position, and chr() to convert a ASCII position into a character.
EDIT: Updated to wrap alphabet so a-1 maps to z and z+1 maps to a
For example:
my_string = "zebra"
difference = -1
new_string = ''.join((chr(97+(ord(letter)-97+difference) % 26) for letter in my_string))
This will create a string with all the characters moved one space backwards in the alphabet ('ydaqz'). It will only work for lowercase words.
# define an alphabet
alfa = "abcdefghijklmnopqrstuvwxyz"
# define reverse lookup dict
rdict = dict([ (x[1],x[0]) for x in enumerate(alfa) ])
print alfa[1] # should print b
print rdict["b"] # should print 1
rdict is a dictionary that is created by stepping through the alphabet, one character at a time. The enumerate function returns a tuple with the list index, and the character. We reverse the order by creating a new tuple with this code: ( x[1], x[0]) and then turn the list of tuples into a dictionary. Since a dictionary is a hash table (key, value) data structure, we can now look up the index of any alphabet character.
However, that is not what you want to solve your problem, and if this is a class assignment you would probably get 0 for plagiarism if you submit it. For encoding the strings, first create a SECOND alphabet that is organised so that alfa2[n] is the encoded form of alfa[n]. In your example, the second alphabet would be just shifted by two characters but you could also randomly shuffle the characters or use some other pattern to order them. All of this would continue to work with other alphabets such as Greek, Cyrillic, etc.
I've only just started learning Python, so I have no idea how efficient this is compared to the other methods, but it works. Also, it doesn't matter whether the text is upper case, lower case or if there is any punctuation etc.
If you want to change all letters:
from string import maketrans
textin = "abcdefghijklmnopqrstuvwxyz"
textout = "cdefghijklmnopqrstuvwxyzab"
texttrans = maketrans(textin, textout)
text = "qcc, gr umpiq"
print text.translate(texttrans)
Also works to change some characters:
from string import maketrans
textin = "81972"
textout = "Seios"
texttrans = maketrans(textin, textout)
text = "811, 9t w7rk2"
print text.translate(texttrans)
Here's a catch all method that might be useful for someone...
def alphabet(arg, return_lower=True):
"""
Indexing the english alphabet consisting of 26 letters.
Note: zero indexed
example usage:
alphabet('a')
>> 0
alphabet(25, return_lower=False)
>> 'Z'
:param arg: Either type int or type chr specifying the \
index of desired letter or ther letter at \
the desired index respectivley.
:param return_lower: If index is passes, returns letter \
with corresponding case. Default is \
set to True (lower case returned).
:returns: integer representing index of passed character \
or character at passed index.
"""
arg = str(arg)
assert arg.isdigit() or arg.isalpha()
if arg.isdigit():
if return_lower:
return chr(int(arg) + 97).lower()
return chr(int(arg) + 97).upper()
return ord(arg.lower()) - 97
Equivalent of COLUMN function in excel
def position(word):
if len(word)>1:
pos = 0
for idx, letter in enumerate(word[::-1]):
pos += (position(letter)+(1 if idx!=0 else 0))*26**(idx)
return pos
return ord(word.lower()) - 97
print(position("A")) --> 0
print(position("AA")) --> 26
print(position("AZ")) --> 51

Why does csvwriter.writerow() put a comma after each character?

This code opens the URL and appends the /names at the end and opens the page and prints the string to test1.csv:
import urllib2
import re
import csv
url = ("http://www.example.com")
bios = [u'/name1', u'/name2', u'/name3']
csvwriter = csv.writer(open("/test1.csv", "a"))
for l in bios:
OpenThisLink = url + l
response = urllib2.urlopen(OpenThisLink)
html = response.read()
item = re.search('(JD)(.*?)(\d+)', html)
if item:
JD = item.group()
csvwriter.writerow(JD)
else:
NoJD = "NoJD"
csvwriter.writerow(NoJD)
But I get this result:
J,D,",", ,C,o,l,u,m,b,i,a, ,L,a,w, ,S,c,h,o,o,l,....
If I change the string to ("JD", "Columbia Law School" ....) then I get
JD, Columbia Law School...)
I couldn't find in the documentation how to specify the delimeter.
If I try to use delimeter I get this error:
TypeError: 'delimeter' is an invalid keyword argument for this function
It expects a sequence (eg: a list or tuple) of strings. You're giving it a single string. A string happens to be a sequence of strings too, but it's a sequence of 1 character strings, which isn't what you want.
If you just want one string per row you could do something like this:
csvwriter.writerow([JD])
This wraps JD (a string) with a list.
The csv.writer class takes an iterable as it's argument to writerow; as strings in Python are iterable by character, they are an acceptable argument to writerow, but you get the above output.
To correct this, you could split the value based on whitespace (I'm assuming that's what you want)
csvwriter.writerow(JD.split())
This happens, because when group() method of a MatchObject instance returns only a single value, it returns it as a string. When there are multiple values, they are returned as a tuple of strings.
If you are writing a row, I guess, csv.writer iterates over the object you pass to it. If you pass a single string (which is an iterable), it iterates over its characters, producing the result you are observing. If you pass a tuple of strings, it gets an actual string, not a single character on every iteration.
To put it another way - if you add square brackets around the whole output, it will be treated as one item, so commas won't be added. e.g. instead of:
spamwriter.writerow(matrix[row]['id'],matrix[row]['value'])
use:
spamwriter.writerow([matrix[row]['id'] + ',' + matrix[row]['value']])

Categories