PyEnchant weird behavior for numbers - python

I am using PyEnchant for some spelling/grammar correction scripting. I have noticed this behavior on my Mac:
>>> import enchant
>>> d = enchant.Dict('en_us')
>>> d.suggest('50')
['W', 'Y', 'w', 'y', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'z']
>>> enchant.__version__
'1.6.6'
However, it works more predictably on my linux machine (same version of pyenchant)
>>> import enchant
>>> d = enchant.Dict('en_us')
>>> d.suggest('50')
['5', '0', '50s']

It is due to the underlying provider. On Ubuntu I have an en_US dictionary installed for both myspell and aspell. If I switch providers I get different results. E.g. with a script like this:
import enchant
b = enchant.Broker()
b.set_ordering("en_US","myspell,aspell")
print b.describe()
d=b.request_dict("en_US")
print d.provider
s = '50'
print d.suggest(s)
b = enchant.Broker()
b.set_ordering("en_US","aspell,myspell")
print b.describe()
d=b.request_dict("en_US")
print d.provider
s = '50'
print d.suggest(s)
I get the following output.
[<Enchant: Aspell Provider>, <Enchant: Ispell Provider>, <Enchant: Myspell Provider>, <Enchant: Hspell Provider>]
<Enchant: Myspell Provider>
['5', '0', '50s']
[<Enchant: Aspell Provider>, <Enchant: Ispell Provider>, <Enchant: Myspell Provider>, <Enchant: Hspell Provider>]
<Enchant: Aspell Provider>
['W', 'Y', 'w', 'y', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'z']
The first set of suggestions is what you are seeing on Linux but I am using Myspell Provider. The second is what you are seeing on your Mac and I am using Aspell Provider.

Related

Delete a string from an array in Python

I'm writing some code in Python to read from a file some text, and make a 2-dimensional array from it. But when I make the array, in the last spot of the first 2 array(of three) there is : '\n', and I want delete it.
This is the file(data.txt):
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
And this is the Python code:
data = open("data.txt", mode="r")
arr = data.readlines()
for i in range(len(arr)):
arr[i] = list(arr[i].split(","))
#here I tryed to detele it
for i in range(len(arr)):
if arr[i][len(arr[i])-1] == '\\n':
del arr[len(arr[i])-1]
data.close()
This is the result of the code(but there is anyway '\n'):
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\n']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\n']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
How I could delete those?
Short solution using str.rstrip() and str.splitlines() functions:
with open('data.txt', 'r') as f:
items = [l.rstrip(',').split(',') for l in f.read().splitlines()]
print(items)
The output:
[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']]
You can use rstrip and list comprehension.
with open("data.txt", 'r', encoding="utf-8") as file:
array = [line.rstrip(',\n').split(',') for line in file]
You can filter the list and only keep values that are not "\n":
for i in range(len(arr)):
arr[i] = [b for b in arr[i] if b != "\n"]
You can just strip the \n as you read the lines:
arr = []
with open('data.txt', mode="r") as f:
for line in f.readlines():
arr.append(line.strip(',\n').split(','))
print arr

Python: Alphabet array sort

I was trying a sample exercise on regexes. To find all the letters of the alphabets. Sort the array, and finally eliminate all repetitions.
>>> letterRegex = re.compile(r'[a-z]')
>>> alphabets = letterRegex.findall("The quick brown fox jumped over the lazy dog")
>>> alphabets.sort()
>>> alphabets
['a', 'b', 'c', 'd', 'd', 'e', 'e', 'e', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'o', 'o', 'o', 'p', 'q', 'r', 'r', 't', 'u', 'u', 'v', 'w', 'x', 'y', 'z']
After doing the sort I tried to make a loop that'll eliminate all repetitions in the array.
e.g [...'e', 'e'...]
So I did this
>>> i, j = -1,0
>>> for items in range(len(alphabets)):
if alphabets[i+1] == alphabets[j+1]:
alphabets.remove(alphabets[j])
However it didn't work. How can I remove repetitons?
Here's a much easier way of removing co-occurrences:
import itertools
L = ['a', 'b', 'c', 'd', 'd', 'e', 'e', 'e', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'o', 'o', 'o', 'p', 'q', 'r', 'r', 't', 'u', 'u', 'v', 'w', 'x', 'y', 'z']
answer = []
for k,_group in itertools.groupby(L):
answer.append(k)
Or simpler still:
answer = [k for k,_g in itertools.groupby(L)]
Both yield this:
In [42]: print(answer)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 't', 'u', 'v', 'w', 'x', 'y', 'z']

split long string inside one list to small lists

How do you split this long string of one list to small multi-lists as show on the output ? (I have file has 100 lines)
Num=['S', 'I', 'R', 'T', 'S', 'A', 'V', 'P', 'S', 'P', 'C', 'G', 'K', 'Y', 'Y', 'T', 'L', 'N', 'G', 'S', 'K', '\n', ',', 'S', 'T', 'P', 'C', 'T', 'T', 'I', 'N', 'K', 'V', 'K', 'A', 'S', 'G', 'M', 'K', 'A', 'I', 'M', 'M', 'A', '\n']
Output should look like this:
['S', 'I', 'R', 'T', 'S', 'A', 'V', 'P', 'S', 'P', 'K', 'G', 'K', 'Y', 'Y', 'T', 'L', 'N', 'G', 'S', 'K']
['S', 'T', 'P', 'C', 'T', 'T', 'I', 'N', 'K', 'V', 'K', 'A', 'S', 'G', 'M', 'K', 'A', 'I', 'M', 'M', 'A']
First join the elements, strip() leading-trailing whitespace characters, split on a new line \n and comma , and then map them to a list again.
In short:
l1, l2 = map(list, "".join(Num).strip().split('\n,'))
Now, l1, l2 look, respectively:
['S', 'I', 'R', 'T', 'S', 'A', 'V', 'P', 'S', 'P', 'C', 'G', 'K', 'Y', 'Y', 'T', 'L', 'N', 'G', 'S', 'K']
and
['S', 'T', 'P', 'C', 'T', 'T', 'I', 'N', 'K', 'V', 'K', 'A', 'S', 'G', 'M', 'K', 'A', 'I', 'M', 'M', 'A']

Difference between string and list and usuage of join

I want to understand why and how python treats below example as string when it is actually a list. Does join(expression) treats expression as string?
EXAMPLE 1
available_letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
print 'Available letters: ' + ''.join(available_letters)
When I check type of ' '.join(available_letters) ,it has type string.
But if I use
print 'Available letters: ' + available_letters
then I get error cannot concatenate a string and a list.
This is the reason:
>>> available_letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
>>> x = ' '.join(available_letters)
>>> available_letters
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
>>> x
'a b c d e f g h i j k l m n o p q r s t u v w x y z'
>>> type(available_letters)
<type 'list'>
>>> type(x)
<type 'str'>
As you see above, the availabe_letters is a variable of list type itself. But when you call ' '.join method on it, its output is a string. concatenating of list and str is not correct, but concatenating of strs is okay.
Python3:
available_letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
print ('Available letters: ', ''.join(available_letters))

Generate a matrix from a given string

I need to create a matrix from a string s where m is the given number of rows and len(s)/m is the number of columns. First column must be filled with the first m chars in the string s (I.E.: 0*m+i chars for every i in range(m) ); the second column with the 1*m+i and so on.
What's the best way to do this in python?
EDIT:
this is the code I wrote by now.
def split_by_n( seq, n ):
"""A generator to divide a sequence into chunks of n units."""
while seq:
yield seq[:n]
seq = seq[n:]
#print list(split_by_n("1234567890",2))
input=list("ZPFKYLGJPNSGNMQGFGCITLVRIWMGFBLBFDSIOAJGBGAVFVHBGLFSRPNIOFSYOBTFCGRQLWWZAAJFUPGAFZSNXLTGARUVFKOLGAIWGUUCMVSEKLIAGJGGUZFBAOILVRIZPORNXWVFRGNMEGCEUNUZSPNIUAHFRQLWALHWEQGQKDFDCCKLUZWFSITKWIKLSMUQKNJUWRTKZAHJGABKDEGEMNCVIMBFRNYXSSKYPWLWHUKKISHFAJPOOFGJBJTBXXSGTRYAJGBNRMYHOGXQBLSFEWVUCHRLEJWAQBIWFRLWSSKRKSBFRAKDFJVRGZUOCJUZEKWAPIQSBRYM")
l = list(split_by_n(input,6))
for i in range(len(l[-2])-len(l[-1])):
l[-1].append('$')
print l
I learned from your comment that you want to make transpose of your matrix that is formed from a given string. Your code creates the matrix from a given string just fine. I have tweaked your code only slightly, and added code for making transpose.
def split_by_n( seq, n ):
while seq:
yield seq[:n]
seq = seq[n:]
def make_matrix(string):
col_count = 6
matrix = list(split_by_n(string,6))
row_count = len(matrix)
# the last row has "less_by" fewer elements than the rest of the rows
less_by = len(matrix[-2]) - len(matrix[-1])
matrix[-1] += '$' * less_by
return matrix
def make_transpose(matrix):
col_count = len(matrix[0])
transpose = []
for i in range(col_count):
transpose.append([row[i] for row in matrix])
return transpose
string = list("ZPFKYLGJPNSGNMQGFGCITLVRIWMGFBLBFDSIOAJGBGAVFVHBGLFSRPNIOFSYOBTFCGRQLWWZAAJFUPGAFZSNXLTGARUVFKOLGAIWGUUCMVSEKLIAGJGGUZFBAOILVRIZPORNXWVFRGNMEGCEUNUZSPNIUAHFRQLWALHWEQGQKDFDCCKLUZWFSITKWIKLSMUQKNJUWRTKZAHJGABKDEGEMNCVIMBFRNYXSSKYPWLWHUKKISHFAJPOOFGJBJTBXXSGTRYAJGBNRMYHOGXQBLSFEWVUCHRLEJWAQBIWFRLWSSKRKSBFRAKDFJVRGZUOCJUZEKWAPIQSBRYM")
matrix = make_matrix(string)
transpose = make_transpose(matrix)
for e in matrix:
print(e)
print('\nThe transpose:')
for e in transpose:
print(e)
Output:
['Z', 'P', 'F', 'K', 'Y', 'L']
['G', 'J', 'P', 'N', 'S', 'G']
['N', 'M', 'Q', 'G', 'F', 'G']
['C', 'I', 'T', 'L', 'V', 'R']
['I', 'W', 'M', 'G', 'F', 'B']
['L', 'B', 'F', 'D', 'S', 'I']
['O', 'A', 'J', 'G', 'B', 'G']
['A', 'V', 'F', 'V', 'H', 'B']
['G', 'L', 'F', 'S', 'R', 'P']
['N', 'I', 'O', 'F', 'S', 'Y']
['O', 'B', 'T', 'F', 'C', 'G']
['R', 'Q', 'L', 'W', 'W', 'Z']
['A', 'A', 'J', 'F', 'U', 'P']
['G', 'A', 'F', 'Z', 'S', 'N']
['X', 'L', 'T', 'G', 'A', 'R']
['U', 'V', 'F', 'K', 'O', 'L']
['G', 'A', 'I', 'W', 'G', 'U']
['U', 'C', 'M', 'V', 'S', 'E']
['K', 'L', 'I', 'A', 'G', 'J']
['G', 'G', 'U', 'Z', 'F', 'B']
['A', 'O', 'I', 'L', 'V', 'R']
['I', 'Z', 'P', 'O', 'R', 'N']
['X', 'W', 'V', 'F', 'R', 'G']
['N', 'M', 'E', 'G', 'C', 'E']
['U', 'N', 'U', 'Z', 'S', 'P']
['N', 'I', 'U', 'A', 'H', 'F']
['R', 'Q', 'L', 'W', 'A', 'L']
['H', 'W', 'E', 'Q', 'G', 'Q']
['K', 'D', 'F', 'D', 'C', 'C']
['K', 'L', 'U', 'Z', 'W', 'F']
['S', 'I', 'T', 'K', 'W', 'I']
['K', 'L', 'S', 'M', 'U', 'Q']
['K', 'N', 'J', 'U', 'W', 'R']
['T', 'K', 'Z', 'A', 'H', 'J']
['G', 'A', 'B', 'K', 'D', 'E']
['G', 'E', 'M', 'N', 'C', 'V']
['I', 'M', 'B', 'F', 'R', 'N']
['Y', 'X', 'S', 'S', 'K', 'Y']
['P', 'W', 'L', 'W', 'H', 'U']
['K', 'K', 'I', 'S', 'H', 'F']
['A', 'J', 'P', 'O', 'O', 'F']
['G', 'J', 'B', 'J', 'T', 'B']
['X', 'X', 'S', 'G', 'T', 'R']
['Y', 'A', 'J', 'G', 'B', 'N']
['R', 'M', 'Y', 'H', 'O', 'G']
['X', 'Q', 'B', 'L', 'S', 'F']
['E', 'W', 'V', 'U', 'C', 'H']
['R', 'L', 'E', 'J', 'W', 'A']
['Q', 'B', 'I', 'W', 'F', 'R']
['L', 'W', 'S', 'S', 'K', 'R']
['K', 'S', 'B', 'F', 'R', 'A']
['K', 'D', 'F', 'J', 'V', 'R']
['G', 'Z', 'U', 'O', 'C', 'J']
['U', 'Z', 'E', 'K', 'W', 'A']
['P', 'I', 'Q', 'S', 'B', 'R']
['Y', 'M', '$', '$', '$', '$']
The transpose:
['Z', 'G', 'N', 'C', 'I', 'L', 'O', 'A', 'G', 'N', 'O', 'R', 'A', 'G', 'X', 'U', 'G', 'U', 'K', 'G', 'A', 'I', 'X', 'N', 'U', 'N', 'R', 'H', 'K', 'K', 'S', 'K', 'K', 'T', 'G', 'G', 'I', 'Y', 'P', 'K', 'A', 'G', 'X', 'Y', 'R', 'X', 'E', 'R', 'Q', 'L', 'K', 'K', 'G', 'U', 'P', 'Y']
['P', 'J', 'M', 'I', 'W', 'B', 'A', 'V', 'L', 'I', 'B', 'Q', 'A', 'A', 'L', 'V', 'A', 'C', 'L', 'G', 'O', 'Z', 'W', 'M', 'N', 'I', 'Q', 'W', 'D', 'L', 'I', 'L', 'N', 'K', 'A', 'E', 'M', 'X', 'W', 'K', 'J', 'J', 'X', 'A', 'M', 'Q', 'W', 'L', 'B', 'W', 'S', 'D', 'Z', 'Z', 'I', 'M']
['F', 'P', 'Q', 'T', 'M', 'F', 'J', 'F', 'F', 'O', 'T', 'L', 'J', 'F', 'T', 'F', 'I', 'M', 'I', 'U', 'I', 'P', 'V', 'E', 'U', 'U', 'L', 'E', 'F', 'U', 'T', 'S', 'J', 'Z', 'B', 'M', 'B', 'S', 'L', 'I', 'P', 'B', 'S', 'J', 'Y', 'B', 'V', 'E', 'I', 'S', 'B', 'F', 'U', 'E', 'Q', '$']
['K', 'N', 'G', 'L', 'G', 'D', 'G', 'V', 'S', 'F', 'F', 'W', 'F', 'Z', 'G', 'K', 'W', 'V', 'A', 'Z', 'L', 'O', 'F', 'G', 'Z', 'A', 'W', 'Q', 'D', 'Z', 'K', 'M', 'U', 'A', 'K', 'N', 'F', 'S', 'W', 'S', 'O', 'J', 'G', 'G', 'H', 'L', 'U', 'J', 'W', 'S', 'F', 'J', 'O', 'K', 'S', '$']
['Y', 'S', 'F', 'V', 'F', 'S', 'B', 'H', 'R', 'S', 'C', 'W', 'U', 'S', 'A', 'O', 'G', 'S', 'G', 'F', 'V', 'R', 'R', 'C', 'S', 'H', 'A', 'G', 'C', 'W', 'W', 'U', 'W', 'H', 'D', 'C', 'R', 'K', 'H', 'H', 'O', 'T', 'T', 'B', 'O', 'S', 'C', 'W', 'F', 'K', 'R', 'V', 'C', 'W', 'B', '$']
['L', 'G', 'G', 'R', 'B', 'I', 'G', 'B', 'P', 'Y', 'G', 'Z', 'P', 'N', 'R', 'L', 'U', 'E', 'J', 'B', 'R', 'N', 'G', 'E', 'P', 'F', 'L', 'Q', 'C', 'F', 'I', 'Q', 'R', 'J', 'E', 'V', 'N', 'Y', 'U', 'F', 'F', 'B', 'R', 'N', 'G', 'F', 'H', 'A', 'R', 'R', 'A', 'R', 'J', 'A', 'R', '$']

Categories