Python How to get the each letters put into a word? - python

When the name is given, for example Aberdeen Scotland.
I need to get the result of Adbnearldteoecns.
Leaving the first word plain, but reverse the last word and put in between the first word.
I have done so far:
coordinatesf = "Aberdeen Scotland"
for line in coordinatesf:
separate = line.split()
for i in separate [0:-1]:
lastw = separate[1][::-1]
print(i)

A bit dirty but it works:
coordinatesf = "Aberdeen Scotland"
new_word=[]
#split the two words
words = coordinatesf.split(" ")
#reverse the second and put to lowercase
words[1]=words[1][::-1].lower()
#populate the new string
for index in range(0,len(words[0])):
new_word.insert(2*index,words[0][index])
for index in range(0,len(words[1])):
new_word.insert(2*index+1,words[1][index])
outstring = ''.join(new_word)
print outstring

Note that what you want to do is only well-defined if the the input string is composed of two words with the same lengths.
I use assertions to make sure that is true but you can leave them out.
def scramble(s):
words = s.split(" ")
assert len(words) == 2
assert len(words[0]) == len(words[1])
scrambledLetters = zip(words[0], reversed(words[1]))
return "".join(x[0] + x[1] for x in scrambledLetters)
>>> print(scramble("Aberdeen Scotland"))
>>> AdbnearldteoecnS
You could replace the x[0] + x[1] part with sum() but I think that makes it less readable.

This splits the input, zips the first word with the reversed second word, joins the pairs, then joins the list of pairs.
coordinatesf = "Aberdeen Scotland"
a,b = coordinatesf.split()
print(''.join(map(''.join, zip(a,b[::-1]))))

Related

finding the index of word and its distance

hello there ive been trying to code a script that finds the distance the word one to the other one in a text but the code somehow doesnt work well...
import re
#dummy text
words_list = ['one']
long_string = "are marked by one the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there one"
striped_long_text = re.sub(' +', ' ',(long_string.replace('\n',' '))).strip()
length = []
index_items = {}
for item in words_list :
text_split = striped_long_text.split('{}'.format(item))[:-1]
for space in text_split:
if space:
length.append(space.count(' ')-1)
print(length)
dummy Text :
are marked by one the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there one
so what im trying to do is that find the exact distance the word one has to its next similar one in text as you see , this code works well with in some exceptions... if the first text has the word one after 2-3 words the index starts to the counting from the start of the text and this will fail the output but for example if add the word one at the start of the text the code will work fine...
Output of the code vs what it should be :
result = [2, 9, 5, 13]
expected result = [ 9, 5, 13]
Sorry, I can't ask questions in comments yet, but if you want exactly to use regular expressions, than this can help you:
import re
def forOneWord(word):
length = []
lastIndex = -1
for i in range(len(words)):
if words[i].lower() == word:
if lastIndex != -1:
length.append(i - lastIndex - 1)
lastIndex = i
return length
def forWordsList(words_list):
for i in range(len(words_list)): # Make all words lowercase
words_list[i] = words_list[i].lower()
length = [[] for i in range(len(words_list))]
lastIndex = [-1 for i in range(len(words_list))]
for i in range(len(words)): # For all words in string
for j in range(len(words_list)): # For all words in list
if words[i].lower() == words_list[j]:
if lastIndex[j] != -1: # This means that this is not the first match
length[j].append(i - lastIndex[j] - 1)
lastIndex[j] = i
return length
words_list = ['one' , 'the', 'a']
long_string = "are marked by one the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there one"
words = re.findall(r"(\w[\w!-]*)+", long_string)
print(forOneWord('one'))
print(forWordsList(words_list))
There are two functions: one just for single-word search and another to search by list. Also if it is not necessary to use RegEx, I can improve this solution in terms of performance if you want so.
Another solution, using itertools.groupby:
from itertools import groupby
words_list = ["one"]
long_string = "are marked by one the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there one"
for w in words_list:
out = [
(v, sum(1 for _ in g))
for v, g in groupby(long_string.split(), lambda k: k == w)
]
# check first and last occurence:
if out and out[0][0] is False:
out = out[1:]
if out and out[-1][0] is False:
out = out[:-1]
out = [length for is_word, length in out if not is_word]
print(out)
Prints:
[9, 5, 13]

Get certain items based on their formatting

I have a list of values, some are numeric only, others made up of words, others a mix of the two.
I would like to select only those items composed by the combination number, single letter, number.
let me explain, this is my list of values
l = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts',
'930x2115', 'DO_UN_HPL_Links', 'DO_UN_HPL_Rechts', '830X2115',
'Deuropening', 'BF_32_Tourniquets_dubbeledeur_Aluminium']
I'd like to just get back:
['980X2350', '930x2115', '830X2115']
There is no need of importing re for such trivial matter.
Here is an approach that is more efficient than the regex based one:
allowed = '0123456786x'
def filter_str(lst):
output = []
for s in lst:
c = s.lower().strip()
if all(i in allowed for i in c) and c.count('x') == 1:
output.append(s)
return output
If the strings must contain two numeric fields:
allowed = '0123456786x'
def filter_str(lst):
output = []
for s in lst:
c = s.lower().strip()
n = len(c) - 1
if all(i in allowed for i in c) and c.count('x') == 1 and c.index('x') not in (0, n):
output.append(s)
return output
all function short-circuits (i.e. it stops checking as soon as Falsy value is registered), all Python logical operators also short-circuit, for the and operator, the right-hand operand won't be executed if the left-hand operand is Falsy, so my code does look it's longer than the regex based one, but it actually executes faster because regex checks whole strings and does not short-circuit.
Assuming a list of strings as input, you can use a regex and a list comprehension:
l = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts',
'930x2115', 'DO_UN_HPL_Links', 'DO_UN_HPL_Rechts', '830X2115',
'Deuropening', 'BF_32_Tourniquets_dubbeledeur_Aluminium']
import re
regex = re.compile('\d+x\d+', flags=re.I)
out = [s for s in l if regex.match(s.strip())]
output:
['980X2350', '930x2115', '830X2115']
Assuming a list of strings :
you can store in a counter the number of letters encountered, if this number is exactly equal to 1 and you have encountered some numbers then you can store it to your output list :
a = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts', '930x2115', 'DO_UN_HPL_Links',
'DO_UN_HPL_Rechts', '830X2115', 'Deuropening' ]
alphabet = 'abcdefghijklmnopqrstuvwxyz'
alphabet+= alphabet.upper()
numeric = '0123456789'
numeric_flag = False
output = []
for item in a:
alphabet_count = 0
for char in item:
if char in alphabet:
alphabet_count += 1
if char in numeric:
numeric_flag = True
if alphabet_count == 1 and numeric_flag:
output.append(item)
print(output)
# ['980X2350', '930x2115', '830X2115']

How to find a pattern in an input string

I need to find a given pattern in a text file and print the matching patterns. The text file is a string of digits and the pattern can be any string of digits or placeholders represented by 'X'.
I figured the way to approach this problem would be by loading the sequence into a variable, then creating a list of testable subsequences, and then testing each subsequence. This is my first function in python so I'm confused as to how to create the list of test sequences easily and then test it.
def find(pattern): #finds a pattern in the given input file
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
print('Test data is:', string)
testableStrings = []
#how to create a list of testable sequences?
for x in testableStrings:
if x == pattern:
print(x)
return
For example, searching for "X10X" in "11012102" should print "1101" and "2102".
Let pattern = "X10X", string = "11012102", n = len(pattern) - just for followed illustration:
Without using regular expressions, your algorithm may be as follows:
Construct a list of all subsequences of string with length of n:
In[2]: parts = [string[i:i+n] for i in range(len(string) - n + 1)]
In[3]: parts
Out[3]: ['1101', '1012', '0121', '1210', '2102']
Compare pattern with each element in parts:
for part in parts:
The comparison of pattern with part (both have now equal lengths) will be symbol with symbol in corresponding positions:
for ch1, ch2 in zip(pattern, part):
If ch1 is the X symbol or ch1 == ch2, the comparison of corresponding symbols will continue, else we will break it:
if ch1 == "X" or ch1 == ch2:
continue
else:
break
Finally, if all symbol with symbol comparisons were successful, i. e. all pairs of corresponding symbols were exhausted, the else branch of the for statement will be executed (yes, for statements may have an else branch for that case).
Now you may perform any actions with that matched part, e. g. print it or append it to some list:
else:
print(part)
So all in one place:
pattern = "X10X"
string = "11012102"
n = len(pattern)
parts = [string[i:i+n] for i in range(len(string) - n + 1)]
for part in parts:
for ch1, ch2 in zip(pattern, part):
if ch1 == "X" or ch1 == ch2:
continue
else:
break
else:
print(part)
The output:
1101
2102
You probably wanted to create the list of testable sequences from the individual rows of the input file. So instead of
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
use
with open('sequence.txt') as myfile: # 'r' is default
testableStrings = [row.strip() for row in myfile]
The strip() method removes whitespace characters from the start and end of rows, including \n symbols at the end of lines.
Example of the sequence.txt file:
123456789
87654321
111122223333
The output of the print(testableStrings) command:
['123456789', '87654321', '111122223333']

Find symmetric words in a text [duplicate]

This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?
Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)
The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym
According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']
Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.

How many common English words of 4 letters or more can you make from the letters of a given word (each letter can only be used once)

On the back of a block calendar I found the following riddle:
How many common English words of 4 letters or more can you make from the letters
of the word 'textbook' (each letter can only be used once).
My first solution that I came up with was:
from itertools import permutations
with open('/usr/share/dict/words') as f:
words = f.readlines()
words = map(lambda x: x.strip(), words)
given_word = 'textbook'
found_words = []
ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))
for p in ps:
for word in map(''.join, p):
if word in words and word != given_word:
found_words.append(word)
print set(found_words)
This gives the result set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook']) but took more than 7 minutes on my machine.
My next iteration was:
with open('/usr/share/dict/words') as f:
words = f.readlines()
words = map(lambda x: x.strip(), words)
given_word = 'textbook'
print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]
Which return an answer almost immediately but gave as answer: ['book', 'oboe', 'text', 'toot']
What is the fastest, correct and most pythonic solution to this problem?
(edit: added my earlier permutations solution and its different output).
I thought I'd share this slightly interesting trick although it takes a good bit more code than the rest and isn't really "pythonic". This will take a good bit more code than the other solutions but should be rather quick if I look at the timing the others need.
We're doing a bit preprocessing to speed up the computations. The basic approach is the following: We assign every letter in the alphabet a prime number. E.g. A = 2, B = 3, and so on. We then compute a hash for every word in the alphabet which is simply the product of the prime representations of every character in the word. We then store every word in a dictionary indexed by the hash.
Now if we want to find out which words are equivalent to say textbook we only have to compute the same hash for the word and look it up in our dictionary. Usually (say in C++) we'd have to worry about overflows, but in python it's even simpler than that: Every word in the list with the same index will contain exactly the same characters.
Here's the code with the slightly optimization that in our case we only have to worry about characters also appearing in the given word, which means we can get by with a much smaller prime table than otherwise (the obvious optimization would be to only assign characters that appear in the word a value at all - it was fast enough anyhow so I didn't bother and this way we could pre process only once and do it for several words). The prime algorithm is useful often enough so you should have one yourself anyhow ;)
from collections import defaultdict
from itertools import permutations
PRIMES = list(gen_primes(256)) # some arbitrary prime generator
def get_dict(path):
res = defaultdict(list)
with open(path, "r") as file:
for line in file.readlines():
word = line.strip().upper()
hash = compute_hash(word)
res[hash].append(word)
return res
def compute_hash(word):
hash = 1
for char in word:
try:
hash *= PRIMES[ord(char) - ord(' ')]
except IndexError:
# contains some character out of range - always 0 for our purposes
return 0
return hash
def get_result(path, given_word):
words = get_dict(path)
given_word = given_word.upper()
result = set()
powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
for word in (word for word in powerset(given_word) if len(word) >= 4):
hash = compute_hash(word)
for equiv in words[hash]:
result.add(equiv)
return result
if __name__ == '__main__':
path = "dict.txt"
given_word = "textbook"
result = get_result(path, given_word)
print(result)
Runs on my ubuntu word list (98k words) rather quickly, but not what I'd call pythonic since it's basically a port of a c++ algorithm. Useful if you want to compare more than one word that way..
How about this?
from itertools import permutations, chain
with open('/usr/share/dict/words') as fp:
words = set(fp.read().split())
given_word = 'textbook'
perms = (permutations(given_word, i) for i in range(4, len(given_word)+1))
pwords = (''.join(p) for p in chain(*perms))
matches = words.intersection(pwords)
print matches
which gives
>>> print matches
set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket'])
There is a generator itertools.permutations with which you can gather all permutations of a sequence with a specified length. That makes it easier:
from itertools import permutations
GIVEN_WORD = 'textbook'
with open('/usr/share/dict/words', 'r') as f:
words = [s.strip() for s in f.readlines()]
print len(filter(lambda x: ''.join(x) in words, permutations(GIVEN_WORD, 4)))
Edit #1: Oh! It says "4 or more" ;) Forget what I said!
Edit #2: This is the second version I came up with:
LETTERS = set('textbook')
with open('/usr/share/dict/words') as f:
WORDS = filter(lambda x: len(x) >= 4, [l.strip() for l in f])
matching = filter(lambda x: set(x).issubset(LETTERS) and all([x.count(c) == 1 for c in x]), WORDS)
print len(matching)
Create the whole power set, then check whether the dictionary word is in the set (order of the letters doesn't matter):
powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
pw = map(lambda x: sorted(x), powerset(given_word))
filter(lambda x: sorted(x) in pw, words)
The following just checks each word in the dictionary to see if it is of the appropriate length, and then if it is a permutation of 'textbook'. I borrowed the permutation check from
Checking if two strings are permutations of each other in Python
but changed it slightly.
given_word = 'textbook'
with open('/usr/share/dict/words', 'r') as f:
words = [s.strip() for s in f.readlines()]
matches = []
for word in words:
if word != given_word and 4 <= len(word) <= len(given_word):
if all(word.count(char) <= given_word.count(char) for char in word):
matches.append(word)
print sorted(matches)
This finishes almost immediately and gives the correct result.
Permutations get very big for longer words. Try counterrevolutionary for example.
I would filter the dict for words from 4 to len(word) (8 for textbook).
Then I would filter with regular expression "oboe".matches ("[textbook]+").
The remaining words, I would sort, and compare them with a sorted version of your word, ("beoo", "bekoottx") with jumping to the next index of a matching character, to find mismatching numbers of characters:
("beoo", "bekoottx")
("eoo", "ekoottx")
("oo", "koottx")
("oo", "oottx")
("o", "ottx")
("", "ttx") => matched
("bbo", "bekoottx")
("bo", "ekoottx") => mismatch
Since I don't talk python, I leave the implementation as an exercise to the audience.

Categories