Find symmetric words in a text [duplicate]

Find symmetric words in a text [duplicate] - python

This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?

Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)

The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym

According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']

Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.

Related

The longest prefix that is also suffix of two lists

So I have two lists:
def function(w,w2): # => this is how I want to define my function (no more inputs than this 2 lists)
I want to know the biggest prefix of w which is also suffix of w2.
How can I do this only with logic (without importing anything)

I can try and help get you started on this problem, but it sort of sounds like a homework question so I won't give you a complete answer (per these guidelines).
If I were you I'd start with a small case and build up from there. Lets start with:
w = "ab"
w2 = "ba"
The function for this might look like:
def function(w,w2):
prefix = ""
# Does the first letter of w equal the last letter of w2?
if w[0] == w2[-1]:
prefix += w[0]
# What about the second letter?
if w[1] == w2[-2]:
prefix += w[1]
return prefix
Then when you run print(function(w,w2)) you get ab.
This code should work for 2 letter words, but what if the words are longer? This is when we would introduce a loop.
def function(w,w2):
prefix = ""
for i in range(0, len(w)):
if w[i] == w2[(i+1)*-1]:
prefix+= w[i]
else:
return prefix
return prefix
Hopefully this code will offer a good starting place for you! One issue with what I have written is what if w2 is shorter than w. Then you will get an index error! There are a few ways to solve this, but one way is to make sure that w is always the shorter word. Best of luck, and feel free to DM me if you have other questions.

A simple iterative approach could be:
Start from the longest possible prefix (i.e. all of w), and test it against a w2 suffix of the same length.
If they match, you can return it immediately, since it must be the longest possible match.
If they don't match, shorten it by one, and repeat.
If you never find a match, the answer is an empty string.
In code, this looks like:
>>> def function(w, w2):
... for i in range(len(w), 0, -1):
... if w[:i] == w2[-i:]:
... return w[:i]
... return ''
...
>>> function("asdfasdf", "qwertyasdf")
'asdf'
The slice operator (w[:i] for a prefix of length i, w2[-i:] for a suffix of length i) gracefully handles mismatched lengths by just giving you a shorter string if i is out of the range of the given string (which means they won't match, so the iteration is forced to continue until the lengths do match).
>>> function("aaaaaba", "ba")
'a'
>>> function("a", "abbbaababaa")
'a'

Deciding whether a string is a palindrome

This is a python question. Answer should be with O(n) time complexity and use no additional memory. As input i get a string which should be classified as palindrome or not (palindrome is as word or a phrase that can be read the same from left to right and from right to left, f.e "level"). In the input there can be punctuation marks and gaps between words.
For example "I. did,,, did I????" The main goal is to decide whether the input is a palindrome.
When I tried to solve this question i faced several challenges. When I try to delete non letter digits
for element in string:
if ord(element) not in range(97, 122):
string.remove(element)
if ord(element) == 32:
string.remove(element)
I use O(n^2) complexity, because for every element in the string i use remove function, which itself has O(n) complexity, where n is the length of the list. I need help optimizing the part with eliminating non letter characters with O(n) complexity
Also, when we get rid of spaces as punctuation marks I know how to check whether a word is a palindrome, but my method uses additional memory.

Here is your O(n) solution without creating a new string:
def is_palindrome(string):
left = 0
right = len(string) - 1
while left < right:
if not string[left].isalpha():
left += 1
continue
if not string[right].isalpha():
right -= 1
continue
if string[left] != string[right]:
return False
left += 1
right -= 1
return True
print(is_palindrome("I. did,,, did I????"))
Output:
True

I'm assuming you mean you want to test if a string is a palindrome when we remove all punctuation digits from the string. In that case, the following code should suffice:
from string import ascii_letters
def is_palindrome(s):
s = ''.join(c for c in s if c in ascii_letters)
return s == s[::-1]
# some test cases:
print(is_palindrome('hello')) # False
print(is_palindrome('ra_ceca232r')) # True

Here's a one-liner using assignment expression syntax (Python 3.8+):
>>> s = "I. did,,, did I????"
>>> (n := [c.lower() for c in s if c.isalpha()]) == n[::-1]
True
I mostly showed the above as a demonstration; for readability's sake I'd recommend something more like SimonR's solution (although still using isalpha over comparing to ascii_letters).
Alternatively, you can use generator expressions to do the same comparison without allocating O(n) extra memory:
def is_palindrome(s):
forward = (c.lower() for c in s if c.isalpha())
back = (c.lower() for c in reversed(s) if c.isalpha())
return all(a == b for a, b in zip(forward, back))
Note that zip still allocates in Python 2, you'll need to use itertools.izip there.

Will this help:
word = input('Input your word: ')
word1 = ''
for l in word:
if l.isalnum():
word1 += l
word2=''
for index in sorted(range(len(word1)),reverse=True):
word2+=word1[index]
if word1 == word2:
print('It is a palindrone.')
else:
print('It is not a palindrone.')

Python How to get the each letters put into a word?

When the name is given, for example Aberdeen Scotland.
I need to get the result of Adbnearldteoecns.
Leaving the first word plain, but reverse the last word and put in between the first word.
I have done so far:
coordinatesf = "Aberdeen Scotland"
for line in coordinatesf:
separate = line.split()
for i in separate [0:-1]:
lastw = separate[1][::-1]
print(i)

A bit dirty but it works:
coordinatesf = "Aberdeen Scotland"
new_word=[]
#split the two words
words = coordinatesf.split(" ")
#reverse the second and put to lowercase
words[1]=words[1][::-1].lower()
#populate the new string
for index in range(0,len(words[0])):
new_word.insert(2*index,words[0][index])
for index in range(0,len(words[1])):
new_word.insert(2*index+1,words[1][index])
outstring = ''.join(new_word)
print outstring

Note that what you want to do is only well-defined if the the input string is composed of two words with the same lengths.
I use assertions to make sure that is true but you can leave them out.
def scramble(s):
words = s.split(" ")
assert len(words) == 2
assert len(words[0]) == len(words[1])
scrambledLetters = zip(words[0], reversed(words[1]))
return "".join(x[0] + x[1] for x in scrambledLetters)
>>> print(scramble("Aberdeen Scotland"))
>>> AdbnearldteoecnS
You could replace the x[0] + x[1] part with sum() but I think that makes it less readable.

This splits the input, zips the first word with the reversed second word, joins the pairs, then joins the list of pairs.
coordinatesf = "Aberdeen Scotland"
a,b = coordinatesf.split()
print(''.join(map(''.join, zip(a,b[::-1]))))

Finding common letters between 2 strings in Python

For a homework assignment, I have to take 2 user inputted strings, and figure out how many letters are common (in the same position of both strings), as well as find common letters.. For example for the two strings 'cat' and 'rat', there are 2 common letter positions (which are positions 2 and 3 in this case), and the common letters are also 2 because 'a' is found one and 't' is found once too..
So I made a program and it worked fine, but then my teacher updated the homework with more examples, specifically examples with repetitive letters, and my program isn't working for that.. For example, with strings 'ahahaha' and 'huhu' - there are 0 common letters in same positions, but there's 3 common letters between them (because 'h' in string 2 appears in string 1, three times..)
My whole issue is that I can't figure out how to count if "h" appears multiple times in the first string, as well as I don't know how to NOT check the SECOND 'h' in huhu because it should only count unique letters, so the overall common letter count should be 2..
This is my current code:
S1 = input("Enter a string: ")
S2 = input("Enter a string: ")
i = 0
big_string = 0
short_string = 0
same_letter = 0
common_letters = 0
if len(S1) > len(S2):
big_string = len(S1)
short_string = len(S2)
elif len(S1) < len(S2):
big_string = len(S2)
short_string = len(S1)
elif len(S1) == len(S2):
big_string = short_string = len(S1)
while i < short_string:
if (S1[i] == S2[i]) and (S1[i] in S2):
same_letter += 1
common_letters += 1
elif (S1[i] == S2[i]):
same_letter += 1
elif (S1[i] in S2):
common_letters += 1
i += 1
print("Number of positions with the same letter: ", same_letter)
print("Number of letters from S1 that are also in S2: ", common_letters)
So this code worked for strings without common letters, but when I try to use it with "ahahaha" and "huhu" I get 0 common positions (which makes sense) and 2 common letters (when it should be 3).. I figured it might work if I tried to add the following:
while x < short_string:
if S1[i] in S2[x]:
common_letters += 1
else:
pass
x += 1
However this doesn't work either...
I am not asking for a direct answer or piece of code to do this, because I want to do it on my own, but I just need a couple of hints or ideas how to do this..
Note: I can't use any functions we haven't taken in class, and in class we've only done basic loops and strings..

You need a data structure like multidict. To my knowledge, the most similar data structure in standard library is Counter from collections.
For simple frequency counting:
>>> from collections import Counter
>>> strings = ['cat', 'rat']
>>> counters = [Counter(s) for s in strings]
>>> sum((counters[0] & counters[1]).values())
2
With index counting:
>>> counters = [Counter(zip(s, range(len(s)))) for s in strings]
>>> sum(counters[0] & counters[1].values())
2
For your examples ahahaha and huhu, you should get 2 and 0, respectively since we get two h but in wrong positions.
Since you can't use advanced constructs, you just need to simulate counter with arrays.
Create 26 elements arrays
Loop over strings and update relevant index for each letter
Loop again over arrays simultaneously and sum the minimums of respective indexes.

A shorter version is this:
def gen1(listItem):
returnValue = []
for character in listItem:
if character not in returnValue and character != " ":
returnValue.append(character)
return returnValue
st = "first string"
r1 = gen1(st)
st2 = "second string"
r2 = gen1(st2)
if len(st)> len(st2):
print list(set(r1).intersection(r2))
else:
print list(set(r2).intersection(r1))
Note:
This is a pretty old post but since its got new activity,I posted my version.

Since you can't use arrays or lists,
Maybe try to add every common character to a var_string then test
if c not in var_string:
before incrementing your common counter so you are not counting the same character multiple times.

You are only getting '2' because you're only going to look at 4 total characters out of ahahaha (because huhu, the shortest string, is only 4 characters long). Change your while loop to go over big_string instead, and then add (len(S2) > i) and to your two conditional tests; the last test performs an in, so it won't cause a problem with index length.
NB: All of the above implicitly assumes that len(S1) >= len(S2); that should be easy enough to ensure, using a conditional and an assignment, and it would simplify other parts of your code to do so. You can replace the first block entirely with something like:
if (len(S2) > len(S1)): (S2, S1) = (S1, S2)
big_string = len(S1)
short_string = len(S2)

We can solve this by using one for loop inside of another as follows
int y=0;
for(i=0;i<big_string ;i++)
{
for(j=0;j<d;j++)
{
if(s1[i]==s2[j])
{y++;}
}
If you enter 'ahahaha' and 'huhu' this code take first character of big
string 'a' when it goes into first foor loop. when it enters into second for loop
it takes first letter of small string 'h' and compares them as they are not
equal y is not incremented. In next step it comes out of second for loop but
stays in first for loop so it consider first character of big string 'a' and
compares it against second letter of small string 'u' as 'j' is incremented even
in this case both of them are not equal and y remains zero. Y is incremented in
the following cases:-
when it compares second letter of big string 'h' and small letter of first string y is incremented for once i,e y=1;
when it compares fourth letter of big string 'h' and small letter of first string y is incremented again i,e y=2;
when it compares sixth letter of big string 'h' and small letter of first string y is incremented again i,e y=3;
Final output is 3. I think that is what we want.

HOW TO "Arbitrary" format items in list/dict/etc. EX: change 4th character in every string in list

first of all i want to mention that there might not be any real life applications for this simple script i created, but i did it because I'm learning and I couldn't find anything similar here in SO. I wanted to know what could be done to "arbitrarily" change characters in an iterable like a list.
Sure tile() is a handy tool I learned relatively quick, but then I got to think what if, just for kicks, i wanted to format (upper case) the last character instead? or the third, the middle one,etc. What about lower case? Replacing specific characters with others?
Like I said this is surely not perfect but could give away some food for thought to other noobs like myself. Plus I think this can be modified in hundreds of ways to achieve all kinds of different formatting.
How about helping me improve what I just did? how about making it more lean and mean? checking for style, methods, efficiency, etc...
Here it goes:
words = ['house', 'flower', 'tree'] #string list
counter = 0 #counter to iterate over the items in list
chars = 4 #character position in string (0,1,2...)
for counter in range (0,len(words)):
while counter < len(words):
z = list(words[counter]) # z is a temp list created to slice words
if len(z) > chars: # to compare char position and z length
upper = [k.upper() for k in z[chars]] # string formatting EX: uppercase
z[chars] = upper [0] # replace formatted character with original
words[counter] = ("".join(z)) # convert and replace temp list back into original word str list
counter +=1
else:
break
print (words)
['housE', 'flowEr', 'tree']

This is somewhat of a combination of both (so +1 to both of them :) ). The main function accepts a list, an arbitrary function and the character to act on:
In [47]: def RandomAlter(l, func, char):
return [''.join([func(w[x]) if x == char else w[x] for x in xrange(len(w))]) for w in l]
....:
In [48]: RandomAlter(words, str.upper, 4)
Out[48]: ['housE', 'flowEr', 'tree']
In [49]: RandomAlter([str.upper(w) for w in words], str.lower, 2)
Out[49]: ['HOuSE', 'FLoWER', 'TReE']
In [50]: RandomAlter(words, lambda x: '_', 4)
Out[50]: ['hous_', 'flow_r', 'tree']
The function RandomAlter can be rewritten as this, which may make it a bit more clear (it takes advantage of a feature called list comprehensions to reduce the lines of code needed).
def RandomAlter(l, func, char):
# For each word in our list
main_list = []
for w in l:
# Create a container that is going to hold our new 'word'
new_word = []
# Iterate over a range that is equal to the number of chars in the word
# xrange is a more memory efficient 'range' - same behavior
for x in xrange(len(w)):
# If the current position is the character we want to modify
if x == char:
# Apply the function to the character and append to our 'word'
# This is a cool Python feature - you can pass around functions
# just like any other variable
new_word.append(func(w[x]))
else:
# Just append the normal letter
new_word.append(w[x])
# Now we append the 'word' to our main_list. However since the 'word' is
# a list of letters, we need to 'join' them together to form a string
main_list.append(''.join(new_word))
# Now just return the main_list, which will be a list of altered words
return main_list

There's much better Pythonistas than me, but here's one attempt:
[''.join(
[a[x].upper() if x == chars else a[x]
for x in xrange(0,len(a))]
)
for a in words]
Also, we're talking about the programmer's 4th, right? What everyone else calls 5th, yes?

Some comments on your code:
for counter in range (0,len(words)):
while counter < len(words):
This won't compile unless you indent the while loop under the for loop. And, if you do that, the inner loop will completely screw up the loop counter for the outer loop. And finally, you almost never want to maintain an explicit loop counter in Python. You probably want this:
for counter, word in enumerate(words):
Next:
z = list(words[counter]) # z is a temp list created to slice words
You can already slice strings, in exactly the same way you slice lists, so this is unnecessary.
Next:
upper = [k.upper() for k in z[chars]] # string formatting EX: uppercase
This is a bad name for the variable, since there's a function with the exact same name—which you're calling on the same line.
Meanwhile, the way you defined things, z[chars] is a character, a copy of words[4]. You can iterate over a single character in Python, because each character is itself a string. but it's generally pointless—[k.upper() for k in z[chars]] is the same thing as [z[chars].upper()].
z[chars] = upper [0] # replace formatted character with original
So you only wanted the list of 1 character to get the first character out of it… why make it a list in the first place? Just replace the last two lines with z[chars] = z[chars].upper().
else:
break
This is going to stop on the first string shorter than length 4, rather than just skip strings shorter than length 4, which is what it seems like you want. The way to say that is continue, not break. Or, better, just fall off the end of the list. In some cases, it's hard to write things without a continue, but in this case, it's easy—it's already at the end of the loop, and in fact it's inside an else: that has nothing else in it, so just remove both lines.
It's hard to tell with upper that your loops are wrong, because if you accidentally call upper twice, it looks the same as if you called it once. Change the upper to chr(ord(k)+1), which replaces any letter with the next letter. Then try it with:
words = ['house', 'flower', 'tree', 'a', 'abcdefgh']
You'll notice that, e.g., you get 'flowgr' instead of 'flowfr'.
You may also want to add a variable that counts up the number of times you run through the inner loop. It should only be len(words) times, but it's actually len(words) * len(words) if you have no short words, or len(words) * len(<up to the first short word>) if you have any. You're making the computer do a whole lot of extra work—if you have 1000 words, it has to do 1000000 loops instead of 1000. In technical terms, your algorithm is O(N^2), even though it only needs to be O(N).
Putting it all together:
words = ['house', 'flower', 'tree', 'a', 'abcdefgh'] #string list
chars = 4 #character position in string (0,1,2...)
for counter, word in enumerate(words):
if len(word) > chars: # to compare char position and z length
z = list(word)
z[chars] = chr(ord(z[chars]+1) # replace character with next character
words[counter] = "".join(z) # convert and replace temp list back into original word str list
print (words)
That does the same thing as your original code (except using "next character" instead of "uppercase character"), without the bugs, with much less work for the computer, and much easier to read.

I think the general case of what you're talking about is a method that, given a string and an index, returns that string, with the indexed character transformed according to some rule.
def transform_string(strng, index, transform):
lst = list(strng)
if index < len(lst):
lst[index] = transform(lst[index])
return ''.join(lst)
words = ['house', 'flower', 'tree']
output = [transform_string(word, 4, str.upper) for word in words]
To make it even more abstract, you could have a factory that returns a method, like so:
def transformation_factory(index, transform):
def inner(word):
lst = list(word)
if index < len(lst):
lst[index] = transform(lst[index])
return inner
transform = transformation_factory(4, lambda x: x.upper())
output = map(transform, words)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find symmetric words in a text [duplicate] - python

The for loop could be modified as: for word in text.lower().split(): for n in range(0,len(word)//2): if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]): break else: sym.append(word) return sym

Related

The longest prefix that is also suffix of two lists

Deciding whether a string is a palindrome

Python How to get the each letters put into a word?

Finding common letters between 2 strings in Python

HOW TO "Arbitrary" format items in list/dict/etc. EX: change 4th character in every string in list

Categories

Resources