Python code to check repetition in a string - python

I'm trying to build logic in programming. I need to write a python function that takes string as input and checks whether any character appears more than once. The function should return True if there are no repetitions and False otherwise. I have searched online and found several examples related to it. I wrote the code and it seemed fine initially but then I realized my mistake and now I'm not getting how should I go about it. Please guide
def repfree(S):
for char in S:
if S.count(char) > 1:
return True
return False

Here, you can create a character list to keep track of the characters that have already occurred in S.
Have a look at the code below, hope it helps:>
def repfree(S):
freq = []
for char in S:
# if the character is already in list that means S contains repeated char
if char in freq:
return False
else:
freq.append(char)
return True

How about you use a set
def repfree(s):
char_set = set()
for c in s:
char_set.append(c)
return len(char_set) == len(s)

You can try on the following code.
def rep_free(text):
return len(text) != len(set(text))
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
From python doc

This should be done using set data structure of python, set is having a property where it will always have unique characters only.
string="country"
def repetion(string):
if len(set(string)) == len(string):
print("string is having unique chars")
else:
print("chars are different")

Related

Python returns generator object without yield

I am trying to generate a function which tells me if a word is an isogram (contains duplicate values). However, this code always returns a generator object.
def is_isogram(string):
return (True if (string.lower().count(letter) == 1) else False for letter in string.lower())
I know how to solve the problem, I was just wondering what is wrong with my code.
I suppose your function is intended to return a boolean, but the return statement has an iteration happening, where each value is mapped to a boolean. The parentheses give you an iterator over those booleans. The function's description suggests that the function should return True when the input has duplicate letters, but the mapping gives True when a letter is not duplicate. So you have three problems:
The iterator
The multiple booleans, when you want one boolean
The booleans indicate the inverse of what you want to return
So your idea for an algorithm should be changed to this:
def is_isogram(string):
return any(letter for letter in string.lower() if string.lower().count(letter) > 1)
Side note: this algorithm is not efficient. More efficient is to create a set:
def is_isogram(string):
return len(set(string.lower())) < len(string)
By wrapping your returned value into parenthesis you've created a generator expression, check https://peps.python.org/pep-0289/ for details.

Why can I not check for comparisons on strings without them being lists?

I was working on the popular palindrome question in python. I originally thought this code would be enough:
def is_palindrome(input_string):
rev_str = reversed(input_string)
if rev_str == input_string:
return True
else:
return False
But only some of the examples ended up being correct. I checked the solution and I had to change the strings into lists for the code to work properly but I don't understand why.
def is_palindrome(input_string):
rev_str = reversed(input_string)
if list(rev_str) == list(input_string):
return True
else:
return False
Any help on understanding why this is the case would be really helpful.
The problem is that reversed("hello") returns a reversed iterator object, not "olleh." This is to save memory, as it doesn't need to compute all the letters until you need them.
>>> reversed("hello")
<reversed object at 0x02A7B170>
If this confuses you, look into what iterators are.
If you want to reverse a string, you can just do
s[::-1]
Where s is your string.

The number of differences between characters in a string in Python 3

Given a string, lets say "TATA__", I need to find the total number of differences between adjacent characters in that string. i.e. there is a difference between T and A, but not a difference between A and A, or _ and _.
My code more or less tells me this. But when a string such as "TTAA__" is given, it doesn't work as planned.
I need to take a character in that string, and check if the character next to it is not equal to the first character. If it is indeed not equal, I need to add 1 to a running count. If it is equal, nothing is added to the count.
This what I have so far:
def num_diffs(state):
count = 0
for char in state:
if char != state[char2]:
count += 1
char2 += 1
return count
When I run it using num_diffs("TATA__") I get 4 as the response. When I run it with num_diffs("TTAA__") I also get 4. Whereas the answer should be 2.
If any of that makes sense at all, could anyone help in fixing it/pointing out where my error lies? I have a feeling is has to do with state[char2]. Sorry if this seems like a trivial problem, it's just that I'm totally new to the Python language.
import operator
def num_diffs(state):
return sum(map(operator.ne, state, state[1:]))
To open this up a bit, it maps !=, operator.ne, over state and state beginning at the 2nd character. The map function accepts multible iterables as arguments and passes elements from those one by one as positional arguments to given function, until one of the iterables is exhausted (state[1:] in this case will stop first).
The map results in an iterable of boolean values, but since bool in python inherits from int you can treat it as such in some contexts. Here we are interested in the True values, because they represent the points where the adjacent characters differed. Calling sum over that mapping is an obvious next step.
Apart from the string slicing the whole thing runs using iterators in python3. It is possible to use iterators over the string state too, if one wants to avoid slicing huge strings:
import operator
from itertools import islice
def num_diffs(state):
return sum(map(operator.ne,
state,
islice(state, 1, len(state))))
There are a couple of ways you might do this.
First, you could iterate through the string using an index, and compare each character with the character at the previous index.
Second, you could keep track of the previous character in a separate variable. The second seems closer to your attempt.
def num_diffs(s):
count = 0
prev = None
for ch in s:
if prev is not None and prev!=ch:
count += 1
prev = ch
return count
prev is the character from the previous loop iteration. You assign it to ch (the current character) at the end of each iteration so it will be available in the next.
You might want to investigate Python's groupby function which helps with this kind of analysis.
from itertools import groupby
def num_diffs(seq):
return len(list(groupby(seq))) - 1
for test in ["TATA__", "TTAA__"]:
print(test, num_diffs(test))
This would display:
TATA__ 4
TTAA__ 2
The groupby() function works by grouping identical entries together. It returns a key and a group, the key being the matching single entry, and the group being a list of the matching entries. So each time it returns, it is telling you there is a difference.
Trying to make as little modifications to your original code as possible:
def num_diffs(state):
count = 0
for char2 in range(1, len(state)):
if state[char2 - 1] != state[char2]:
count += 1
return count
One of the problems with your original code was that the char2 variable was not initialized within the body of the function, so it was impossible to predict the function's behaviour.
However, working with indices is not the most Pythonic way and it is error prone (see comments for a mistake that I made). You may want rewrite the function in such a way that it does one loop over a pair of strings, a pair of characters at a time:
def num_diffs(state):
count = 0
for char1, char2 in zip(state[:-1], state[1:]):
if char1 != char2:
count += 1
return count
Finally, that very logic can be written much more succinctly — see #Ilja's answer.

Count occurrences of a given character in a string using recursion

I have to make a function called countLetterString(char, str) where
I need to use recursion to find the amount of times the given character appears in the string.
My code so far looks like this.
def countLetterString(char, str):
if not str:
return 0
else:
return 1 + countLetterString(char, str[1:])
All this does is count how many characters are in the string but I can't seem to figure out how to split the string then see whether the character is the character split.
The first step is to break this problem into pieces:
1. How do I determine if a character is in a string?
If you are doing this recursively you need to check if the first character of the string.
2. How do I compare two characters?
Python has a == operator that determines whether or not two things are equivalent
3. What do I do after I know whether or not the first character of the string matches or not?
You need to move on to the remainder of the string, yet somehow maintain a count of the characters you have seen so far. This is normally very easy with a for-loop because you can just declare a variable outside of it, but recursively you have to pass the state of the program to each new function call.
Here is an example where I compute the length of a string recursively:
def length(s):
if not s: # test if there are no more characters in the string
return 0
else: # maintain a count by adding 1 each time you return
# get all but the first character using a slice
return 1 + length( s[1:] )
from this example, see if you can complete your problem. Yours will have a single additional step.
4. When do I stop recursing?
This is always a question when dealing with recursion, when do I need to stop recalling myself. See if you can figure this one out.
EDIT:
not s will test if s is empty, because in Python the empty string "" evaluates to False; and not False == True
First of all, you shouldn't use str as a variable name as it will mask the built-in str type. Use something like s or text instead.
The if str == 0: line will not do what you expect, the correct way to check if a string is empty is with if not str: or if len(str) == 0: (the first method is preferred). See this answer for more info.
So now you have the base case of the recursion figured out, so what is the "step". You will either want to return 1 + countLetterString(...) or 0 + countLetterString(...) where you are calling countLetterString() with one less character. You will use the 1 if the character you remove matches char, or 0 otherwise. For example you could check to see if the first character from s matches char using s[0] == char.
To remove a single character in the string you can use slicing, so for the string s you can get all characters but the first using s[1:], or all characters but the last using s[:-1]. Hope that is enough to get you started!
Reasoning about recursion requires breaking the problem into "regular" and "special" cases. What are the special cases here? Well, if the string is empty, then char certainly isn't in the string. Return 0 in that case.
Are there other special cases? Not really! If the string isn't empty, you can break it into its first character (the_string[0]) and all the rest (the_string[1:]). Then you can recursively count the number of character occurrences in the rest, and add 1 if the first character equals the char you're looking for.
I assume this is an assignment, so I won't write the code for you. It's not hard. Note that your if str == 0: won't work: that's testing whether str is the integer 0. if len(str) == 0: is a way that will work, and if str == "": is another. There are shorter ways, but at this point those are probably clearest.
First of all you I would suggest not using char or str. Str is a built function/type and while I don't believe char would give you any problems, it's a reserved word in many other languages. Second you can achieve the same functionality using count, as in :
letterstring="This is a string!"
letterstring.count("i")
which would give you the number of occurrences of i in the given string, in this case 3.
If you need to do it purely for speculation, the thing to remember with recursion is carrying some condition or counter over which each call and placing some kind of conditional within the code that will change it. For example:
def countToZero(count):
print(str(count))
if count > 0:
countToZero(count-1)
Keep it mind this is a very quick example, but as you can see on each call I print the current value and then the function calls itself again while decrementing the count. Once the count is no longer greater than 0 the function will end.
Knowing this you will want to keep track of you count, the index you are comparing in the string, the character you are searching for, and the string itself given your example. Without doing the code for you, I think that should at least give you a start.
You have to decide a base case first. The point where the recursion unwinds and returns.
In this case the the base case would be the point where there are no (further) instances of a particular character, say X, in the string. (if string.find(X) == -1: return count) and the function makes no further calls to itself and returns with the number of instances it found, while trusting its previous caller information.
Recursion means a function calling itself from within, therefore creating a stack(at least in Python) of calls and every call is an individual and has a specified purpose with no knowledge whatsoever of what happened before it was called, unless provided, to which it adds its own result and returns(not strictly speaking). And this information has to be supplied by its invoker, its parent, or can be done using global variables which is not advisable.
So in this case that information is how many instances of that particular character were found by the parent function in the first fraction of the string. The initial function call, made by us, also needs to be supplied that information, since we are the root of all function calls and have no idea(as we haven't treaded the string) of how many Xs are there we can safely tell the initial call that since I haven't gone through the string and haven't found any or zero/0 X therefore here's the string entire string and could you please tread the rest of it and find out how many X are in there. This 0 as a convenience could be the default argument of the function, or you have to supply the 0 every time you make the call.
When will the function call another function?
Recursion is breaking down the task into the most granular level(strictly speaking, maybe) and leave the rest to the (grand)child(ren). The most granular break down of this task would be finding a single instance of X and passing the rest of the string from the point, exclusive(point + 1) at which it occurred to the next call, and adding 1 to the count which its parent function supplied it with.
if not string.find(X) == -1:
string = string[string.find(X) + 1:]
return countLetterString(char, string, count = count + 1)`
Counting X in file through iteration/loop.
It would involve opening the file(TextFILE), then text = read(TextFile)ing it, text is a string. Then looping over each character (for char in text:) , remember granularity, and each time char (equals) == X, increment count by +=1. Before you run the loop specify that you never went through the string and therefore your count for the number X (in text) was = 0. (Sounds familiar?)
return count.
#This function will print the count using recursion.
def countrec(s, c, cnt = 0):
if len(s) == 0:
print(cnt)
return 0
if s[-1] == c:
countrec(s[0:-1], c, cnt+1)
else:
countrec(s[0:-1], c, cnt)
#Function call
countrec('foobar', 'o')
With an extra parameter, the same function can be implemented.
Woking function code:
def countLetterString(char, str, count = 0):
if len(str) == 0:
return count
if str[-1] == char:
return countLetterString(char, str[0:-1], count+1)
else:
return countLetterString(char, str[0:-1], count)
The below function signature accepts 1 more parameter - count.
(P.S : I was presented this question where the function signature was pre-defined; just had to complete the logic.)
Hereby, the code :
def count_occurrences(s, substr, count=0):
''' s - indicates the string,
output : Returns the count of occurrences of substr found in s
'''
len_s = len(s)
len_substr = len(substr)
if len_s == 0:
return count
if len_s < len_substr:
return count
if substr == s[0:len_substr]:
count += 1
count = count_occurrences(s[1:], substr, count) ## RECURSIVE CALL
return count
output behavior :
count_occurences("hishiihisha", "hi", 0) => 3
count_occurences("xxAbx", "xx") => 1 (not mandatory to pass the count , since it's a positional arg.)

efficiently checking that string consists of one character in Python

What is an efficient way to check that a string s in Python consists of just one character, say 'A'? Something like all_equal(s, 'A') which would behave like this:
all_equal("AAAAA", "A") = True
all_equal("AAAAAAAAAAA", "A") = True
all_equal("AAAAAfAAAAA", "A") = False
Two seemingly inefficient ways would be to: first convert the string to a list and check each element, or second to use a regular expression. Are there more efficient ways or are these the best one can do in Python? Thanks.
This is by far the fastest, several times faster than even count(), just time it with that excellent mgilson's timing suite:
s == len(s) * s[0]
Here all the checking is done inside the Python C code which just:
allocates len(s) characters;
fills the space with the first character;
compares two strings.
The longer the string is, the greater is time bonus. However, as mgilson writes, it creates a copy of the string, so if your string length is many millions of symbols, it may become a problem.
As we can see from timing results, generally the fastest ways to solve the task do not execute any Python code for each symbol. However, the set() solution also does all the job inside C code of the Python library, but it is still slow, probably because of operating string through Python object interface.
UPD: Concerning the empty string case. What to do with it strongly depends on the task. If the task is "check if all the symbols in a string are the same", s == len(s) * s[0] is a valid answer (no symbols mean an error, and exception is ok). If the task is "check if there is exactly one unique symbol", empty string should give us False, and the answer is s and s == len(s) * s[0], or bool(s) and s == len(s) * s[0] if you prefer receiving boolean values. Finally, if we understand the task as "check if there are no different symbols", the result for empty string is True, and the answer is not s or s == len(s) * s[0].
>>> s = 'AAAAAAAAAAAAAAAAAAA'
>>> s.count(s[0]) == len(s)
True
This doesn't short circuit. A version which does short-circuit would be:
>>> all(x == s[0] for x in s)
True
However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)
Here's a simple timeit script to test some of the other options posted:
import timeit
import re
def test_regex(s,regex=re.compile(r'^(.)\1*$')):
return bool(regex.match(s))
def test_all(s):
return all(x == s[0] for x in s)
def test_count(s):
return s.count(s[0]) == len(s)
def test_set(s):
return len(set(s)) == 1
def test_replace(s):
return not s.replace(s[0],'')
def test_translate(s):
return not s.translate(None,s[0])
def test_strmul(s):
return s == s[0]*len(s)
tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')
print "WITH ALL EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
print globals()[test]("AAAAAAAAAAAAAAAAA")
raise AssertionError
print
print "WITH FIRST NON-EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
print globals()[test]("FAAAAAAAAAAAAAAAA")
raise AssertionError
On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.count smokes set and all, and beats str.replace by a little, but is edged out by str.translate and strmul is currently in the lead by a good margin:
WITH ALL EQUAL
test_all 5.83863711357
test_count 0.947771072388
test_set 2.01028490067
test_replace 1.24682998657
test_translate 0.941282987595
test_strmul 0.629556179047
test_regex 2.52913498878
WITH FIRST NON-EQUAL
test_all 2.41147494316
test_count 0.942595005035
test_set 2.00480484962
test_replace 0.960338115692
test_translate 0.924381017685
test_strmul 0.622269153595
test_regex 1.36632800102
The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.
Eventually, if you hit the best case for all enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set solution though as I don't see any case where it could possibly beat out the count solution.
If memory could be an issue, you'll need to avoid str.translate, str.replace and strmul as those create a second string, but this isn't usually a concern these days.
You could convert to a set and check there is only one member:
len(set("AAAAAAAA"))
Try using the built-in function all:
all(c == 'A' for c in s)
If you need to check if all the characters in the string are same and is equal to a given character, you need to remove all duplicates and check if the final result equals the single character.
>>> set("AAAAA") == set("A")
True
In case you desire to find if there is any duplicate, just check the length
>>> len(set("AAAAA")) == 1
True
Adding another solution to this problem
>>> not "AAAAAA".translate(None,"A")
True
Interesting answers so far. Here's another:
flag = True
for c in 'AAAAAAAfAAAA':
if not c == 'A':
flag = False
break
The only advantage I can think of to mine is that it doesn't need to traverse the entire string if it finds an inconsistent character.
not len("AAAAAAAAA".replace('A', ''))

Categories