The number of differences between characters in a string in Python 3 - python

Given a string, lets say "TATA__", I need to find the total number of differences between adjacent characters in that string. i.e. there is a difference between T and A, but not a difference between A and A, or _ and _.
My code more or less tells me this. But when a string such as "TTAA__" is given, it doesn't work as planned.
I need to take a character in that string, and check if the character next to it is not equal to the first character. If it is indeed not equal, I need to add 1 to a running count. If it is equal, nothing is added to the count.
This what I have so far:
def num_diffs(state):
count = 0
for char in state:
if char != state[char2]:
count += 1
char2 += 1
return count
When I run it using num_diffs("TATA__") I get 4 as the response. When I run it with num_diffs("TTAA__") I also get 4. Whereas the answer should be 2.
If any of that makes sense at all, could anyone help in fixing it/pointing out where my error lies? I have a feeling is has to do with state[char2]. Sorry if this seems like a trivial problem, it's just that I'm totally new to the Python language.

import operator
def num_diffs(state):
return sum(map(operator.ne, state, state[1:]))
To open this up a bit, it maps !=, operator.ne, over state and state beginning at the 2nd character. The map function accepts multible iterables as arguments and passes elements from those one by one as positional arguments to given function, until one of the iterables is exhausted (state[1:] in this case will stop first).
The map results in an iterable of boolean values, but since bool in python inherits from int you can treat it as such in some contexts. Here we are interested in the True values, because they represent the points where the adjacent characters differed. Calling sum over that mapping is an obvious next step.
Apart from the string slicing the whole thing runs using iterators in python3. It is possible to use iterators over the string state too, if one wants to avoid slicing huge strings:
import operator
from itertools import islice
def num_diffs(state):
return sum(map(operator.ne,
state,
islice(state, 1, len(state))))

There are a couple of ways you might do this.
First, you could iterate through the string using an index, and compare each character with the character at the previous index.
Second, you could keep track of the previous character in a separate variable. The second seems closer to your attempt.
def num_diffs(s):
count = 0
prev = None
for ch in s:
if prev is not None and prev!=ch:
count += 1
prev = ch
return count
prev is the character from the previous loop iteration. You assign it to ch (the current character) at the end of each iteration so it will be available in the next.

You might want to investigate Python's groupby function which helps with this kind of analysis.
from itertools import groupby
def num_diffs(seq):
return len(list(groupby(seq))) - 1
for test in ["TATA__", "TTAA__"]:
print(test, num_diffs(test))
This would display:
TATA__ 4
TTAA__ 2
The groupby() function works by grouping identical entries together. It returns a key and a group, the key being the matching single entry, and the group being a list of the matching entries. So each time it returns, it is telling you there is a difference.

Trying to make as little modifications to your original code as possible:
def num_diffs(state):
count = 0
for char2 in range(1, len(state)):
if state[char2 - 1] != state[char2]:
count += 1
return count
One of the problems with your original code was that the char2 variable was not initialized within the body of the function, so it was impossible to predict the function's behaviour.
However, working with indices is not the most Pythonic way and it is error prone (see comments for a mistake that I made). You may want rewrite the function in such a way that it does one loop over a pair of strings, a pair of characters at a time:
def num_diffs(state):
count = 0
for char1, char2 in zip(state[:-1], state[1:]):
if char1 != char2:
count += 1
return count
Finally, that very logic can be written much more succinctly — see #Ilja's answer.

Related

Error:string index out of range, defining a function

I'm practicing coding on codingbat.com since I'm a complete beginner in python, and here is one of the exercises:
Given a string, return a new string made of every other char starting with the first, so "Hello" yields "Hlo".
Here is my attempt at defining the function string_bits(str):
def string_bits(str):
char = 0
first = str[char]
for char in range(len(str)):
char += 2
every_other = str[char]
return (first + every_other)
Running the code gives an error. What's wrong with my code?
A different approach, with an explanation:
If you need to handle a sentence, where spaces would be included, you can do this using slicing. On a string slicing works as:
[start_of_string:end_of_string:jump_this_many_char_in_string]
So, you want to jump only every second letter, so you do:
[::2]
The first two are empty, because you just want to step every second character.
So, you can do this in one line, like this:
>>> " ".join(i[::2] for i in "Hello World".split())
'Hlo Wrd'
What just happened above, is we take our string, use split to make it a list. The split by default will split on a space, so we will have:
["Hello", "World"]
Then, what we will do from there, is using a comprehension, iterate through each item of the list, which will give us a word at a time, and from there we will perform the desired string manipulation per i[::2].
The comprehension is: (documentation)
i[::2] for i in "Hello World".split()
Finally, we call "".join (doc), which will now change our list back to a string, to finally give us the output:
"Hlo Wrd"
Check out the slicing section from the docs: https://docs.python.org/3/tutorial/introduction.html
The problem is that the char += 2 returns a value greater than len(str) as len(str)-1 (the range) + 2 is longer than the string. You could do:
def string_bits(string):
if len(string) == 2:
return string[0]
result = ''
for char in range(0,len(string),2):#range created value sin increments of two
result += string[char]
return result
A more succinct method would be:
def string_bits(string):
return string[::2]
You should avoid using 'str' as a variable name as it is a reserved word by Python.
Ok, for me:
You should not use str as a variable name as it is a python built-in function (replace str by my_str for example)
For example, 'Hello' length is 5, so 0 <= index <= 4. Here you are trying to access index 3+2=5 (when char = 3) in your for loop.
You can achieve what you want with the following code:
def string_bits(my_str):
result = ""
for char in range(0, len(my_str), 2):
result += my_str[char]
return result
The error you are getting means that you are trying to get the nth letter of a string that has less than n characters.
As another suggestion, strings are Sequence-types in Python, which means they have a lot of built-in functionalities for doing exactly what you're trying to do here. See Built-in Types - Python for more information, but know that sequence types support slicing - that is, selection of elements from the sequence.
So, you could slice your string like this:
def string_bits(input_string):
return input_string[::2]
Meaning "take my input_string from the start (:) to the end (:) and select every second (2) element"

Python memory error when searching substring

I am trying to find substring of very large string and getting memory error:
The code:
def substr(string):
le = []
st = list(string)
for s in xrange(len(string)+1):
for s1 in xrange(len(string)+1):
le.append(''.join(st[s:s1]))
cou = Counter(le)
cou_val = cou.keys()
cou_val.remove('')
return le, cou_val
I am getting error as ile "solution.py", line 31, in substr
le.append(''.join(st[s:s1]))
MemoryError
How to tackle this problem?
Answer
I noticed that your code prints all the possible substrings of string in a certain order. I suggest that instead of storing all of them in an array, you use code to return just the substring that you want. I tested the subroutine below with 'a very long string' and it always returns the same value as if you were to get an indexed value from an array.
string = 'a very long string'
def substr2(string,i):
return string[i//(len(string)+1):i%(len(string)+1)]
print(substr2(string,10))
Solution
The way you order the arguments for your for loops (s,s1) work similarly to a number system. s1 increments by 1 until it gets to a given value, then it resets to 0 and s increments by 1, repeating the cycle. This is seen in a decimal system (e.g. 01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16 etc.)
The i//n div operator returns the integer value of i/n. (e.g. 14//10=1).
The i%n mod operator returns the remainder value of i/n. (e.g. 14%10 is 4).
So if we were to, for example, increment i by 1 and define (s,s1) as [i//10,i%10], we would get:
[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7],[0,8],[0,9],[1,0],[1,1],[1,2] etc.
We can utilize these to produce the same answer as in your array.
PS. My first answer. Hope this helps!
It seems that you are running out of memory. When the string is too large the code you posted seems to be copying it over and over again into the le list. As #Rikka's link suggests, buffer/memoryview may be of use for you but I have never used it.
As a workaround to your solution/code I would suggest that instead of storing each substring in le, store the indexes as a tuple. Additionally, I don't think that st list is required (not sure tho if your way speeds it up) so the result would be (code not tested):
def substr(string):
le = []
for s in xrange(len(string)+1):
for s1 in xrange(len(string)+1):
# Skip empty strings
if s!=s1:
le.append((s, s1))
cou = Counter(le)
cou_val = cou.keys()
cou_val.remove('')
return le, cou_val
Now, an example of how you can use the substr is (code not tested):
myString = "very long string here"
matchString = "here"
matchPos = False
indexes, count = substr(myString)
# Get all the substrings without storing them simultaneously in memory
for i in indexes:
# construct substring and compare
if myString[i[0],i[1]]==matchString:
matchPos = i
break
After the above you have start and end positions of the 1st occurrence of "here" into your large string. I am not sure what you try to achieve but this can easily be modified to find all occurrences, count matches, etc - I just post it as example. I am also not sure why the Counter is there...
This approach should not present the memory error, however, it is a trade-off between memory and CPU and I expect it to be bit slower on runtime since every time you use indexes you have to re-construct every substring.
Hope it helps
The solution:
The error in memory is always caused by out of range.And the slice technique also has some rules.
When the step is positive, just like 1, the first index must be greater than the second.And on the contrary, when negative, such as -1, the number of the index is shorter than the second, but it is actually the greater one.(-1 > -2)
So in your program, the index s is greater than s1 when step is one, so you access a place you have not applied for it.And you know, that is MemoryError!!!

Display the number of lower case letters in a string

This is what I have so far:
count=0
mystring=input("enter")
for ch in mystring:
if mystring.lower():
count+=1
print(count)
I figured out how to make a program that displays the number of lower case letters in a string, but it requires that I list each letter individually: if ch=='a' or ch=='b' or ch=='c', etc. I am trying to figure out how to use a command to do so.
This sounds like homework! Anway, this is a fun way of doing it:
#the operator module contains functions that can be used like
#their operator counter parts. The eq function works like the
#'=' operator; it takes two arguments and test them for equality.
from operator import eq
#I want to give a warning about the input function. In python2
#the equivalent function is called raw_input. python2's input
#function is very different, and in this case would require you
#to add quotes around strings. I mention this in case you have
#been manually adding quotes if you are testing in both 2 and 3.
mystring = input('enter')
#So what this line below does is a little different in python 2 vs 3,
#but comes to the same result in each.
#First, map is a function that takes a function as its first argument,
#and applies that to each element of the rest of the arguments, which
#are all sequences. Since eq is a function of two arguments, you can
#use map to apply it to the corresponding elements in two sequences.
#in python2, map returns a list of the elements. In python3, map
#returns a map object, which uses a 'lazy' evaluation of the function
#you give on the sequence elements. This means that the function isn't
#actually used until each item of the result is needed. The 'sum' function
#takes a sequence of values and adds them up. The results of eq are all
#True or False, which are really just special names for 1 and 0 respectively.
#Adding them up is the same as adding up a sequence of 1s and 0s.
#so, map is using eq to check each element of two strings (i.e. each letter)
#for equality. mystring.lower() is a copy of mystring with all the letters
#lowercase. sum adds up all the Trues to get the answer you want.
sum(map(eq, mystring, mystring.lower()))
or the one-liner:
#What I am doing here is using a generator expression.
#I think reading it is the best way to understand what is happening.
#For every letter in the input string, check if it is lower, and pass
#that result to sum. sum sees this like any other sequence, but this sequence
#is also 'lazy,' each element is generated as you need it, and it isn't
#stored anywhere. The results are just given to sum.
sum(c.islower() for c in input('enter: '))
You have a typo in your code. Instead of:
if my.string.lower():
It should be:
if ch.islower():
If you have any questions ask below. Good luck!
I'm not sure if this will handle UTF or special characters very nicely but should work for at least ASCII in Python3, using the islower() function.
count=0
mystring=input("enter:")
for ch in mystring:
if ch.islower():
count+=1
print(count)
The correct version of your code would be:
count=0
mystring=input("enter")
for ch in mystring:
if ch.islower():
count += 1
print(count)
The method lower converts a string/char to lowercase. Here you want to know if it IS lowercase (you want a boolean), so you need islower.
Tip: With a bit of wizardry you can even write this:
mystring= input("enter")
count = sum(map(lambda x: x.islower(), mystring))
or
count = sum([x.islower() for x in mystring])
(True is automatically converted to 1 and False to 0)
:)
I think you can use following method:
mystring=input("enter:")
[char.lower() for char in mystring].count( True ) )

Count occurrences of a given character in a string using recursion

I have to make a function called countLetterString(char, str) where
I need to use recursion to find the amount of times the given character appears in the string.
My code so far looks like this.
def countLetterString(char, str):
if not str:
return 0
else:
return 1 + countLetterString(char, str[1:])
All this does is count how many characters are in the string but I can't seem to figure out how to split the string then see whether the character is the character split.
The first step is to break this problem into pieces:
1. How do I determine if a character is in a string?
If you are doing this recursively you need to check if the first character of the string.
2. How do I compare two characters?
Python has a == operator that determines whether or not two things are equivalent
3. What do I do after I know whether or not the first character of the string matches or not?
You need to move on to the remainder of the string, yet somehow maintain a count of the characters you have seen so far. This is normally very easy with a for-loop because you can just declare a variable outside of it, but recursively you have to pass the state of the program to each new function call.
Here is an example where I compute the length of a string recursively:
def length(s):
if not s: # test if there are no more characters in the string
return 0
else: # maintain a count by adding 1 each time you return
# get all but the first character using a slice
return 1 + length( s[1:] )
from this example, see if you can complete your problem. Yours will have a single additional step.
4. When do I stop recursing?
This is always a question when dealing with recursion, when do I need to stop recalling myself. See if you can figure this one out.
EDIT:
not s will test if s is empty, because in Python the empty string "" evaluates to False; and not False == True
First of all, you shouldn't use str as a variable name as it will mask the built-in str type. Use something like s or text instead.
The if str == 0: line will not do what you expect, the correct way to check if a string is empty is with if not str: or if len(str) == 0: (the first method is preferred). See this answer for more info.
So now you have the base case of the recursion figured out, so what is the "step". You will either want to return 1 + countLetterString(...) or 0 + countLetterString(...) where you are calling countLetterString() with one less character. You will use the 1 if the character you remove matches char, or 0 otherwise. For example you could check to see if the first character from s matches char using s[0] == char.
To remove a single character in the string you can use slicing, so for the string s you can get all characters but the first using s[1:], or all characters but the last using s[:-1]. Hope that is enough to get you started!
Reasoning about recursion requires breaking the problem into "regular" and "special" cases. What are the special cases here? Well, if the string is empty, then char certainly isn't in the string. Return 0 in that case.
Are there other special cases? Not really! If the string isn't empty, you can break it into its first character (the_string[0]) and all the rest (the_string[1:]). Then you can recursively count the number of character occurrences in the rest, and add 1 if the first character equals the char you're looking for.
I assume this is an assignment, so I won't write the code for you. It's not hard. Note that your if str == 0: won't work: that's testing whether str is the integer 0. if len(str) == 0: is a way that will work, and if str == "": is another. There are shorter ways, but at this point those are probably clearest.
First of all you I would suggest not using char or str. Str is a built function/type and while I don't believe char would give you any problems, it's a reserved word in many other languages. Second you can achieve the same functionality using count, as in :
letterstring="This is a string!"
letterstring.count("i")
which would give you the number of occurrences of i in the given string, in this case 3.
If you need to do it purely for speculation, the thing to remember with recursion is carrying some condition or counter over which each call and placing some kind of conditional within the code that will change it. For example:
def countToZero(count):
print(str(count))
if count > 0:
countToZero(count-1)
Keep it mind this is a very quick example, but as you can see on each call I print the current value and then the function calls itself again while decrementing the count. Once the count is no longer greater than 0 the function will end.
Knowing this you will want to keep track of you count, the index you are comparing in the string, the character you are searching for, and the string itself given your example. Without doing the code for you, I think that should at least give you a start.
You have to decide a base case first. The point where the recursion unwinds and returns.
In this case the the base case would be the point where there are no (further) instances of a particular character, say X, in the string. (if string.find(X) == -1: return count) and the function makes no further calls to itself and returns with the number of instances it found, while trusting its previous caller information.
Recursion means a function calling itself from within, therefore creating a stack(at least in Python) of calls and every call is an individual and has a specified purpose with no knowledge whatsoever of what happened before it was called, unless provided, to which it adds its own result and returns(not strictly speaking). And this information has to be supplied by its invoker, its parent, or can be done using global variables which is not advisable.
So in this case that information is how many instances of that particular character were found by the parent function in the first fraction of the string. The initial function call, made by us, also needs to be supplied that information, since we are the root of all function calls and have no idea(as we haven't treaded the string) of how many Xs are there we can safely tell the initial call that since I haven't gone through the string and haven't found any or zero/0 X therefore here's the string entire string and could you please tread the rest of it and find out how many X are in there. This 0 as a convenience could be the default argument of the function, or you have to supply the 0 every time you make the call.
When will the function call another function?
Recursion is breaking down the task into the most granular level(strictly speaking, maybe) and leave the rest to the (grand)child(ren). The most granular break down of this task would be finding a single instance of X and passing the rest of the string from the point, exclusive(point + 1) at which it occurred to the next call, and adding 1 to the count which its parent function supplied it with.
if not string.find(X) == -1:
string = string[string.find(X) + 1:]
return countLetterString(char, string, count = count + 1)`
Counting X in file through iteration/loop.
It would involve opening the file(TextFILE), then text = read(TextFile)ing it, text is a string. Then looping over each character (for char in text:) , remember granularity, and each time char (equals) == X, increment count by +=1. Before you run the loop specify that you never went through the string and therefore your count for the number X (in text) was = 0. (Sounds familiar?)
return count.
#This function will print the count using recursion.
def countrec(s, c, cnt = 0):
if len(s) == 0:
print(cnt)
return 0
if s[-1] == c:
countrec(s[0:-1], c, cnt+1)
else:
countrec(s[0:-1], c, cnt)
#Function call
countrec('foobar', 'o')
With an extra parameter, the same function can be implemented.
Woking function code:
def countLetterString(char, str, count = 0):
if len(str) == 0:
return count
if str[-1] == char:
return countLetterString(char, str[0:-1], count+1)
else:
return countLetterString(char, str[0:-1], count)
The below function signature accepts 1 more parameter - count.
(P.S : I was presented this question where the function signature was pre-defined; just had to complete the logic.)
Hereby, the code :
def count_occurrences(s, substr, count=0):
''' s - indicates the string,
output : Returns the count of occurrences of substr found in s
'''
len_s = len(s)
len_substr = len(substr)
if len_s == 0:
return count
if len_s < len_substr:
return count
if substr == s[0:len_substr]:
count += 1
count = count_occurrences(s[1:], substr, count) ## RECURSIVE CALL
return count
output behavior :
count_occurences("hishiihisha", "hi", 0) => 3
count_occurences("xxAbx", "xx") => 1 (not mandatory to pass the count , since it's a positional arg.)

Failing to understand recursion

New to Python and trying to understand recursion. I'm trying to make a program that prints out the number of times string 'key' is found in string 'target' using a recursive function, as in Problem 1 of the MIT intro course problem set. I'm having a problem trying to figure out how the function will run. I've read the documentation and some tutorials on it, but does anyone have any tips on how to better comprehend recursion to help me fix this code?
from string import *
def countR(target,key):
numb = 0
if target.find(key) == -1:
print numb
else:
numb +=1
return countR(target[find(target,key):],key)
countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a')
By recursion you want to split the problem into smaller sub-problems that you can solve independently and then combine their solution together to get the final solution.
In your case you can split the task in two parts: Checking where (if) first occurence of key exists and then counting recursively for the rest.
Is there a key in there:
- No: Return 0.
- Yes: Remove key and say that the number of keys is 1 + number of key in the rest
In Code:
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+ countR(target[target.find(key)+len(key):],key)
Edit:
The following code then prints the desired result:
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))
This is not how recursion works. numb is useless - every time you enter the recursion, numb is created again as 0, so it can only be 0 or 1 - never the actual result you seek.
Recursion works by finding the answer the a smaller problem, and using it to solve the big problem. In this case, you need to find the number of appearances of the key in a string that does not contain the first appearance, and add 1 to it.
Also, you need to actually advance the slice so the string you just found won't appear again.
from string import *
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+countR(target[target.find(key)+len(key):],key)
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))
Most recursive functions that I've seen make a point of returning an interesting value upon which higher frames build. Your function doesn't do that, which is probably why it's confusing you. Here's a recursive function that gives you the factorial of an integer:
def factorial(n):
"""return the factorial of any positive integer n"""
if n > 1:
return n * factorial(n - 1)
else:
return 1 # Cheating a little bit by ignoring illegal values of n
The above function demonstrates what I'd call the "normal" kind of recursion – the value returned by inner frames is operated upon by outer frames.
Your function is a little unusual in that it:
Doesn't always return a value.
Outer frames don't do anything with the returned value of inner frames.
Let's see if we can refactor it to follow a more conventional recursion pattern. (Written as spoiler syntax so you can see if you can get it on your own, first):
def countR(target,key):
idx = target.find(key)`
if idx > -1:
return 1 + countR(target[idx + 1:], key)
else:
return 0
Here, countR adds 1 each time it finds a target, and then recurs upon the remainder of the string. If it doesn't find a match it still returns a value, but it does two critical things:
When added to outer frames, doesn't change the value.
Doesn't recur any further.
(OK, so the critical things are things it doesn't do. You get the picture.)
Meta/Edit: Despite this meta article it's apparently not possible to actually properly format code in spoiler text. So I'll leave it unformatted until that feature is fixed, or forever, whichever comes first.
If key is not found in target, print numb, else create a new string that starts after the the found occurrence (so cut away the beginning) and continue the search from there.

Categories