How to get the position of a character in Python? - python

How can I get the position of a character inside a string in Python?

There are two string methods for this, find() and index(). The difference between the two is what happens when the search string isn't found. find() returns -1 and index() raises a ValueError.
Using find()
>>> myString = 'Position of a character'
>>> myString.find('s')
2
>>> myString.find('x')
-1
Using index()
>>> myString = 'Position of a character'
>>> myString.index('s')
2
>>> myString.index('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
From the Python manual
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.
And:
string.index(s, sub[, start[, end]])
Like find() but raise ValueError when the substring is not found.

Just for a sake of completeness, if you need to find all positions of a character in a string, you can do the following:
s = 'shak#spea#e'
c = '#'
print([pos for pos, char in enumerate(s) if char == c])
which will print: [4, 9]

>>> s="mystring"
>>> s.index("r")
4
>>> s.find("r")
4
"Long winded" way
>>> for i,c in enumerate(s):
... if "r"==c: print i
...
4
to get substring,
>>> s="mystring"
>>> s[4:10]
'ring'

Just for completion, in the case I want to find the extension in a file name in order to check it, I need to find the last '.', in this case use rfind:
path = 'toto.titi.tata..xls'
path.find('.')
4
path.rfind('.')
15
in my case, I use the following, which works whatever the complete file name is:
filename_without_extension = complete_name[:complete_name.rfind('.')]

What happens when the string contains a duplicate character?
from my experience with index() I saw that for duplicate you get back the same index.
For example:
s = 'abccde'
for c in s:
print('%s, %d' % (c, s.index(c)))
would return:
a, 0
b, 1
c, 2
c, 2
d, 4
In that case you can do something like that:
for i, character in enumerate(my_string):
# i is the position of the character in the string

string.find(character)
string.index(character)
Perhaps you'd like to have a look at the documentation to find out what the difference between the two is.

A character might appear multiple times in a string. For example in a string sentence, position of e is 1, 4, 7 (because indexing usually starts from zero). but what I find is both of the functions find() and index() returns first position of a character. So, this can be solved doing this:
def charposition(string, char):
pos = [] #list to store positions for each 'char' in 'string'
for n in range(len(string)):
if string[n] == char:
pos.append(n)
return pos
s = "sentence"
print(charposition(s, 'e'))
#Output: [1, 4, 7]

If you want to find the first match.
Python has a in-built string method that does the work: index().
string.index(value, start, end)
Where:
Value: (Required) The value to search for.
start: (Optional) Where to start the search. Default is 0.
end: (Optional) Where to end the search. Default is to the end of the string.
def character_index():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
return string.index(match)
print(character_index())
> 15
If you want to find all the matches.
Let's say you need all the indexes where the character match is and not just the first one.
The pythonic way would be to use enumerate().
def character_indexes():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
indexes_of_match = []
for index, character in enumerate(string):
if character == match:
indexes_of_match.append(index)
return indexes_of_match
print(character_indexes())
# [15, 18, 42, 53]
Or even better with a list comprehension:
def character_indexes_comprehension():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
return [index for index, character in enumerate(string) if character == match]
print(character_indexes_comprehension())
# [15, 18, 42, 53]

more_itertools.locate is a third-party tool that finds all indicies of items that satisfy a condition.
Here we find all index locations of the letter "i".
Given
import more_itertools as mit
text = "supercalifragilisticexpialidocious"
search = lambda x: x == "i"
Code
list(mit.locate(text, search))
# [8, 13, 15, 18, 23, 26, 30]

Most methods I found refer to finding the first substring in a string. To find all the substrings, you need to work around.
For example:
Define the string
vars = 'iloveyoutosimidaandilikeyou'
Define the substring
key = 'you'
Define a function that can find the location for all the substrings within the string
def find_all_loc(vars, key):
pos = []
start = 0
end = len(vars)
while True:
loc = vars.find(key, start, end)
if loc is -1:
break
else:
pos.append(loc)
start = loc + len(key)
return pos
pos = find_all_loc(vars, key)
print(pos)
[5, 24]

A solution with numpy for quick access to all indexes:
string_array = np.array(list(my_string))
char_indexes = np.where(string_array == 'C')

Related

Counting in lists

I need to write a function, tag_count, that takes as its argument a list of strings. It should return a count of how many of those strings are XML tags. You can tell if a string is an XML tag if it begins with a left angle bracket "<" and end with a right angle bracket ">".
def tag_count(input_list):
found = 0
counts = input_list.count('<')
for key in input_list:
if key == counts:
found += 1
return found
Test for the tag_count function:
list1 = ['<greeting>', 'Hello World!', '</greeting>']
count = tag_count(list1)
print("Expected result: 2, Actual result: {}".format(count))
Can someone tell me why this does not work - and come up with
something that does using a def function.
At the moment, it is returning: Expected result: 2, Actual result: 0
The main problem with your trying to count the number of strings in your list that are a single '<'. You need to iterate over your list and count the strings that begin and end with angle brackets:
>>> def tag_count(lst):
return sum(s[0] == '<' and s[-1] == '>' for s in lst)
>>>
>>> list1 = ['<greeting>', 'Hello World!', '</greeting>']
>>> count = tag_count(list1)
>>> count
2
>>>
If there may be cases where there are empty strings in your data, use str.starstwith and str.endswith rather than indexing to avoid an IndexError:
return sum(s.startswith('<') and s.endswith('>') for s in lst)
Taking Cuber's answer into account, a safe and readable way to count XML tags could be:
def is_key_XML(key):
try :
return (key[0] == '<') and (key[-1] == '>')
except IndexError:
return False
def tag_count(input_list):
return sum(is_key_XML(k) for k in input_list)
And the test could be:
list1 = ['<greeting>', 'Hello World!', '</greeting>', '< Graou', 'L', '<>', '']
count = tag_count(list1)
print("Expected result: 3, Actual result: {}".format(count))
def tag_count(input_list):
found = 0
for key in input_list:
if (len(key) > 1) and (key[0] == '<') and (key[-1] == '>'):
found += 1
return found
You need to check whether the characters in your key correspond to '>' or '<'.
Also, len(key) > 1 checks whether the string has atleast 2 characters.
list1 = ['<greeting>', 'Hello World!', '</greeting>', '']
import re
len( [ s for s in list1 if re.match(r'<.*>', s) ] )
Output:
2
You can write it in a list comprehension notation:
requested_strs = len([s for s in input_list if s and s.startswith('<') and s.endswith('>')])
Even though it is a simple solution, I don't recommend using regexes in case. Compiling regex to match the strings and matching them will take to much time to perform a simple check as this one..

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.
So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:
matchPat ( 'a*t*r', 'anteaters' ) : True
matchPat ( 'a*t*r', 'albatross' ) : True
matchPat ( 'a*t*r', 'artist' ) : False
My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.
def matchPat(str1,str2):
## str(*)==str(=>1)
if str1=='':
return True
elif str2=='':
return False
elif str1[0]==str2[0]:
return matchPat(str1[2],str2[len(str1)-1])
else: return True
Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.
You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].
You can find the offset of next occurrence of a substring in a string using the string's find method.
So the problem of wildcard matching becomes:
split the pattern into parts which were separated by astrix
for each part of the pattern
can we find this after the previous part's locations?
You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!
There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.
I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.
If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.
Since this is homework I am hoping that gives you enough of an idea to move on.
The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.
Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-
def matchPat(str1, str2):
index1 = 0
index2 = 0
while index1 < len(str1):
c = str1[index1]
#Check if the str2 has run it's course.
if index2 >= len(str2):
#This needs to be checked,assuming matchPatch("*", "") to be true
if(len(str2) == 0 and str1 == "*"):
return True
return False
#If c is not "*", then it's normal comparision.
if c != "*":
if c != str2[index2]:
return False
index2 += 1
#If c is "*", then you need to increment str1,
#search for the next value in str2,
#and update index2
else:
index1 += 1
if(index1 == len(str1)):
return True
c = str1[index1]
#Search the character in str2
i = str2.find(c, index2)
#If search fails, return False
if(i == -1):
return False
index2 = i + 1
index1 += 1
return True
OUTPUT -
print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False
Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.
needles = needle.split('*')
Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.
In pseudocode, it'll look like this:
needles = needle.split('*')
i = 0
loop through all strings in needles:
if current needle not in haystack[i:], return false
increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true
Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:
import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' )) # False
And if asterisks are a strict necessity, you can use regular expressions for that, too:
newstr = re.sub('\*', '.', 'a*t*r') # Replace * with .
bool(re.search(newstr, 'anteasters')) # Search using the new string
If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:
def matchpat(str1, str2):
if len(str1) > len(str2): return False #Can't match if the first string is longer
for i in range(0, len(str2)-len(str1)+1):
substring = str2[i:i+len(str1)] # create substring of same length as first string
for j in range(0, len(str1)):
matched = False # assume False until match is found
if str1[j] != '*' and str1[j] != substring[j]: # check each character
break
matched = True
if matched == True: break # we don't need to keep searching if we've found a match
return matched

Remove Last instance of a character and rest of a string

If I have a string as follows:
foo_bar_one_two_three
Is there a clean way, with RegEx, to return: foo_bar_one_two?
I know I can use split, pop and join for this, but I'm looking for a cleaner solution.
result = my_string.rsplit('_', 1)[0]
Which behaves like this:
>>> my_string = 'foo_bar_one_two_three'
>>> print(my_string.rsplit('_', 1)[0])
foo_bar_one_two
See in the documentation entry for str.rsplit([sep[, maxsplit]]).
One way is to use rfind to get the index of the last _ character and then slice the string to extract the characters up to that point:
>>> s = "foo_bar_one_two_three"
>>> idx = s.rfind("_")
>>> if idx >= 0:
... s = s[:idx]
...
>>> print s
foo_bar_one_two
You need to check that the rfind call returns something greater than -1 before using it to get the substring otherwise it'll strip off the last character.
If you must use regular expressions (and I tend to prefer non-regex solutions for simple cases like this), you can do it thus:
>>> import re
>>> s = "foo_bar_one_two_three"
>>> re.sub('_[^_]*$','',s)
'foo_bar_one_two'
Similar the the rsplit solution, rpartition will also work:
result = my_string.rpartition("_")[0]
You'll need to watch out for the case where the separator character is not found. In that case the original string will be in index 2, not 0.
doc string:
rpartition(...)
S.rpartition(sep) -> (head, sep, tail)
Search for the separator sep in S, starting at the end of S, and return
the part before it, the separator itself, and the part after it. If the
separator is not found, return two empty strings and S.
Here is a generic function to remove everything after the last occurrence of any specified string. For extra credit, it also supports removing everything after the nth last occurrence.
def removeEverythingAfterLast (needle, haystack, n=1):
while n > 0:
idx = haystack.rfind(needle)
if idx >= 0:
haystack = haystack[:idx]
n -= 1
else:
break
return haystack
In your case, to remove everything after the last '_', you would simply call it like this:
updatedString = removeEverythingAfterLast('_', yourString)
If you wanted to remove everything after the 2nd last '_', you would call it like this:
updatedString = removeEverythingAfterLast('_', yourString, 2)
I know is python, and my answer may be a little bit wrong in syntax, but in java you would do:
String a = "foo_bar_one_two_three";
String[] b = a.split("_");
String c = "";
for(int i=0; i<b.length-1; a++){
c += b[i];
if(i != b.length-2){
c += "_";
}
}
//and at this point, c is "foo_bar_one_two"
Hope in python split function works same way. :)
EDIT:
Using the limit part of the function you can do:
String a = "foo_bar_one_two_three";
String[] b = a.split("_",StringUtils.countMatches(a,"_"));
//and at this point, b is the array = [foo,bar,one,two]

Find index of last occurrence of a substring in a string

I want to find the position (or index) of the last occurrence of a certain substring in given input string str.
For example, suppose the input string is str = 'hello' and the substring is target = 'l', then it should output 3.
How can I do this?
Use .rfind():
>>> s = 'hello'
>>> s.rfind('l')
3
Also don't use str as variable name or you'll shadow the built-in str().
You can use rfind() or rindex()
Python2 links: rfind() rindex()
>>> s = 'Hello StackOverflow Hi everybody'
>>> print( s.rfind('H') )
20
>>> print( s.rindex('H') )
20
>>> print( s.rfind('other') )
-1
>>> print( s.rindex('other') )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
The difference is when the substring is not found, rfind() returns -1 while rindex() raises an exception ValueError (Python2 link: ValueError).
If you do not want to check the rfind() return code -1, you may prefer rindex() that will provide an understandable error message. Else you may search for minutes where the unexpected value -1 is coming from within your code...
Example: Search of last newline character
>>> txt = '''first line
... second line
... third line'''
>>> txt.rfind('\n')
22
>>> txt.rindex('\n')
22
Use the str.rindex method.
>>> 'hello'.rindex('l')
3
>>> 'hello'.index('l')
2
Not trying to resurrect an inactive post, but since this hasn't been posted yet...
(This is how I did it before finding this question)
s = "hello"
target = "l"
last_pos = len(s) - 1 - s[::-1].index(target)
Explanation: When you're searching for the last occurrence, really you're searching for the first occurrence in the reversed string. Knowing this, I did s[::-1] (which returns a reversed string), and then indexed the target from there. Then I did len(s) - 1 - the index found because we want the index in the unreversed (i.e. original) string.
Watch out, though! If target is more than one character, you probably won't find it in the reversed string. To fix this, use last_pos = len(s) - 1 - s[::-1].index(target[::-1]), which searches for a reversed version of target.
Try this:
s = 'hello plombier pantin'
print (s.find('p'))
6
print (s.index('p'))
6
print (s.rindex('p'))
15
print (s.rfind('p'))
For this case both rfind() and rindex() string methods can be used, both will return the highest index in the string where the substring is found like below.
test_string = 'hello'
target = 'l'
print(test_string.rfind(target))
print(test_string.rindex(target))
But one thing should keep in mind while using rindex() method, rindex() method raises a ValueError [substring not found] if the target value is not found within the searched string, on the other hand rfind() will just return -1.
The more_itertools library offers tools for finding indices of all characters or all substrings.
Given
import more_itertools as mit
s = "hello"
pred = lambda x: x == "l"
Code
Characters
Now there is the rlocate tool available:
next(mit.rlocate(s, pred))
# 3
A complementary tool is locate:
list(mit.locate(s, pred))[-1]
# 3
mit.last(mit.locate(s, pred))
# 3
Substrings
There is also a window_size parameter available for locating the leading item of several items:
s = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
substring = "chuck"
pred = lambda *args: args == tuple(substring)
next(mit.rlocate(s, pred=pred, window_size=len(substring)))
# 59
Python String rindex() Method
Description
Python string method rindex() returns the last index where the substring str is found, or raises an exception if no such index exists, optionally restricting the search to string[beg:end].
Syntax
Following is the syntax for rindex() method −
str.rindex(str, beg=0 end=len(string))
Parameters
str − This specifies the string to be searched.
beg − This is the starting index, by default its 0
len − This is ending index, by default its equal to the length of the string.
Return Value
This method returns last index if found otherwise raises an exception if str is not found.
Example
The following example shows the usage of rindex() method.
Live Demo
!/usr/bin/python
str1 = "this is string example....wow!!!";
str2 = "is";
print str1.rindex(str2)
print str1.index(str2)
When we run above program, it produces following result −
5
2
Ref: Python String rindex() Method
- Tutorialspoint
If you don't wanna use rfind then this will do the trick/
def find_last(s, t):
last_pos = -1
while True:
pos = s.find(t, last_pos + 1)
if pos == -1:
return last_pos
else:
last_pos = pos
# Last Occurrence of a Character in a String without using inbuilt functions
str = input("Enter a string : ")
char = input("Enter a character to serach in string : ")
flag = 0
count = 0
for i in range(len(str)):
if str[i] == char:
flag = i
if flag == 0:
print("Entered character ",char," is not present in string")
else:
print("Character ",char," last occurred at index : ",flag)
you can use rindex() function to get the last occurrence of a character in string
s="hellloooloo"
b='l'
print(s.rindex(b))
str = "Hello, World"
target='l'
print(str.rfind(target) +1)
or
str = "Hello, World"
flag =0
target='l'
for i,j in enumerate(str[::-1]):
if target == j:
flag = 1
break;
if flag == 1:
print(len(str)-i)

Examples for string find in Python

I am trying to find some examples but no luck. Does anyone know of some examples on the net? I would like to know what it returns when it can't find, and how to specify from start to end, which I guess is going to be 0, -1.
I'm not sure what you're looking for, do you mean find()?
>>> x = "Hello World"
>>> x.find('World')
6
>>> x.find('Aloha');
-1
you can use str.index too:
>>> 'sdfasdf'.index('cc')
Traceback (most recent call last):
File "<pyshell#144>", line 1, in <module>
'sdfasdf'.index('cc')
ValueError: substring not found
>>> 'sdfasdf'.index('df')
1
From the documentation:
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.
So, some examples:
>>> my_str = 'abcdefioshgoihgs sijsiojs '
>>> my_str.find('a')
0
>>> my_str.find('g')
10
>>> my_str.find('s', 11)
15
>>> my_str.find('s', 15)
15
>>> my_str.find('s', 16)
17
>>> my_str.find('s', 11, 14)
-1
Honestly, this is the sort of situation where I just open up Python on the command line and start messing around:
>>> x = "Dana Larose is playing with find()"
>>> x.find("Dana")
0
>>> x.find("ana")
1
>>> x.find("La")
5
>>> x.find("La", 6)
-1
Python's interpreter makes this sort of experimentation easy. (Same goes for other languages with a similar interpreter)
If you want to search for the last instance of a string in a text, you can run rfind.
Example:
s="Hello"
print s.rfind('l')
output: 3
*no import needed
Complete syntax:
stringEx.rfind(substr, beg=0, end=len(stringEx))
find( sub[, start[, end]])
Return the lowest index in the string where substring sub is found, such that sub is contained in the range [start, end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.
From the docs.
Try
myString = 'abcabc'
myString.find('a')
This will give you the index!!!
Try this:
with open(file_dmp_path, 'rb') as file:
fsize = bsize = os.path.getsize(file_dmp_path)
word_len = len(SEARCH_WORD)
while True:
p = file.read(bsize).find(SEARCH_WORD)
if p > -1:
pos_dec = file.tell() - (bsize - p)
file.seek(pos_dec + word_len)
bsize = fsize - file.tell()
if file.tell() < fsize:
seek = file.tell() - word_len + 1
file.seek(seek)
else:
break
if x is a string and you search for y which also a string their is two cases :
case 1: y is exist in x so x.find(y) = the index (the position) of the y in x .
case 2: y is not exist so x.find (y) = -1 this mean y is not found in x.

Categories