How can I find a character with specific criteria? - python

I want to loop through a string and find a character that is not a letter or number or _ . #. This my code:
mystr = "saddas das"
for x in range(0, len(mystr)):
if not(mystr[x].isdigit() or mystr[x].isalpha or mystr[x]=="#" or mystr[x]=="_" or mystr[x]=="."):
print (x)
Unfortunately it doen't detect anthing while it should return the index of the space.

for x in range(0, len(mystr)):
if not(mystr[x].isdigit() or mystr[x].isalpha() or mystr[x]=="#" or mystr[x]=="_" or mystr[x]=="."):
print (x)
You forgot to add (): mystr[x].isalpha. To call function you should do mystr[x].isalpha(). mystr[x].isalpha is always evaluated to True, that's why your code doesn't print anything

Use enumerate() wich returns the pos and the character you iterate:
mystr = "saddas das"
for pos,c in enumerate(mystr):
# change your conditions to make it easier to understand, isalpha() helps
if c.isdigit() or c.isalpha() or c in "#_.":
continue # do nothing
else:
print (pos)
Output:
6

Using a regex:
import re
pattern = re.compile('[^\d\w\.#]')
s = "saddas das"
for match in pattern.finditer(s):
print(match.start())
Output
6
The pattern '[^\d\w\.#]' matches everything that is not a digit, not a letter, nor _, . or #.

Related

Extract letters (and a specific number) from a string

I have a list of strings similar to the one below:
l = ['ad2g3f234','4jafg32','fg23g523']
For each string in l, I want to delete every digit (except for 2 and 3 if they appear as 23). So in this case, I want the following outcome:
n = ['adgf23','jafg','fg23g23']
How do I go about this? I tried re.findall like:
w = [re.findall(r'[a-zA-Z]+',t) for t in l]
but it doesn't give my desired outcome.
You can capture 23 in a group, and remove all other digits. In the replacement, use the group which holds 23 if it is there, else replace with an empty string.
import re
l = ['ad2g3f234', '4jafg32', 'fg23g523']
result = [
re.sub(
r"(23)|(?:(?!23)\d)+",
lambda m: m.group(1) if m.group(1) else "", s) for s in l
]
print(result)
Output
['adgf23', 'jafg', 'fg23g23']
Python demo
One way would be just to replace the string twice:
[re.sub("\d", "", i.replace("23", "*")).replace("*", "23") for i in l]
Output:
['adgf23', 'jafg', 'fg23g23']
Use a placeholder with re.sub
l = ['ad2g3f234','4jafg32','fg23g523']
w = [re.sub('#','23',re.sub('\d','',re.sub('23','#',t))) for t in l]
['adgf23', 'jafg', 'fg23g23']
EDIT
As answered by Chris, the approach is the same although string replace will be a better alternative stack_comparison
Using re.sub with function
import re
def replace(m):
if m.group() == '23':
return m.group()
else:
return ''
l = ['ad2g3f234','4jafg32','fg23g523']
w = [re.sub(r'23|\d', replace, x) for x in l]
#w: ['adgf23', 'jafg', 'fg23g23']
Explanation
re.sub(r'23|\d', replace, x)
- checks first for 23, next for a digit
- replace function leaves alone match with 23
- changes match with digit to null string.

how to add a dot before each letter in a string in python

we get a string from user and want to lowercase it and remove vowels and add a '.' before each letter of it. for example we get 'aBAcAba' and change it to '.b.c.b' . two early things are done but i want some help with third one.
str = input()
str=str.lower()
for i in range(0,len(str)):
str=str.replace('a','')
str=str.replace('e','')
str=str.replace('o','')
str=str.replace('i','')
str=str.replace('u','')
print(str)
for j in range(0,len(str)):
str=str.replace(str[j],('.'+str[j]))
print(str)
A few things:
You should avoid the variable name str because this is used by a builtin, so I've changed it to st
In the first part, no loop is necessary; replace will replace all occurrences of a substring
For the last part, it is probably easiest to loop through the string and build up a new string. Limiting this answer to basic syntax, a simple for loop will work.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)
Another potential improvement: for the second part, you can also write a loop (instead of your five separate replace lines):
for c in 'aeiou':
st = st.replace(c, '')
Other possibilities using more advanced techniques:
For the second part, a regular expression could be used:
st = re.sub('[aeiou]', '', st)
For the third part, a generator expression could be used:
st_new = ''.join(f'.{c}' for c in st)
You can use str.join() to place some character in between all the existing characters, and then you can use string concatenation to place it again at the end:
# st = 'bcb'
st = '.' + '.'.join(st)
# '.b.c.b'
As a sidenote, please don't use str as a variable name. It's the name of the "string" datatype, and if you make a variable named it then you can't properly work with other strings any more. string, st, s, etc. are fine, as they're not the reserved keyword str.
z = "aBAcAba"
z = z.lower()
newstring = ''
for i in z:
if not i in 'aeiou':
newstring+='.'
newstring+=i
print(newstring)
Here I have gone step by step, first converting the string to lowercase, then checking if the word is not vowel, then add a dot to our final string then add the word to our final string.
You could try splitting the string into an array and then build a new string with the indexes of the array appending an "."
not too efficient but will work.
thanks to all of you especially allani. the bellow code worked.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)
This does everything.
import re
data = 'KujhKyjiubBMNBHJGJhbvgqsauijuetystareFGcvb'
matches = re.compile('[^aeiou]', re.I).finditer(data)
final = f".{'.'.join([m.group().lower() for m in matches])}"
print(final)
#.k.j.h.k.y.j.b.b.m.n.b.h.j.g.j.h.b.v.g.q.s.j.t.y.s.t.r.f.g.c.v.b
s = input()
s = s.lower()
for i in s:
for x in ['a','e','i','o','u']:
if i == x:
s = s.replace(i,'')
new_s = ''
for i in s:
new_s += '.'+ i
print(new_s)
def add_dots(n):
return ".".join(n)
print(add_dots("test"))
def remove_dots(a):
return a.replace(".", "")
print(remove_dots("t.e.s.t"))

more efficient way to replace items on a list based on a condition

I have the following piece of code. Basically, I'm trying to replace a word if it matches one of these regex patterns. If the word matches even once, the word should be completely gone from the new list. The code below works, however, I'm wondering if there's a way to implement this so that I can indefinitely add more patterns to the 'pat' list without having to write additional if statements within the for loop.
To clarify, my regex patterns have negative lookaheads and lookbehinds to make sure it's one word.
pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
if re.search(pat[0], i):
i = re.sub(pat[0], '', i)
if re.search(pat[1], i):
i = re.sub(pat[1], '', i)
if len(i) > 0:
new.append(i)
x = new
else:
x = x.strip()
Just add another for loop:
for patn in pat:
if re.search(patn, i):
i = re.sub(patn, '', i)
if i:
new.append(i)
pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
for p in pat:
i = re.sub(p, '', i)
if len(i) > 0:
new.append(i)
x = new
else:
x = x.strip()
Add another loop:
pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
# iterate through pat list
for regx in pat:
if re.search(regx, i):
i = re.sub(regx, '', i)
...
If in your pattern, then changes are only the words, then you can add the words joined with | to make it or. So for your two patterns from the example will become one like below one.
r'(?<![a-z][ ])(?:Pacific|Global)(?![ ])'
If you need to add more words, just add with a pipe. For example (?:word1|word2|word3)
Inside the bracket ?: means do not capture the group.
something like this:
[word for word in l if not any(re.search(p, word) for p in pat)]
I will attempt a guess here; if I am wrong, please skip to the "this is how I'd write it" and modify the code that I provide, according to what you intend to do (which I may have failed to understand).
I am assuming you are trying to eliminate the words "Global" and "Pacific" in a list of phrases that may contain them.
If that is the case, I think your regular expression does not do what you specify. You probably intended to have something like the following (which does not work as-is!):
pat = [r'(?<=[a-z][ ])Pacific(?=[ ])', r'(?<=[a-z][ ])Global(?=[ ])']
The difference is in the look-ahead patterns, which are positive ((?=...) and (?<=...)) instead of negative ((?!...) and (?<!...)).
Furthermore, writing your regular expressions like this will not always correctly eliminate white space between your words.
This is how I'd write it:
words = ['Pacific', 'Global']
pat = "|".join(r'\b' + word + r'\b\s*' for word in words)
if isinstance(x, str):
x = x.strip() # I don't understand why you don't sub here, anyway!
else:
x = [s for s in (re.sub(pat, '', s) for s in x) if s != '']
In the regular expression for patterns, notice (a) \b, standing for "the empty string, but only at the beginning or end of a word" (see the manual), (b) the use of | for separating alternative patterns, and (c) \s, standing for "characters considered whitespace". The latter is what takes care of correctly removing unnecessary space after each eliminated word.
This works correctly in both Python 2 and Python 3. I think the code is much clearer and, in terms of efficiency, it's best if you leave re to do its work instead of testing each pattern separately.
Given:
x = ["from Global a to Pacific b",
"Global Pacific",
"Pacific Global",
"none",
"only Global and that's it"]
this produces:
x = ['from a to b', 'none', "only and that's it"]

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.
So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:
matchPat ( 'a*t*r', 'anteaters' ) : True
matchPat ( 'a*t*r', 'albatross' ) : True
matchPat ( 'a*t*r', 'artist' ) : False
My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.
def matchPat(str1,str2):
## str(*)==str(=>1)
if str1=='':
return True
elif str2=='':
return False
elif str1[0]==str2[0]:
return matchPat(str1[2],str2[len(str1)-1])
else: return True
Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.
You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].
You can find the offset of next occurrence of a substring in a string using the string's find method.
So the problem of wildcard matching becomes:
split the pattern into parts which were separated by astrix
for each part of the pattern
can we find this after the previous part's locations?
You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!
There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.
I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.
If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.
Since this is homework I am hoping that gives you enough of an idea to move on.
The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.
Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-
def matchPat(str1, str2):
index1 = 0
index2 = 0
while index1 < len(str1):
c = str1[index1]
#Check if the str2 has run it's course.
if index2 >= len(str2):
#This needs to be checked,assuming matchPatch("*", "") to be true
if(len(str2) == 0 and str1 == "*"):
return True
return False
#If c is not "*", then it's normal comparision.
if c != "*":
if c != str2[index2]:
return False
index2 += 1
#If c is "*", then you need to increment str1,
#search for the next value in str2,
#and update index2
else:
index1 += 1
if(index1 == len(str1)):
return True
c = str1[index1]
#Search the character in str2
i = str2.find(c, index2)
#If search fails, return False
if(i == -1):
return False
index2 = i + 1
index1 += 1
return True
OUTPUT -
print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False
Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.
needles = needle.split('*')
Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.
In pseudocode, it'll look like this:
needles = needle.split('*')
i = 0
loop through all strings in needles:
if current needle not in haystack[i:], return false
increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true
Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:
import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' )) # False
And if asterisks are a strict necessity, you can use regular expressions for that, too:
newstr = re.sub('\*', '.', 'a*t*r') # Replace * with .
bool(re.search(newstr, 'anteasters')) # Search using the new string
If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:
def matchpat(str1, str2):
if len(str1) > len(str2): return False #Can't match if the first string is longer
for i in range(0, len(str2)-len(str1)+1):
substring = str2[i:i+len(str1)] # create substring of same length as first string
for j in range(0, len(str1)):
matched = False # assume False until match is found
if str1[j] != '*' and str1[j] != substring[j]: # check each character
break
matched = True
if matched == True: break # we don't need to keep searching if we've found a match
return matched

For string with lower then UPPER . Give position when UPPER starts . Python

If i have a string
ex = 'aaatttgggatgaATG'
and I want to find the index where the lowercase ends
so in this case it would be
indx_lower = 13
how would i get that value ?
would I have to do a for loop where i checked the boolean for each element in the string ?
like this ?
total_indx = range(0,len(ex))
for p,k in zip(ex,total_indx):
if upper print k ?
ya i don't know how i would do this . . .
next(i for i,j in enumerate(ex) if j.isupper()) - 1
The best way is not to use a for loop:
>>> print re.search("[A-Z]", "aaatttgggatgaATG").start()
13
re.search() returns a MatchObject object, and you can ask where it begins by calling its start() method. (But if there is no match, re.search() will return None).
You can use re
import re
ex = 'aaatttgggatgaATG'
print ex.index(re.search('[A-Z]', ex).group())
for x in range(0, len(ex)):
if ex[x].isupper():
print x
break
>>> import re
>>> ex = 'aaatttgggatgaATG'
>>> re.search("[A-Z]", ex).start()
13

Categories