Python / Regex: exclude everything except one thing - python

Suppose I have these strings:
a = "hello"
b = "-hello"
c = "-"
d = "hell-o"
e = " - "
How do I match only the -(String C)? I've tried a if "-" in something but obviously that isn't correct. Could someone please advise?
Let's say we put these strings into a list, looped through and all I wanted to extract was C. How would I do this?
for aa in list1:
if not re.findall('[^-$]'):
print aa
Would that be too messy?

If you want to match only variable c:
if '-' == something:
print 'hurray!'
To answer the updates: yes, that would be too messy. You don't need regex there. Simple string methods are faster:
>>> lst =["hello", "-hello", "-", "hell-o"," - "]
>>> for i, item in enumerate(lst):
if item == '-':
print(i, item)
2 -

as a regex its "^-$"

If what you're trying to do is strip out the dash (i.e. he-llo gives hello), then this is more of a job for generator expressions.
''.join((char for char in 'he-llo' if char != '-'))

if "-" in c and len(c) ==1 : print "do something"
OR
if c=="-"

Related

Remove nonalpha, edit, add nonalpha

i want to edit a string after removing nonalpha chars, then move them in the proper place again.
eg:
import re
string = input()
alphastring = re.sub('[^0-9a-zA-Z]+', '', string)
alphastring = alphastring[::2]
i want it to be as following:
string = "heompuem ykojua'rje awzeklvl."
alphastring = heompuemykojuarjeawzeklvl
alphastring = hopeyourewell
?????? = hope you're well.
I tried fixing the problem with different sollutions but none give me the right output. I resorted to using RegEx which I'm not very familiar with.
Any help would be greatly welcomed.
I can't think of any reasonable way to do this with a regexp. Just use a loop that copies from input to output, skipping every other alphanumeric character.
copy_flag = True
string = input()
alphastring = ''
for c in string:
if c.isalnum():
if copy_flag:
alphastring += c
copy_flag = not copy_flag
else:
alphastring += c
Here is an approach:
string = "heompuem ykojua'rje awzeklvl."
result, i = "", 0
for c, alph in ((c, c.isalnum()) for c in string)
if not (alph and i%2): # skip odd alpha-numericals
result += c
i += alph # only increment counter for alpha-numericals
result
# "hope you're well."

Find multiple elements in string in Python

my problem is that I need to find multiple elements in one string.
For example I got one string that looks like this:
line = if ((var.equals("INPUT")) || (var.equals("OUTPUT"))
and then i got this code to find everything between ' (" ' and ' ") '
char1 = '("'
char2 = '")'
add = line[line.find(char1)+2 : line.find(char2)]
list.append(add)
The current result is just:
['INPUT']
but I need the result to look like this:
['INPUT','OUTPUT', ...]
after it got the first match it stopped searching for other matches, but I need to find everything in that string that matches this search.
I also need to append every single match to the list.
The simplest:
>>> import re
>>> s = """line = if ((var.equals("INPUT")) || (var.equals("OUTPUT"))"""
>>> r = re.compile(r'\("(.*?)"\)')
>>> r.findall(s)
['INPUT', 'OUTPUT']
The trick is to use .*? which is a non-greedy *.
You should look into regular expressions because that's a perfect fit for what you're trying to achieve.
Let's examine a regular expression that does what you want:
import re
regex = re.compile(r'\("([^"]+)"\)')
It matches the string (" then captures anything that isn't a quotation mark and then matches ") at the end.
By using it with findall you will get all the captured groups:
In [1]: import re
In [2]: regex = re.compile(r'\("([^"]+)"\)')
In [3]: line = 'if ((var.equals("INPUT")) || (var.equals("OUTPUT"))'
In [4]: regex.findall(line)
Out[4]: ['INPUT', 'OUTPUT']
If you don't want to use regex, this will help you.
line = 'if ((var.equals("INPUT")) || (var.equals("OUTPUT"))'
char1 = '("'
char2 = '")'
add = line[line.find(char1)+2 : line.find(char2)]
list.append(add)
line1=line[line.find(char2)+1:]
add = line1[line1.find(char1)+2 : line1.find(char2)]
list.append(add)
print(list)
just add those 3 lines in your code, and you're done
if I understand you correct, than something like that is help you:
line = 'line = if ((var.equals("INPUT")) || (var.equals("OUTPUT"))'
items = []
start = 0
end = 0
c = 0;
while c < len(line):
if line[c] == '(' and line[c + 1] == '"':
start = c + 2
if line[c] == '"' and line[c + 1] == ')':
end = c
if start and end:
items.append(line[start:end])
start = end = None
c += 1
print(items) # ['INPUT', 'OUTPUT']

Pattern search by NOT using Regex algorithm and code in python

Today I had an interview at AMD and was asked a question which I didn't know how to solve it without Regex. Here is the question:
Find all the pattern for the word "Hello" in a text. Consider that there is only ONE char can be in between letters of hello e.g. search for all instances of "h.ello", "hell o", "he,llo", or "hel!lo".
Since you also tagged this question algorithm, I'm just going to show the general approach that I would take when looking at this question, without including any language tricks from python.
1) I would want to split the string into a list of words
2) Loop through each string in the resulting list, checking if the string matches 'hello' without the character at the current index (or if it simply matches 'hello')
3) If a match is found, return it.
Here is a simple approach in python:
s = "h.ello hello h!ello hell.o none of these"
all = s.split()
def drop_one(s, match):
if s == match:
return True # WARNING: Early Return
for i in range(len(s) - 1):
if s[:i] + s[i+1:] == match:
return True
matches = [x for x in all if drop_one(x, "hello")]
print(matches)
The output of this snippet:
['h.ello', 'hello', 'h!ello', 'hell.o']
This should work. I've tried to make it generic. You might have to make some minor adjustments. Let me know if you don't understand any part.
def checkValidity(tlist):
tmpVar = ''
for i in range(len(tlist)):
if tlist[i] in set("hello"):
tmpVar += tlist[i]
return(tmpVar == 'hello')
mStr = "he.llo hehellbo hellox hell.o hello helloxy abhell.oyz"
mWord = "hello"
mlen = len(mStr)
wordLen = len(mWord)+1
i=0
print ("given str = ", mStr)
while i<mlen:
tmpList = []
if mStr[i] == 'h':
for j in range(wordLen):
tmpList.append(mStr[i+j])
validFlag = checkValidity(tmpList)
if validFlag:
print("Match starting at index: ",i, ':', mStr[i:i+wordLen])
i += wordLen
else:
i += 1
else:
i += 1

Most Pythonic and efficient way to insert character at end of string if not already there

I have a string:
b = 'week'
I want to check if the last character is an "s". If not, append an "s".
Is there a Pythonic one-liner for this one?
You could use a conditional expression:
b = b + 's' if not b.endswith('s') else b
Personally, I'd still stick with two lines, however:
if not b.endswith('s'):
b += 's'
def pluralize(string):
if string:
if string[-1] != 's':
string += 's'
return string
Shortest way possible:
b = b.rstrip('s') + 's'
But I would write like this:
b = ''.join((b.rstrip('s'), 's'))
b = b + 's' if b[-1:] != 's' else b
I know this is an old post but in one line you could write:
b = '%ss' % b.rstrip('s')
Example
>>> string = 'week'
>>> b = '%ss' % string.rstrip('s')
>>> b
'weeks'
Another solution:
def add_s_if_not_already_there (string):
return string + 's' * (1 - string.endswith('s'))
I would still stick with the two liner but I like how 'arithmetic' this feels.

Python Regex use during specific string search in a text file

I have to find an expression in a text file like : StartTime="4/11/2013 8:11:20:965" and EndTime="4/11/2013 8:11:22:571"
So I used the regex expression
r'(\w)="(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}:\d{2,3})"'
Thanks again to eumiro for his help earlier (Retrieve randomly preformatted text from Text File)
But I can't find anything in my file, and I checked it was there.
I can't go trhough 'GetDuration lvl 1' with it actually.
I tried to simplify my regex as r'(\d)', and it worked to lvl 4, so I thought it could be and issue with eventually protected " but I didn't see anything about this in python doc.
What am I missing ?
Regular_Exp = r'(\w)="(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}:\d{2,3})"'
def getDuration(timeCode1, timeCode2)
duration =0
c = ''
print 'GetDuration lvl 0'
for c in str(timeCode1) :
m = re.search(Regular_Exp, c)
print 'GetDuration lvl 1'
if m:
print 'GetDuration lvl 2'
for text in str(timeCode2) :
print 'GetDuration lvl 3'
n = re.search(Regular_Exp, c)
if n:
print 'GetDuration lvl 4'
timeCode1Split = timeCode1.split(' ')
timeCode1Date = timeCode1Split[0].split('/')
timeCode1Heure = timeCode1Split[1].split(':')
timeCode2Split = timeCode2.split(' ')
timeCode2Date = timeCode2Split[0].split('/')
timeCode2Heure = timeCode2Split[1].split(':')
timeCode1Date = dt.datetime(timeCode1Date[0], timeCode1Date[1], timeCode1Date[2], timeCode1Heure[0], timeCode1Heure[0], timeCode1Heure[0], tzinfo=utc)
timeCode2Date = dt.datetime(timeCode2Date[0], timeCode2Date[1], timeCode2Date[2], timeCode2Heure[0], timeCode2Heure[0], timeCode2Heure[0], tzinfo=utc)
print 'TimeCode'
print timeCode1Date
print timeCode2Date
duration += timeCode1Date - timeCode2Date
return duration
for c in str(timeCode1) :
m = re.search(Regular_Exp, c)
...
for x in str(something) means you're iterating something character by character (one character=1 length str at a time), and no regex can match with that.
Maybe this exp should help:
"(\w+?)=\"(.+?)\""
TO use:
>>> string = u'StartTime="4/11/2013 8:11:20:965" and EndTime="4/11/2013 8:11:22:571"'
>>> regex = re.compile("(\w+?)=\"(.+?)\"")
# Run findall
>>> regex.findall(string)
[(u'StartTime', u'4/11/2013 8:11:20:965'), (u'EndTime', u'4/11/2013 8:11:22:571')]
Also, for c in str(timeCode1), try printing c, you are going one character at a time, not a good idea with regex..

Categories