Python: replace an exact matching substring with variable - python

I have a list of strings like 'cdbbdbda', 'fgfghjkbd', 'cdbbd' etc. I have also a variable fed from another list of strings. What I need is to replace a substring in the first list's strings, say b by z, only if it is preceeded by a substring from the variable list, all the other occurrences being intouched.
What I have:
a = ['cdbbdbda', 'fgfghjkbd', 'cdbbd']
c = ['d', 'f', 'l']
What I do:
for i in a:
for j in c:
if j+'b' in i:
i = re.sub('b', 'z', i)
What I need:
'cdzbdzda'
'fgfghjkbd'
'cdzbd'
What I get:
'cdzzdzda'
'fgfghjkbd'
'cdzzd'
all instances of 'b' are replaced.
I'm new in it, any help is very welcome. Looking for answer at Stackoverflow I have found many solutions with regex based on word boundaries or with re either with str.replace based on count, but I can't use it as the lenght of the string and number of occurrences of 'b' can vary.

I think if you include j in the find and replace, you'll get what you want.
>>> for i in a:
... for j in c:
... i = re.sub(j+'b', j+'z', i)
... print i
...
cdzbdzda
fgfghjkbd
cdzbd
>>>
I added print i because your loop doesn't make in-place changes, so without that output, it's not possible to see what replacements were made.

You should simply use regular expressions with a positive lookbehind assertion.
Like this:
import re
for i in a:
for j in c:
i = re.sub('(?<=' + j + ')b', 'z', i)
The base case is:
re.sub('(?<=d)b', 'z', 'cdbbdbda')

You can use a list comprehension:
import re
a = ['cdbbdbda', 'fgfghjkbd', 'cdbbd']
c = ['d', 'f', 'l']
new_a = [re.sub('|'.join('(?<={})b'.format(i) for i in c), 'z', b) for b in a]
Output:
['cdzbdzda', 'fgfghjkbd', 'cdzbd']

Related

Python: How to sort the letters in a string alphabetically keeping distinction between uppercases and lowercases

I am trying to order the words of a string in a particular way: In my code below the output is "MNWdeorwy" but i would like it to be "deMNorWwy" (so i need to keep the letters ordered despite being upper o lowercases)
Could you please help me to understand where I am wrong and why? Thank you
wrd = "MyNewWord"
def order_word(s):
if s == "":
return "Invalid String!"
else:
c = sorted(s)
d = ''.join(sorted(c))
return d
print order_word(wrd)
I would like to precise that my question is different from the following: How to sort the letters in a string alphabetically in Python : in fact, the answers given in the link does not consider the difference between upper and lowercases in a string.
sorted() sorts based off of the ordinal of each character. Capital letters have ordinals that are lower than all lowercase letters. If you want different behavior, you'll need to define your own key:
c = sorted(s, key=lambda c: (c.lower(), c.islower()))
That way, c would be sorted by ('c', 1) and C is sorted by ('c', 0). Both come before ('d', ...) or ('e', ...) etc., but the capital C is earlier (lower) than the lowercase c.
By the way, you shouldn't say d = "".join(sorted(c)) because c has already been sorted. Just do d = "".join(c)
If I understand correctly your requirements, you want to sort a string
without changing the case of letters
as if all the letters have the same case
this can be achieved, e.g.,
In [44]: a = 'zWea'
In [45]: sorted(a,key=lambda c:c.upper())
Out[45]: ['a', 'e', 'W', 'z']
In [46]:
that works because you transform momentarily individual characters during a comparison.
Forgot to mention, you can mix non-alphabetical chars in your string, but a few characters are placed between upper and lower case alphabetical chars (e.g., the ^ caret), so what you get depends on using .lower() or .upper() method of strings,
In [56]: sorted('abCD^',key=lambda c:c.lower())
Out[56]: ['^', 'a', 'b', 'C', 'D']
In [57]: sorted('abCD^',key=lambda c:c.upper())
Out[57]: ['a', 'b', 'C', 'D', '^']
In [58]:
You can also try like this
import re
def natural_sort(wrd):
convert = lambda text: int(text) if text.isdigit() else text.lower()
final = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return ''.join(sorted(wrd, key = final))
Output:
>>> natural_sort(wrd)
'deMNorwWy'
OR
You can do with third party library for this on PyPI called natsort
https://pypi.python.org/pypi/natsort

Remove duplicates but retain sequence

I'm trying to reduce a string with duplicates however I do not want to create a set. For example
mystring = 'TTTTTPPPTPTTTTPPPPPPPPP'
The sequence of the letters is 'TPTPTP', so I need a resulting string of
newstring = 'TPTPTP'
I'm sure there is an easy one-liner but its evading me
You're looking for itertools.groupby.
>>> mystring = 'TTTTTPPPTPTTTTPPPPPPPPP'
>>> groups = [x for x, y in itertools.groupby(mystring)]
>>> groups
['T', 'P', 'T', 'P', 'T', 'P']
>>> ''.join(groups)
TPTPTP
Official documentation
zip each character with the one before and take those which are different:
>>> a
'TTTTTPPPTPTTTTPPPPPPPPP'
>>> ''.join(i for i, j in zip(a, '\0' + a) if i != j)
'TPTPTP'
You can also use regular expressions if you feel like it.
>>> import re
>>> mystring = 'TTTTTPPPTPTTTTPPPPPPPPP'
>>> ''.join(re.findall(r'(.)\1*', mystring))
'TPTPTP'
That looks for any character, followed by the same found character zero or more times.

Split a string in Python having parenthesis (multiple splitters)

I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]

match the pattern at the end of a string?

Imagine I have the following strings:
['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
Where the "c" string has important sub-categories (L1, L2, L3). These indicate special data for our purposes that have been generated in a program based a pre-designated string "L". In other words, I know that the special entries should have the form:
name_Lnumber
Knowing that I'm looking for this pattern, and that I am using "L" or more specifically "_L" as my designation of these objects, how could I return a list of entries that meet this condition? In this case:
['c', 'e']
Use a simple filter:
>>> l = ['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
>>> filter(lambda x: "_L" in x, l)
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Alternatively, use a list comprehension
>>> [s for s in l if "_L" in s]
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Since you need the prefix only, you can just split it:
>>> set(s.split("_")[0] for s in l if "_L" in s)
set(['c', 'e'])
you can use the following list comprehension :
>>> set(i.split('_')[0] for i in l if '_L' in i)
set(['c', 'e'])
Or if you want to match the elements that ends with _L(digit) and not something like _Lm you can use regex :
>>> import re
>>> set(i.split('_')[0] for i in l if re.match(r'.*?_L\d$',i))
set(['c', 'e'])

Python: Adding a word from a stringlist to another stringlist

I'm trying to add whole words from one string to another if they contain a certain character:
mylist = ["hahah", "hen","cool", "breaker", "when"]
newlist = []
for word in mylist:
store = word #stores current string
if 'h' in word: #splits string into characters and searches for 'h'
newlist += store #adds whole string to list
print newlist
the result I expect is:
newlist = ["hahah","hen","when"]
but instead I'm getting:
newlist = ['h', 'a', 'h', 'a', 'h', 'h', 'e', 'n', 'w', 'h', 'e', 'n']
How do I get my expected result?
Use append [docs]:
newlist.append(store)
Or shorter (using list comprehension [docs]):
newlist = [word for word in mylist if 'h' in word]
Why does newlist += store not work?
This is the same as newlist = newlist + store and is extending the existing list (on left side) by all the items in the sequence [docs] on the right side. If you follow the documentation, you will find this:
s + t the concatenation of s and t
In Python, not only lists are sequences, but strings are too (a sequence of characters). That means every item of the sequence (→ every character) is appended to the list.
Out of interest I decided to see which of the three solutions (the loop, the list comprehension and the filter() function) was the quickest. My test code and the results are below for anybody else who is interested.
Initialisation
>>> import timeit
>>> num_runs = 100000
>>> setup_statement = 'mylist = ["hahah", "hen","cool", "breaker", "when"]'
Loop
>>> loop_statement = """
newlist = []
for word in mylist:
if 'h' in word:
newlist.append(word)"""
>>> timeit.timeit(loop_statement, setup_statement, number=num_runs) / num_runs
4.3187308311462406e-06
List comprehension
>>> list_statement = "newlist = [word for word in mylist if 'h' in word]"
>>> timeit.timeit(list_statement, setup_statement, number=num_runs) / num_runs
2.9228806495666502e-06
Filter call
>>> filter_statement = """
filt = lambda x: "h" in x
newlist = filter(filt, mylist)"""
>>> timeit.timeit(filter_statement, setup_statement, number=num_runs) / num_runs
7.2317290306091313e-06
Results
List comprehension at 2.92us
Loop at 4.32us (48% slower than the list comprehension)
Filter call at 7.23us (148% slower than the list comprehension)
Another alternate syntax for expressing this is to use filter. Thus, an implementation for your problem would look something like
filt = lambda x: 'h' in x
newlist1 = filter(filt, mylist)
try to use:
newlist.append(store)

Categories