I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]
Related
I have the following string:
bar = 'F9B2Z1F8B30Z4'
I have a function foo that splits the string on F, then adds back the F delimiter.
def foo(my_str):
res = ['F' + elem for elem in my_str.split('F') if elem != '']
return res
This works unless there are two "F"s back-to-back in the string. For example,
foo('FF9B2Z1F8B30Z4')
returns
['F9B2Z1', 'F8B30Z4']
(the double "F" at the start of the string is not processed)
I'd like the function to split on the first "F" and add it to the list, as follows:
['F', 'F9B2Z1', 'F8B30Z4']
If there is a double "F" in the middle of the string, then the desired behavior would be:
foo('F9B2Z1FF8B30Z4')
['F9B2Z1', 'F', 'F8B30Z4']
Any help would be greatly appreciated.
Instead of the filtering if, use slicing instead because an empty string is a problem only at the beginning:
def foo(my_str):
res = ['F' + elem for elem in my_str.split('F')]
return res[1:] if my_str and my_str[0]=='F' else res
Output:
>>> foo('FF9B2Z1F8B30Z4')
['F', 'F9B2Z1', 'F8B30Z4']
>>> foo('FF9B2Z1FF8B30Z4FF')
['F', 'F9B2Z1', 'F', 'F8B30Z4', 'F', 'F']
>>> foo('9B2Z1F8B30Z4')
['F9B2Z1', 'F8B30Z4']
>>> foo('')
['F']
Using regex it can be done with
import re
pattern = r'^[^F]+|(?<=F)[^F]*'
The ^[^F]+ captures all characters at the beginning of strings that do not start with F.
(?<=F)[^F]* captures anything following an F so long as it is not an F character including empty matches.
>>> print(['F' + x for x in re.findall(pattern, 'abcFFFAFF')])
['Fabc', 'F', 'F', 'FA', 'F', 'F']
>>> print(['F' + x for x in re.findall(pattern, 'FFabcFA')])
['F', 'Fabc', 'FA']
>>> print(['F' + x for x in re.findall(pattern, 'abc')])
['Fabc']
Note that this returns nothing for empty strings. If empty strings need to return ['F'] then pattern can be changed to pattern = r'^[^F]+|(?<=F)[^F]*|^$' adding ^$ to capture empty strings.
I need a regex, which split input string to list with next rules:
1) By dot;
2) Do not split expression if it is in quotes.
Examples:
'a.b.c' -> ['a', 'b', 'c'];
'a."b.c".d' -> ['a', 'b.c', 'd'];
'a.'b.c'.d' -> ['a', 'b.c', 'd'];
'a.'b c'.d' -> ['a', 'b c', 'd'];
You could leverage the newer regex module with the following expression:
(["']).*?\1(*SKIP)(*FAIL)|\.
This captures quotes, match them up to the next quote and let the matched part fail. The alternation is the dot.
In Python:
import regex as re
data = """
a.b.c
a."b.c".d
a.'b.c'.d
a.'b c'.d
"""
rx = re.compile(r"""(["']).*?\1(*SKIP)(*FAIL)|\.""")
for line in data.split("\n"):
if line:
parts = [part.strip("'").strip('"') for part in rx.split(line) if part]
print(parts)
Which yields
['a', 'b', 'c']
['a', 'b.c', 'd']
['a', 'b.c', 'd']
['a', 'b c', 'd']
See a demo on regex101.com.
If you want to stick with the re module, you could replace the dot in question before and split by the replacement afterwards.
import re
data = """
a.b.c
a."b.c".d
a.'b.c'.d
a.'b c'.d
"""
rx = re.compile(r"""(["']).*?\1|(?P<dot>\.)""")
needle = "SUPERMAN"
def replacer(match):
if match.group('dot') is not None:
return needle
else:
return match.group(0)
for line in data.split("\n"):
if line:
line = rx.sub(replacer, line)
parts = [part.strip("'").strip('"') for part in line.split(needle) if part]
print(parts)
This yields the exact same output as above. Please note that both approaches won't work for escaped quotes.
You can do it with some extra efforts here how can you do.
First split with '.' and then do some logically work on it.
string_data = 'a."b.c".d'
data = string_data.split('.')
list = []
value = None
for i in range(0,len(data)):
if value:
value = None
else:
if '"' in data[i]:
value = data[i]
value = value + '.' + data[i+1]
if value:
list.append(value)
else:
list.append(data[i])
print(list)
It will give you output same as in your qus.
As an alternative you could try using an or | with a positive lookbehind (?<= and a positive lookahead (?= for the single and double quotes
(?<=").*?(?=")|(?<=').*?(?=')|[a-z]+
regex = r"(?<=\").*?(?=\")|(?<=').*?(?=')|[a-z]+"
line = "a.\"b.t\".qq.d.d.'d'.'d.g.r'.d.d"
print(re.findall(regex, line))
['a', 'b.t', 'qq', 'd', 'd', 'd', '.', 'd.g.r', 'd', 'd']
Test output python
here is a regex for you:
\.?([^\"\'\.]+)|\"(.+)\"|\'(.+)\'\.?
implementation:
import re
regex = re.compile( r"""\.?([^\"\'\.]+)|\"(.+)\"|\'(.+)\'\.?""")
def str2list(string):
b = regex.findall(string)
l = []
for i in list(b):
for j in list(i):
if j:
l.append(j)
return l
str2list('a.b.c')
str2list('a."b.c".d')
str2list("a.'b.c'.d")
output:
['a', 'b', 'c']
['a', 'b.c', 'd']
['a', 'b.c', 'd']
let's say
there's a list
vowels = ['a', 'e', 'i', 'o', 'u']
x = raw_input("Enter something?")
Now how to find instances of these vowels in the x? I want to modify x so that it contains only non vowel letters.
.find won't work.
vowels = {'a', 'e', 'i', 'o', 'u'}
x =input('Enter...')
new_string = ''.join(c for c in x if c not in vowels)
Will create a new copy of x minus the vowels saved as new_string. I have changed vowels to be a set so that look up time is faster (somewhat trivial in this example but it's a good habit to sue where appropriate). Strings are immutable so you can't just take the letters out of x, you have to create a new string that is a copy of x without the values you don't need. .join() puts the whole thing back together.
You can use the count function for each letter. For example x.count('a') would count how many 'a' are in the word. The iterate over all the vowels and use sum to find the total number of vowels.
vowelCount = sum(x.count(vowel) for vowel in vowels)
from collections import Counter
vowels = {'a', 'e', 'i', 'o', 'u'}
s = "foobar"
print(sum(v for k,v in Counter(s).items() if k in vowels))
3
Or use dict.get with a default value of 0:
s = "foobar"
c = Counter(s)
print(sum(c.get(k,0) for k in vowels))
3
You can use like this,
>>> st = 'test test'
>>> len(re.findall('[aeiou]', st, re.IGNORECASE))
2
Or,
>>> vowels = ['a', 'e', 'i', 'o', 'u']
>>> sum(map(lambda x: vowels.count(x) if x in vowels else 0, st))
2
Or,
>>> len([ ch for ch in st if ch in vowels])
2
This question already has answers here:
RegExp match repeated characters
(6 answers)
Closed 8 years ago.
I have string :-
s = 'bubble'
how to use regular expression to get a list like:
['b', 'u', 'bb', 'l', 'e']
I want to filter single as well as double occurrence of a letter.
This should do it:
import re
[m.group(0) for m in re.finditer('(.)\\1*',s)]
For 'bubbles' this returns:
['b', 'u', 'bb', 'l', 'e', 's']
For 'bubblesssss' this returns:
['b', 'u', 'bb', 'l', 'e', 'sssss']
You really have two questions. The first question is how to split the list, the second is how to filter.
The splitting takes advantage of back references in a pattern. In this case we'll construct a pattern the will find one or two occurrences of a letter then construct a list from the search results. The \1 in the code block refers to the first parenthesized expression.
import re
pattern = re.compile(r'(.)\1?')
s = "bubble"
result = [x.group() for x in pattern.finditer(s)]
print(result)
To filter the list stored in result you could use a list comprehension that filters on length.
filtered_result = [x for x in result if len(x) == 2]
print(filtered_result)
You could just get the set of duplications directly by tweaking the regular expression.
pattern2 = re.compile(r'(.)\1')
result2 = [x.group() for x in pattern2.finditer(s)]
print(result2)
The output from running the above is:
['b', 'u', 'bb', 'l', 'e']
['bb']
['bb']
I am generating all possible three letters keywords e.g. aaa, aab, aac.... zzy, zzz below is my code:
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
keywords = []
for alpha1 in alphabets:
for alpha2 in alphabets:
for alpha3 in alphabets:
keywords.append(alpha1+alpha2+alpha3)
Can this functionality be achieved in a more sleek and efficient way?
keywords = itertools.product(alphabets, repeat = 3)
See the documentation for itertools.product. If you need a list of strings, just use
keywords = [''.join(i) for i in itertools.product(alphabets, repeat = 3)]
alphabets also doesn't need to be a list, it can just be a string, for example:
from itertools import product
from string import ascii_lowercase
keywords = [''.join(i) for i in product(ascii_lowercase, repeat = 3)]
will work if you just want the lowercase ascii letters.
You could also use map instead of the list comprehension (this is one of the cases where map is still faster than the LC)
>>> from itertools import product
>>> from string import ascii_lowercase
>>> keywords = map(''.join, product(ascii_lowercase, repeat=3))
This variation of the list comprehension is also faster than using ''.join
>>> keywords = [a+b+c for a,b,c in product(ascii_lowercase, repeat=3)]
from itertools import combinations_with_replacement
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for (a,b,c) in combinations_with_replacement(alphabets, 3):
print a+b+c
You can also do this without any external modules by doing simple calculation.
The PermutationIterator is what you are searching for.
def permutation_atindex(_int, _set, length):
"""
Return the permutation at index '_int' for itemgetter '_set'
with length 'length'.
"""
items = []
strLength = len(_set)
index = _int % strLength
items.append(_set[index])
for n in xrange(1,length, 1):
_int //= strLength
index = _int % strLength
items.append(_set[index])
return items
class PermutationIterator:
"""
A class that can iterate over possible permuations
of the given 'iterable' and 'length' argument.
"""
def __init__(self, iterable, length):
self.length = length
self.current = 0
self.max = len(iterable) ** length
self.iterable = iterable
def __iter__(self):
return self
def __next__(self):
if self.current >= self.max:
raise StopIteration
try:
return permutation_atindex(self.current, self.iterable, self.length)
finally:
self.current += 1
Give it an iterable object and an integer as the output-length.
from string import ascii_lowercase
for e in PermutationIterator(ascii_lowercase, 3):
print "".join(e)
This will start from 'aaa' and end with 'zzz'.
chars = range(ord('a'), ord('z')+1);
print [chr(a) + chr(b) +chr(c) for a in chars for b in chars for c in chars]
We could solve this without the itertools by utilizing two function definitions:
def combos(alphas, k):
l = len(alphas)
kRecur(alphas, "", l, k)
def KRecur(alphas, prfx, l, k):
if k==0:
print(prfx)
else:
for i in range(l):
newPrfx = prfx + alphas[i]
KRecur(alphas, newPrfx, l, k-1)
It's done using two functions to avoid resetting the length of the alphas, and the second function self-iterates itself until it reaches a k of 0 to return the k-mer for that i loop.
Adopted from a solution by Abhinav Ramana on Geeks4Geeks
Well, i came up with that solution while thinking about how to cover that topic:
import random
s = "aei"
b = []
lenght=len(s)
for _ in range(10):
for _ in range(length):
password = ("".join(random.sample(s,length)))
if password not in b:
b.append("".join(password))
print(b)
print(len(b))
Please let me describe what is going on inside:
Importing Random,
creating a string with letters that we want to use
creating an empty list that we will use to put our combinations in
and now we are using range (I put 10 but for 3 digits it can be less)
next using random.sample with a list and list length we are creating letter combinations and joining it.
in next steps we are checking if in our b list we have that combination - if so, it is not added to the b list. If current combination is not on the list, we are adding it to it. (we are comparing final joined combination).
the last step is to print list b with all combinations and print number of possible combinations.
Maybe it is not clear and most efficient code but i think it works...
print([a+b+c for a in alphabets for b in alphabets for c in alphabets if a !=b and b!=c and c!= a])
This removes the repetition of characters in one string