How to find characters not in parentheses - python

Attempting to find all occurrences of characters
string1 = '%(example_1).40s-%(example-2)_-%(example3)s_'
so that output has all occurrences of '-' '_' not in parentheses
['-', '_', '-', '_']
Do not need to care about nested parentheses

You can use module re to do that by passing regex to it
import re
str = '%(example_1).40s-%(example-2)_-%(example3)s_'
#remove all occurences of paratheses and what is inside
tmpStr = re.sub('\(([^\)]+)\)', '', str)
#take out other element except your caracters
tmpStr = re.sub('[^_-]', '', tmpStr)
#and transform it to list
result_list = list(tmpStr)
Result
['-', '_', '-', '_']
And like Bharath shetty has mentioned it in comment, do not use str, it's a reserved word in python for built-in strings

The following will give you your output.:
>>> import re
>>> str = '%(example_1).40s-%(example-2)_-%(example3)s_'
>>> print list("".join(re.findall("[-_]+(?![^(]*\))", str)))
['-', '_', '-', '_']
What this does is it finds all the substrings containing '-' and/or '_' in str and not in parentheses. Since these are substrings, we get all such matching characters by joining, and splitting into a list.

Related

How do I find the first element of intersection of a string from a list?

For example, I want to get "," printed given the following string and list because it's the first character of the string that appears in my list of characters.
my_list = [',', '.', ';', ':']
my_string = "Hello world, I am a programmer."
The whole intersection list would be ',' and '.' with ',' being the first and therefore what I want to print
I've tried the following code, but is there a shorter way to do it?
my_list = [',', '.', ';', ':']
my_string = "Hello world, I am a programmer."
my_set = set(my_string).intersection(my_list)
my_list2 = [my_string.find(i) for i in my_set]
my_list2.sort()
num1 = my_list2[0]
print(my_string[num1])
From what I understand you want to find the first character that appears in the string. With the character options being what you specify. If this is the case could you do something like this?
my_chars = [',', '.', ';', ':']
my_string = "Hello world, I am a programmer."
for char in my_string:
if char in my_chars:
print(char)
break
You can use next:
my_set = {',', '.', ';', ':'}
my_string = "Hello world, I am a programmer."
output = next((x for x in my_string if x in my_set), '')
print(output) # ,
If there are no common characters, it returns '' (an empty string).

Partitioning a string with multiple delimiters

I know partition() exists, but it only takes in one value, I'm trying to partition around various values:
for example say I wanted to partition around symbols in a string:
input: "function():"
output: ["function", "(", ")", ":"]
I can't seem to find an efficient way to handle variable amounts of partitioning.
You can use re.findall with an alternation pattern that matches either a word or a non-space character:
re.findall(r'\w+|\S', s)
so that given s = 'function():', this returns:
['function', '(', ')', ':']
You could re.split by \W and use (...) to keep the delimiters, then remove empty parts.
>>> import re
>>> s = "function(): return foo + 3"
>>> [s for s in re.split(r"(\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3']
Note that this will split after every special character; if you want to keep certain groups of special characters together, e.g. == or <=, you should test those first with |.
>>> s = "function(): return foo + 3 == 42"
>>> [s for s in re.split(r"(\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3', '=', '=', '42']
>>> [s for s in re.split(r"(==|!=|<=|\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3', '==', '42']

[] followed by () in regex altering the meaning of [] in python

My regex expression is re.findall("[2]*(.)","b = 2 + a*10");
Its output: ['b', ' ', '=', ' ', ' ', '+', ' ', 'a', '*', '1', '0']
But from the expression what I can infer is it should give all strings starting with o or more times 2 followed by anything, which should give all characters including 2! But there is not 2 in the output? It is actually omitting the characters inside [] which I concluded after replacing 2 with any other character But unable to understand why it is happening? Why [] followed by () omitting characters inside [].
Read the docs for re.findall:
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
So when you include (.) in your pattern, re.findall will return the contents of that group.

Split a string in python with spaces and punctuations mark , unicode characters , etc.

I want to split string like this:
string = '[[he (∇((comesΦf→chem,'
based on spaces, punctuation marks also unicode characters. I mean, what I expect in output is in following mode:
out= ['[', '[', 'he',' ', '(','∇' , '(', '(', 'comes','Φ', 'f','→', 'chem',',']
I am using
re.findall(r"[\w\s\]+|[^\w\s]",String,re.unicode)
for this case, but it returned following output:
output=['[', '[', 'he',' ', '(', '\xe2', '\x88', '\x87', '(', '(', 'comes\xce', '\xa6', 'f\xe2', '\x86', '\x92', 'chem',',']
Please tell me how can i solve this problem.
Without using regexes and assuming words only contain ascii characters:
from string import ascii_letters
from itertools import groupby
LETTERS = frozenset(ascii_letters)
def is_alpha(char):
return char in LETTERS
def split_string(text):
for key, tokens in groupby(text, key=is_alpha):
if key: # Found letters, join them and yield a word
yield ''.join(tokens)
else: # not letters, just yield the single tokens
yield from tokens
Example result:
In [2]: list(split_string('[[he (∇((comesΦf→chem,'))
Out[2]: ['[', '[', 'he', ' ', '(', '∇', '(', '(', 'comes', 'Φ', 'f', '→', 'chem', ',']
If you are using a python version less than 3.3 you can replace yield from tokens with:
for token in tokens: yield token
If you are on python2 keep in mind that split_string accepts a unicode string.
Note that modifying the is_alpha function you can define different kinds of grouping. For example if you wanted to considered all unicode letters as letters you could do: is_alpha = str.isalpha (or unicode.isalpha in python2):
In [3]: is_alpha = str.isalpha
In [4]: list(split_string('[[he (∇((comesΦf→chem,'))
Out[4]: ['[', '[', 'he', ' ', '(', '∇', '(', '(', 'comesΦf', '→', 'chem', ',']
Note the 'comesΦf' that before was splitted.
Hope i halp.
In [33]: string = '[[he (∇((comesΦf→chem,'
In [34]: re.split('\W+', string)
Out[34]: ['', 'he', 'comes', 'f', 'chem', '']

How to remove specific symbols from a text using python?

I have a string like this:
string = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
These are the symbols I want to remove from my String.
!, #, #, %, ^, &, *, (, ), _, +, =, `, /
What I have tried is:
listofsymbols = ['!', '#', '#', '%', '^', '&', '*', '(', ')', '_', '+', '=', '`', '/']
exceptionals = set(chr(e) for e in listofsymbols)
string.translate(None,exceptionals)
The error is:
an integer is required
Please help me doing this!
Try this
>>> my_str = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
>>> my_str.translate(None, '!##%^&*()_+=`/')
This is my text of 2013-02-11, it contained characters like this Exceptional
Also, please refrain from naming variables that are already built-in names or part of the standard library.
How about this? I've also renamed string to s to avoid it getting mixed up with the built-in module string.
>>> s = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
>>> listofsymbols = ['!', '#', '#', '%', '^', '&', '*', '(', ')', '_', '+', '=', '`', '/']
>>> print ''.join([i for i in s if i not in listofsymbols])
This is my text of 2013-02-11, it contained characters like this Exceptional
Another proposal, easily expandable to more complex filter criteria or other input data type:
from itertools import ifilter
def isValid(c): return c not in "!##%^&*()_+=`/"
print "".join(ifilter(isValid, my_string))

Categories