Verify how many pair of parentheses exist in a string in Python - python

I'm wondering if there's any way to find how many pair of parentheses are in a string.
I have to do some string manipulation and I sometimes have something like:
some_string = '1.8.0*99(0000000*kWh)'
or something like
some_string = '1.6.1*01(007.717*kW)(1604041815)'
What I'd like to do is:
get all the digits between the parentheses (e.g for the first string: 0000000)
if there are 2 pairs of parentheses (there will always be max 2 pairs) get all the digits and join them (e.g for the second string I'll have: 0077171604041815)
How can I verify how many pair of parentheses are in a string so that I can do later something like:
if number_of_pairs == 1:
do_this
else:
do_that
Or maybe there's an easier way to do what I want but couldn't think of one so far.
I know how to get only the digits in a string: final_string = re.sub('[^0-9]', '', my_string), but I'm wondering how could I treat both cases.

As parenthesis always present in pairs, So just count the left or right parenthesis in a string and you'll get your answer.
num_of_parenthesis = string.count('(')

You can do that: (assuming you already know there's at least one parenthese)
re.sub(r'[^0-9]+', '', some_string.split('(', 1)[1])
or only with re.sub:
re.sub(r'^[^(]*\(|[^0-9]+', '', some_string)

If you want all the digits in a single string, use re.findall after replacing any . and join into a single string:
In [15]: s="'1.6.1*01(007.717*kW)(1604041815)'"
In [16]: ("".join(re.findall("\((\d+).*?\)", s.replace(".", ""))))
Out[16]: '0077171604041815'
In [17]: s = '1.8.0*99(0000000*kWh)'
In [18]: ("".join(re.findall("\((\d+).*?\)", s.replace(".", ""))))
Out[18]: '0000000'
The count of parens is irrelevant when all you want is to extract any digits inside them. Based on the fact "you only have max two pairs" I presume the format is consistent.
Or if the parens always have digits, find the data in the parens and sub all bar the digits:
In [20]: "".join([re.sub("[^0-9]", "", m) for m in re.findall("\((.*?)\)", s)])
Out[20]: '0077171604041815'

Related

How can I search through a string and extract all specific characters?

So say that I have a string which is something along the lines of "One2three4". Is it possible for me to look through the string and take the integers and put them in their own string, so my final result will be "24". Thanks
Using str.join() and str.isdigit():
>>> s = "One2three4"
>>> ''.join(c for c in s if c.isdigit())
'24'
This method looks through the string once and checks if each character is a digit or not; the characters that satisfy this are joined into a new string. In complexity terms, this is O(n), and as we need to check every character in the string, this is the best we can do.

Is it possible to determine if a string begins with an character repeating x times?

I have a two strings
code_one = "222abc"
code_two = "2abc"
Is there a way I can determine that strings begin with "2" repeating any number of times?
You can simply use lstrip() and compare the lengths:
>>> code_one = "222abc"
>>> len(code_one) - len(code_one.lstrip("2"))
3
Or if you just want to check the string starts with some characters:
>>> code_one.startswith("222")
True
Assuming tests would include characters other than '2', perhaps grab character [0] and compare to [1] and if equal then lstrip on that character as shown to get a count (if needed )

Dot notation string manipulation

Is there a way to manipulate a string in Python using the following ways?
For any string that is stored in dot notation, for example:
s = "classes.students.grades"
Is there a way to change the string to the following:
"classes.students"
Basically, remove everything up to and including the last period. So "restaurants.spanish.food.salty" would become "restaurants.spanish.food".
Additionally, is there any way to identify what comes after the last period? The reason I want to do this is I want to use isDigit().
So, if it was classes.students.grades.0 could I grab the 0 somehow, so I could use an if statement with isdigit, and say if the part of the string after the last period (so 0 in this case) is a digit, remove it, otherwise, leave it.
you can use split and join together:
s = "classes.students.grades"
print '.'.join(s.split('.')[:-1])
You are splitting the string on . - it'll give you a list of strings, after that you are joining the list elements back to string separating them by .
[:-1] will pick all the elements from the list but the last one
To check what comes after the last .:
s.split('.')[-1]
Another way is to use rsplit. It works the same way as split but if you provide maxsplit parameter it'll split the string starting from the end:
rest, last = s.rsplit('.', 1)
'classes.students'
'grades'
You can also use re.sub to substitute the part after the last . with an empty string:
re.sub('\.[^.]+$', '', s)
And the last part of your question to wrap words in [] i would recommend to use format and list comprehension:
''.join("[{}]".format(e) for e in s.split('.'))
It'll give you the desired output:
[classes][students][grades]
The best way to do this is using the rsplit method and pass in the maxsplit argument.
>>> s = "classes.students.grades"
>>> before, after = s.rsplit('.', maxsplit=1) # rsplit('.', 1) in Python 2.x onwards
>>> before
'classes.students'
>>> after
'grades'
You can also use the rfind() method with normal slice operation.
To get everything before last .:
>>> s = "classes.students.grades"
>>> last_index = s.rfind('.')
>>> s[:last_index]
'classes.students'
Then everything after last .
>>> s[last_index + 1:]
'grades'
if '.' in s, s.rpartition('.') finds last dot in s,
and returns (before_last_dot, dot, after_last_dot):
s = "classes.students.grades"
s.rpartition('.')[0]
If your goal is to get rid of a final component that's just a single digit, start and end with re.sub():
s = re.sub(r"\.\d$", "", s)
This will do the job, and leave other strings alone. No need to mess with anything else.
If you do want to know about the general case (separate out the last component, no matter what it is), then use rsplit to split your string once:
>>> "hel.lo.there".rsplit(".", 1)
['hel.lo', 'there']
If there's no dot in the string you'll just get one element in your array, the entire string.
You can do it very simply with rsplit (str.rsplit([sep[, maxsplit]]) , which will return a list by breaking each element along the given separator.
You can also specify how many splits should be performed:
>>> s = "res.spa.f.sal.786423"
>>> s.rsplit('.',1)
['res.spa.f.sal', '786423']
So the final function that you describe is:
def dimimak_cool_function(s):
if '.' not in s: return s
start, end = s.rsplit('.', 1)
return start if end.isdigit() else s
>>> dimimak_cool_function("res.spa.f.sal.786423")
'res.spa.f.sal'
>>> dimimak_cool_function("res.spa.f.sal")
'res.spa.f.sal'

find last occurence of multiple characters in a string in Python

I would like to find the last occurrence of a number of characters in a string.
str.rfind() will give the index of the last occurrence of a single character in a string, but I need the index of the last occurrence of any of a number of characters. For example if I had a string:
test_string = '([2+2])-[3+4])'
I would want a function that returns the index of the last occurence of {, [, or { similar to
test_string.rfind('(', '[', '{')
Which would ideally return 8. What is the best way to do this?
max(test_string.rfind('('), test_string.rfind('['), test_string.rfind('{'))
seems clunky and not Pythonic.
You can use generator expression to do this in a Pythonic way.
max(test_string.rfind(i) for i in "([{")
This iterates through the list/tuple of characters that you want to check and uses rfind() on them, groups those values together, and then returns the maximum value.
This is pretty concise, and will do the trick.
max(map(test_string.rfind, '([{'))
You can use reversed to start at the end of the string getting the first match, using the length of the string -1 - the index i to get the index counting from the start, doing at worst a single pass over the string:
test_string = '([2+2])-[3+4])'
st = {"[", "(", "{"}
print(next((len(test_string) - 1 - i
for i, s in enumerate(reversed(test_string)) if s in st),-1))
8
If there is no match, you will get -1 as the default value. This is a lot more efficient if you a large amount of substrings to search for than doing an O(n) rfind for every substring you want to match and then getting the max of all those
>>> def last_of_many(string, findees):
... return max(string.rfind(s) for s in findees)
...
>>> test_string = '([2+2])-[3+4])'
>>> last_of_many(test_string, '([{')
8
>>> last_of_many(test_string, ['+4', '+2'])
10
>>>

Regexp matching equal number of the same character on each side of a string

How do you match only equal numbers of the same character (up to 3) on each side of a string in python?
For example, let's say I am trying to match equal signs
=abc= or ==abc== or ===abc===
but not
=abc== or ==abc=
etc.
I figured out how to do each individual case, but can't seem to get all of them.
(={1}(?=abc={1}))abc(={1})
as | of the same character
((={1}(?=abc={1}))|(={2}(?=abc={2})))abc(={1}|={2})
doesn't seem to work.
Use the following regex:
^(=+)abc\1$
Edit:
If you are talking about only max three =
^(={1,3})abc\1$
This is not a regular language. However, you can do it with backreferences:
(=+)[^=]+\1
consider that sample is a single string, here's a non-regex approach (out of many others)
>>> string="===abc==="
>>> string.replace("abc"," ").split(" ")
['===', '===']
>>> a,b = string.replace("abc"," ").split(" ")
>>> if a == b:
... print "ok"
...
ok
You said you want to match equal characters on each side, so regardless of what characters, you just need to check a and b are equal.
You are going to want to use a back reference. Check this post for an example:
Regex, single quote or double quote

Categories