How can I search through a string and extract all specific characters? - python

So say that I have a string which is something along the lines of "One2three4". Is it possible for me to look through the string and take the integers and put them in their own string, so my final result will be "24". Thanks

Using str.join() and str.isdigit():
>>> s = "One2three4"
>>> ''.join(c for c in s if c.isdigit())
'24'
This method looks through the string once and checks if each character is a digit or not; the characters that satisfy this are joined into a new string. In complexity terms, this is O(n), and as we need to check every character in the string, this is the best we can do.

Related

Confusion with string split method in python

Consider the following example
a= 'Apple'
b = a.split(',')
print(b)
Output is ['Apple'].
I am not getting why is it returning a list even when there is no ',' character in Apple
There might be case when we use split method we are expecting more than one element in list but since we are splitting based on separator not present in string, there will be only one element, wouldn't it be better if this mistake is caught during this split method itself
The behaviour of a.split(',') when no commas are present in a is perfectly consistent with the way it behaves when there are a positive number of commas in a.
a.split(',') says to split string a into a list of substrings that are delimited by ',' in a; the delimiter is not preserved in the substrings.
If 1 comma is found you get 2 substrings in the list, if 2 commas are found you get 3 substrings in the list, and in general, if n commas are found you get n+1 substrings in the list. So if 0 commas are found you get 1 substring in the list.
If you want 0 substrings in the list, then you'll need to supply a string with -1 commas in it. Good luck with that. :)
The docstring of that method says:
Return a list of the words in the string S, using sep as the delimiter string.
The delimiter is used to separate multiple parts of the string; having only one part is not an error.
That's the way split() function works. If you do not want that behaviour, you can implement your my_split() function as follows:
def my_split(s, d=' '):
return s.split(d) if d in s else s

Verify how many pair of parentheses exist in a string in Python

I'm wondering if there's any way to find how many pair of parentheses are in a string.
I have to do some string manipulation and I sometimes have something like:
some_string = '1.8.0*99(0000000*kWh)'
or something like
some_string = '1.6.1*01(007.717*kW)(1604041815)'
What I'd like to do is:
get all the digits between the parentheses (e.g for the first string: 0000000)
if there are 2 pairs of parentheses (there will always be max 2 pairs) get all the digits and join them (e.g for the second string I'll have: 0077171604041815)
How can I verify how many pair of parentheses are in a string so that I can do later something like:
if number_of_pairs == 1:
do_this
else:
do_that
Or maybe there's an easier way to do what I want but couldn't think of one so far.
I know how to get only the digits in a string: final_string = re.sub('[^0-9]', '', my_string), but I'm wondering how could I treat both cases.
As parenthesis always present in pairs, So just count the left or right parenthesis in a string and you'll get your answer.
num_of_parenthesis = string.count('(')
You can do that: (assuming you already know there's at least one parenthese)
re.sub(r'[^0-9]+', '', some_string.split('(', 1)[1])
or only with re.sub:
re.sub(r'^[^(]*\(|[^0-9]+', '', some_string)
If you want all the digits in a single string, use re.findall after replacing any . and join into a single string:
In [15]: s="'1.6.1*01(007.717*kW)(1604041815)'"
In [16]: ("".join(re.findall("\((\d+).*?\)", s.replace(".", ""))))
Out[16]: '0077171604041815'
In [17]: s = '1.8.0*99(0000000*kWh)'
In [18]: ("".join(re.findall("\((\d+).*?\)", s.replace(".", ""))))
Out[18]: '0000000'
The count of parens is irrelevant when all you want is to extract any digits inside them. Based on the fact "you only have max two pairs" I presume the format is consistent.
Or if the parens always have digits, find the data in the parens and sub all bar the digits:
In [20]: "".join([re.sub("[^0-9]", "", m) for m in re.findall("\((.*?)\)", s)])
Out[20]: '0077171604041815'

String splitting in python by finding non-zero character

I want to do the following split:
input: 0x0000007c9226fc output: 7c9226fc
input: 0x000000007c90e8ab output: 7c90e8ab
input: 0x000000007c9220fc output: 7c9220fc
I use the following line of code to do this but it does not work!
split = element.rpartition('0')
I got these outputs which are wrong!
input: 0x000000007c90e8ab output: e8ab
input: 0x000000007c9220fc output: fc
what is the fastest way to do this kind of split?
The only idea for me right now is to make a loop and perform checking but it is a little time consuming.
I should mention that the number of zeros in input is not fixed.
Each string can be converted to an integer using int() with a base of 16. Then convert back to a string.
for s in '0x000000007c9226fc', '0x000000007c90e8ab', '0x000000007c9220fc':
print '%x' % int(s, 16)
Output
7c9226fc
7c90e8ab
7c9220fc
input[2:].lstrip('0')
That should do it. The [2:] skips over the leading 0x (which I assume is always there), then the lstrip('0') removes all the zeros from the left side.
In fact, we can use lstrip ability to remove more than one leading character to simplify:
input.lstrip('x0')
format is handy for this:
>>> print '{:x}'.format(0x000000007c90e8ab)
7c90e8ab
>>> print '{:x}'.format(0x000000007c9220fc)
7c9220fc
In this particular case you can just do
your_input[10:]
You'll most likely want to properly parse this; your idea of splitting on separation of non-zero does not seem safe at all.
Seems to be the XY problem.
If the number of characters in a string is constant then you can use
the following code.
input = "0x000000007c9226fc"
output = input[10:]
Documentation
Also, since you are using rpartitionwhich is defined as
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
Since your input can have multiple 0's, and rpartition only splits the last occurrence this a malfunction in your code.
Regular expression for 0x00000 or its type is (0x[0]+) and than replace it with space.
import re
st="0x000007c922433434000fc"
reg='(0x[0]+)'
rep=re.sub(reg, '',st)
print rep

Remove items in a sequence from a string Python

Okay so I'm trying to make a function that will take a string and a sequence of items (in the form of either a list, a tuple or a string) and remove all items from that list from the string.
So far my attempt looks like this:
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.strip(item)
return s
However, for some reason when I try this or variations of this, it only returns either the original string or a version with only the first item in bad_characters removed.
>>> eliminate("foobar",["o","b"])
'foobar'
Is there a way to remove all items in bad_characters from the given string?
The reason your solution doesn't work is because str.strip() only removes characters from the outsides of the string, i.e. characters on the leftmost or rightmost end of the string. So, in the case of 'foobar', str.strip() with a single character argument would only work if you wanted to remove the characters 'f' and 'r'.
You could eliminate more of the inner characters with strip, but you would need to include one of the outer characters as well.
>>> 'foobar'.strip('of')
'bar'
>>> 'foobar'.strip('o')
'foobar'
Here's how to do it by string-joining a generator expression:
def eliminate(s, bad_characters):
bc = set(bad_characters)
return ''.join(c for c in s if c not in bc)
Try to replace the bad characters as empty strings.
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.replace(item, '')
return s
strip() doesn't work as it tries to remove beginning and tail part of the original string only.
strip is not a correct choice for this task as it remove the characters from leading and trailing of the string, instead you can use str.translate method :
>>> s,l="foobar",["o","b"]
>>> s.translate(None,''.join(l))
'far'
Try this, may be time consuming using recursion
def eliminate(s, seq):
while seq:
return eliminate(s.replace(seq.pop(),""), seq)
return s
>>>eliminate("foobar",["o","b"])
'far'

Splice a string based on certain characters

I'm looking for a way to examine only certain characters within a string. For example:
#Given the string
s= '((hello+world))'
s[1:')'] #This obviously doesn't work because you can only splice a string using ints
Basically I want the program to start at the second occurence of ( and then from there splice until it hits the first occurence of ). So then maybe from there I can return it to another fucntion or whatever. Any solutions?
You can do it as follows: (assuming you want the innermost parenthesis)
s[s.rfind("("):s.find(")")+1] if you want "(hello+world)"
s[s.rfind("(")+1:s.find(")")] if you want "hello+world"
You can strip parenthesis (if, in your case, they always appear at the beginning and the end of the string):
>>> s= '((hello+world))'
>>> s.strip('()')
'hello+world'
Another option is to use regular expression to extract what is inside the double parenthesis:
>>> re.match('\(\((.*?)\)\)', s).group(1)
'hello+world'

Categories