Splitting a string on non digits - python

I am trying to split a string on any char that is not a digit.
orig = '0 1,2.3-4:5;6d7'
results = orig.split(r'\D+')
I expect to get a list of integers in results
0, 1, 2, 3, 4, 5, 6, 7
but instead I am getting a list with a single string element which matches the original string.

Well ... you are using str.split() - which takes characters to split at - not regex. Your code would split on any '\D+' - string inside your text:
orig = 'Some\\D+text\\D+tosplit'
results = orig.split(r'\D+') # ['Some', 'text', 'tosplit']
You can use re.split() instead:
import re
orig = '0 1,2.3-4:5;6d7'
results = re.split(r'\D+',orig)
print(results)
to get
['0', '1', '2', '3', '4', '5', '6', '7']
Use data = list(map(int,results)) to convert to int.

Try this:
orig = '0 1,2.3-4:5;6d7'
[i for i in orig if i.isdigit()]

for i in '0 1,2.3-4:5;6d7':
try:
print(int(i),end=' ')
except:
continue
0 1 2 3 4 5 6 7

Related

Replace consecutive delimiters in string with values from list

I have a string, for example:
s = "I ? am ? a ? string"
And I have a list equal in length to the number of ? in the string:
l = ['1', '2', '3']
What is a pythonic way to return s with each consecutive ? replaced with the values in l?, e.g.:
s_new = 'I 1 am 2 a 3 string'
2 Methods:
# Method 1
s = "I ? am ? a ? string"
l = ['1', '2', '3']
for i in l:
s = s.replace('?', i, 1)
print(s)
# Output: I 1 am 2 a 3 string
# Method 2
from functools import reduce
s = "I ? am ? a ? string"
l = ['1', '2', '3']
s_new = reduce(lambda x, y: x.replace('?', y, 1), l, s)
print(s_new)
# Output: I 1 am 2 a 3 string
If the placeholders (not "delimiters") were {} rather than ?, this would be exactly how the built-in .format method handles empty {} (along with a lot more power). So, we can simply replace the placeholders first, and then use that functionality:
>>> s = "I ? am ? a ? string"
>>> l = ['1', '2', '3']
>>> s.replace('?', '{}').format(*l)
'I 1 am 2 a 3 string'
Notice that .format expects each value as a separate argument, so we use * to unpack the list.
If the original string contains { or } which must be preserved, we can first escape them by doubling them up:
>>> s = "I ? {am} ? a ? string"
>>> l = ['1', '2', '3']
>>> s.replace('?', '{}').format(*l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'am'
>>> s.replace('{', '{{').replace('}', '}}').replace('?', '{}').format(*l)
'I 1 {am} 2 a 3 string'

how to count a repeated char from a string

imagine you have this string
'49999494949'
'4949999949'
'4949499994'
how do you count the most repeated of this 9 in one line after the 4 and print it out as '99999'
i tried
a = '49999494949'
b = "123456780"
for char in b:
a = a.replace(char, "")
return a
but it's end up giving me all the 9 in the given string where i want only the most repeated 9 in line after the 4
and thanks for helping !
You can split each string on "4", which will give you a list of strings containing "9". Then you can find the longest:
>>> s = "49999494949"
>>> nines = s.split("4")
>>> nines
['', '9999', '9', '9', '9']
>>> max(nines)
'9999'
Or as a single command:
>>> s = "49999494949"
>>> max(s.split("4"))
'9999'

Find all the number in a string which are separated by commas in Python

I have a a column of string. The data does not follow any particular format.
I need to find all numbers which are separated by commas.
For example,
string = "There are 5 people in the class and their heights 3,9,6,7,4".
I want to just extract the number 3,9,6,7,4 without the number 5.
I ultimately want to concatenate the word before the first number to each number. i.e heights3,heights9,heights6,heights7,heights4.
ExampleString = "There are 5 people in the class and their heights are 3,9,6,7,4"
temp = re.findall(r'\s\d+\b',ExampleString)
Here I get number 5 as well.
Regex is your friend. You can solve your problem with just one line of code:
[int(n) for n in sum([l.split(',') for l in re.findall(r'[\d,]+[,\d]', test_string)], []) if n.isdigit()]
Ok, let's explain step by step:
The following code produced the list of string numbers delimited by comma:
test_string = "There are 5 people in the class and their heights are 3,9,6,7,4 and this 55,66, 77"
list_of_comma = [l for l in re.findall(r'[\d,]+[,\d]', test_string)]
# output: ['3,9,6,7,4', '55,66,', '77']
Divides list_of_comma and produces a list_of_lists of characters:
list_of_list = [l.split(',') for l in list_of_comma]
# output: [['3', '9', '6', '7', '4'], ['55', '66', ''], ['77']]
I use a trick to unpack the list of the list:
lst = sum(list_of_list, [])
# output: ['3', '9', '6', '7', '4', '55', '66', '', '77']
Convert each element to an integer and exclude non integers:
int_list = [int(n) for n in lst if n.isdigit()]
# output: [3, 9, 6, 7, 4, 55, 66, 77]
EDIT: if you want to format the numeric list in the required format:
keyword= ',heights'
formatted_res = keyword[1:] + keyword.join(map(str,res))
# output: 'heights3,heights9,heights6,heights7,heights4,heights55,heights66,heights77'
As stated in the commnents the 4 isn't followed by any number (so leaving it out):
>>> t = "There are 5 people in the class and their heights are 3,9,6,7,4"
>>> 'heights'+'heights'.join(re.findall(r'\d+,', t)).rstrip(',')
'heights3,heights9,heights6,heights7'
And if you want to include it you can:
>>> 'heights'+'heights'.join(re.findall(r'\d+,|(?<=,)\d+', t))
'heights3,heights9,heights6,heights7,heights4'
This should work. \d is a digit (a character in the range 0-9), and + means 1 or more times
import re
test_string = "There are 2 apples for 4 persons 4 helasdf 4 23 "
print("The original string : " + test_string)
temp = re.findall(r'\d+', test_string)
res = list(map(int, temp))
print("The numbers list is : " + str(res))
To extract a sequence of numbers in any string:
import re
# some random text just for testing
string = "azrazer 5,6,4 qsfdqdf 5,,1,2,!,88,9,44,aa,2"
# retrieve all sequence of number separated by ','
r = r'(?:\d+,)+\d+'
# retrieve all sequence of number separated by ',' except the last one
r2 = r'((?:\d+,)+)(?:\d+)'
# best answers for question so far
r3 = r'[\d,]+[,\d]+[^a-z]'
r4 = r'[\d,]+[,\d]'
print('findall r1: ', re.findall(r, string))
print('findall r2:', re.findall(r3, string))
print('findall r3:', re.findall(r4, string))
print('-----------------------------------------')
print('findall r2:', re.findall(r2, string))
Out put:
findall r1: ['5,6,4', '1,2', '88,9,44'] ---> correct
findall r3: ['5,6,4 ', '5,,1,2,!', ',88,9,44,'] --> wrong
findall r4: ['5,6,4', '5,,1,2,', ',88,9,44,', ',2'] --> wrong
-----------------------------------------
findall r2: ['5,6,', '1,', '88,9,'] --> correct exclude the last element

How to remove whitespace in a list

I can't remove my whitespace in my list.
invoer = "5-9-7-1-7-8-3-2-4-8-7-9"
cijferlijst = []
for cijfer in invoer:
cijferlijst.append(cijfer.strip('-'))
I tried the following but it doesn't work. I already made a list from my string and seperated everything but the "-" is now a "".
filter(lambda x: x.strip(), cijferlijst)
filter(str.strip, cijferlijst)
filter(None, cijferlijst)
abc = [x.replace(' ', '') for x in cijferlijst]
Try that:
>>> ''.join(invoer.split('-'))
'597178324879'
If you want the numbers in string without -, use .replace() as:
>>> string_list = "5-9-7-1-7-8-3-2-4-8-7-9"
>>> string_list.replace('-', '')
'597178324879'
If you want the numbers as list of numbers, use .split():
>>> string_list.split('-')
['5', '9', '7', '1', '7', '8', '3', '2', '4', '8', '7', '9']
This looks a lot like the following question:
Python: Removing spaces from list objects
The answer being to use strip instead of replace. Have you tried
abc = x.strip(' ') for x in x

Changing a list into a string

I have a list like this:
li = [1, 2, 3, 4, 5]
I want to change it into a string, get rid of the quotes, and get rid of the commas so that it looks like this:
1 2 3 4 5
I tried the following:
new_list = []
new_list.append(li)
new_string = " ".join(new_list)
print new_string
however I get the below error:
TypeError: sequence item 0: expected str instance, int found
Why does this happen and how can I fix this so that I get the output I want?
The items in the list need to be of the str type in order to join them with the given delimeter. Try this:
' '.join(map(str, your_list)) # join the resulting iterable of strings, after casting ints
This is happening because join is expecting an iterable sequence of strings, and yours contains int.
You need to convert this list to string either by using list comprehension:
>>> li
[1, 2, 3, 4, 5]
>>> new_li = [str(val) for val in li]
>>> new_li
['1', '2', '3', '4', '5']
or a regular for loop:
>>> for x in range(len(li)):
... li[x] = str(li[x])
...
>>> li
['1', '2', '3', '4', '5']
then your expression will work.
>>> result = ' '.join(li)
>>> result
'1 2 3 4 5'
The error is from attempting to join integers into a string, you could do this to turn every value into a string, then join them.
new_list = [str(x) for x in li]
new_string = " ".join(new_list)
As a one-liner:
new_string = " ".join([str(x) for x in li])

Categories