I have a string, for example:
s = "I ? am ? a ? string"
And I have a list equal in length to the number of ? in the string:
l = ['1', '2', '3']
What is a pythonic way to return s with each consecutive ? replaced with the values in l?, e.g.:
s_new = 'I 1 am 2 a 3 string'
2 Methods:
# Method 1
s = "I ? am ? a ? string"
l = ['1', '2', '3']
for i in l:
s = s.replace('?', i, 1)
print(s)
# Output: I 1 am 2 a 3 string
# Method 2
from functools import reduce
s = "I ? am ? a ? string"
l = ['1', '2', '3']
s_new = reduce(lambda x, y: x.replace('?', y, 1), l, s)
print(s_new)
# Output: I 1 am 2 a 3 string
If the placeholders (not "delimiters") were {} rather than ?, this would be exactly how the built-in .format method handles empty {} (along with a lot more power). So, we can simply replace the placeholders first, and then use that functionality:
>>> s = "I ? am ? a ? string"
>>> l = ['1', '2', '3']
>>> s.replace('?', '{}').format(*l)
'I 1 am 2 a 3 string'
Notice that .format expects each value as a separate argument, so we use * to unpack the list.
If the original string contains { or } which must be preserved, we can first escape them by doubling them up:
>>> s = "I ? {am} ? a ? string"
>>> l = ['1', '2', '3']
>>> s.replace('?', '{}').format(*l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'am'
>>> s.replace('{', '{{').replace('}', '}}').replace('?', '{}').format(*l)
'I 1 {am} 2 a 3 string'
I have a a column of string. The data does not follow any particular format.
I need to find all numbers which are separated by commas.
For example,
string = "There are 5 people in the class and their heights 3,9,6,7,4".
I want to just extract the number 3,9,6,7,4 without the number 5.
I ultimately want to concatenate the word before the first number to each number. i.e heights3,heights9,heights6,heights7,heights4.
ExampleString = "There are 5 people in the class and their heights are 3,9,6,7,4"
temp = re.findall(r'\s\d+\b',ExampleString)
Here I get number 5 as well.
Regex is your friend. You can solve your problem with just one line of code:
[int(n) for n in sum([l.split(',') for l in re.findall(r'[\d,]+[,\d]', test_string)], []) if n.isdigit()]
Ok, let's explain step by step:
The following code produced the list of string numbers delimited by comma:
test_string = "There are 5 people in the class and their heights are 3,9,6,7,4 and this 55,66, 77"
list_of_comma = [l for l in re.findall(r'[\d,]+[,\d]', test_string)]
# output: ['3,9,6,7,4', '55,66,', '77']
Divides list_of_comma and produces a list_of_lists of characters:
list_of_list = [l.split(',') for l in list_of_comma]
# output: [['3', '9', '6', '7', '4'], ['55', '66', ''], ['77']]
I use a trick to unpack the list of the list:
lst = sum(list_of_list, [])
# output: ['3', '9', '6', '7', '4', '55', '66', '', '77']
Convert each element to an integer and exclude non integers:
int_list = [int(n) for n in lst if n.isdigit()]
# output: [3, 9, 6, 7, 4, 55, 66, 77]
EDIT: if you want to format the numeric list in the required format:
keyword= ',heights'
formatted_res = keyword[1:] + keyword.join(map(str,res))
# output: 'heights3,heights9,heights6,heights7,heights4,heights55,heights66,heights77'
As stated in the commnents the 4 isn't followed by any number (so leaving it out):
>>> t = "There are 5 people in the class and their heights are 3,9,6,7,4"
>>> 'heights'+'heights'.join(re.findall(r'\d+,', t)).rstrip(',')
'heights3,heights9,heights6,heights7'
And if you want to include it you can:
>>> 'heights'+'heights'.join(re.findall(r'\d+,|(?<=,)\d+', t))
'heights3,heights9,heights6,heights7,heights4'
This should work. \d is a digit (a character in the range 0-9), and + means 1 or more times
import re
test_string = "There are 2 apples for 4 persons 4 helasdf 4 23 "
print("The original string : " + test_string)
temp = re.findall(r'\d+', test_string)
res = list(map(int, temp))
print("The numbers list is : " + str(res))
To extract a sequence of numbers in any string:
import re
# some random text just for testing
string = "azrazer 5,6,4 qsfdqdf 5,,1,2,!,88,9,44,aa,2"
# retrieve all sequence of number separated by ','
r = r'(?:\d+,)+\d+'
# retrieve all sequence of number separated by ',' except the last one
r2 = r'((?:\d+,)+)(?:\d+)'
# best answers for question so far
r3 = r'[\d,]+[,\d]+[^a-z]'
r4 = r'[\d,]+[,\d]'
print('findall r1: ', re.findall(r, string))
print('findall r2:', re.findall(r3, string))
print('findall r3:', re.findall(r4, string))
print('-----------------------------------------')
print('findall r2:', re.findall(r2, string))
Out put:
findall r1: ['5,6,4', '1,2', '88,9,44'] ---> correct
findall r3: ['5,6,4 ', '5,,1,2,!', ',88,9,44,'] --> wrong
findall r4: ['5,6,4', '5,,1,2,', ',88,9,44,', ',2'] --> wrong
-----------------------------------------
findall r2: ['5,6,', '1,', '88,9,'] --> correct exclude the last element
I am trying to split a string on any char that is not a digit.
orig = '0 1,2.3-4:5;6d7'
results = orig.split(r'\D+')
I expect to get a list of integers in results
0, 1, 2, 3, 4, 5, 6, 7
but instead I am getting a list with a single string element which matches the original string.
Well ... you are using str.split() - which takes characters to split at - not regex. Your code would split on any '\D+' - string inside your text:
orig = 'Some\\D+text\\D+tosplit'
results = orig.split(r'\D+') # ['Some', 'text', 'tosplit']
You can use re.split() instead:
import re
orig = '0 1,2.3-4:5;6d7'
results = re.split(r'\D+',orig)
print(results)
to get
['0', '1', '2', '3', '4', '5', '6', '7']
Use data = list(map(int,results)) to convert to int.
Try this:
orig = '0 1,2.3-4:5;6d7'
[i for i in orig if i.isdigit()]
for i in '0 1,2.3-4:5;6d7':
try:
print(int(i),end=' ')
except:
continue
0 1 2 3 4 5 6 7
My regex
\d+\.*\d+
my String
4a 1 a2 3 21 12a3 123.12
What I get:
['21', '12', '123.12']
What I need:
412321123123.12
I do work well when there is just 123.12 for example but when I add spaces or new chars in between it separates them. I want it to skip all the spaces and chars and to extract just numbers separated with '.' in the right position.
Remove everything that is not a digit or a point:
import re
result = re.sub('[^.\d]', '', '4a 1 a2 3 21 12a3 123.12')
print(result)
Output
412321123123.12
Regex-free alternative:
>>> s = '4a 1 a2 3 21 12a3 123.12'
>>>
>>> ''.join(c for c in s if c.isdigit() or c == '.')
>>> '412321123123.12'
Using a list comprehension with str.join is slightly faster than the generator-comprehension, i.e.:
>>> ''.join([c for c in s if c.isdigit() or c == '.'])
>>> '412321123123.12'
This assumes that you want to keep multiple . or that you don't expect more than one.
I have a list like this:
li = [1, 2, 3, 4, 5]
I want to change it into a string, get rid of the quotes, and get rid of the commas so that it looks like this:
1 2 3 4 5
I tried the following:
new_list = []
new_list.append(li)
new_string = " ".join(new_list)
print new_string
however I get the below error:
TypeError: sequence item 0: expected str instance, int found
Why does this happen and how can I fix this so that I get the output I want?
The items in the list need to be of the str type in order to join them with the given delimeter. Try this:
' '.join(map(str, your_list)) # join the resulting iterable of strings, after casting ints
This is happening because join is expecting an iterable sequence of strings, and yours contains int.
You need to convert this list to string either by using list comprehension:
>>> li
[1, 2, 3, 4, 5]
>>> new_li = [str(val) for val in li]
>>> new_li
['1', '2', '3', '4', '5']
or a regular for loop:
>>> for x in range(len(li)):
... li[x] = str(li[x])
...
>>> li
['1', '2', '3', '4', '5']
then your expression will work.
>>> result = ' '.join(li)
>>> result
'1 2 3 4 5'
The error is from attempting to join integers into a string, you could do this to turn every value into a string, then join them.
new_list = [str(x) for x in li]
new_string = " ".join(new_list)
As a one-liner:
new_string = " ".join([str(x) for x in li])