imagine you have this string
'49999494949'
'4949999949'
'4949499994'
how do you count the most repeated of this 9 in one line after the 4 and print it out as '99999'
i tried
a = '49999494949'
b = "123456780"
for char in b:
a = a.replace(char, "")
return a
but it's end up giving me all the 9 in the given string where i want only the most repeated 9 in line after the 4
and thanks for helping !
You can split each string on "4", which will give you a list of strings containing "9". Then you can find the longest:
>>> s = "49999494949"
>>> nines = s.split("4")
>>> nines
['', '9999', '9', '9', '9']
>>> max(nines)
'9999'
Or as a single command:
>>> s = "49999494949"
>>> max(s.split("4"))
'9999'
I have a a column of string. The data does not follow any particular format.
I need to find all numbers which are separated by commas.
For example,
string = "There are 5 people in the class and their heights 3,9,6,7,4".
I want to just extract the number 3,9,6,7,4 without the number 5.
I ultimately want to concatenate the word before the first number to each number. i.e heights3,heights9,heights6,heights7,heights4.
ExampleString = "There are 5 people in the class and their heights are 3,9,6,7,4"
temp = re.findall(r'\s\d+\b',ExampleString)
Here I get number 5 as well.
Regex is your friend. You can solve your problem with just one line of code:
[int(n) for n in sum([l.split(',') for l in re.findall(r'[\d,]+[,\d]', test_string)], []) if n.isdigit()]
Ok, let's explain step by step:
The following code produced the list of string numbers delimited by comma:
test_string = "There are 5 people in the class and their heights are 3,9,6,7,4 and this 55,66, 77"
list_of_comma = [l for l in re.findall(r'[\d,]+[,\d]', test_string)]
# output: ['3,9,6,7,4', '55,66,', '77']
Divides list_of_comma and produces a list_of_lists of characters:
list_of_list = [l.split(',') for l in list_of_comma]
# output: [['3', '9', '6', '7', '4'], ['55', '66', ''], ['77']]
I use a trick to unpack the list of the list:
lst = sum(list_of_list, [])
# output: ['3', '9', '6', '7', '4', '55', '66', '', '77']
Convert each element to an integer and exclude non integers:
int_list = [int(n) for n in lst if n.isdigit()]
# output: [3, 9, 6, 7, 4, 55, 66, 77]
EDIT: if you want to format the numeric list in the required format:
keyword= ',heights'
formatted_res = keyword[1:] + keyword.join(map(str,res))
# output: 'heights3,heights9,heights6,heights7,heights4,heights55,heights66,heights77'
As stated in the commnents the 4 isn't followed by any number (so leaving it out):
>>> t = "There are 5 people in the class and their heights are 3,9,6,7,4"
>>> 'heights'+'heights'.join(re.findall(r'\d+,', t)).rstrip(',')
'heights3,heights9,heights6,heights7'
And if you want to include it you can:
>>> 'heights'+'heights'.join(re.findall(r'\d+,|(?<=,)\d+', t))
'heights3,heights9,heights6,heights7,heights4'
This should work. \d is a digit (a character in the range 0-9), and + means 1 or more times
import re
test_string = "There are 2 apples for 4 persons 4 helasdf 4 23 "
print("The original string : " + test_string)
temp = re.findall(r'\d+', test_string)
res = list(map(int, temp))
print("The numbers list is : " + str(res))
To extract a sequence of numbers in any string:
import re
# some random text just for testing
string = "azrazer 5,6,4 qsfdqdf 5,,1,2,!,88,9,44,aa,2"
# retrieve all sequence of number separated by ','
r = r'(?:\d+,)+\d+'
# retrieve all sequence of number separated by ',' except the last one
r2 = r'((?:\d+,)+)(?:\d+)'
# best answers for question so far
r3 = r'[\d,]+[,\d]+[^a-z]'
r4 = r'[\d,]+[,\d]'
print('findall r1: ', re.findall(r, string))
print('findall r2:', re.findall(r3, string))
print('findall r3:', re.findall(r4, string))
print('-----------------------------------------')
print('findall r2:', re.findall(r2, string))
Out put:
findall r1: ['5,6,4', '1,2', '88,9,44'] ---> correct
findall r3: ['5,6,4 ', '5,,1,2,!', ',88,9,44,'] --> wrong
findall r4: ['5,6,4', '5,,1,2,', ',88,9,44,', ',2'] --> wrong
-----------------------------------------
findall r2: ['5,6,', '1,', '88,9,'] --> correct exclude the last element
I am trying to split a string on any char that is not a digit.
orig = '0 1,2.3-4:5;6d7'
results = orig.split(r'\D+')
I expect to get a list of integers in results
0, 1, 2, 3, 4, 5, 6, 7
but instead I am getting a list with a single string element which matches the original string.
Well ... you are using str.split() - which takes characters to split at - not regex. Your code would split on any '\D+' - string inside your text:
orig = 'Some\\D+text\\D+tosplit'
results = orig.split(r'\D+') # ['Some', 'text', 'tosplit']
You can use re.split() instead:
import re
orig = '0 1,2.3-4:5;6d7'
results = re.split(r'\D+',orig)
print(results)
to get
['0', '1', '2', '3', '4', '5', '6', '7']
Use data = list(map(int,results)) to convert to int.
Try this:
orig = '0 1,2.3-4:5;6d7'
[i for i in orig if i.isdigit()]
for i in '0 1,2.3-4:5;6d7':
try:
print(int(i),end=' ')
except:
continue
0 1 2 3 4 5 6 7
I have a list like this:
li = [1, 2, 3, 4, 5]
I want to change it into a string, get rid of the quotes, and get rid of the commas so that it looks like this:
1 2 3 4 5
I tried the following:
new_list = []
new_list.append(li)
new_string = " ".join(new_list)
print new_string
however I get the below error:
TypeError: sequence item 0: expected str instance, int found
Why does this happen and how can I fix this so that I get the output I want?
The items in the list need to be of the str type in order to join them with the given delimeter. Try this:
' '.join(map(str, your_list)) # join the resulting iterable of strings, after casting ints
This is happening because join is expecting an iterable sequence of strings, and yours contains int.
You need to convert this list to string either by using list comprehension:
>>> li
[1, 2, 3, 4, 5]
>>> new_li = [str(val) for val in li]
>>> new_li
['1', '2', '3', '4', '5']
or a regular for loop:
>>> for x in range(len(li)):
... li[x] = str(li[x])
...
>>> li
['1', '2', '3', '4', '5']
then your expression will work.
>>> result = ' '.join(li)
>>> result
'1 2 3 4 5'
The error is from attempting to join integers into a string, you could do this to turn every value into a string, then join them.
new_list = [str(x) for x in li]
new_string = " ".join(new_list)
As a one-liner:
new_string = " ".join([str(x) for x in li])
I have a lot of files and I have saved all filenames to filelists.txt. Here is an example file:
cpu_H1_M1_S1.out
cpu_H1_M1_S2.out
cpu_H2_M1_S1.out
cpu_H2_M1_S2.out
When the program detects _H, _M, _S in the file name. I need to output the numbers that appear afterwards. For example:
_H _M _S
1 1 1
1 1 2
2 1 1
2 1 2
Thank you.
You could use a regexp:
>>> s = 'cpu_H2_M1_S2.out'
>>> re.findall(r'cpu_H(\d+)_M(\d+)_S(\d+)', s)
[('2', '1', '2')]
If it doesn't match the format exactly, you'll get an empty list as a result, which can be used to ignore the results. You could adapt this to convert the str's to int's if you wished:
[int(i) for i in re.findall(...)]
something like this using regex:
In [13]: with open("filelists.txt") as f:
for line in f:
data=re.findall(r"_H\d+_M\d+_S\d+",line)
if data:
print [x.strip("HMS") for x in data[0].split("_")[1:]]
....:
['1', '1', '1']
['1', '1', '2']
['2', '1', '1']
['2', '1', '2']
Though I have nothing against regex itself, I think it's overkill for this problem. Here's a lighter solution:
five = operator.itemgetter(5)
seven = operator.itemgetter(7)
nine = operator.itemgetter(9)
with open("filelists.txt") as f:
for line in f:
return [(int(five(line)), int(seven(line)), int(nine(nine))) for line in f]
Hope that helps