I have a nested list as below:
[['asgy200;f','ssll100',' time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100',' time is: 00h:00m:05s','ooo']]
'***' is my delimiter. I want to separate all of seconds in the list in python.
First of all with regular expression I want to separate the line that has time is: string but it doesn't work!
I don't know what should I do.
Thanks
import re
x=[['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
s=str(x)
print re.findall(r"(?<=time is)\s*:\s*[^']*:(\d+)",s)
Output:['12', '05']
You can try this.
You can use a look-ahead regex (r'(?<=time is\:).*') :
>>> [i.group(0).split(':')[2] for i in [re.search(r'(?<=time is\:).*',i) for i in l[0]] if i is not None]
['12s', '05s']
and you can convert them to int :
>>> [int(j.replace('s','')) for j in sec]
[12, 5]
if you want the string of seconds don't convert them to int after replace :
>>> [j.replace('s','') for j in sec]
['12', '05']
You could use capturing groups also. It won't print the seconds if the seconds is exactly equal to 00
>>> lst = [['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> [i for i in re.findall(r'time\s+is:\s+\d{2}h:\d{2}m:(\d{2})', ' '.join(lst[0])) if int(i) != 00]
['12', '05']
>>> lst = [['asgy200;f','ssll100','time is: 10h:00m:00s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> [i for i in re.findall(r'time\s+is:\s+\d{2}h:\d{2}m:(\d{2})', ' '.join(lst[0])) if int(i) != 00]
['05']
Taking into account your last comment to your Q,
>>> x = [['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> print all([w[-3:-1]!='00' for r in x for w in r if w.startswith('time is: ')])
True
>>>
all and any are two useful builtins...
The thing operates like this, the slower loop is on the sublists (rows) of x, the fastest loop on the items (words)in each row, we pick up only the words that startswith a specific string, and our iterable is made of booleans where we have true if the 3rd last and 2nd last character of the picked word are different from'00'. Finally the all consumes the iterable and returns True if all the second fields are different from '00'.
HTH,
Addendum
Do we want to break out early?
all_secs_differ_from_0 = True
for row in x:
for word in row:
if word.startswith('time is: ') and word[-3:-1] == '00':
all_secs_differ_from_0 = False
break
if not all_secs_differ_from_0: break
Related
I have a list of strings consisting of integers, and I am trying to replace them with the sum of their digits. E.g. nums = ["12","23","33"] -> nums = [3,5,6]
Here is my code:
strng = ['12','23','33']
for i in range(len(strng)):
print(list((map(lambda x:int[x],list(strng[i])))))
For the above I am getting a TypeError: 'type' object is not subscriptable. It works up until map(), but when I add the list(map(...)), I get this error.
Any ideas how to fix it?
My after this is fixed, my idea is to do the following:
strng = ['12','23','33']
for i in range(len(strng)):
strng[i] = sum(list((map(lambda x:int[x],list(strng[i]))))))
Which should replace each of strng with the sum of its digits.
The error you're getting is you because you wrote int[x] instead of int(x). However, there are some additional issues with your existing solution.
The short and pythonic solution to this problem would be:
answer = [sum(map(int, list(s))) for s in strng]
To break this down:
[... for s in strng]: this is a list comprehension
list(s): This takes each string and converts it into a list of str of each character, so "123" becomes ["1","2","3"]
map(int, list(s)): This applys the int conversion to each element in list(s), so ["1","2","3"] becomes [1,2,3]
sum(...): We take the sum of the resulting list of ints
The equivalent of the above using a normal for loop would be something like this:
answer = []
for s in strng:
list_of_chars = list(s)
list_of_ints = map(int, list_of_chars)
sum_of_ints = sum(list_of_ints)
answer.append(sum_of_ints)
You can use comprehension, and iterate each digits, convert them to integer, finally pass it to sum builtin to get sum of the values.
>>> [sum(int(i) for i in v) for v in strng]
[3, 5, 6]
Not really efficient, but try this :
strng = ['12','23','33']
def function(strng) :
my_list = []
for string in strng :
my_list.append(0)
for digit in string :
my_list[-1] += int(digit)
return my_list
strng = function(strng)
print(strng)
Suppose I had a string
string1 = "498results should get"
Now I need to get only integer values from the string like 498. Here I don't want to use list slicing because the integer values may increase like these examples:
string2 = "49867results should get"
string3 = "497543results should get"
So I want to get only integer values out from the string exactly in the same order. I mean like 498,49867,497543 from string1,string2,string3 respectively.
Can anyone let me know how to do this in a one or two lines?
>>> import re
>>> string1 = "498results should get"
>>> int(re.search(r'\d+', string1).group())
498
If there are multiple integers in the string:
>>> map(int, re.findall(r'\d+', string1))
[498]
An answer taken from ChristopheD here: https://stackoverflow.com/a/2500023/1225603
r = "456results string789"
s = ''.join(x for x in r if x.isdigit())
print int(s)
456789
Here's your one-liner, without using any regular expressions, which can get expensive at times:
>>> ''.join(filter(str.isdigit, "1234GAgade5312djdl0"))
returns:
'123453120'
if you have multiple sets of numbers then this is another option
>>> import re
>>> print(re.findall('\d+', 'xyz123abc456def789'))
['123', '456', '789']
its no good for floating point number strings though.
Iterator version
>>> import re
>>> string1 = "498results should get"
>>> [int(x.group()) for x in re.finditer(r'\d+', string1)]
[498]
>>> import itertools
>>> int(''.join(itertools.takewhile(lambda s: s.isdigit(), string1)))
With python 3.6, these two lines return a list (may be empty)
>>[int(x) for x in re.findall('\d+', your_string)]
Similar to
>>list(map(int, re.findall('\d+', your_string))
this approach uses list comprehension, just pass the string as argument to the function and it will return a list of integers in that string.
def getIntegers(string):
numbers = [int(x) for x in string.split() if x.isnumeric()]
return numbers
Like this
print(getIntegers('this text contains some numbers like 3 5 and 7'))
Output
[3, 5, 7]
def function(string):
final = ''
for i in string:
try:
final += str(int(i))
except ValueError:
return int(final)
print(function("4983results should get"))
Another option is to remove the trailing the letters using rstrip and string.ascii_lowercase (to get the letters):
import string
out = [int(s.replace(' ','').rstrip(string.ascii_lowercase)) for s in strings]
Output:
[498, 49867, 497543]
integerstring=""
string1 = "498results should get"
for i in string1:
if i.isdigit()==True
integerstring=integerstring+i
print(integerstring)
I have an array I want to iterate through. The array consists of strings consisting of numbers and signs.
like this: €110.5M
I want to loop over it and remove all Euro sign and also the M and return that array with the strings as ints.
How would I do this knowing that the array is a column in a table?
You could just strip the characters,
>>> x = '€110.5M'
>>> x.strip('€M')
'110.5'
def sanitize_string(ss):
ss = ss.replace('$', '').replace('€', '').lower()
if 'm' in ss:
res = float(ss.replace('m', '')) * 1000000
elif 'k' in ss:
res = float(ss.replace('k', '')) * 1000
return int(res)
This can be applied to a list as follows:
>>> ls = [sanitize_string(x) for x in ["€3.5M", "€15.7M" , "€167M"]]
>>> ls
[3500000, 15700000, 167000000]
If you want to apply it to the column of a table instead:
dataFrame = dataFrame.price.apply(sanitize_string) # Assuming you're using DataFrames and the column is called 'price'
You can use a string comprehension:
numbers = [float(p.replace('€','').replace('M','')) for p in a]
which gives:
[110.5, 210.5, 310.5]
You can use a list comprehension to construct one list from another:
foo = ["€13.5M", "€15M" , "€167M"]
foo_cleaned = [value.translate(None, "€M")]
str.translate replaces all occurrences of characters in the latter string with the first argument None.
Try this
arr = ["€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M"]
f = [x.replace("€","").replace("M","") for x in arr]
You can call .replace() on a string as often as you like. An initial solution could be something like this:
my_array = ['€110.5M', '€111.5M', '€112.5M']
my_cleaned_array = []
for elem in my_array:
my_cleaned_array.append(elem.replace('€', '').replace('M', ''))
At this point, you still have strings in your array. If you want to return them as ints, you can write int(elem.replace('€', '').replace('M', '')) instead. But be aware that you will then lose everything after the floating point, i.e. you will end up with [110, 111, 112].
You can use Regex to do that.
import re
str = "€110.5M"
x = re.findall("\-?\d+\.\d+", str )
print(x)
I didn't quite understand the second part of the question.
I have a regex that looks for numbers in a file.
I put results in a list
The problem is that it prints each results on a new line for every single number it finds. it aslo ignore the list I've created.
What I want to do is to have all the numbers into one list.
I used join() but it doesn't works.
code :
def readfile():
regex = re.compile('\d+')
for num in regex.findall(open('/path/to/file').read()):
lst = [num]
jn = ''.join(lst)
print(jn)
output :
122
34
764
What goes wrong:
# this iterates the single numbers you find - one by one
for num in regex.findall(open('/path/to/file').read()):
lst = [num] # this puts one number back into a new list
jn = ''.join(lst) # this gets the number back out of the new list
print(jn) # this prints one number
Fixing it:
Reading re.findall() show's you, it returns a list already.
There is no(t much) need to use a for on it to print it.
If you want a list - simply use re.findall()'s return value - if you want to print it, use one of the methods in Printing an int list in a single line python3 (several more posts on SO about printing in one line):
import re
my_r = re.compile(r'\d+') # define pattern as raw-string
numbers = my_r.findall("123 456 789") # get the list
print(numbers)
# different methods to print a list on one line
# adjust sep / end to fit your needs
print( *numbers, sep=", ") # print #1
for n in numbers[:-1]: # print #2
print(n, end = ", ")
print(numbers[-1])
print(', '.join(numbers)) # print #3
Output:
['123', '456', '789'] # list of found strings that are numbers
123, 456, 789
123, 456, 789
123, 456, 789
Doku:
print() function for sep= and end=
Printing an int list in a single line python3
Convert all strings in a list to int ... if you need the list as numbers
More on printing in one line:
Print in one line dynamically
Python: multiple prints on the same line
How to print without newline or space?
Print new output on same line
In your case, regex.findall() returns a list and you are are joining in each iteration and printing it.
That is why you're seeing this problem.
You can try something like this.
numbers.txt
Xy10Ab
Tiger20
Beta30Man
56
My45one
statements:
>>> import re
>>>
>>> regex = re.compile(r'\d+')
>>> lst = []
>>>
>>> for num in regex.findall(open('numbers.txt').read()):
... lst.append(num)
...
>>> lst
['10', '20', '30', '56', '45']
>>>
>>> jn = ''.join(lst)
>>>
>>> jn
'1020305645'
>>>
>>> jn2 = '\n'.join(lst)
>>> jn2
'10\n20\n30\n56\n45'
>>>
>>> print(jn2)
10
20
30
56
45
>>>
>>> nums = [int(n) for n in lst]
>>> nums
[10, 20, 30, 56, 45]
>>>
>>> sum(nums)
161
>>>
Use list built-in functions to append new values.
def readfile():
regex = re.compile('\d+')
lst = []
for num in regex.findall(open('/path/to/file').read()):
lst.append(num)
print(lst)
I am able to detect matches but unable to locate where are they.
Given the following list:
['A second goldfish is nice and all', 3456, 'test nice']
I need to search for match (i.e. "nice") and print all the list elements that contain it. Ideally if the keyword to search were "nice" the results should be:
'A second goldfish is nice and all'
'test nice'
I have:
list = data_array
string = str(raw_input("Search keyword: "))
print string
if any(string in s for s in list):
print "Yes"
So it finds the match and prints both, the keyword and "Yes" but it doesn't tell me where it is.
Should I iterate through every index in list and for each iteration search "string in s" or there is an easier way to do this?
Try this:
list = data_array
string = str(raw_input("Search keyword: "))
print string
for s in list:
if string in str(s):
print 'Yes'
print list.index(s)
Editted to working example. If you only want the first matching index you can also break after the if statement evaluates true
matches = [s for s in my_list if my_string in str(s)]
or
matches = filter(lambda s: my_string in str(s), my_list)
Note that 'nice' in 3456 will raise a TypeError, which is why I used str() on the list elements. Whether that's appropriate depends on if you want to consider '45' to be in 3456 or not.
print filter(lambda s: k in str(s), l)
To print all the elements that contains nice
mylist = ['nice1', 'def456', 'ghi789', 'nice2', 'nice3']
sub = 'nice'
print("\n".join([e for e in mylist if sub in e]))
>>> nice1
nice2
nice3
To get the index of elements that contain nice (irrespective of the letter case)
mylist = ['nice1', 'def456', 'ghi789', 'Nice2', 'NicE3']
sub = 'nice'
index_list = []
i = 0
for e in mylist:
if sub in e.lower():
index_list.append(i)
i +=1
print(index_list)
>>> [0, 3, 4]