I have a string which is very long. I would like to split this string into substrings 16 characters long, skipping one character every time (e.g. substring1=first 16 elements of the string, substring2 from element 18 to element 34 and so on) and list them.
I wrote the following code:
string="abcd..."
list=[]
for j in range(0,int(len(string)/17)-1):
list.append(string[int(j*17):int(j*17+16)])
But it returns:
list=[]
I can't figure out what is wrong with this code.
>>> string="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
Your original code, without masking the built-in (excludes the final full-length string and any partial string after it):
>>> l = []
>>> for j in range(0,int(len(string)/17)-1):
... l.append(string[int(j*17):int(j*17+16)])
...
>>> l
['abcdefghijklmnop', 'rstuvwxyzabcdefg', 'ijklmnopqrstuvwx']
A cleaned version that includes all possible strings:
>>> for j in range(0,len(string),17):
... l.append(string[j:j+16])
...
>>> l
['abcdefghijklmnop', 'rstuvwxyzabcdefg', 'ijklmnopqrstuvwx', 'zabcdefghijklmno', 'qrstuvwxyz']
How about we turn that last one into a comprehension? Everyone loves comprehensions.
>>> l = [string[j:j+16] for j in range(0,len(string),17)]
We can filter out strings that are too short if we want to:
>>> l = [string[j:j+16] for j in range(0,len(string),17) if len(string[j:j+16])>=16]
It does work -- but only for strings longer than 16 characters. You have
range(0,int(len(string)/17)-1)
but, for the string "abcd...", int(len(string)/17)-1) is -1. Add some logic to catch the < 16 chars case and you're good:
...
for j in range(0, max(1, int(len(string)/17)-1)):
...
Does this work?
>>> from string import ascii_lowercase
>>> s = ascii_lowercase * 2
>>> s
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
>>> spl = [s[i:i+16] for i in range(0, len(s), 17)]
>>> spl
['abcdefghijklmnop', 'rstuvwxyzabcdefg', 'ijklmnopqrstuvwx', 'z']
The following should work:
#!/usr/bin/python
string="abcdefghijklmnopqrstuvwxyz"
liszt=[]
leng=5
for j in range(0,len(string)/leng):
ibeg=j*(leng+1)
liszt.append(string[ibeg:ibeg+leng])
if ibeg+leng+1 < len(string):
liszt.append(string[ibeg+leng:])
print liszt
Related
I know that my_str[1::3] gets me every 2nd character in chunks of 3, but what if I want to get every 2nd and 3rd character? Is there a neat way to do that with slicing, or do I need some other method like a list comprehension plus a join:
new_str = ''.join([s[i * 3 + 1: i * 3 + 3] for i in range(len(s) // 3)])
I think using a list comprehension with enumerate would be the cleanest.
>>> "".join(c if i % 3 in (1,2) else "" for (i, c) in enumerate("peasoup booze scaffold john"))
'eaou boz safol jhn'
Instead of getting only 2nd and 3rd characters, why not filter out the 1st items?
Something like this:
>>> str = '123456789'
>>> tmp = list(str)
>>> del tmp[::3]
>>> new_str = ''.join(tmp)
>>> new_str
'235689'
I have a nested list as below:
[['asgy200;f','ssll100',' time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100',' time is: 00h:00m:05s','ooo']]
'***' is my delimiter. I want to separate all of seconds in the list in python.
First of all with regular expression I want to separate the line that has time is: string but it doesn't work!
I don't know what should I do.
Thanks
import re
x=[['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
s=str(x)
print re.findall(r"(?<=time is)\s*:\s*[^']*:(\d+)",s)
Output:['12', '05']
You can try this.
You can use a look-ahead regex (r'(?<=time is\:).*') :
>>> [i.group(0).split(':')[2] for i in [re.search(r'(?<=time is\:).*',i) for i in l[0]] if i is not None]
['12s', '05s']
and you can convert them to int :
>>> [int(j.replace('s','')) for j in sec]
[12, 5]
if you want the string of seconds don't convert them to int after replace :
>>> [j.replace('s','') for j in sec]
['12', '05']
You could use capturing groups also. It won't print the seconds if the seconds is exactly equal to 00
>>> lst = [['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> [i for i in re.findall(r'time\s+is:\s+\d{2}h:\d{2}m:(\d{2})', ' '.join(lst[0])) if int(i) != 00]
['12', '05']
>>> lst = [['asgy200;f','ssll100','time is: 10h:00m:00s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> [i for i in re.findall(r'time\s+is:\s+\d{2}h:\d{2}m:(\d{2})', ' '.join(lst[0])) if int(i) != 00]
['05']
Taking into account your last comment to your Q,
>>> x = [['asgy200;f','ssll100','time is: 10h:00m:12s','xxxxxxx','***','','asgy200;f','frl5100','time is: 00h:00m:05s','ooo']]
>>> print all([w[-3:-1]!='00' for r in x for w in r if w.startswith('time is: ')])
True
>>>
all and any are two useful builtins...
The thing operates like this, the slower loop is on the sublists (rows) of x, the fastest loop on the items (words)in each row, we pick up only the words that startswith a specific string, and our iterable is made of booleans where we have true if the 3rd last and 2nd last character of the picked word are different from'00'. Finally the all consumes the iterable and returns True if all the second fields are different from '00'.
HTH,
Addendum
Do we want to break out early?
all_secs_differ_from_0 = True
for row in x:
for word in row:
if word.startswith('time is: ') and word[-3:-1] == '00':
all_secs_differ_from_0 = False
break
if not all_secs_differ_from_0: break
I'm new to Python, and I would like to find a substring in a string.
For example, if I have a substring of some constant letters such as:
substring = 'sdkj'
And a string of some letters such as:
string = 'sdjskjhdvsnea'
I want to make a counter so that any letters S, D, K, and J found in the string the counter will get incremented by 1. For example, for the above example, the counter will be 8.
How can I achieve this?
May this code can help you:
>>> string = 'sdjskjhdvsnea'
>>> substring = 'sdkj'
>>> counter = 0
>>> for x in string:
... if x in substring:
... counter += 1
>>> counter
8
>>>
An alternative solution using re.findall():
>>> import re
>>> substring = 'sdkj'
>>> string = 'sdjskjhdvsnea'
>>> len(re.findall('|'.join(list(substring)), string))
8
Edit:
As you apparently do want the count of the appearances of the whole four-character substring, regex is probably the easiest method:
>>> import re
>>> string = 'sdkjhsgshfsdkj'
>>> substring = 'sdkj'
>>> len(re.findall(substring, string))
2
re.findall will give you a list of all (non-overlapping) appearances of substring in string:
>>> re.findall('sdkj', 'sdkjhsgshfsdkj')
['sdkj', 'sdkj']
Normally, "finding a sub-string 'sdkj'" would mean trying to locate the appearances of that complete four-character substring within the larger string. In this case, it appears that you simply want the sum of the counts of those four letters:
sum(string.count(c) for c in substring)
Or, more efficiently, use collections.Counter:
from collections import Counter
counts = Counter(string)
sum(counts.get(c, 0) for c in substring)
This only iterates over string once, rather than once for each c in substring, so is O(m+n) rather than O(m*n) (where m == len(string) and n == len(substring)).
In action:
>>> string = "sdjskjhdvsnea"
>>> substring = "sdkj"
>>> sum(string.count(c) for c in substring)
8
>>> from collections import Counter
>>> counts = Counter(string)
>>> sum(counts.get(c, 0) for c in substring)
8
Note that you may want set(substring) to avoid double-counting:
>>> sum(string.count(c) for c in "sdjks")
11
>>> sum(string.count(c) for c in set("sdjks"))
8
I use nlst on a ftp server which returns directories in the form of lists. The format of the returned list is as follows:
[xyz123,abcde345,pqrst678].
I have to separate each element of the list into two parts such that part1 = xyz and part2 = 123 i.e split the string at the beginning of the integer part. Any help on this will be appreciated!
>>> re.findall(r'\d+|[a-z]+', 'xyz123')
['xyz', '123']
For example, using the re module:
>>> import re
>>> a = ['xyz123','ABCDE345','pqRst678']
>>> regex = '(\D+)(\d+)'
>>> for item in a:
... m = re.match(regex, item)
... (a, b) = m.groups()
... print a, b
xyz 123
ABCDE 345
pqRst 678
Use the regular expression module re:
import re
def splitEntry(entry):
firstDecMatch = re.match(r"\d+$", entry)
alpha, numeric = "",""
if firstDecMatch:
pos = firstDecMatch.start(0)
alpha, numeric = entry[:pos], entry[pos:]
else # no decimals found at end of string
alpha = entry
return (alpha, numeric)
Note that the regular expression is `\d+$', which should match all decimals at the end of the string. If the string has decimals in the first part, it will not count those, e.g: xy3zzz134 -> "xy3zzz","134". I opted for that because you say you are expecting filenames, and filenames can include numbers. Of course it's still a problem if the filename ends with numbers.
Another non-re answer:
>>> [''.join(x[1]) for x in itertools.groupby('xyz123', lambda x: x.isalpha())]
['xyz', '123']
If you don't want to use regex, then you can do something like this. Note that I have not tested this so there could be a bug or typo somewhere.
list = ["xyz123", "abcde345", "pqrst678"]
newlist = []
for item in list:
for char in range(0, len(item)):
if item[char].isnumeric():
newlist.append([item[:char], item[char:]])
break
>>> import re
>>> [re.findall(r'(.*?)(\d+$)',x)[0] for x in ['xyz123','ABCDE345','pqRst678']]
[('xyz', '123'), ('ABCDE', '345'), ('pqRst', '678')]
I don't think its that difficult without re
>>> s="xyz123"
>>> for n,i in enumerate(s):
... if i.isdigit(): x=n ; break
...
>>> [ s[:x], s[x:] ]
['xyz', '123']
>>> s="abcde345"
>>> for n,i in enumerate(s):
... if i.isdigit(): x=n ; break
...
>>> [ s[:x], s[x:] ]
['abcde', '345']
I want to use python re module to filter the int number by the number digital.
1
700
76093
71365
35837
75671
^^
||--------------------- this position should not be 6,7,8,9,0
|---------------------- this position should not be 5,6,7
Code:
int_list=[1,700,76093,71365,35837,75671]
str_list = [str(x).zfill(5) for x in int_list]
reexp = r"\d[0-4,8-9][1-5]\d\d"
import re
p = re.compile(reexp)
result = [int("".join(str(y) for y in x)) for x in str_list if p.match(x)]
I have 2 questions:
1.Is it possible to generate the reexp string from below code:
thousand_position = set([1,2,3,4,5,1,1,1,1,1,1,1,1,1,1])
hundred_position = set([1,2,3,4,8,9,0,1,2,3,2,3,1,2])
2.how to make the reexp be more simple avoid below 0-prefixed bug?
00700
00500 <--- this will also drops into the reexp, it is a
bug because it has no kilo number
10700
reexp = r"\d[0-4,8-9][1-5]\d\d"
Thanks for your time
B.Rgs
PS: thanks for suggstion for the math solution below, I know it may be easy and faster, but I want the re based version to balance other thoughts.
Are you sure you want to be using the re module? You can get at what you're trying to do with some simple math operations.
def valid_number(n):
return 0 < n%1000/100 < 6 and not 5 >= n%10000/1000 >= 7
int_list = [1,700,76093,71365,35837,75671,]
result = [x for x in int_list if valid_number(x)]
or alternatively:
result = filter(valid_number, int_list)
Ok, first, I'm going to post some code that actually does what you describe initially:
>>> int_list=[1, 700, 76093, 71365, 35837, 75671]
>>> str_list = [str(i).zfill(5) for i in int_list]
>>> filtered = [s for s in str_list if re.match('\d[0-4,8-9][1-5]\d\d', s)]
>>> filtered
['71365']
Edit: Ok, I think I understand your question now. Instead of using zfill, you could use rjust, which will insert spaces instead of zeros.
>>> int_list=[1,700,76093,71365,35837,75671,500]
>>> str_list = [str(i).rjust(5) for i in int_list]
>>> re_str = '\d' + str(list(set([0, 1, 3, 4, 8, 9]))) + str(list(set([1, 2, 3, 4, 5]))) + '\d\d'
>>> filtered = [s for s in str_list if re.match(re_str, s)]
>>> filtered
['71365']
I think doing this mathematically as yan suggests will be faster in the end, but perhaps you have your reasons for using regular expressions.