Splitting an unspaced string of decimal values - Python - python

An awful person has given me a string like this
values = '.850000.900000.9500001.000001.50000'
and I need to split it to create the following list:
['.850000', '.900000', '.950000', '1.00000', '1.500000']
I know that I was dealing only with numbers < 1 I could use the code
dl = '.'
splitvalues = [dl+e for e in values.split(dl) if e != ""]
But in cases like this one where there are numbers greater than 1 buried in the string, splitvalue would end up being
['.850000', '.900000', '.9500001', '.000001', '.50000']
So is there a way to split a string with multiple delimiters while also splitting the string differently based on which delimiter is encountered?

I think this is somewhat closer to a fixed width format string. Try a regular expression like this:
import re
str = "(\d{1,2}\\.\d{5})"
m = re.search(str, input_str)
your_first_number = m.group(0)
Try this repeatedly on the remaining string to consume all numbers.

>>> import re
>>> source = '0.850000.900000.9500001.000001.50000'
>>> re.findall("(.*?00+(?!=0))", source)
['0.850000', '.900000', '.950000', '1.00000', '1.50000']
The split is based on looking for "{anything, double zero, a run of zeros (followed by a not-zero)"}.

Assume that the value before the decimal is less than 10, and then we have,
values = '0.850000.900000.9500001.000001.50000'
result = list()
last_digit = None
for value in values.split('.'):
if value.endswith('0'):
result.append(''.join([i for i in [last_digit, '.', value] if i]))
last_digit = None
else:
result.append(''.join([i for i in [last_digit, '.', value[0:-1]] if i]))
last_digit = value[-1]
if values.startswith('0'):
result = result[1:]
print(result)
# Output
['.850000', '.900000', '.950000', '1.00000', '1.50000']

How about using re.split():
import re
values = '0.850000.900000.9500001.000001.50000'
print([a + b for a, b in zip(*(lambda x: (x[1::2], x[2::2]))(re.split(r"(\d\.)", values)))])
OUTPUT
['0.85000', '0.90000', '0.950000', '1.00000', '1.50000']

Here digits are of fixed width, i.e. 6, if include the dot it's 7. Get the slices from 0 to 7 and 7 to 14 and so on. Because we don't need the initial zero, I use the slice values[1:] for extraction.
values = '0.850000.900000.9500001.000001.50000'
[values[1:][start:start+7] for start in range(0,len(values[1:]),7)]
['.850000', '.900000', '.950000', '1.00000', '1.50000']
Test;
''.join([values[1:][start:start+7] for start in range(0,len(values[1:]),7)]) == values[1:]
True

With a fixed / variable string, you may try something like:
values = '0.850000.900000.9500001.000001.50000'
str_list = []
first_index = values.find('.')
while first_index > 0:
last_index = values.find('.', first_index + 1)
if last_index != -1:
str_list.append(values[first_index - 1: last_index - 2])
first_index = last_index
else:
str_list.append(values[first_index - 1: len(values) - 1])
break
print str_list
Output:
['0.8500', '0.9000', '0.95000', '1.0000', '1.5000']
Assuming that there will always be a single digit before the decimal.
Please take this as a starting point and not a copy paste solution.

Related

How to replace all occurrences of "00000" with "0" repeatedly?

I need to repeatedly replace all occurrence of 00000 with 0 in a binary string input.
Although I'm able to achieve it to some extent, I do not know the logic when there are multiple consecutive 00000s like for example:
25 0s should be replaced with one 0
50 0s should be replaced with two 0s
125 0s should be replaced with one 0
Currently I have following code :
new_list = []
c = 0
l = list(s.split("00000"))
print(l)
for i in l:
if i == "00000":
for x in range(l.index(i),l.index(i-3)):
if l[x] != 0:
break
for y in range(0,5):
del l[i-y]
new_list.append(i)
new_list.append("0")
r_list = new_list[0:-1]
r_list= ''.join(map(str, r_list))
print(r_list)
But this will not work for 25 0s.
Also What would be the regex alternative for this ?
To get those results, you would need to repeatedly replace five consecutive zeroes to one zero, until there is no more occurrence of five consecutive zeroes. Here is an example run:
s = "0" * 125 # example input
while "00000" in s:
s = s.replace("00000", "0")
print(s)
As I state in my comment, my best guess at what you're trying to do is that you're trying to repeatedly apply the rule that 50's get replaced with 1, so that, for example, 25 0's get reduced to 00000, which in turn gets reduced to 0. Assuming that's correct:
It's not the most efficient approach, but here's one way to do it:
import re
new = "00000100002000003000000004" + "0"*50
old = ""
while old != new:
old,new = new,re.sub("0{5}","0",new)
print(new) #0100002030000400
Alternatively, here's a method to apply that change in one pass through the array:
s = "00000100002000003000000004" + "0"*50
stack,ct = ['#'],[-1]
i = 0
while i < len(s):
if s[i] == stack[-1]:
ct[-1] += 1
i+=1
elif ct[-1] >= 5:
q,r = divmod(ct[-1],5)
ct[-1] = q+r
else:
stack.append(s[i])
ct.append(1)
i+=1
while ct[-1] >= 5:
q,r = divmod(ct[-1],5)
ct[-1] = q+r
ans = "".join(c*k for c,k in zip(stack[1:],ct[1:]))
print(ans)
PyPI regex supports recursion. Something like this could do:
import regex as re
s = re.sub(r"0000(?:(?0)|0)", "0", s)
See this Python demo at tio.run or the regex demo at regex101
At (?0) or alternatively (?R) the pattern gets pasted (recursed).

Changing version number to single digits python

I have a version number in a file like this:
Testing x.x.x.x
So I am grabbing it off like this:
import re
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
# return the replacement string
return f'{a}.{b}.{c}.{d}'
lines = open('file.txt', 'r').readlines()
lines[3] = re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, lines[3])
I want to make it so if the last digit is a 9... then change it to 0 and then change the previous digit to a 1. So 1.1.1.9 changes to 1.1.2.0.
I did that by doing:
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
# return the replacement string
if (d == 9):
return f'{a}.{b}.{c+1}.{0}'
elif (c == 9):
return f'{a}.{b+1}.{0}.{0}'
elif (b == 9):
return f'{a+1}.{0}.{0}.{0}'
Issue occurs when its 1.1.9.9 or 1.9.9.9. Where multiple digits need to rounded. How can I handle this issue?
Use integer addition?
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
*a,b,c,d = [int(x) for x in str(a*1000 + b*100 + c*10 + d + 1)]
a = ''.join(map(str,a)) # fix for 2 digit 'a'
# return the replacement string
return f'{a}.{b}.{c}.{d}'
If your versions are never going to go beyond 10, it is better to just convert it to an integer, increment it and then convert back to a string.
This allows you to go up to as many version numbers as you require and you are not limited to thousands.
def increment(match):
match = match.replace('.', '')
match = int(match)
match += 1
match = str(match)
output = '.'.join(match)
return output
Add 1 to the last element. If it's more than 9, set it to 0 and do the same for the previous element. Repeat as necessary:
import re
def increment(match):
# convert the four matches to integers
g = [int(x) for x in match.groups()]
# increment, last one first
pos = len(g)-1
g[pos] += 1
while pos > 0:
if g[pos] > 9:
g[pos] = 0
pos -= 1
g[pos] += 1
else:
break
# return the replacement string
return '.'.join(str(x) for x in g)
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '1.8.9.9'))
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '1.9.9.9'))
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '9.9.9.9'))
Result:
1.9.0.0
2.0.0.0
10.0.0.0

python intelligent hexadecimal numbers generator

I want to be able to generate 12 character long chain, of hexadecimal, BUT with no more than 2 identical numbers duplicate in the chain: 00 and not 000
Because, I know how to generate ALL possibilites, including 00000000000 to FFFFFFFFFFF, but I know that I won't use all those values, and because the size of the file generated with ALL possibilities is many GB long, I want to reduce the size by avoiding the not useful generated chains.
So my goal is to have results like 00A300BF8911 and not like 000300BF8911
Could you please help me to do so?
Many thanks in advance!
if you picked the same one twice, remove it from the choices for a round:
import random
hex_digits = set('0123456789ABCDEF')
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
print(result)
Since the title mentions generators. Here's the above as a generator:
import random
hex_digits = set('0123456789ABCDEF')
def hexGen():
while True:
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
yield result
my_hex_gen = hexGen()
counter = 0
for result in my_hex_gen:
print(result)
counter += 1
if counter > 10:
break
Results:
1ECC6A83EB14
D0897DE15E81
9C3E9028B0DE
CE74A2674AF0
9ECBD32C003D
0DF2E5DAC0FB
31C48E691C96
F33AAC2C2052
CD4CEDADD54D
40A329FF6E25
5F5D71F823A4
You could also change the while true loop to only produce a certain number of these based on a number passed into the function.
I interpret this question as, "I want to construct a rainbow table by iterating through all strings that have the following qualities. The string has a length of 12, contains only the characters 0-9 and A-F, and it never has the same character appearing three times in a row."
def iter_all_strings_without_triplicates(size, last_two_digits = (None, None)):
a,b = last_two_digits
if size == 0:
yield ""
else:
for c in "0123456789ABCDEF":
if a == b == c:
continue
else:
for rest in iter_all_strings_without_triplicates(size-1, (b,c)):
yield c + rest
for s in iter_all_strings_without_triplicates(12):
print(s)
Result:
001001001001
001001001002
001001001003
001001001004
001001001005
001001001006
001001001007
001001001008
001001001009
00100100100A
00100100100B
00100100100C
00100100100D
00100100100E
00100100100F
001001001010
001001001011
...
Note that there will be several hundred terabytes' worth of values outputted, so you aren't saving much room compared to just saving every single string, triplicates or not.
import string, random
source = string.hexdigits[:16]
result = ''
while len(result) < 12 :
idx = random.randint(0,len(source))
if len(result) < 3 or result[-1] != result[-2] or result[-1] != source[idx] :
result += source[idx]
You could extract a random sequence from a list of twice each hexadecimal digits:
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
hex_number = ''.join(digits[:12])
If you wanted to allow shorter sequences, you could randomize that too, and left fill the blanks with zeros.
import random
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
num_digits = random.randrange(3, 13)
hex_number = ''.join(['0'] * (12-num_digits)) + ''.join(digits[:num_digits])
print(hex_number)
You could use a generator iterating a window over the strings your current implementation yields. Sth. like (hex_str[i:i + 3] for i in range(len(hex_str) - window_size + 1)) Using len and set you could count the number of different characters in the slice. Although in your example it might be easier to just compare all 3 characters.
You can create an array from 0 to 255, and use random.sample with your list to get your list

Apply Regular expression on output file

I have written a python script that dumps all versions in a text file. All versions are separated by '|' symbol.
I need to replace all versions starting with 3 with follwing condition
e.g 1) 3.7.0E should be replaced as 03.07.00E
2) 3.17.1E should be replaced as 03.17.01E
All single digit numbers should be replaced with 0
My output file looks like
3.7.0E|3.7.1E|3.7.2E|3.7.3E|3.7.4E|3.7.5E|16.2.1|16.2.2|3.8.0E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|3.17.1E|3.7.11E
This isn't pretty, but it will do what you want:
import re
s = '3.7.0E|3.7.1E|3.7.2E|3.7.3E|3.7.4E|3.7.5E|16.2.1|16.2.2|3.8.0E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|3.17.1E|3.7.11E'
l = []
# split up based on pipe
for chunk in s.split('|'):
if chunk.startswith('3'):
new_chunk = ''
# split up based on period
for piece in chunk.split('.'):
try:
# if there's a letter, exception will be thrown
x = int(piece)
new_chunk += '0{}.'.format(x) if x < 10 else '{}.'.format(x)
except:
n = int(re.search('\d+', piece).group(0))
letter = re.search('\w', piece).group(0)
new_chunk += '0{}{}'.format(n, letter) if n < 10 else piece
l.append(''.join(new_chunk))
else:
l.append(chunk)
new_s = '|'.join([p for p in l])
print(new_s)
The value of new_s will be: '03.07.00E|03.07.01E|03.07.02E|03.07.03E|03.07.04E|03.07.05E|16.2.1|16.2.2|03.08.00E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|03.17.01E|03.07.11E'.

Python - how to multiply characters in string by number after character

Title, for example I want to make 'A3G3A' into 'AAAGGGA'.
I have this so far:
if any(i.isdigit() for i in string):
for i in range(0, len(string)):
if string[i].isdigit():
(i am lost after this)
Here's a simplistic approach:
string = 'A3G3A'
expanded = ''
for character in string:
if character.isdigit():
expanded += expanded[-1] * (int(character) - 1)
else:
expanded += character
print(expanded)
OUTPUT: AAAGGGA
It assumes valid input. It's limitation is that the repetition factor has to be a single digit, e.g. 2 - 9. If we want repetition factors greater than 9, we have to do slightly more parsing of the string:
from itertools import groupby
groups = groupby('DA10G3ABC', str.isdigit)
expanded = []
for is_numeric, characters in groups:
if is_numeric:
expanded.append(expanded[-1] * (int(''.join(characters)) - 1))
else:
expanded.extend(characters)
print(''.join(expanded))
OUTPUT: DAAAAAAAAAAGGGABC
Assuming that the format is always a letter followed by an integer, with the last integer possibly missing:
>>> from itertools import izip_longest
>>> s = 'A3G3A'
>>> ''.join(c*int(i) for c, i in izip_longest(*[iter(s)]*2, fillvalue=1))
'AAAGGGA'
Assuming that the format can be any substring followed by an integer, with the integer possibly longer than one digit and the last integer possibly missing:
>>> from itertools import izip_longest
>>> import re
>>> s = 'AB10GY3ABC'
>>> sp = re.split('(\d+)', s)
>>> ''.join(c*int(i) for c, i in izip_longest(*[iter(sp)]*2, fillvalue=1))
'ABABABABABABABABABABGYGYGYABC'
A minimal pure python code which manage all cases.
output = ''
n = ''
c = ''
for x in input + 'a':
if x.isdigit():
n += x
else:
if n == '':
n = '1'
output = output + c*int(n)
n = ''
c = x
with input="WA5OUH2!10", output is WAAAAAOUHH!!!!!!!!!!.
+'a' is to enforce the good behaviour at the end, because output is delayed.
Another approach could be -
import re
input_string = 'A3G3A'
alphabets = re.findall('[A-Z]', input_string) # List of all alphabets - ['A', 'G', 'A']
digits = re.findall('[0-9]+', input_string) # List of all numbers - ['3', '3']
final_output = "".join([alphabets[i]*int(digits[i]) for i in range(0, len(alphabets)-1)]) + alphabets[-1]
# This expression repeats each letter by the number next to it ( Except for the last letter ), joins the list of strings into a single string, and appends the last character
# final_output - 'AAAGGGA'
Explanation -
In [31]: alphabets # List of alphabets in the string
Out[31]: ['A', 'G', 'A']
In [32]: digits # List of numbers in the string ( Including numbers more than one digit)
Out[32]: ['3', '3']
In [33]: list_of_strings = [alphabets[i]*int(digits[i]) for i in range(0, len(alphabets)-1)] # List of strings after repetition
In [34]: list_of_strings
Out[34]: ['AAA', 'GGG']
In [35]: joined_string = "".join(list_of_strings) # Joined list of strings
In [36]: joined_string
Out[36]: 'AAAGGG'
In [38]: final_output = joined_string + input_string[-1] # Append last character of the string
In [39]: final_output
Out[39]: 'AAAGGGA'
using the * to repeat the characters:
assumption repeater range between [1,9]
q = 'A3G3A'
try:
int(q[-1]) # check if it ends with digit
except:
q = q+'1' # repeat only once
"".join([list(q)[i]*int(list(q)[i+1]) for i in range(0,len(q),2)])
One line solution. Assuming numbers in the range [0, 9].
>>> s = 'A3G3A'
>>> s = ''.join(s[i] if not s[i].isdigit() else s[i-1]*(int(s[i])-1) for i in range(0, len(s)))
>>> print(s)
AAAGGGA
Embrace regex! This finds all occurrences of the pattern non-digit character followed by non-negative integer (any number of digits) and replaces that substring with that many of the character.
import re
re.sub(r'(\D)(\d+)', lambda m: m.group(1) * int(m.group(2)), 'A3G3A')
This can be solved by numpy:
import numpy as np
x = 'A3G3A'
if not x[-1].isdigit():
x += '1'
letters = list(x[::2])
times = list(map(int,x[1::2]))
lst = ''.join(np.repeat(letters, times))
#output
'AAAGGGA'

Categories