Python find similar combinations of elements in a list - python

So I have a list that looks a bit like this:
my_list = [0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,0,1,1,0,1 ... 0,1,0]
it contains thousands of 0's and 1's basically. Im looking for a way to find similar (repeating) combinations of elements in it (10 next elements to be specific). So (for example) if there is a :
... 0,1,1,1,0,0,1,1,0,1 ...
combination and it appears more than once I would like to know where it is in my list (index) and how many times it repeats.
I need to check all possible combinations here, that is 1024 possibilities...

Here is a solution using regex:
import random
from itertools import product
import re
testlist = [str(random.randint(0,1)) for i in range(1000)]
testlist_str = "".join(testlist)
for i in ["".join(seq) for seq in product("01", repeat=10)]:
print(f'pattern {i} has {len(re.findall(i, testlist_str))} matches')
outputs:
pattern 0000000000 has 0 matches
pattern 0000000001 has 0 matches
pattern 0000000010 has 1 matches
pattern 0000000011 has 2 matches
pattern 0000000100 has 2 matches
pattern 0000000101 has 2 matches
....

It looks like a homework problem, so I don't want to give the solution at once, just hints.
Don't look at it literally. It's 0s and 1s, so you can look at them like at binary numbers.
Some hints:
1024 "patterns" become just numbers from 0 to 1023.
Checking for a pattern is making a number from those 10 digits.
Think how you would do that then.
More hints, more technical:
If you have one number pattern, e.g. from 0th to 9th element, you can get 1st to 10th pattern by taking 9-digit (from 1st index to 9th index) value (aka %512), "move" them left (*2) and add the 10th digit.
Make a dictionary or list of lists where key/index is the pattern number (0 to 1023) and list contains indexes of the start pattern.
I'll edit this answer later to provide an example solution but I gotta take a short break first.
Edit:
Customisable base and length with defaults for your case.
def find_patterns(my_list, base=2, pattern_size=10):
modulo_value = base ** (pattern_size-1)
results = [[] for _ in range(base ** pattern_size)]
current_value = 0
for index, elem in enumerate(a):
if index < pattern_size:
current_value = base*current_value + elem
elif index == pattern_size:
results[current_value].append(0)
if index >= pattern_size:
current_value = base*(current_value % modulo_value) + elem
results[current_value].append(index+1-pattern_size) #index of the first element in the pattern
return results

IIUC, you could do:
my_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]
w = 10
occurrences = {}
for i in range(len(my_list) - w + 1):
key = tuple(my_list[i:i+w])
occurrences.setdefault(key, []).append(i)
for pattern, indices in occurrences.items():
print(pattern, indices)
Output
(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) [0]
(1, 1, 1, 0, 0, 1, 0, 1, 0, 1) [1]
(1, 1, 0, 0, 1, 0, 1, 0, 1, 1) [2]
(1, 0, 0, 1, 0, 1, 0, 1, 1, 0) [3]
(0, 0, 1, 0, 1, 0, 1, 1, 0, 0) [4]
(0, 1, 0, 1, 0, 1, 1, 0, 0, 0) [5]
(1, 0, 1, 0, 1, 1, 0, 0, 0, 1) [6]
(0, 1, 0, 1, 1, 0, 0, 0, 1, 0) [7]
(1, 0, 1, 1, 0, 0, 0, 1, 0, 1) [8]
(0, 1, 1, 0, 0, 0, 1, 0, 1, 1) [9]
(1, 1, 0, 0, 0, 1, 0, 1, 1, 0) [10]
(1, 0, 0, 0, 1, 0, 1, 1, 0, 1) [11]
(0, 0, 0, 1, 0, 1, 1, 0, 1, 0) [12]
(0, 0, 1, 0, 1, 1, 0, 1, 0, 1) [13]
(0, 1, 0, 1, 1, 0, 1, 0, 1, 0) [14]

Treat the elements as bits that can be converted to integers. The solution below converts the input list to integers, find number of occurrence of each integer and what index they can be found on.
import collections
x = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1]
as_int = []
# given the input above there is no pattern longer than 6 that occure more than once...
pattern_length = 6
# convert input to a list of integers
# can this be done in a nicer way, like skipping the string-conversion?
for s in range(len(x) - pattern_length+1) :
bitstring = ''.join([str(b) for b in x[s:s+pattern_length]])
as_int.append(int(bitstring,2))
# create a dict with integer as key and occurence as value
count_dict = collections.Counter(as_int)
# empty dict to store index for each integer
index_dict = {}
# find index for each integer that occur more than once
for key in dict(count_dict):
if count_dict[key] > 1:
indexes = [i for i, x in enumerate(as_int) if x == key]
index_dict[key] = indexes
#print as binary together with its index
for key, value in index_dict.items():
print('{0:06b}'.format(key), 'appears', count_dict[key], 'times, on index:', value)
Output:
101011 appears 2 times, on index: [6, 18]
010110 appears 2 times, on index: [7, 14]

Related

How to replace every third 0 in the numpy array consists of 0 and 1?

I'm new to Python and Stackoverflow, so I'm sorry in advance if this question is silly and/or duplicated.
I'm trying to write a code that replaces every nth 0 in the numpy array that consists of 0 and 1.
For example, if I want to replace every third 0 with 0.5, the expected result is:
Input: np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1])
Output: np.array([0, 0, 0.5, 0, 1, 1, 1, 1, 1, 0, 0.5, 1, 0, 1])
And I wrote the following code.
import numpy as np
arr = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1])
counter = 0
for i in range(len(arr)):
if arr[i] == 0 and counter%3 == 0:
arr[i] = 0.5
counter += 1
print(arr)
The expected output is [0, 0, 0.5, 0, 1, 1, 1, 1, 1, 0, 0.5, 1, 0, 1].
However, the output is exactly the same as input and it's not replacing any values...
Does anyone know why this does not replace value and how I can solve this?
Thank you.
Reasonably quick and dirty:
Find the indices of entries that are zero
indices = np.flatnonzero(arr == 0)
Take every third of those indices
indices = indices[::3]
As noted in a comment, you need a float type
arr = arr.astype(float)
Set those indices to 0.5
arr[indices] = 0.5

How to add and remove random bit padding in Python

UPDATE: I have a list of integers that represent bits values:
bits = [1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1]
len(bits) == 30
My question is how to add random bit padding so that the length of the bits is 32 and how to remove the padding?
Same in the case when I have the length of bits, let say 20, how to add 4 bit padding so that it become 24 and how to remove back the 4 bit padding?
Here is one approach extracted into a function:
import random
def add_padding(seq, num_bits):
pad_size = num_bits - len(seq)
return [random.choice([0, 1]) for _ in range(pad_size)] + seq
bit = [1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1]
print(len(bit))
padded = add_padding(bit, 32)
print(len(padded)) # <-- now a 32 bits sequence
To remove the padding, you will need to somehow remember the number of bits added to the sequence, and remove them; maybe with slicing:
unpadded = padded[num_bits_added:] # <-- restores the original sequence of bits
[edit]: to adjust to the closest containing number of bytes:
import random
def adjust_a_byte(seq):
if len(seq) % 8 == 0 and len(seq) > 0: # an empty sequence will return an 8 bit sequence (all padding)
pad_size = 0
else:
pad_size = 8 * (len(seq) // 8 + 1) - len(seq)
print('len(seq):', len(seq), 'len(seq) % 8:', len(seq) % 8, 'pad', pad_size)
return random.choices([0, 1], k=pad_size) + seq
bit = [1, 0, 1, 1, 1, 1, 1, 1]
print(len(bit))
padded = adjust_a_byte(bit)
print(len(padded)) # <-- now a multiple of 8 bits sequence

Count since last occurence in NumPy

Seemingly straightforward problem: I want to create an array that gives the count since the last occurence of a given condition. In this condition, let the condition be that a > 0:
in: [0, 0, 5, 0, 0, 2, 1, 0, 0]
out: [0, 0, 0, 1, 2, 0, 0, 1, 2]
I assume step one would be something like np.cumsum(a > 0), but not sure where to go from there.
Edit: Should clarify that I want to do this without iteration.
Numpy one-liner:
x = numpy.array([0, 0, 5, 0, 0, 2, 1, 0, 0])
result = numpy.arange(len(x)) - numpy.maximum.accumulate(numpy.arange(len(x)) * (x > 0))
Gives
[0, 1, 0, 1, 2, 0, 0, 1, 2]
If you want to have zeros in the beginning, turn it to zero explicitly:
result[:numpy.nonzero(x)[0][0]] = 0
Split the array based on the condition and use the lengths of the remaining pieces and the condition state of the first and last element in the array.
A pure python solution:
result = []
delta = 0
for val in [0, 0, 5, 0, 0, 2, 1, 0, 0]:
delta += 1
if val > 0:
delta = 0
result.append(delta)

How to check for adjacency in list, then fix adjacency in python

I have this list:
row = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I need to then shuffle or randomize the list:
shuffle(row)
And then I need to go through and find any adjacent 1's and move them so that they are separated by at least one 0. For example I need the result to look like this:
row = [0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0]
I am not sure of what the most efficient way to go about searching for adjacent 1's and then moving them so that they aren't adjacent is... I will also being doing this repeatedly to come up with multiple combinations of this row.
Originally when the list was shorter I did it this way:
row = [1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
rowlist = set(list(permutations(row)))
rowschemes = [(0, 0) + x for x in rowlist if '1, 1' not in str(x)]
But now that my row is 20 elements long this takes forever to come up with all the possible permutations.
Is there an efficient way to go about this?
I had a moderately clever partition-based approach in mind, but since you said there are always 20 numbers and 6 1s, and 6 is a pretty small number, you can construct all the possible locations (38760) and toss the ones which are invalid. Then you can uniformly draw from those, and build the resulting row:
import random
from itertools import combinations
def is_valid(locs):
return all(y-x >= 2 for x,y in zip(locs, locs[1:]))
def fill_from(size, locs):
locs = set(locs)
return [int(i in locs) for i in range(size)]
and then
>>> size = 20
>>> num_on = 6
>>> on_locs = list(filter(is_valid, combinations(range(size), num_on)))
>>> len(on_locs)
5005
>>> fill_from(size, random.choice(on_locs))
[0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1]
>>> fill_from(size, random.choice(on_locs))
[0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1]
>>> fill_from(size, random.choice(on_locs))
[1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1]
Why not go directly for what you want? Something like:
row = ["0","0","0","0","0","0","0","0","0","01","01","01","01","01","01"]
random.shuffle(row)
print (map(int, list("".join(row)[1:])))
Since the number of 1's is fixed in a row and you don't want any 1's to be adjacent, let m be the number of 1's and let k be the number of 0's of the row. Then you want to place the m 1's in (k+1) locations randomly so that there is at most one 1 in each location. This amounts to choosing a random subset of size ((k+1) choose m) from the set (1,2,...,k+1). This is easy to do. Given the random choice of subset, you can construct your random arrangement of 0's and 1's so that no two 1's are adjacent. The random choice algorithm takes O(m) time.
Place the 6 1's and 5 of the 0's in a list giving
row = [1,0,1,0,1,0,1,0,1,0,1]
Then insert the remaining 0's one by one at random positions in the (growing) list.
for i in range(11,19):
row.insert(random.randint(0,i), 0)

how to use the steady state probability to select a state in each iteration of the python code?

I have an ergodic markov chain whit three states. I calculated the steady state probability.
the state present the input of my problem .
I want to solve my problem for n iteration which in each one we select the input based on the calculated steady state probability.
In the words, this is same a having three options with specific probability. and we want to select one of them randomly in each iteration.
Do you have any suggestion ??
Best,
Aissan
Let's assume you have a vector of probabilities (instead of just 3) and also that your initial state is the first one.
import random
def markov(probs, iter):
# normalize the probabilities
total = sum(probs)
probs = map(lambda e: float(e)/total, probs)
# determine the number of states
n = len(probs)
# Set the initial state
s = 0
for i in xrange(iter):
thresh = random.random()
buildup = 0
# When the sum of our probability vector is greater than `thresh`
# we've found the next state
for j in xrange(n):
buildup += probs[j]
if buildup >= thresh:
break
# Set the new state
s = j
return s
And thus
>>> markov([1,1,1], 100)
2
>>> markov([1,1,1], 100)
1
But this only returns the last state. It's easy to fix this with a neat trick, though. Let's turn this into a generator. We literally just need one more line, yield s.
def markov(probs, iter):
# ...
for i in xrange(iter):
# Yield the current state
yield s
# ...
for j in xrange(n):
# ...
Now when we call markov we don't get an immediate response.
>>> g = markov([1,1,1], 100)
>>> g
<generator object markov at 0x10fce3280>
Instead we get a generator object which is kind of like a "frozen" loop. You can step it once with next
>>> g.next()
1
>>> g.next()
1
>>> g.next()
2
Or even unwind the whole thing with list
>>> list(markov([1,1,1], 100))
[0, 0, 1, 1, 0, 0, 0, 2, 1, 1, 2, 0, 1, 0, 0, 1, 2, 2, 2, 1, 2, 0, 1, 2, 0, 1, 2, 2, 2, 2, 1, 0, 0, 0, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 0, 0, 2, 2, 1, 0, 0, 0, 2, 0, 2, 2, 1, 0, 1, 1, 1, 2, 2, 2, 2, 0, 2, 1, 0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 0, 1, 2, 1, 0, 0, 1, 0, 2, 1, 1, 0, 1, 1, 2, 2, 2, 1, 1, 0, 0, 0]

Categories