list.insert(position, value) grows list indefinitely - python

If I run the following code:
data = list()
length = 10
for i in range(1000):
point = i % length
data.insert(point, i)
len(data)
The output is: 1000
I was expecting the length to be 10 as I am restricting point to be in range 0-9.
What I am doing wrong?

Insert adds elements in a new position, to overwrite old ones try this instead:
length = 10
data = [None] * length
for i in range(1000):
point = i % length
data[point] = i
len(data)
=> 10
Although it's not clear why you want to loop 1000 times when only the last 10 values are needed... Wouldn't it better to use range(990, 1000)?

Related

removing the middle member or members from list in python

I have this code. Everything is okay but it is not printing the desired values.
I think something is wrong in calling function but I can't figure it out.
this code is removing the middle element if the list length is odd, or the middle two elements if the length is even.
This is the code,
One_Ten = [1,2,3,4,5,6,7,8,9,10]
def removeMiddle(data:list)-> list:
index = 0
size = len(data)
index = size // 2
if (size % 2 == 0 ):
data = data[:index-1] + data[index+1:]
if (size % 2 == 1):
data.pop(index)
return data
data = list(One_Ten)
removeMiddle(data)
print("After removing the middle element (s):why ", data)
so the desired output should look like
[1,2,3,4,7,8,9,10]
You just need to assign data it's new value,
data = removeMiddle(data)
Altertnately you can me the function inplace by editing the first conditions
if (size % 2 == 0):
data.pop(index)
data.pop(index-1)

Q: Expected number of coin tosses to get N heads in a row, in Python. My code gives answers that don't match published correct ones, but unsure why

I'm trying to write Python code to see how many coin tosses, on average, are required to get a sequences of N heads in a row.
The thing that I'm puzzled by is that the answers produced by my code don't match ones that are given online, e.g. here (and many other places) https://math.stackexchange.com/questions/364038/expected-number-of-coin-tosses-to-get-five-consecutive-heads
According to that, the expected number of tosses that I should need to get various numbers of heads in a row are: E(1) = 2, E(2) = 6, E(3) = 14, E(4) = 30, E(5) = 62. But I don't get those answers! For example, I get E(3) = 8, instead of 14. The code below runs to give that answer, but you can change n to test for other target numbers of heads in a row.
What is going wrong? Presumably there is some error in the logic of my code, but I confess that I can't figure out what it is.
You can see, run and make modified copies of my code here: https://trinket.io/python/17154b2cbd
Below is the code itself, outside of that runnable trinket.io page. Any help figuring out what's wrong with it would be greatly appreciated!
Many thanks,
Raj
P.S. The closest related question that I could find was this one: Monte-Carlo Simulation of expected tosses for two consecutive heads in python
However, as far as I can see, the code in that question does not actually test for two consecutive heads, but instead tests for a sequence that starts with a head and then at some later, possibly non-consecutive, time gets another head.
# Click here to run and/or modify this code:
# https://trinket.io/python/17154b2cbd
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 3
possible_tosses = [ 'h', 't' ]
num_trials = 1000
target_seq = ['h' for i in range(0,n)]
toss_sequence = []
seq_lengths_rec = []
for trial_num in range(0,num_trials):
if (trial_num % 100) == 0:
print 'Trial num', trial_num, 'out of', num_trials
# (The free version of trinket.io uses Python2)
target_reached = 0
toss_num = 0
while target_reached == 0:
toss_num += 1
random.shuffle(possible_tosses)
this_toss = possible_tosses[0]
#print([toss_num, this_toss])
toss_sequence.append(this_toss)
last_n_tosses = toss_sequence[-n:]
#print(last_n_tosses)
if last_n_tosses == target_seq:
#print('Reached target at toss', toss_num)
target_reached = 1
seq_lengths_rec.append(toss_num)
print 'Average', sum(seq_lengths_rec) / len(seq_lengths_rec)
You don't re-initialize toss_sequence for each experiment, so you start every experiment with a pre-existing sequence of heads, having a 1 in 2 chance of hitting the target sequence on the first try of each new experiment.
Initializing toss_sequence inside the outer loop will solve your problem:
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 4
possible_tosses = [ 'h', 't' ]
num_trials = 1000
target_seq = ['h' for i in range(0,n)]
seq_lengths_rec = []
for trial_num in range(0,num_trials):
if (trial_num % 100) == 0:
print('Trial num {} out of {}'.format(trial_num, num_trials))
# (The free version of trinket.io uses Python2)
target_reached = 0
toss_num = 0
toss_sequence = []
while target_reached == 0:
toss_num += 1
random.shuffle(possible_tosses)
this_toss = possible_tosses[0]
#print([toss_num, this_toss])
toss_sequence.append(this_toss)
last_n_tosses = toss_sequence[-n:]
#print(last_n_tosses)
if last_n_tosses == target_seq:
#print('Reached target at toss', toss_num)
target_reached = 1
seq_lengths_rec.append(toss_num)
print(sum(seq_lengths_rec) / len(seq_lengths_rec))
You can simplify your code a bit, and make it less error-prone:
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 3
possible_tosses = [ 'h', 't' ]
num_trials = 1000
seq_lengths_rec = []
for trial_num in range(0, num_trials):
if (trial_num % 100) == 0:
print('Trial num {} out of {}'.format(trial_num, num_trials))
# (The free version of trinket.io uses Python2)
heads_counter = 0
toss_counter = 0
while heads_counter < n:
toss_counter += 1
this_toss = random.choice(possible_tosses)
if this_toss == 'h':
heads_counter += 1
else:
heads_counter = 0
seq_lengths_rec.append(toss_counter)
print(sum(seq_lengths_rec) / len(seq_lengths_rec))
We cam eliminate one additional loop by running each experiment long enough (ideally infinite) number of times, e.g., each time toss a coin n=1000 times. Now, it is likely that the sequence of 5 heads will appear in each such trial. If it does appear, we can call the trial as an effective trial, otherwise we can reject the trial.
In the end, we can take an average of number of tosses needed w.r.t. the number of effective trials (by LLN it will approximate the expected number of tosses). Consider the following code:
N = 100000 # total number of trials
n = 1000 # long enough sequence of tosses
k = 5 # k heads in a row
ntosses = []
pat = ''.join(['1']*k)
effective_trials = 0
for i in range(N): # num of trials
seq = ''.join(map(str,random.choices(range(2),k=n))) # toss a coin n times (long enough times)
if pat in seq:
ntosses.append(seq.index(pat) + k)
effective_trials += 1
print(effective_trials, sum(ntosses) / effective_trials)
# 100000 62.19919
Notice that the result may not be correct if n is small, since it tries to approximate infinite number of coin tosses (to find expected number of tosses to obtain 5 heads in a row, n=1000 is okay since actual expected value is 62).

Given a string of a million numbers, return all repeating 3 digit numbers

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)
I pretty much screwed up on the first interview problem...
Question: Given a string of a million numbers (Pi for example), write
a function/program that returns all repeating 3 digit numbers and number of
repetition greater than 1
For example: if the string was: 123412345123456 then the function/program would return:
123 - 3 times
234 - 3 times
345 - 2 times
They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:
000 --> 999
Now that I'm thinking about it, I don't think it's possible to come up with a constant time algorithm. Is it?
You got off lightly, you probably don't want to be working for a hedge fund where the quants don't understand basic algorithms :-)
There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.
Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that's not usually how people use complexity analysis.
It appears to me you could have impressed them in a number of ways.
First, by informing them that it's not possible to do it in O(1), unless you use the "suspect" reasoning given above.
Second, by showing your elite skills by providing Pythonic code such as:
inpStr = '123412345123456'
# O(1) array creation.
freq = [0] * 1000
# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
freq[val] += 1
# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])
This outputs:
[(123, 3), (234, 3), (345, 2)]
though you could, of course, modify the output format to anything you desire.
And, finally, by telling them there's almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.
And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.
Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):
vv
123412 vv
123451
5123456
You can farm these out to separate workers and combine the results afterwards.
The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of "measure, don't guess" applies here, of course.
This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.
For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:
#include <stdio.h>
#include <string.h>
int main(void) {
static char inpStr[100000000+1];
static int freq[1000];
// Set up test data.
memset(inpStr, '1', sizeof(inpStr));
inpStr[sizeof(inpStr)-1] = '\0';
// Need at least three digits to do anything useful.
if (strlen(inpStr) <= 2) return 0;
// Get initial feed from first two digits, process others.
int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
char *inpPtr = &(inpStr[2]);
while (*inpPtr != '\0') {
// Remove hundreds, add next digit as units, adjust table.
val = (val % 100) * 10 + *inpPtr++ - '0';
freq[val]++;
}
// Output (relevant part of) table.
for (int i = 0; i < 1000; ++i)
if (freq[i] > 1)
printf("%3d -> %d\n", i, freq[i]);
return 0;
}
Constant time isn't possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.
For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.
A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:
from collections import Counter
def triple_counter(s):
c = Counter(s[n-3: n] for n in range(3, len(s)))
for tri, n in c.most_common():
if n > 1:
print('%s - %i times.' % (tri, n))
else:
break
if __name__ == '__main__':
import random
s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
triple_counter(s)
Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.
Parallel execution version
I wrote a blog post on this with more explanation.
The simple O(n) solution would be to count each 3-digit number:
for nr in range(1000):
cnt = text.count('%03d' % nr)
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
This would search through all 1 million digits 1000 times.
Traversing the digits only once:
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
for nr, cnt in enumerate(counts):
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
Timing shows that iterating only once over the index is twice as fast as using count.
Here is a NumPy implementation of the "consensus" O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say "385", adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.
def setup_data(n):
import random
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_np(text):
# Get the data into NumPy
import numpy as np
a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
bins = np.zeros((10, 10, 10), dtype=int)
# Next line performs O(n) binning
np.add.at(bins, tuple(a3), 1)
# Filtering is left as an exercise
return bins.ravel()
def f_py(text):
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
return counts
import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
data = setup_data(n)
ref = f_np(**data)
print(f'n = {n}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
assert np.all(ref == func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
except:
print("{:16s} apparently crashed".format(name[2:]))
Unsurprisingly, NumPy is a bit faster than #Daniel's pure Python solution on large data sets. Sample output:
# n = 10
# np 0.03481400 ms
# py 0.00669330 ms
# n = 1000
# np 0.11215360 ms
# py 0.34836530 ms
# n = 1000000
# np 82.46765980 ms
# py 360.51235450 ms
I would solve the problem as follows:
def find_numbers(str_num):
final_dict = {}
buffer = {}
for idx in range(len(str_num) - 3):
num = int(str_num[idx:idx + 3])
if num not in buffer:
buffer[num] = 0
buffer[num] += 1
if buffer[num] > 1:
final_dict[num] = buffer[num]
return final_dict
Applied to your example string, this yields:
>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}
This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.
As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn't exists already in the dictionary.
The code will look something like this:
def calc_repeating_digits(number):
hash = {}
for i in range(len(str(number))-2):
current_three_digits = number[i:i+3]
if current_three_digits in hash.keys():
hash[current_three_digits] += 1
else:
hash[current_three_digits] = 1
return hash
You can filter down to the keys which have item value greater than 1.
As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.
However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.
My guess is that either the interviewer misspoke when they gave you the solution, or you misheard "constant time" when they said "constant space."
Here's my answer:
from timeit import timeit
from collections import Counter
import types
import random
def setup_data(n):
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_counter(text):
c = Counter()
for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])
return (i for i in c.items() if i[1] > 1)
def f_dict(text):
d = {}
for i in range(len(text)-2):
ss = text[i:i+3]
if ss not in d:
d[ss] = 0
d[ss] += 1
return ((i, d[i]) for i in d if d[i] > 1)
def f_array(text):
a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
for n in range(len(text)-2):
i, j, k = (int(ss) for ss in text[n:n+3])
a[i][j][k] += 1
for i, b in enumerate(a):
for j, c in enumerate(b):
for k, d in enumerate(c):
if d > 1: yield (f'{i}{j}{k}', d)
for n in (1E1, 1E3, 1E6):
n = int(n)
data = setup_data(n)
print(f'n = {n}')
results = {}
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
for r in results:
print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
The array lookup method is very fast (even faster than #paul-panzer's numpy method!). Of course, it cheats since it isn't technicailly finished after it completes, because it's returning a generator. It also doesn't have to check every iteration if the value already exists, which is likely to help a lot.
n = 10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter : []
f_dict : []
f_array : []
n = 1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
Image as answer:
Looks like a sliding window.
Here is my solution:
from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}
With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point.
Hope it helps :)
-Telling from the perspective of C.
-You can have an int 3-d array results[10][10][10];
-Go from 0th location to n-4th location, where n being the size of the string array.
-On each location, check the current, next and next's next.
-Increment the cntr as resutls[current][next][next's next]++;
-Print the values of
results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]
-It is O(n) time, there is no comparisons involved.
-You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count = {}
for i in range(len(inputStr) - 2):
subNum = int(inputStr[i:i+3])
if subNum not in count:
count[subNum] = 1
else:
count[subNum] += 1
print count

Looking for unique numbers in a list of sets

I am running my own little experiment and need a little help with the code.
I am creating a list that stores 100 sets in index locations 0-99, with each stored set storing random numbers ranging from 1 to 100 that came from a randomly generated list containing 100 numbers.
For each set of numbers, I use the set() command to filter out any duplicates before appending this set to a list...so basically I have a list of 100 sets which contain numbers between 1-100.
I wrote a little bit of code to check the length of each set - I noticed that my sets were often 60-69 elements in length! Basically, 1/3 of all numbers is a duplicate.
The code:
from random import randint
sets = []
#Generate list containing 100 sets of sets.
#sets contain numbers between 1 and 100 and no duplicates.
for i in range(0, 100):
nums = []
for x in range(1, 101):
nums.append(randint(1, 100))
sets.append(set(nums))
#print sizes of each set
for i in range(0, len(sets)):
print(len(sets[i]))
#I now want to create a final set
#using the data stored within all sets to
#see if there is any unique value.
So here is the bit I can't get my head around...I want to see if there is a unique number in all of those sets! What I can't work out is how I go about doing that.
I know I can directly compare a set with another set if they are stored in their own variables...but I can't work out an efficient way of looping through a list of sets and compare them all to create a new set which, I hope, might contain just one unique value!
I have seen this code in the documentation...
s.symmetric_difference_update(t)
But I can't work out how I might apply that to my code.
Any help would be greatly appreciated!!
You could use a Counter dict to count the occurrences keeping values that only have a value of 1 across all sets:
from collections import Counter
sets = [{randint(1, 100) for _ in range(100)} for i in range(100)]
from itertools import chain
cn = Counter(chain.from_iterable(sets))
unique = [k for k, v in cn.items() if v == 1] # use {} to get a set
print(unique)
For an element to only be unique to any set the count of the element must be 1 across all sets in your list.
If we use a simple example where we add a value definitely outside our range:
In [27]: from random import randint
In [28]: from collections import Counter
In [29]: from itertools import chain
In [30]: sets = [{randint(1, 100) for _ in range(100)} for i in range(0, 100)]+ [{1, 2, 102},{3,4,103}]
In [31]: cn = Counter(chain.from_iterable(sets))
In [32]: unique = [k for k, v in cn.items() if v == 1]
In [33]: print(unique)
[103, 102]
If you want to find the sets that contain any of those elements:
In [34]: for st in sets:
....: if not st.isdisjoint(unique):
....: print(st)
....:
set([1, 2, 102])
set([3, 4, 103])
For your edited part of the question you can still use a Counter dict using Counter.most_common to get the min and max occurrence:
from collections import Counter
cn = Counter()
identified_sets = 0
sets = ({randint(1, MAX) for _ in range(MAX)} for i in range(MAX))
for i, st in enumerate(sets):
cn.update(st)
if len(st) < 60 or len(st) > 70:
print("Set {} Set Length: {}, Duplicates discarded: {:.0f}% *****".
format(i, len(st), (float((MAX - len(st)))/MAX)*100))
identified_sets += 1
else:
print("Set {} Set Length: {}, Duplicates discarded: {:.0f}%".
format(i, len(st), (float((MAX - len(st)))/MAX)*100))
#print lowest fequency
comm = cn.most_common()
print("key {} : count {}".format(comm[-1][0],comm[-1][1]))
#print highest frequency
print("key {} : count {}".format(comm[0][0], comm[0][1]))
print("Count of identified sets: {}, {:.0f}%".
format(identified_sets, (float(identified_sets)/MAX)*100))
If you call random.seed(0) before you create the sets in this and your own code you will see they both return identical numbers.
well you can do:
result = set()
for s in sets:
result.symmetric_difference_update(s)
After looking through the comments I decided to do things a little differently to accomplish my goal. Essentially, I realised I just wanted to check the frequency of numbers generated by a random number generator after all duplicates have been removed. I thought I could do this by using sets to remove duplicates and then using a set to remove duplicates found in sets...but this actually doesn't work!!
I also noticed that with 100 sets containing a maximum 100 possible numbers, on average the number of duplicated numbers was around 30-40%. As you increase the maximum number of sets and, thus the maximum number of numbers generated, the % of duplicated numbers discarded decreases by a clear pattern.
After further investigation you can work out the % of discarded numbers - its all down to probability of hitting the same number once a number has been generated...
Anyway...thanks for the help!
The code updated:
from random import randint
sets = []
identified_sets = 0
MAX = 100
for i in range(0, MAX):
nums = []
for x in range(1, MAX + 1):
nums.append(randint(1, MAX))
nums.sort()
print("Set %i" % i)
print(nums)
print()
sets.append(set(nums))
for i in range(0, len(sets)):
#Only relevant when using MAX == 100
if len(sets[i]) < 60 or len(sets[i]) > 70:
print("Set %i Set Length: %i, Duplicates discarded: %.0f%s *****" %
(i, len(sets[i]), (float((MAX - len(sets[i])))/MAX)*100, "%"))
identified_sets += 1
else:
print("Set %i Set Length: %i, Duplicates discarded: %.0f%s" %
(i, len(sets[i]), (float((MAX - len(sets[i])))/MAX)*100, "%"))
#dictionary of numbers
count = {}
for i in range(1, MAX + 1):
count[i] = 0
#count occurances of numbers
for s in sets:
for e in s:
count[int(e)] += 1
#print lowest fequency
print("key %i : count %i" %
(min(count, key=count.get), count[min(count, key=count.get)]))
#print highest frequency
print("key %i : count %i" %
(max(count, key=count.get), count[max(count, key=count.get)]))
#print identified sets <60 and >70 in length as these appear less often
print("Count of identified sets: %i, %.0f%s" %
(identified_sets, (float(identified_sets)/MAX)*100, "%"))
You can keep the reversed matrix as well, which is a mapping from numbers to the set of set indexes where this number has places in. This mapping should be a dict (from numbers to sets) in gerenal, but a simple list of sets can do the trick here.
(We could use Counter too, instead of keeping the whole reversed matrix)
from random import randint
sets = [set() for _ in range(100)]
byNum = [set() for _ in range(100)]
#Generate list containing 100 sets of sets.
#sets contain numbers between 1 and 100 and no duplicates.
for setIndex in range(0, 100):
for numIndex in range(100):
num = randint(1, 100)
byNum[num].add(setIndex)
sets[setIndex].add(num)
#print sizes of each set
for setIndex, _set in enumerate(sets):
print(setIndex, len(_set))
#I now want to create a final set
#using the data stored within all sets to
#see if there is any unique value.
for num, setIndexes in enumerate(byNum)[1:]:
if len(setIndexes) == 100:
print 'number %s has appeared in all the random sets'%num

How to equally partitioning an array into predefined size and loop folding in Python

I'm using python and try to do 10 folds looping. To explain this problem, I've an array of any size > 10 of any content, for example:
myArray = [12,14,15,22,16,20,30,25,21,5,3,8,11,19,40,33,23,45,65]
smallArray = []
bigArray = []
I want to do two things:
divide "myArray" into 10 equal parts [e.g. part1, part2, ..., part10]
I need to loop 10 times and each time to do the following:
smallArray = one distinct part a time
the remaining parts are assigned into "bigArray"
and keep doing this for the remaining 10 folds.
the output for example:
Loop1: smallArray = [part1], bigArray[the remaining parts except part1]
Loop2: smallArray = [part2], bigArray[the remaining parts except part2]
...
Loop10: smallArray = [part10], bigArray[the remaining parts except part10]
How to do so in Python?
l = len(myArray)
#create start and end indices for each slice
slices = ((i * l // 10, (i + 1) * l // 10) for i in xrange(0, 10))
#build (small, big) pairs
pairs = [(myArray[a:b], myArray[:a] + myArray[b:]) for a, b in slices]
for small, big in pairs:
pass

Categories