Random series whre non are 75% alike - python

So, I've been trying to make a random series generator with the given numbers using an array:
so the possibilities are: [0-9, 0-9, 0-9, 0 - 59, 0-9, 0-9, 0-9].
The only problem is that I want that all the series aren't even 75% the same (no more than 2 numbers the same).
So here are some examples:
Good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 2]
Not good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 1]
So, if there are fewer than 2 numbers the same it deletes the second one.
And the second problem is that I want 10,000 of these series.
Sorry if I didn't explain it well, the code would probably explain what I tried to explain.
TRIGGER WARNING!! CODE ISN'T EFFICIENT AT ALL!!
TOTAL_SERIES = 10000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
def create_series():
global fail, success
series = []
for i in range(len(placement_amount)):
series.append(random.randint(0, placement_amount[i]))
for i in all_series:
count = 0
for j in range(len(i)):
if series[j] == i[j]:
count += 1
if count > 2:
return;
all_series.append(series)
while len(all_series) < TOTAL_SERIES:
create_series()
The code technically works but it takes around 1 hour to generate 400 of these since the longer it runs the harder it takes to find a series that follows the rules.
So, my question is how do I make it more efficient and so it will make 10,000 series the fastest a code can.
What I've tried so far:
Tried adding cuda so I'll be able to run the code on a gpu making it faster (have python 32-bit so can't)
Tried creating a few threads where each generates 10,000/threads amount and then run a code that deletes all the ones who don't follow the rules (the code just got stuck).
I'm open to hear how I can try these again but with a correct code or anything that will make it efficient.

The answer for me isn't code efficiency but just that it's impossible make 10,000 series since the first 3 numbers can't be identical, so I changed the lines:
if counter > 2:
to
if counter > 3
Thanks everyone for the help, but if you got a way to make it more efficient it would be nice :D

Your original solution is in O( P(N)*N), you can reduce it to O(P(N)) with dicts and computing the differrent index combinations:
-P(N) is the expected number of iterations to get N such series
- the constants are larger!
import itertools
import random
indexes=list(itertools.combinations(range(7),3))
big_dict={ k : {} for k in indexes }
TOTAL_SERIES = 1000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
loops=0
while len(all_series) < TOTAL_SERIES:
loops+=1
candidate = tuple(random.randint(0, amount) for amount in placement_amount)
if any( (candidate[index[0]],candidate[index[1]],candidate[index[2]]) in \
big_dict[index] for index in indexes ):
continue
else:
for index in indexes:
big_dict[index[(candidate[index[0]],candidate[index[1]],candidate[index[2]])]=True
all_series.append(candidate)

This must be the solution:
import random
def gen_series(pattern):
return [random.randint(0, max_val) for max_val in pattern]
pattern = [9, 9, 9, 59, 9, 9, 9]
for i in range(100):
print(gen_series(pattern))

Related

How to improve time complexity of remove all multiplicands from array or list?

I am trying to find elements from array(integer array) or list which are unique and those elements must not divisible by any other element from same array or list.
You can answer in any language like python, java, c, c++ etc.
I have tried this code in Python3 and it works perfectly but I am looking for better and optimum solution in terms of time complexity.
assuming array or list A is already sorted and having unique elements
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
while i<len(A)-1:
while j<len(A):
if A[j]%A[i]==0:
A.pop(j)
else:
j+=1
i+=1
j=i+1
For the given array A=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] answer would be like ans=[2,3,5,7,11,13]
another example,A=[4,5,15,16,17,23,39] then ans would be like, ans=[4,5,17,23,39]
ans is having unique numbers
any element i from array only exists if (i%j)!=0, where i!=j
I think it's more natural to do it in reverse, by building a new list containing the answer instead of removing elements from the original list. If I'm thinking correctly, both approaches do the same number of mod operations, but you avoid the issue of removing an element from a list.
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
ans = []
for x in A:
for y in ans:
if x % y == 0:
break
else: ans.append(x)
Edit: Promoting the completion else.
This algorithm will perform much faster:
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
if (A[-1]-A[0])/A[0] > len(A)*2:
result = list()
for v in A:
for f in result:
d,m = divmod(v,f)
if m == 0: v=0;break
if d<f: break
if v: result.append(v)
else:
retain = set(A)
minMult = 1
maxVal = A[-1]
for v in A:
if v not in retain : continue
minMult = v*2
if minMult > maxVal: break
if v*len(A)<maxVal:
retain.difference_update([m for m in retain if m >= minMult and m%v==0])
else:
retain.difference_update(range(minMult,maxVal,v))
if maxVal%v == 0:
maxVal = max(retain)
result = list(retain)
print(result) # [2, 3, 5, 7, 11, 13]
In the spirit of the sieve of Eratostenes, each number that is retained, removes its multiples from the remaining eligible numbers. Depending on the magnitude of the highest value, it is sometimes more efficient to exclude multiples than check for divisibility. The divisibility check takes several times longer for an equivalent number of factors to check.
At some point, when the data is widely spread out, assembling the result instead of removing multiples becomes faster (this last addition was inspired by Imperishable Night's post).
TEST RESULTS
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (100000 repetitions)
Original: 0.55 sec
New: 0.29 sec
A = list(range(2,5000))+[9697] (100 repetitions)
Original: 3.77 sec
New: 0.12 sec
A = list(range(1001,2000))+list(range(4000,6000))+[9697**2] (10 repetitions)
Original: 3.54 sec
New: 0.02 sec
I know that this is totally insane but i want to know what you think about this:
A = [4,5,15,16,17,23,39]
prova=[[x for x in A if x!=y and y%x==0] for y in A]
print([A[idx] for idx,x in enumerate(prova) if len(prova[idx])==0])
And i think it's still O(n^2)
If you care about speed more than algorithmic efficiency, numpy would be the package to use here in python:
import numpy as np
# Note: doesn't have to be sorted
a = [2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 29, 29]
a = np.unique(a)
result = a[np.all((a % a[:, None] + np.diag(a)), axis=0)]
# array([2, 3, 5, 7, 11, 13, 29])
This divides all elements by all other elements and stores the remainder in a matrix, checks which columns contain only non-0 values (other than the diagonal), and selects all elements corresponding to those columns.
This is O(n*M) where M is the max size of an integer in your list. The integers are all assumed to be none negative. This also assumes your input list is sorted (came to that assumption since all lists you provided are sorted).
a = [4, 7, 7, 8]
# a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
# a = [4, 5, 15, 16, 17, 23, 39]
M = max(a)
used = set()
final_list = []
for e in a:
if e in used:
continue
else:
used.add(e)
for i in range(e, M + 1):
if not (i % e):
used.add(i)
final_list.append(e)
print(final_list)
Maybe this can be optimized even further...
If the list is not sorted then for the above method to work, one must sort it. The time complexity will then be O(nlogn + Mn) which equals to O(nlogn) when n >> M.

Python methods to reduce running times KTNS algorithm

I have been trying to develop an algorithm called keep the tool needed soonest but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times.
Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= {1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]}
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools = {}
Lin={}
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set([])
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
One recommendation - try to minimize your use of dictionaries. It looks like many of your dictionaries could instead be lists. Dictionary access is much slower than list access in python.
It looks like you could simply make Tools, Lin and Tin all be lists, e.g. Lin = [] instead of Lin = {}, and I expect you'll see a drastic improvement in performance.
You know the sizes of your 3 dictionaries, so just initialize them to the size you need. Create Lin and Tools as follows:
Lin = [None] * m+1
Tools = [None] * m+1
Tin = [None] * m+1
This will make a list of m+1 elements (which is what you'll get with your loop from 1 through m+1). Since you're doing 1-based indexing, it leaves an empty place in Lin[0], Tools[0], etc, but you'll then be able to access Lin[1] - Lin[10], as you're currently doing.
Simple example you can try for yourself:
python3 -m timeit -s 'foo = [x for x in range(10000)]' 'foo[500]'
100000000 loops, best of 3: 0.0164 usec per loop
python3 -m timeit -s 'foo = {x: x for x in range(10000)}' 'foo[500]'
10000000 loops, best of 3: 0.0254 usec per loop
Simply by changing your dictionaries to list, you get almost a 2x improvement. Your 65 second task would now take about 35 seconds.
By the way, check out the python wiki for tips on improving speed, including lots of references on how to profile your function.

How to efficiently create a frequency table of numbers of entries in an array Python

I'm trying to implement an efficient way of creating a frequency table in python, with a rather large numpy input array of ~30 million entries. Currently I am using a for-loop, but it's taking far too long.
The input is an ordered numpy array of the form
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9..... etc])
And I would like to have an output of the form:
Z = {4:3, 5:0, 6:2, 7:1,8:1,9:3..... etc} (as any data type)
Currently I am using the following implementation:
Z = pd.Series(index = np.arange(Y.min(), Y.max()))
for i in range(Y.min(), Y.max()):
Z[i] = (Y == i).sum()
Is there a quicker way of doing this or a way without iterating through a loop? Thanks for helping, and sorry if this has been asked before!
You can simply do this using Counter from collections module. Please see the below code i ran for your test case.
import numpy as np
from collections import Counter
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9,10,5,5,5])
print(Counter(Y))
It gave the following output
Counter({4: 3, 9: 3, 5: 3, 6: 2, 7: 1, 8: 1, 10: 1})
you can easily use this object for further. I hope this helps.
If your input array x is sorted, you can do the following to get the counts in linear time:
diff1 = np.diff(x)
# get indices of the elements at which jumps occurred
jumps = np.concatenate([[0], np.where(diff1 > 0)[0] + 1, [len(x)]])
unique_elements = x[jumps[:-1]]
counts = np.diff(jumps)
I think numpy.unique is your solution.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html
import numpy as np
t = np.random.randint(0, 1000, 100000000)
print(np.unique(t, return_counts=True))
This takes ~4 seconds for me.
The collections.Counter approach takes ~10 seconds.
But the numpy.unique returns the frequencies in an array and the collections.Counter returns a dictionary. It's up to convenience.
Edit. I cannot comment on other posts so I'll write here that #lomereiters solution is lightning fast (linear) and should be the accepted one.

Compare nums need optimisation (codingame.com)

www.codingame.com
Task
Write a program which, using a given number of strengths,
identifies the two closest strengths and shows their difference with an integer
Info
n = Number of horses
pi = strength of each horse
d = difference
1 < n < 100000
0 < pi ≤ 10000000
My code currently
def get_dif(a, b):
return abs(a - b)
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
n = len(horse_str)
d = 10000001
for x in range(len(horse_str)):
for y in range(x, len(horse_str) - 1):
d = min([get_dif(horse_str[x], horse_str[y + 1]), d])
print(d)
Test cases
[3,5,8, 9] outputs: 1
[10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7] outputs: 1
Problem
They both work but then the next test gives me a very long list of horse strengths and i get **Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
How can i optimise it? Thank you!
EDIT ONE
Default code given
import sys
import math
# Auto-generated code below aims at helping you parse
# the standard input according to the problem statement.
n = int(input())
for i in range(n):
pi = int(input())
# Write an action using print
# To debug: print("Debug messages...", file=sys.stderr)
print("answer")
Since you can use sort method (which is optimized to avoid performing a costly bubble sort or double loop by hand which has O(n**2) complexity, and times out with a very big list), let me propose something:
sort the list
compute the minimum of absolute value of difference of the adjacent values, passing a generator comprehension to the min function
The minimum has to be the abs difference of adjacent values. Since the list is sorted using a fast algorithm, the heavy lifting is done for you.
like this:
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
sh = sorted(horse_str)
print(min(abs(sh[i]-sh[i+1]) for i in range(len(sh)-1)))
I also get 1 as a result (I hope I didn't miss anything)

Python: Specifying Positions in Lists?

I have three lists, of the form:
Day: [1, 1, 1, 2, 2, 2, 3, 3, 3, ..... n, n, n]
Wavelength: [10, 20, 30, 10, 20, 30, 10, 20, 30, ..... 10, 20, 30]
Flux: [1, 2, 3, 1, 2, 3, 1, 2, 3, ..... 1, 2, 3]
I want to split the lists so that the sections of the list with the "Day" value of 1 are seperated and run through a function, and the process then repeats and does this all the way through until it has been done for all n days.
I've tried splitting them into lists and currently have:
x=[]
y=[]
z=[]
for i in day:
if Day[i] == Day[i+1]:
x.append(Day(i))
y.append(Wavelength[i])
z.append(Flux[i])
i+=1
else "integrate over the Wavelength/Flux values where the value of Day is 1"
i+=1
This doesn't work, and I'm not convinced I'm going about it the best way. I'm relatively new to programming so it still takes me ages to find and fix errors!
If you use zip() to combine the three lists into one list of tuples, you can then filter it for each day you care about. (This isn't particularly efficient if you have lots of data, and will require more memory than your approach, but has the advantage of being, concise, fairly pythonic, and I believe readable.)
data = zip(day, wavelength, flux)
for d in range(min(day), max(day)+1):
print d, [ datum for datum in data if datum[0] == d ]
Instead of print you could just pass that list (the output of the […] list comprehension) to whatever function you need to run over the data (possibly with d, the day you're dealing with at that time).

Categories