python, scikit-learn - Weird behaviour using LabelShuffleSplit - python

Following the scikit-learn documentation for LabelShuffleSplit, I wish to randomise my train/validation batches to ensure I'm training on all possible data (e.g. for an ensemble).
According to the doc, I should see something like (indeed, notice that train/validation sets are evenly split via test_size=0.5):
>>> from sklearn.cross_validation import LabelShuffleSplit
>>> labels = [1, 1, 2, 2, 3, 3, 4, 4]
>>> slo = LabelShuffleSplit(labels, n_iter=4, test_size=0.5, random_state=0)
>>> for train, test in slo:
>>> print("%s %s" % (train, test))
...
[0 1 2 3] [4 5 6 7]
[2 3 6 7] [0 1 4 5]
[2 3 4 5] [0 1 6 7]
[4 5 6 7] [0 1 2 3]
But then I tried using labels = [0, 0, 0, 0, 0, 0, 0, 0] which returned:
...
[] [0 1 2 3 4 5 6 7]
[] [0 1 2 3 4 5 6 7]
[] [0 1 2 3 4 5 6 7]
[] [0 1 2 3 4 5 6 7]
(i.e not evenly split - all data has simply been put into the validation set?) I understand in this case that is doesn't really matter which indices are put into the train/validation sets, but I was hoping it would still be a 50%:50% split???

Related

What will be the best approach for a digit like pattern in python?

i was trying a pattern in Python
if n == 6
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
after trying to think a lot
i did it like this --->
n = 6
for i in range(1,n):
x = 1
countj = 0
for j in range(i,n):
countj +=1
print(j,end=" ")
if j == n-1 and countj < n-1 :
while countj < n-1:
print(x , end =" ")
countj +=1
x +=1
print()
but i don't think it is the best approach, I was trying to search some better approach , but not able to get the proper one, So that I came here,, is there any possible better approach for the problem?
I would do like this, using a rotating deque instance:
>>> from collections import deque
>>> n = 6
>>> d = deque(range(1, n))
>>> for _ in range(1, n):
... print(*d)
... d.rotate(-1)
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
There is a similar/shorter code possible just using range slicing, but maybe it's a bit harder to understand how it works:
>>> ns = range(1, 6)
>>> for i in ns:
... print(*ns[i-1:], *ns[:i-1])
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
You could also create a mathematical function of the coordinates, which might look something like this:
>>> for row in range(5):
... for col in range(5):
... print((row + col) % 5 + 1, end=" ")
... print()
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
A too-clever way using list comprehension:
>>> r = range(5)
>>> [[1 + r[i - j - 1] for i in r] for j in reversed(r)]
[[1, 2, 3, 4, 5],
[2, 3, 4, 5, 1],
[3, 4, 5, 1, 2],
[4, 5, 1, 2, 3],
[5, 1, 2, 3, 4]]
more-itertools has this function:
>>> from more_itertools import circular_shifts
>>> circular_shifts(range(1, 6))
[(1, 2, 3, 4, 5),
(2, 3, 4, 5, 1),
(3, 4, 5, 1, 2),
(4, 5, 1, 2, 3),
(5, 1, 2, 3, 4)]
You can use itertools.cycle to make the sequence generated from range repeat itself, and then use itertools.islice to slice the sequence according to the iteration count:
from itertools import cycle, islice
n = 6
for i in range(n - 1):
print(*islice(cycle(range(1, n)), i, i + n - 1))
This outputs:
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
Your 'pattern' is actually known as a Hankel matrix, commonly used in linear algebra.
So there's a scipy function for creating them.
from scipy.linalg import hankel
hankel([1, 2, 3, 4, 5], [5, 1, 2, 3, 4])
or
from scipy.linalg import hankel
import numpy as np
def my_hankel(n):
x = np.arange(1, n)
return hankel(x, np.roll(x, 1))
print(my_hankel(6))
Output:
[[1 2 3 4 5]
[2 3 4 5 1]
[3 4 5 1 2]
[4 5 1 2 3]
[5 1 2 3 4]]
Seeing lots of answers involving Python libraries. If you want a simple way to do it, here it is.
n = 5
arr = [[1 + (start + i) % n for i in range(n)] for start in range(n)]
arr_str = "\n".join(" ".join(str(cell) for cell in row) for row in arr)
print(arr_str)

Sudoku backtracking solver bug

Ok, I've been scratching my head over this for a few hours now..
My goal was to code a sudoku solver that uses the backtracking method and to show the progress of the algorithm using pygame. For this I have to keep track of the events, I did it by appending them to a list named registre as is shown in the code:
def solver(self):
self.t1 = time.time()
if self.t1-self.t0 > self.solve_time_max:
sys.exit(1)
for i in range(self.grid.shape[0]):
for j in range(self.grid.shape[1]):
if self.grid[i][j]==0:
for n in range(1,10):
self.registre.append([n,i,j])
if self.verify(n,i,j):
self.grid[i][j]=n
self.solver()
if 0 not in self.grid:
break
self.registre.append([0,i,j])
self.grid[i][j]=0
return self.grid
I actually succeeded and everything goes fine for most of the runs. But sometimes, for some reason I couldn't identify, this happens :
print(une_grille.grid0)
print(une_grille.grid)
print(une_grille.registre[:20])
[[0 8 0 7 0 0 0 0 2]
[5 0 0 0 0 9 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 1 0 0 6 0 0 0 0]
[4 0 0 9 0 0 0 0 0]
[0 0 9 0 8 0 0 0 4]
[2 0 0 0 0 0 0 8 0]
[0 0 0 0 0 0 0 0 0]]
[[1 8 3 7 4 5 6 9 2]
[5 2 4 6 1 9 3 7 8]
[6 9 7 2 3 8 1 4 5]
[3 5 2 8 7 1 4 6 9]
[9 1 8 3 6 4 2 5 7]
[4 7 6 9 5 2 8 1 3]
[7 3 9 1 8 6 5 2 4]
[2 4 1 5 9 3 7 8 6]
[8 6 5 4 2 7 9 3 1]]
[[1, 0, 0], [1, 0, 1], [0, 0, 1], [2, 0, 1], [0, 0, 1], [3, 0, 1], [1, 0,
2], [0, 0, 2], [2, 0, 2], [0, 0, 2], [3, 0, 2], [0, 0, 2], [4, 0, 2], [0,
0, 2], [5, 0, 2], [1, 0, 3], [0, 0, 3], [2, 0, 3], [0, 0, 3], [3, 0, 3]]
What is printed is simply the initialized grid, the solved grid and the first 20 events in self.registre. For this run the displaying on pygame didn't work, some numbers overlap themselves and others are left blank. I am almost sure it's not a displaying problem since the displaying function uses the list registre and it works just fine for most of the other runs. Also I don't understand these events.
Complete script :
import numpy as np
import random as rd
import time
import sys
class Grid():
"""
une Docstring
"""
def __init__(self, nval=15, dim=(9,9), tries_max=1000, init_time_max=5e-3, solve_time_max = 1):
self.nval = nval+1
self.dim = dim
self.t0 = 0
self.t1 = 0
self.tries_max = tries_max
self.k = 0
self.init_time_max = init_time_max
self.solve_time_max = solve_time_max
self.registre = []
self.grid = self.create_grid()
self.smthg = 0
def create_grid(self):
for tries in range(self.tries_max):
self.k = 0
if tries == self.tries_max -1:
print(f"Tried {self.tries_max} times, I have failed")
sys.exit(1)
self.grid0 = np.zeros([self.dim[0],self.dim[1]], dtype=int)
try:
self.grid0 = self.initialize_board()
except SystemExit:
print(f"TRY #{tries}: Spent too much time initializing board. Re-trying.")
continue
self.grid = np.copy(self.grid0)
try:
self.t0 = time.time()
self.grid = self.solver()
if 0 not in self.grid:
print(f"Found grid with solution after n = {tries+1} tries!")
return self.grid
else:
print(f"TRY #{tries} converged to null solution")
continue
except SystemExit:
print(f"TRY #{tries} too much time spent trying to solve current grid, continuing")
continue
print("Maximum tries reached")
def initialize_board(self):
for i in range(self.nval):
rx = rd.randint(0, self.grid0.shape[0]-1)
ry = rd.randint(0, self.grid0.shape[1]-1)
cx = int(rx/3)
cy = int(ry/3)
time0 = time.time()
while(self.grid0[rx][ry]==0):
if time.time()-time0 > self.init_time_max:
sys.exit(1)
r = rd.randint(1, 9)
if((r in self.grid0[rx,:]) or (r in self.grid0[:,ry]) or (r in self.grid0[3*cx:3*cx+3,3*cy:3*cy+3])):
continue
else:
self.grid0[rx][ry] = r
return self.grid0
def solver(self):
self.t1 = time.time()
if self.t1-self.t0 > self.solve_time_max:
sys.exit(1)
for i in range(self.grid.shape[0]):
for j in range(self.grid.shape[1]):
if self.grid[i][j]==0:
for n in range(1,10):
self.registre.append([n,i,j])
if self.verify(n,i,j):
self.grid[i][j]=n
self.solver()
if 0 not in self.grid:
break
self.registre.append([0,i,j])
self.grid[i][j]=0
return self.grid
def verify(self, number, x, y):
cx = int(x/3)
cy = int(y/3)
if((number in self.grid[x,:]) or (number in self.grid[:,y]) or (number in self.grid[3*cx:3*cx+3,3*cy:3*cy+3])):
return False
return True
game = Grid(nval = 35)
print(game.grid)
print(game.grid0)
print(game.registre[:20])
Another instance of the issue :
[[1 2 3 9 5 7 6 4 8]
[5 8 4 6 2 1 9 7 3]
[7 9 6 8 4 3 1 5 2]
[6 7 2 5 1 4 3 8 9]
[3 4 9 7 8 6 5 2 1]
[8 5 1 3 9 2 7 6 4]
[2 1 5 4 6 9 8 3 7]
[4 6 7 1 3 8 2 9 5]
[9 3 8 2 7 5 4 1 6]]
[[0 0 0 9 0 7 6 0 0]
[0 0 0 6 0 0 0 0 3]
[0 0 0 8 4 3 1 5 2]
[6 0 0 0 0 0 0 0 0]
[0 4 9 0 0 6 5 2 0]
[0 5 1 0 9 0 7 0 0]
[0 1 5 0 0 0 0 0 0]
[0 0 7 0 0 0 0 0 5]
[9 0 0 2 0 0 0 1 0]]
[[1, 0, 2], [0, 0, 2], [2, 0, 2], [0, 0, 2], [3, 0, 2], [1, 0, 3], [0, 0, 3], [2, 0, 3], [0, 0, 3], [3, 0, 3], [0, 0, 3], [4, 0, 3], [1, 0, 4], [0, 0, 4], [2, 0, 4], [0, 0, 4], [3, 0, 4], [0, 0, 4], [4, 0, 4], [0, 0, 4]]
I would really appreciate it if you could help me with this.
After you added the code that repeats the generation of grids, it becomes clear that you don't create a new instance of Grid, but mutate the existing one. In that process you should then take care to reset the state completely. You only do this partly, e.g. by resetting t0. But you don't reset registre, and so when you print the first 20 items, you are actually looking at the log of the first attempt, not the successful one.

An efficient way to concatenate rows of a 2-dim array according to a given list of pairs of indexes

Suppose I have a 2 dimensional array with a very large number of rows, and a list of pairs of indexes of that array. I want to create a new 2 dim array, whose rows are concatenations of the rows of the original array, made according to the list of pairs of indexes. For example:
a =
1 2 3
4 5 6
7 8 9
0 0 0
indexes = [[0,0], [0,1], [2,3]]
the returned array should be:
1 2 3 1 2 3
1 2 3 4 5 6
7 8 9 0 0 0
Obviously I can iterate the list of indexes, but my question is whether there is a more efficient way of doing this. I should say that the list of indexes is also very large.
First convert indexes to a Numpy array:
ind = np.array(indexes)
Then generate your result as:
result = np.concatenate([a[ind[:,0]], a[ind[:,1]]], axis=1)
The result is:
array([[1, 2, 3, 1, 2, 3],
[1, 2, 3, 4, 5, 6],
[7, 8, 9, 0, 0, 0]])
Another possible formula (with the same result):
result = np.concatenate([ a[ind[:,i]] for i in range(ind.shape[1]) ], axis=1)
You can do this in one line using NumPy as:
a = np.arange(12).reshape(4, 3)
print(a)
b = [[0, 0], [1, 1], [2, 3]]
b = np.array(b)
print(b)
c = a[b.reshape(-1)].reshape(-1, a.shape[1]*b.shape[1])
print(c)
'''
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[0 0]
[1 1]
[2 3]]
[[ 0 1 2 0 1 2]
[ 3 4 5 3 4 5]
[ 6 7 8 9 10 11]]
'''
You can use horizontal stacking np.hstack:
c = np.array(indexes)
np.hstack((a[c[:,0]],a[c[:,1]]))
output:
[[1 2 3 1 2 3]
[1 2 3 4 5 6]
[7 8 9 0 0 0]]

Delete specific Values of an Array: Python

I have an array of the shape (1179648, 909).
The problem is that some rows are filled with 0's only. I am checking for this as follows:
for i in range(spectra1Only.shape[0]):
for j in range(spectra1Only.shape[1]):
if spectra1Only[i,j] == 0:
I now want to remove the whole row of [i] if there is any 0 appearing to get a smaller amount of only the data needed.
My question is: what would be the best method to do so? Remove? Del? numpy.delete? Or any other method?
You can use Boolean indexing with np.any along axis=1:
spectra1Only = spectra1Only[~(spectra1Only == 0).any(1)]
Here's a demonstration:
A = np.random.randint(0, 9, (5, 5))
print(A)
[[5 0 3 3 7]
[3 5 2 4 7]
[6 8 8 1 6]
[7 7 8 1 5]
[8 4 3 0 3]]
print(A[~(A == 0).any(1)])
[[3 5 2 4 7]
[6 8 8 1 6]
[7 7 8 1 5]]

Python numpy: reshape list into repeating 2D array

I'm new to python and I have a question about numpy.reshape. I currently have 2 lists of values like this:
x = [0,1,2,3]
y = [4,5,6,7]
And I want them to be in separate 2D arrays, where each item is repeated for the length of the original lists, like this:
xx = [[0,0,0,0]
[1,1,1,1]
[2,2,2,2]
[3,3,3,3]]
yy = [[4,5,6,7]
[4,5,6,7]
[4,5,6,7]
[4,5,6,7]]
Is there a way to do this with numpy.reshape, or is there a better method I could use? I would very much appreciate a detailed explanation. Thanks!
numpy.meshgrid will do this for you.
N.B. From your requested output, it looks like you want ij indexing, not the default xy
from numpy import meshgrid
x = [0,1,2,3]
y = [4,5,6,7]
xx,yy=meshgrid(x,y,indexing='ij')
print xx
>>> [[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
print yy
>>> [[4 5 6 7]
[4 5 6 7]
[4 5 6 7]
[4 5 6 7]]
For reference, here's xy indexing
xx,yy=meshgrid(x,y,indexing='xy')
print xx
>>> [[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
print yy
>>> [[4 4 4 4]
[5 5 5 5]
[6 6 6 6]
[7 7 7 7]]

Categories