target, image[0] in datasets sklearn

target, image[0] in datasets sklearn - python

what is the use of targetand image[0] method in digits datasets ?
from sklearn import datasets
digits = datasets.load_digits()
digits.target
digits.images[0]
prints
array([0, 1, 2, ..., 8, 9, 8])
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])

target return a vector where each value corresponds to the label of each image of the data set: digit between 0 to 9.
image[0] corresponds to the first image encoded as a matrix of size (8,8).

Related

Python array: Take two and skip two

I have a case where I have input array like this below
array([[[ 1., 0., 2., 0., 3., 0., 4., 0., 5.],
[ 6., 0., 7., 0., 8., 0., 9., 0., 10.],
[11., 0., 12., 0., 13., 0., 14., 0., 15.]],
[[16., 0., 17., 0., 18., 0., 19., 0., 20.],
[21., 0., 22., 0., 23., 0., 24., 0., 25.],
[26., 0., 27., 0., 28., 0., 29., 0., 30.]]])
and I would like to get an output like the one below.
array([[[ 1., 0., 3., 0., 5.],
[ 6., 0., 8., 0., 10.],
[11., 0., 13., 0., 15.]],
[[16., 0., 18., 0., 20.],
[21., 0., 23., 0., 25.],
[26., 0., 28., 0., 30.]]])
I would love if the solution can be generic not just to this example.

Since the length of the last dimension cannot be guaranteed to be even, here I choose to build the bool indices:
>>> mask = np.arange(ar.shape[-1]) // 2 % 2 == 0 # np.arange() & 2 == 0 is faster
>>> mask
array([ True, True, False, False, True, True, False, False, True])
>>> ar[:, :, mask] # or ar[..., mask]
array([[[ 1., 0., 3., 0., 5.],
[ 6., 0., 8., 0., 10.],
[11., 0., 13., 0., 15.]],
[[16., 0., 18., 0., 20.],
[21., 0., 23., 0., 25.],
[26., 0., 28., 0., 30.]]])
If the length of the last dimension can be guaranteed to be even, reshape with slicing is another technique:
>>> ar
array([[[ 1., 0., 2., 0., 3., 0., 4., 0.],
[ 6., 0., 7., 0., 8., 0., 9., 0.],
[11., 0., 12., 0., 13., 0., 14., 0.]],
[[16., 0., 17., 0., 18., 0., 19., 0.],
[21., 0., 22., 0., 23., 0., 24., 0.],
[26., 0., 27., 0., 28., 0., 29., 0.]]])
>>> shape = ar.shape[:-1]
>>> ar.reshape(*shape, -1, 2)[..., ::2, :].reshape(*shape, -1)
array([[[ 1., 0., 3., 0.],
[ 6., 0., 8., 0.],
[11., 0., 13., 0.]],
[[16., 0., 18., 0.],
[21., 0., 23., 0.],
[26., 0., 28., 0.]]])

it there any way to convert 3D numpy array to 2D

I got a 3d NumPy array:
array([[[ 12., 0., 0.],
[ 15., 0., 0.],
[ 13., 0., 0.]],
[[ 12., 0., 0.],
[ 11., 0., 0.],
[ 13., 0., 0.]]])
Is there any way to convert to a 2d and only get
[12., 15., 13.]
[12., 11., 13.]

x = np.array(
[[[ 12., 0., 0.],
[ 15., 0., 0.],
[ 13., 0., 0.]],
[[ 12., 0., 0.],
[ 11., 0., 0.],
[ 13., 0., 0.]]]
)
x_2d = x[:, :, 0]
>> x_2d
>> array([[12., 15., 13.],
[12., 11., 13.]])

fold/col2im for convolutions in numpy

Suppose I have an input matrix of shape (batch_size ,channels ,h ,w)
in this case (1 ,2 ,3 ,3)
[[[[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]],
[[ 9., 10., 11.],
[12., 13., 14.],
[15., 16., 17.]]]])
to do a convolution with it i unroll it to the shape of
(batch_size ,channels * kernel_size * kernel_size ,out_h * out_w)
which is:
[[[ 0., 1., 3., 4.],
[ 1., 2., 4., 5.],
[ 3., 4., 6., 7.],
[ 4., 5., 7., 8.],
[ 9., 10., 12., 13.],
[10., 11., 13., 14.],
[12., 13., 15., 16.],
[13., 14., 16., 17.]]]
now i want to get the unrolled matrix back to its original form
which looks like this:
# for demonstration only the first and second column of the unrolled matrix
# the output should be the same shape as the initial matrix -> initialized to zeros
# current column -> [ 0., 1., 3., 4., 9., 10., 12., 13.]
[[[[0+0, 0+1, 0],
[0+3, 0+4, 0],
[0 , 0 , 0]],
[[0+9 , 0+10, 0],
[0+12, 0+13, 0],
[0 , 0 , 0]]]]
# for the next column it would be
# current column -> [ 1., 2., 4., 5., 10., 11., 13., 14.]
[[[[0 , 1+1, 0+2],
[3 , 4+4, 0+5],
[0 , 0 , 0 ]],
[[9 , 10+10, 0+11],
[12 , 13+13, 0+14],
[0 , 0 , 0 ]]]])
you basically put your unrolled elements back to its original place and sum the overlapping parts together.
But now to my question:
How could one implement this as fast as possible using numpy and
as less loops as possible. I already just looped through it kernel by kernel but this aproach isnt feasible with larger inputs. I think this could be parallelized quite a bit but my numpy indexing and overall knowledge isnt good enough to figure out a good solution by myself.
thanks for reading and have a nice day :)

With numpy, I expect this can be done using numpy.lib.stride_tricks.as_strided. However, I'd suggest that you look at pytorch, which interoperates easily with numpy and has quite efficient primitives for this operation. In your case, the code would look like:
kernel_size = 2
x = torch.arange(18).reshape(1, 2, 3, 3).to(torch.float32)
unfold = torch.nn.Unfold(kernel_size=kernel_size)
fold = torch.nn.Fold(kernel_size=kernel_size, output_size=(3, 3))
unfolded = unfold(x)
cols = torch.arange(kernel_size ** 2)
for col in range(kernel_size ** 2):
# col = 0
unfolded_masked = torch.where(col == cols, unfolded, torch.tensor(0.0, dtype=torch.float32))
refolded = fold(unfolded_masked)
print(refolded)
tensor([[[[ 0., 1., 0.],
[ 3., 4., 0.],
[ 0., 0., 0.]],
[[ 9., 10., 0.],
[12., 13., 0.],
[ 0., 0., 0.]]]])
tensor([[[[ 0., 1., 2.],
[ 0., 4., 5.],
[ 0., 0., 0.]],
[[ 0., 10., 11.],
[ 0., 13., 14.],
[ 0., 0., 0.]]]])
tensor([[[[ 0., 0., 0.],
[ 3., 4., 0.],
[ 6., 7., 0.]],
[[ 0., 0., 0.],
[12., 13., 0.],
[15., 16., 0.]]]])
tensor([[[[ 0., 0., 0.],
[ 0., 4., 5.],
[ 0., 7., 8.]],
[[ 0., 0., 0.],
[ 0., 13., 14.],
[ 0., 16., 17.]]]])

Make 3D array with 1D arrays with zero padding depending on index of 1D array numpythonically

Consider the following 1D arrays
a=np.arange(3)+9
b=np.arange(3)+5
currently I am initializing the new 3d array by using
n=4
cols=3
k=np.vstack((a,b,a*b,np.zeros((n,cols)),a,b,a,a,b**2,np.zeros((n,cols)),a*2,a)).T.reshape(-1,2,n+5)
where a and b will always be the same shape
which results in
array([[[ 9., 5., 45., 0., 0., 0., 0., 9., 5.],
[ 9., 9., 25., 0., 0., 0., 0., 18., 9.]],
[[ 10., 6., 60., 0., 0., 0., 0., 10., 6.],
[ 10., 10., 36., 0., 0., 0., 0., 20., 10.]],
[[ 11., 7., 77., 0., 0., 0., 0., 11., 7.],
[ 11., 11., 49., 0., 0., 0., 0., 22., 11.]]])
How would i use a similar technique, also without a for loop, to change the zero padding to the following:
array([[[ 9., 5., 45., 9., 5., 0., 0., 0., 0.],
[ 9., 9., 25., 18., 9., 0., 0., 0., 0.]],
[[ 10., 6., 60., 0., 0., 10., 6., 0., 0.],
[ 10., 10., 36., 0., 0., 20., 10., 0., 0.]],
[[ 11., 7., 77., 0., 0., 0., 0., 11., 7.],
[ 11., 11., 49., 0., 0., 0., 0., 22., 11.]]])

One can use advanced-indexing to assign those array values into a zeros initialized array given the column indices -
out = np.zeros((3,2,9),dtype=bool)
vals = np.array([[a,b,a*b,a,b],[a,a,b**2,2*a,a]])
out[np.arange(3)[:,None],:, idx] = vals.T
Sample run -
In [448]: a
Out[448]: array([ 9, 10, 11])
In [449]: b
Out[449]: array([5, 6, 7])
In [450]: out
Out[450]:
array([[[ 9., 5., 45., 9., 5., 0., 0., 0., 0.],
[ 9., 9., 25., 18., 9., 0., 0., 0., 0.]],
[[ 10., 6., 60., 0., 0., 10., 6., 0., 0.],
[ 10., 10., 36., 0., 0., 20., 10., 0., 0.]],
[[ 11., 7., 77., 0., 0., 0., 0., 11., 7.],
[ 11., 11., 49., 0., 0., 0., 0., 22., 11.]]])

Making OrderedDict out of Lists

I am trying to modify my code so that it will work over different time spans. I want my variable days_dict to look like months_dict.
I have this variable called months_dict which works very well. I made it using these lines of code
graphmonths = [pivot_table[(m)].astype(float).values for m in range(1, 13)]
names = ["Jan", "Feb", "Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov", "Dec"]
months_dict = OrderedDict(list(zip(names, graphmonths)))
It looks like this
months_dict
Out[86]:
OrderedDict([('Jan', array([ 17., 4., 11., 12., 13., 6., 9., 7., 8., 7., 4.,
3., 5., 3., 4., 4., 5., 6., 3., 4., 10., 5.,
8., 5., 3., 4., 9., 4., 2., 2.])), ('Feb', array([ 30., 11., 8., 11., 10., 10., 1., 8., 3., 5., 5.,
11., 6., 8., 3., 5., 2., 5., 15., 2., 3., 8.,
6., 6., 5., 4., 9., 6., 11., 1.])), ('Mar', array([ 29., 13., 25., 6., 7., 11., 2., 9., 9., 5., 4.,
7., 12., 5., 5., 8., 8., 5., 13., 6., 8., 3.,
7., 5., 10., 5., 6., 4., 3., 4.])), ('Apr', array([ 39., 22., 24., 23., 14., 8., 20., 7., 8., 3., 6.,
7., 6., 6., 6., 3., 15., 8., 4., 1., 5., 2.,
4., 7., 2., 4., 6., 3., 5., 0.])), ('May', array([ 15., 34., 7., 11., 6., 3., 6., 11., 9., 3., 5.,
5., 10., 1., 5., 4., 2., 4., 5., 5., 2., 3.,
13., 9., 4., 7., 5., 3., 5., 0.])), ('Jun', array([ 27., 27., 13., 11., 8., 4., 7., 4., 7., 7., 4.,
9., 6., 4., 6., 4., 7., 9., 1., 3., 2., 11.,
8., 1., 4., 4., 5., 1., 3., 10.])), ('Jul', array([ 22., 30., 24., 9., 5., 10., 6., 3., 5., 9., 12.,
6., 4., 6., 5., 10., 6., 7., 1., 9., 2., 6.,
0., 8., 6., 2., 3., 6., 5., 9.])), ('Aug', array([ 12., 18., 0., 4., 10., 8., 4., 3., 7., 7., 14.,
3., 5., 10., 5., 7., 6., 2., 0., 8., 20., 10.,
1., 5., 7., 8., 3., 0., 5., 12.])), ('Sep', array([ 36., 29., 21., 6., 13., 11., 6., 6., 6., 11., 5.,
6., 3., 6., 4., 5., 6., 5., 7., 8., 2., 3.,
1., 4., 5., 6., 3., 3., 10., 7.])), ('Oct', array([ 21., 31., 12., 11., 8., 11., 6., 5., 9., 6., 8.,
5., 4., 4., 7., 4., 1., 6., 9., 3., 5., 7.,
6., 7., 6., 6., 3., 4., 4., 8.])), ('Nov', array([ 18., 17., 12., 5., 12., 18., 12., 8., 3., 10., 2.,
3., 9., 4., 12., 6., 5., 4., 2., 8., 4., 5.,
4., 3., 2., 3., 4., 21., 3., 3.])), ('Dec', array([ 15., 14., 17., 10., 11., 14., 7., 11., 5., 3., 6.,
9., 3., 15., 9., 11., 5., 7., 5., 7., 1., 1.,
4., 1., 7., 7., 3., 4., 2., 2.]))])
Now I am trying to do the same but for a month. So I have
Days = pivot_table.columns[1:].tolist()
this gives
Days
Out[80]:
[1L,
2L,
3L,
4L,
5L,
6L,
7L,
8L,
9L,
10L,
11L,
12L,
13L,
14L,
15L,
16L,
17L,
18L,
19L,
20L,
21L,
22L,
23L,
24L,
25L,
26L,
27L,
28L,
29L,
30L]
Then I have
graphdays = [pivot_table[(m)].astype(float).values for m in range(1, len(Days)+1)]
days_dict = OrderedDict(list(zip(Days, graphdays)
but days_dict looks like this:
days_dict
Out[73]:
OrderedDict([(1L, array([ 3., 0., 0., 0., 0., 2., 0., 1., 0., 0., 2., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 0., 0.])), (2L, array([ 1., 1., 0., 0., 3., 1., 0., 1., 1., 1., 0., 1., 0.,
0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 0., 0.])), (3L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (4L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 2., 0., 0.,
0., 0., 0., 0.])), (5L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (6L, array([ 1., 3., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 1., 0.])), (7L, array([ 2., 2., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
1., 4., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.,
0., 0., 2., 0.])), (8L, array([ 9., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 2., 0., 0.,
0., 0., 0., 0.])), (9L, array([ 1., 0., 4., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0.,
0., 1., 0., 0., 0., 0., 0., 0., 3., 2., 0., 0., 0.,
0., 0., 0., 0.])), (10L, array([ 2., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.,
0., 0., 0., 0.])), (11L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (12L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (13L, array([ 1., 0., 0., 2., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 0., 0.])), (14L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (15L, array([ 0., 2., 10., 1., 0., 0., 1., 1., 0., 1., 1.,
3., 0., 2., 0., 1., 0., 2., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.])), (16L, array([ 0., 4., 3., 0., 0., 0., 5., 2., 1., 0., 0., 0., 3.,
1., 0., 0., 1., 4., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (17L, array([ 0., 1., 0., 0., 0., 0., 0., 1., 0., 2., 1., 0., 1.,
0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0.,
1., 0., 0., 0.])), (18L, array([ 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 0., 0.])), (19L, array([ 4., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 0., 0., 0.])), (20L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (21L, array([ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (22L, array([ 2., 1., 0., 5., 0., 0., 0., 0., 0., 0., 0., 2., 0.,
0., 0., 0., 0., 0., 0., 0., 3., 0., 0., 0., 0., 2.,
0., 1., 0., 0.])), (23L, array([ 2., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0.,
0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 3., 0., 1.])), (24L, array([ 1., 0., 0., 3., 1., 0., 0., 0., 0., 1., 1., 0., 1.,
0., 1., 0., 0., 0., 0., 0., 3., 0., 0., 0., 0., 0.,
0., 0., 1., 2.])), (25L, array([ 1., 4., 0., 1., 0., 2., 0., 0., 1., 0., 1., 0., 0.,
0., 0., 0., 1., 0., 6., 0., 0., 0., 0., 0., 6., 0.,
2., 0., 0., 2.])), (26L, array([ 0., 1., 0., 2., 1., 0., 0., 0., 0., 2., 0., 0., 1.,
1., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (27L, array([ 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (28L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])), (29L, array([ 2., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 1., 0.])), (30L, array([ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 0., 0.]))])
With 'L's in the code.
What are these 'L's? How can I remove them so that this code will work the same as with months_dict?

L means long integer (Why do integers in database row tuple have an 'L' suffix?). Long integers and regular integers were merged in Python 3, so you can get rid of them by switching to Python 3. I do not think you have to worry about them, however.

Alright I figured out how to make it work. First I had to make a list of Days. I took this from my dataframe. Then I used
graphdays = [pivot_table[(m)].astype(float).values for m in range(1, len(Days)+1)]
days_dict = OrderedDict(list(zip(Days, graphdays)))
and it worked!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

target, image[0] in datasets sklearn - python

target return a vector where each value corresponds to the label of each image of the data set: digit between 0 to 9. image[0] corresponds to the first image encoded as a matrix of size (8,8).

Related

Python array: Take two and skip two

it there any way to convert 3D numpy array to 2D

fold/col2im for convolutions in numpy

Make 3D array with 1D arrays with zero padding depending on index of 1D array numpythonically

Making OrderedDict out of Lists

Categories

Resources