To Generate a split indices for n-fold - python

I have a requirement to generate a split for cross validation, say s is an index of records
s = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Now I want to randomly shuffle and split the data with 5 folds, typically I want output something like this
s = [[1 5 4 6], [2,3, 19,20], [... ], [... ], [.. ]]
Note: In each array numbers should be unique, it should not repeat
I know I can use chunk() but in chunk you can do only sequence wise like 1-4, 5-8,....
Can anyone help me on this ?

Shuffle your array using random.shuffle and split it into 5 pieces:
For Python2 use
import random
s = range(1, 21)
random.shuffle(s)
s = [s[i::5] for i in range(5)]
or for Python3:
import random
s = list(range(1, 21))
random.shuffle(s)
s = [s[i::5] for i in range(5)]

import random
s = [1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
print [random.sample(s,5) for i in xrange(len(s)/5)]

Related

Numpy argsort - what is happening?

I have a numpy array called arr1 defined like following.
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 9, 8, 10, 11, 12, 13, 14, 15, 16,
17], dtype=int64)
I expected all the indices of the array to be in numeric order but indices 8 and 9 seems to have flipped.
Can someone help on why this is happening?
np.argsort by default uses the quicksort algorithm which is not stable. You can specify kind = "stable" to perform a stable sort, which will preserve the order of equal elements:
import numpy as np
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort(kind="stable"))
It gives:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
Because it will sort according to the quick sort algorithm if you follow the steps you will see that is why they are flipped. https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

How do I change my code to draw a table from this list?

seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution

Pandas compare items in list in one column with single value in another column

Consider this two column df. I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column. I cannot figure out how to enable pandas to do this with apply. I am using apply functions for other purposes and they are working well. Any ideas would be very appreciated.
cur other_yrs
1 11 [11, 11]
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4 16 [15, 85]
5 17 [17, 17, 16]
6 13 [8, 8]
Below is the function I used to extract the values into the "other_yrs" column. I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count. I really only need to store the count of how many of the list items are <= the value in the "cur" column.
def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1: #avoids col values of 0 meaning no other cases.
pass
else:
case_lst = col_string.split(", ") #splits the string of cases into a list
for i in case_lst:
cs_yr = int(i[3:5]) #gets the case year from each individual case number
cs_yr_lst.append(cs_yr) #stores those integers in a list and then into a new column using apply
return cs_yr_lst
The expected output would be this:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:
df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]
Another idea:
df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)
Result:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
You can consider explode and compare then group on level=0 and sum:
u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)
print(df)
cur other_yrs Count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.
for element in Dataframe1.Column1:
Dataframe2[Dateframe2.Column2.isin([element])]
Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.

convert all rows to columns and columns to rows in Arrays [duplicate]

This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
Closed 6 years ago.
I'm trying to create this program (in python) that converts all rows to columns and columns to rows. To be more specific, the first input are 2 numbers. N and M. N - total rows,M total columns. I've used b=map(int, raw_input().split()). and then based on b[0], Each of the next N lines will contain M space separated integers. For example:
Input:
3 5
13 4 8 14 1
9 6 3 7 21
5 12 17 9 3
Now the program will store it in a 2D array:
arr=[[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
What's required for the output is to print M lines each containing N space separated integers. For example:
Output:
13 9 5
4 6 12
8 3 17
14 7 9
1 21 3
This is what I've tried so far:
#Getting N and M from input
NM=map(int, raw_input().split())
arr=[]
for i in xrange(NM[0]):
c=map(int, raw_input().split())
arr.append(c)
I've created a 2D array and got the values from input but I don't know the rest. Let me make this clear that I'm definitely NOT asking for code. Just exactly what to do to convert rows to columns and in reverse.
Thanks in advance!
You can use zip to transpose the data:
arr = [[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
new_arr = zip(*arr)
# [(13, 9, 5), (4, 6, 12), (8, 3, 17), (14, 7, 9), (1, 21, 3)]

Matlab vs Python: Reshape

So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')

Categories