convert all rows to columns and columns to rows in Arrays [duplicate] - python

This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
Closed 6 years ago.
I'm trying to create this program (in python) that converts all rows to columns and columns to rows. To be more specific, the first input are 2 numbers. N and M. N - total rows,M total columns. I've used b=map(int, raw_input().split()). and then based on b[0], Each of the next N lines will contain M space separated integers. For example:
Input:
3 5
13 4 8 14 1
9 6 3 7 21
5 12 17 9 3
Now the program will store it in a 2D array:
arr=[[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
What's required for the output is to print M lines each containing N space separated integers. For example:
Output:
13 9 5
4 6 12
8 3 17
14 7 9
1 21 3
This is what I've tried so far:
#Getting N and M from input
NM=map(int, raw_input().split())
arr=[]
for i in xrange(NM[0]):
c=map(int, raw_input().split())
arr.append(c)
I've created a 2D array and got the values from input but I don't know the rest. Let me make this clear that I'm definitely NOT asking for code. Just exactly what to do to convert rows to columns and in reverse.
Thanks in advance!

You can use zip to transpose the data:
arr = [[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
new_arr = zip(*arr)
# [(13, 9, 5), (4, 6, 12), (8, 3, 17), (14, 7, 9), (1, 21, 3)]

Related

Numpy argsort - what is happening?

I have a numpy array called arr1 defined like following.
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 9, 8, 10, 11, 12, 13, 14, 15, 16,
17], dtype=int64)
I expected all the indices of the array to be in numeric order but indices 8 and 9 seems to have flipped.
Can someone help on why this is happening?
np.argsort by default uses the quicksort algorithm which is not stable. You can specify kind = "stable" to perform a stable sort, which will preserve the order of equal elements:
import numpy as np
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort(kind="stable"))
It gives:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
Because it will sort according to the quick sort algorithm if you follow the steps you will see that is why they are flipped. https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

Pandas compare items in list in one column with single value in another column

Consider this two column df. I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column. I cannot figure out how to enable pandas to do this with apply. I am using apply functions for other purposes and they are working well. Any ideas would be very appreciated.
cur other_yrs
1 11 [11, 11]
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4 16 [15, 85]
5 17 [17, 17, 16]
6 13 [8, 8]
Below is the function I used to extract the values into the "other_yrs" column. I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count. I really only need to store the count of how many of the list items are <= the value in the "cur" column.
def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1: #avoids col values of 0 meaning no other cases.
pass
else:
case_lst = col_string.split(", ") #splits the string of cases into a list
for i in case_lst:
cs_yr = int(i[3:5]) #gets the case year from each individual case number
cs_yr_lst.append(cs_yr) #stores those integers in a list and then into a new column using apply
return cs_yr_lst
The expected output would be this:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:
df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]
Another idea:
df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)
Result:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
You can consider explode and compare then group on level=0 and sum:
u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)
print(df)
cur other_yrs Count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.
for element in Dataframe1.Column1:
Dataframe2[Dateframe2.Column2.isin([element])]
Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.

multiple iteration in a single for loop [duplicate]

This question already has answers here:
How do I iterate through two lists in parallel?
(8 answers)
Closed 5 years ago.
i have,
list1 = [1, 2, 3, 4, 5]
list2 = [7, 8, 9, 14, 25, 36]
list3 = [43, 65]
and what i want to achieve is
1 7 43
2 8 65
3 9
4 14
5 25
36
i have tried looking in methods like itertools and multiprocessiongbut none of them helps,
is there a way i can achiver this in a single for loop like...
for I, J, K inb list1, list2, list3:
print('{} {} {}'.format(I, J, K))
any help?
EDIT
zip function gets the least number of elements from the lists, if i use zip the output will be
1 7 43
2 8 65
You can try this using itertools:
import itertools
list1 = [1, 2, 3, 4, 5]
list2 = [7, 8, 9, 14, 25, 36]
list3 = [43, 65]
for a, b, c in itertools.izip_longest(list1, list2, list3):
print a, b, c

To Generate a split indices for n-fold

I have a requirement to generate a split for cross validation, say s is an index of records
s = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Now I want to randomly shuffle and split the data with 5 folds, typically I want output something like this
s = [[1 5 4 6], [2,3, 19,20], [... ], [... ], [.. ]]
Note: In each array numbers should be unique, it should not repeat
I know I can use chunk() but in chunk you can do only sequence wise like 1-4, 5-8,....
Can anyone help me on this ?
Shuffle your array using random.shuffle and split it into 5 pieces:
For Python2 use
import random
s = range(1, 21)
random.shuffle(s)
s = [s[i::5] for i in range(5)]
or for Python3:
import random
s = list(range(1, 21))
random.shuffle(s)
s = [s[i::5] for i in range(5)]
import random
s = [1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
print [random.sample(s,5) for i in xrange(len(s)/5)]

How to format data into a python list

I am trying to figure out how to take data which looks like:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
from a file and make it look like:
[1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20]
Each line represents a row from the data file, so the starting format should not be confused with:
1 2 3 4 5 6 7...
The original data is contained in a file so it must be first read in and then rewritten with the commas and brackets into a new file. My starting point would be to first read in the data:
with open("data.txt","r") as data:
lines = data.readlines()
Then I know that I have to take the read lines and rewrite them in the format I need but I don't know how to do this for each element of each line.
You can tell Python to split each line by its spaces, and then join the resulting elements with ', ' like this:
', '.join(line_in.split())
This will convert a string like:
'6 7 8 9 10'
in this:
'6, 7, 8, 9, 10'
Now, you need to decide whether this is the last line of the file or not. If it is the last line of the file you need to append a "]" whereas if it is not the last line, you need to add a ",".
At the beginning of the file you need also to add a "["
Hope it helps
You can try something like this:
>>> data = open('data.txt').read().split()
>>> data = [int(item) for item in data if item.isdigit()
>>> data
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
This will get all of the integers in the file. If you don't care, just remove the second, line, and call data = open('data.txt').read().split().

Categories