Related
I have a numpy array named "a":
a = numpy.array([
[[1, 2, 3], [11, 22, 33]],
[[4, 5, 6], [44, 55, 66]],
])
I want to print the following (in this exact format):
1 2 3
11 22 33
4 5 6
44 55 66
To accomplish this, I wrote the following:
for i in range(len(A)):
a = A[i]
for j in range(len(a)):
a1 = a[j][0]
a2 = a[j][1]
a3 = a[j][2]
print(a1, a2, a3)
The output is:
1 2 3
11 22 33
4 5 6
44 55 66
I would like to vectorize my solution (if possible) and discard the for loop. I understand that this problem might not benefit from vectorization. In reality (for work-related purposes), the array "a" has 52 elements and each element contains hundreds of arrays stored inside. I'd like to solve a basic/trivial case and move onto a more advanced, realistic case.
Also, I know that Numpy arrays were not meant to be iterated through.
I could have used Python lists to accomplish the following, but I really want to vectorize this (if possible, of course).
You could use np.apply_along_axis which maps the array with a function on an arbitrary axis. Applying it on axis=2 to get the desired result.
Using print directly as the callback:
>>> np.apply_along_axis(print, 2, a)
[1 2 3]
[11 22 33]
[4 5 6]
[44 55 66]
Or with a lambda wrapper:
>>> np.apply_along_axis(lambda r: print(' '.join([str(x) for x in r])), 2, a)
1 2 3
11 22 33
4 5 6
44 55 66
In [146]: a = numpy.array([
...: [[1, 2, 3], [11, 22, 33]],
...: [[4, 5, 6], [44, 55, 66]],
...: ])
...:
In [147]: a
Out[147]:
array([[[ 1, 2, 3],
[11, 22, 33]],
[[ 4, 5, 6],
[44, 55, 66]]])
A proper "vectorized" numpy output is:
In [148]: a.reshape(-1,3)
Out[148]:
array([[ 1, 2, 3],
[11, 22, 33],
[ 4, 5, 6],
[44, 55, 66]])
You could also convert that to a list of lists:
In [149]: a.reshape(-1,3).tolist()
Out[149]: [[1, 2, 3], [11, 22, 33], [4, 5, 6], [44, 55, 66]]
But you want a print without the standard numpy formatting (nor list formatting)
But this iteration is easy:
In [150]: for row in a.reshape(-1,3):
...: print(*row)
...:
1 2 3
11 22 33
4 5 6
44 55 66
Since your desired output is a print, or at least "unformatted" strings, there's no "vectorized", i.e. whole-array, option. You have to iterate on each line!
np.savetxt creates a csv output by iterating on rows and writing a format tuple, e.g. f.write(fmt%tuple(row)).
In [155]: np.savetxt('test', a.reshape(-1,3), fmt='%d')
In [156]: cat test
1 2 3
11 22 33
4 5 6
44 55 66
To get that exact output without iterating, try this:
print(str(a.tolist()).replace('], [', '\n').replace('[', '').replace(']', '').replace(',', ''))
Consider this two column df. I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column. I cannot figure out how to enable pandas to do this with apply. I am using apply functions for other purposes and they are working well. Any ideas would be very appreciated.
cur other_yrs
1 11 [11, 11]
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4 16 [15, 85]
5 17 [17, 17, 16]
6 13 [8, 8]
Below is the function I used to extract the values into the "other_yrs" column. I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count. I really only need to store the count of how many of the list items are <= the value in the "cur" column.
def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1: #avoids col values of 0 meaning no other cases.
pass
else:
case_lst = col_string.split(", ") #splits the string of cases into a list
for i in case_lst:
cs_yr = int(i[3:5]) #gets the case year from each individual case number
cs_yr_lst.append(cs_yr) #stores those integers in a list and then into a new column using apply
return cs_yr_lst
The expected output would be this:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:
df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]
Another idea:
df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)
Result:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
You can consider explode and compare then group on level=0 and sum:
u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)
print(df)
cur other_yrs Count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.
for element in Dataframe1.Column1:
Dataframe2[Dateframe2.Column2.isin([element])]
Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.
This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
Closed 6 years ago.
I'm trying to create this program (in python) that converts all rows to columns and columns to rows. To be more specific, the first input are 2 numbers. N and M. N - total rows,M total columns. I've used b=map(int, raw_input().split()). and then based on b[0], Each of the next N lines will contain M space separated integers. For example:
Input:
3 5
13 4 8 14 1
9 6 3 7 21
5 12 17 9 3
Now the program will store it in a 2D array:
arr=[[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
What's required for the output is to print M lines each containing N space separated integers. For example:
Output:
13 9 5
4 6 12
8 3 17
14 7 9
1 21 3
This is what I've tried so far:
#Getting N and M from input
NM=map(int, raw_input().split())
arr=[]
for i in xrange(NM[0]):
c=map(int, raw_input().split())
arr.append(c)
I've created a 2D array and got the values from input but I don't know the rest. Let me make this clear that I'm definitely NOT asking for code. Just exactly what to do to convert rows to columns and in reverse.
Thanks in advance!
You can use zip to transpose the data:
arr = [[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
new_arr = zip(*arr)
# [(13, 9, 5), (4, 6, 12), (8, 3, 17), (14, 7, 9), (1, 21, 3)]
given a list of strings like so (in reality I have a much longer list but I'll keep it short for here):
items=['fish','headphones','wineglass','bowtie','cheese','hammer','socks']
I would like to pick a subset, say 3, of this list randomly so that items can only get picked once. This is easy enough using the following:
import itertools
import random
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(xrange(n), r))
return tuple(pool[i] for i in indices)
items=['fish','headphones','wineglass','bowtie','cheese','hammer','socks']
randomPick=random_combination(items,3)
Next, to be a pain, I don't want to do this just once, but several times say 10 times. The final product would be 10 lists of randomly-only-picked-once items with the constraint that over those 10 lists items are presented an equal amount of times across lists. I'd like to avoid the "socks" to be picked up 10 times and the "hammer" only once for example.
This is the step that I'm stuck with, I simply don't know enough programming or enough about the available functions in python to perform such a thing.
Can anyone help?
The following code might help. It pops a random element until (a copy of) iterable is empty, then starts over from the entire list. The downside is every item is picked once before a single item can be picked a second time. However, as you can see from the output, the distribution of items ends up about equal.
import random
def equal_distribution_combinations(iterable, n, csize):
"""
Yield 'n' lists of size 'csize' containing distinct random elements
from 'iterable.' Elements of 'iterable' are approximately evenly
distributed across all yielded combinations.
"""
i_copy = list(iterable)
if csize > len(i_copy):
raise ValueError(
"csize cannot exceed len(iterable), as elements could not distinct."
)
for i in range(n):
comb = []
for j in range(csize):
if not i_copy:
i_copy = list(iterable)
randi = random.randint(0, len(i_copy) - 1)
# If i_coppy was reinstantiated it would be possible to have
# duplicate elements in comb without this check.
while i_copy[randi] in comb:
randi = random.randint(0, len(i_copy) - 1)
comb.append(i_copy.pop(randi))
yield comb
Edit
Apologies for Python 3. The only change to the function for Python 2 should be range -> xrange.
Edit 2 (answering comment question)
equal_distribution_combinations should result in an even distribution for any n, csize, and length of iterable, as long as csize does not exceed len(iterable) (as the combination elements could not be distinct).
Here's a test using the specific numbers in your comment:
items = range(30)
item_counts = {k: 0 for k in items}
for comb in equal_distribution_combinations(items, 10, 10):
print(comb)
for e in comb:
item_counts[e] += 1
print('')
for k, v in item_counts.items():
print('Item: {0} Count: {1}'.format(k, v))
Output:
[19, 28, 3, 20, 2, 9, 0, 25, 27, 12]
[29, 5, 22, 10, 1, 8, 17, 21, 14, 4]
[16, 13, 26, 6, 23, 11, 15, 18, 7, 24]
[26, 14, 18, 20, 16, 0, 1, 11, 10, 2]
[27, 21, 28, 24, 25, 12, 13, 19, 22, 6]
[23, 3, 8, 4, 15, 5, 29, 9, 7, 17]
[11, 1, 8, 28, 3, 13, 7, 26, 16, 23]
[9, 29, 14, 15, 17, 21, 18, 24, 12, 10]
[19, 20, 0, 2, 25, 5, 22, 4, 27, 6]
[12, 13, 24, 28, 6, 7, 26, 17, 25, 23]
Item: 0 Count: 3
Item: 1 Count: 3
Item: 2 Count: 3
Item: 3 Count: 3
Item: 4 Count: 3
Item: 5 Count: 3
Item: 6 Count: 4
Item: 7 Count: 4
Item: 8 Count: 3
Item: 9 Count: 3
Item: 10 Count: 3
Item: 11 Count: 3
Item: 12 Count: 4
Item: 13 Count: 4
Item: 14 Count: 3
Item: 15 Count: 3
Item: 16 Count: 3
Item: 17 Count: 4
Item: 18 Count: 3
Item: 19 Count: 3
Item: 20 Count: 3
Item: 21 Count: 3
Item: 22 Count: 3
Item: 23 Count: 4
Item: 24 Count: 4
Item: 25 Count: 4
Item: 26 Count: 4
Item: 27 Count: 3
Item: 28 Count: 4
Item: 29 Count: 3
As can be seen, the items are evenly distributed.
i would do something like this:
items = set(items)
res = []
for _ in xrange(10):
r = random.sample(items, 3)
res.append(r)
items -= set(r)
all this does is grab 3 elements, store them, and then subtract them from the original list so they can't be selected again.
Ok in the end I resorted in doing the following.
It is a more constrained implementation where I set the number of times I want to see an item repeat, for example over the 10 lists I want each items to be picked 5 times:
List = ['airplane',
'fish',
'watch',
'balloon',
'headphones',
'wineglass',
'bowtie',
'guitar',
'desk',
'bottle',
'glove'] #there is more in my final list but keeping it short here
numIters = 5
numItems = len(List)
finalList=[]
for curList in range(numIters):
random.shuffle(List)
finalList.append(List[0 : numItems/2]) #append first list
finalList.append(List[numItems/2 : -1]) #append second list
return finalList
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Can you please explain this to me, I'm completely lost here.
This is my code:
def ff(L):
for a in L:
k = L.index(a)
print(k)
b = L.pop(k)
g = b
print(g)
L.insert(k,g)
return L
This is the output:
>>> L = [12,13,14]
>>> ff(L)
0
12
1
13
2
14
[12, 13, 14]
But when i do this:
def ff(L):
for a in L:
k = L.index(a)
print(k)
b = L.pop(k)
g = b + 1
print(g)
L.insert(k,g)
return L
output:
>>> L = [12,13,14]
>>> ff(L)
0
13
0
14
0
15
[15, 13, 14]
Why So?
It' quite obvious. You can add more print to your code to see reasons yourself:
>>> def ff(L):
... for a in L:
... k = L.index(a)
... print 'value', a, 'at', k, 'pos in', L,
... b = L.pop(k)
... g = b + 1
... print 'list after pop', L,
... L.insert(k,g)
... print 'inserted value', g, 'list after ins', L
... return L
...
>>> ff(L)
value 12 at 0 pos in [12, 13, 14] list after pop [13, 14] inserted value 13 list after ins [13, 13, 14]
value 13 at 0 pos in [13, 13, 14] list after pop [13, 14] inserted value 14 list after ins [14, 13, 14]
value 14 at 0 pos in [14, 13, 14] list after pop [13, 14] inserted value 15 list after ins [15, 13, 14]
[15, 13, 14]
So you basically look for a value, pop it, insert value+1 at first position, look for value+1 and further.
I am not sure what's the question here, but as you incremented the value in the first iteration the item at 0th index becomes 13. And during the second iteration L.index(13) returns the index 0 again, so in the second iteration you modified the item at index 0 again to 14. And this goes on...
def ff(L):
for a in L:
print 'List', L, 'searching for', a, 'found at', L.index(a)
k = L.index(a)
b = L.pop(k)
g = b + 1
L.insert(k,g)
return L
L = [12,13,14]
print ff(L)
Output:
List [12, 13, 14] searching for 12 found at 0
List [13, 13, 14] searching for 13 found at 0
List [14, 13, 14] searching for 14 found at 0
[15, 13, 14]
So, list.index() always returns the index of first match found, that's why in the second case the item at 0th index gets incremented.
A simple solution to increment all values by 1 will be:
>>> L = [12,13,14]
>>> [x+1 for x in L]
[13, 14, 15]