consider two numpy arrays
array1 = np.arange(0,6)
array2 = np.arange(0,12)
i want to a run a loop (preferably a list comprehension) where the desire output for a single round is
print(array1[0])
print(array2[0],array2[1]) or
print(array1[1])
print(array2[2], array2[3])
ie the loop runs six times, but for every round in array1 it selects the two consecutive elements from array2.
I have tried something like
for i in xrange(array1):
for v in xrange(array2):
but this evidently runs the second loop inside the first one, How can i run them simultaneously but select different number of elements from each array in one round?
I have also tried making the loops equal in length such as
array1 = np.repeat(np.arange(0,6),2).ravel()
array1 = [0,0,1,1,2,2.....5,5]
however, this will make the length of the two arrays equal but i still cannot get the desired output
(In actual case, the elements of the array are pandas Series objects)
There are a bunch of different ways of going about this. One thing you can do is use the indices:
for ind, item in array1:
print(item, array2[2*ind:2*ind+2])
This does not use the full power of numpy, however. The easiest thing I can think of is to concatenate your arrays into a single array containing the desired sequence. You can make it into a 2D array for easy iteration, where each column or row will be the sequence of three elements you want:
array1 = np.arange(6)
array2 = np.arange(12)
combo = np.concatenate((array1.reshape(-1, 1), array2.reshape(-1, 2)), axis=1)
for row in combo:
print(row)
Results in
[0 0 1]
[1 2 3]
[2 4 5]
[3 6 7]
[4 8 9]
[ 5 10 11]
In this case, the explicit reshape of array1 is necessary because array1.T will result in a 1D array.
You can use a hybrid of the two approaches, as #Divakar suggests, where you reshape a2 but iterate using the index:
array3 = array2.reshape(-1, 2)
for ind, item in array1:
print(item, array3[ind])
Yes, as #MadPhysicist mentioned, there are a lot of ways to do this.... but the simplest is
>>> for x,y,z in zip(array1,array2[:-1:2],array2[1::2]):
... print x,y,z
...
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
5 10 11
for i in xrange(array1):
print(array1[i])
print(array2[2*i],array2[2*i+1])
Related
How do I fill a 2d array value from an updated 1d list ?
for example I have a list that I get from this code :
a=[]
for k, v in data.items():
b=v/sumcount
a.append(b)
What I want to do is produce several 'a' list and put their value into 2d array with different column. OR put directly the b value into 2D array whic one colum represent loop for number of k.
*My difficulties here is, k is not integer. its dict keys (str). whose length=9
I have tried this but does not work :
row = len(data.items())
matrix=np.zeros((9,2))
for i in range (1,3) :
a=[]
for k, v in data.items():
b=v/sumcount
matrix[x][i].fill(b), for x in range (1, 10)
a list is
1
2
3
4
5
6
7
8
9
and for example I do the outer loop, what I expect is
*for example 1 to 2 outer loop so I expect there will be 2 column and 9 row.
1 6
2 7
3 8
4 9
5 14
6 15
7 16
8 17
9 18
I want to fill matrix value with b
import numpy as np
import pandas as pd
matrix = np.zeros((9, 2))
df = pd.DataFrame({'aaa': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
sumcount = [1, 2]
for i in range(len(sumcount)):
matrix[:, i] = df['aaa']/sumcount[i]
print(matrix)
As far as I understand you: you need to get the result of the column from the dataframe and place it in a numpy array. No need to iterate over each row if your sumcount is the same number. This will work slowly. In general, loops are used as a last resort, if there is no other possibility.
Slicing is used to set values in numpy.
bbb = np.array([df['aaa']/sumcount[i] for i in range(len(sumcount))]).transpose()
print(bbb)
Or do without a loop at all using list comprehension, wrap the result in np.array and apply transpose.
If I have a dictionary, say
test_dict
and it contains 81 entries, in the current correct order.
How would I convert the dictionary's 81 values only into a 9x9 2D array? First 9 values make up the first 9 item array, second 9 values make up the second, and so on. Is it possible with numpy? I feel as though I'm missing something simple.
You could try this.
Here I have taken a dictionary having 4 elements.
Extracted values from dictionary to a numpy array, then reshaped it to 2X2. You can reshape it to 9 by 9
import numpy as np
values = {1 : 1, 2 : 2, 3 : 2, 4 : 5}
vals = np.fromiter(values.values(), dtype=int)
print(vals.reshape(2,2))
Output:
[[1 2]
[2 5]]
I have some code right now that works fine, but it entirely too slow. I'm trying to add up the weighted sum of squares for every row in a Pandas dataframe. I'd like to vectorize the operations--that seems to run much, much faster--but there's a wrinkle in the code that has defeated my attempts to vectorize.
totalDist = 0.0
for index, row in pU.iterrows():
totalDist += (row['distance'][row['schoolChoice']]**2.0*float(row['students']))
The row has 'students' (an integer), distance (a numpy array of length n), and schoolChoice (an integer less than or equal to n-1 which designates which element of the distance array I'm using for the calcuation). Basically, I'm pulling a row-specific value from the numpy array. I've used df.lookup, but that actually seems to be slower and is being deprecated. Any suggestions on how to make this run faster? Thanks in advance!
If all else fails you can use .apply() on each row
totalSum = df.apply(lambda row: row.distance[row.schoolChoice] ** 2 * row.students, axis=1).sum()
To go faster you can import numpy
totalSum = (numpy.stack(df.distance)[range(len(df.schoolChoice)), df.schoolChoice] ** 2 * df.students).sum()
The numpy method requires distance be the same length for each row - however it is possible to pad them to the same length if needed. (Though this may affect any gains made.)
Tested on a df of 150,000 rows like:
distance schoolChoice students
0 [1, 2, 3] 0 4
1 [4, 5, 6] 2 5
2 [7, 8, 9] 2 6
3 [1, 2, 3] 0 4
4 [4, 5, 6] 2 5
Timings:
method time
0 for loop 15.9s
1 df.apply 4.1s
2 numpy 0.7s
I have data in a text file with 3 columns and 4 rows like this:
5 6.4 17
6 5.8 16
7 5.5 3.9
8 5.3 10.4
I want to read this data from the text file into 3 1D arrays each with 4 elements
I have this code:
import numpy as np
with open('data.txt','rt') as filedata:
values=np.genfromtxt('data.txt', unpack=True)
this has created a 2D (3,4) array.
I managed to split it into 3 subarrays using np.slice(values,4) but then I didn't know how to rename and subsequently use those subarrays
You can use python's slice notation:
import numpy as np
with open('data.txt','rt') as filedata:
values = np.genfromtxt('data.txt', unpack=True)
array1 = values[:, 0]
array2 = values[:, 1]
array3 = values[:, 2]
When you use slicing the first value defines the range in rows and the second in columns. So by typing values[:, 0] you say give me all elements in 0th column. The semicolon lets you specify a range. For example, values[0:2, 0] says give me the first two elements in the 0th column. You can look at the slicing notation in more detail here.
I have a 2D array of the form: np.zeros((m,n)).
My objective is to look at the first 2 columns, I want to find the element in the first column that is occurring the most (so the mode of the first column), however I do not want to count it twice if the second column is the same.
5x3 example:
[[1 2 x], [1 2 y], [1 3 z], [5 3 w], [5 6 v], [9 2 x], [9 2 y],]
Desired output, i.e. the number of occurrences of:
[1]: 2
[5]: 2
[9]: 1
So in a way it is a counter function but conditional on a second array (column 2).
I am relatively new to Python, is there a function that can do this directly and somewhat efficiently? I need to run it on very large matrices, but could not find such a function.
This funciotn solves your problem.
def count_special(arr):
counter = {}
for i in np.unique(arr[:,0]):
sec = arr[arr[:,0]==i,1]
counter[i] = len(np.unique(sec))
return counter
which, for your input, returns:
arr = np.array([[1,2,0],[1,2,4],[1,3,4],[5,3,1],[5,6,0]])
print(count_special(arr))
-> {1: 2, 5: 2}