I have a pandas dataframe with people who live in different locations (latitude, longitude, floor number). I would like to assign 3 people to a group. The groups should be numbered from 1 to n. Important: These 3 people share different locations in terms of latitude, longitude and floor. This means, at the end of this process, every person is assigned to one particular group. My dataframe has the length of multiples of 9 (e.g 18 people).
Example:
Here is my dataframe:
array_data=([[ 50.56419 , 8.67667 , 2. , 160. ],
[ 50.5740356, 8.6718179, 1. , 5. ],
[ 50.5746321, 8.6831284, 3. , 202. ],
[ 50.5747453, 8.6765588, 4. , 119. ],
[ 50.5748992, 8.6611471, 2. , 260. ],
[ 50.5748992, 8.6611471, 3. , 102. ],
[ 50.575 , 8.65985 , 2. , 267. ],
[ 50.5751 , 8.66027 , 2. , 7. ],
[ 50.5751 , 8.66027 , 2. , 56. ],
[ 50.57536 , 8.67741 , 1. , 194. ],
[ 50.57536 , 8.67741 , 1. , 282. ],
[ 50.5755255, 8.6884584, 0. , 276. ],
[ 50.5755273, 8.674282 , 3. , 167. ],
[ 50.57553 , 8.6826 , 2. , 273. ],
[ 50.5755973, 8.6847492, 0. , 168. ],
[ 50.5756757, 8.6846139, 4. , 255. ],
[ 50.57572 , 8.65965 , 0. , 66. ],
[ 50.57591 , 8.68175 , 1. , 187. ]])
all_persons = pd.DataFrame(data=array_data) # convert back to dataframe
all_persons.rename(columns={0: 'latitude', 1: 'longitude', 2:'floor', 3:'id'}, inplace=True) # rename columns
How can I create this column? As you can see, my approach doesn't work correctly.
This was my approach: Google Colab Link to my solution
temp = ()
temp += (pd.concat([df.loc[users group 1]], keys=[1], names=['group']),)
temp += (pd.concat([df.loc[users group 2]], keys=[2], names=['group']),)
temp += (pd.concat([df.loc[users group 3]], keys=[3], names=['group']),)
df = pd.concat(temp)
Of course you can do this in a loop and locate the users you need in a more elegant way.
Related
Input:
array([[ 1. , 5. , 1. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ]])
Expected Output:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
i tried doing the below:
for i in M:
ls = i.mean()
x = np.append(i,ls)
print(x) #found the mean
After this i am unable to arrange each column based on the mean value in each row. All i can do
is to arrange each row in descending order but that is not what i wanted.
You can do this:
In [405]: row_idxs = np.argsort(np.mean(a * -1, axis=1))
In [406]: a[row_idxs, :]
Out[406]:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
Using argsort will sort the indices. Multiplying by -1 allows you to get descending order.
I have this numpy array:
array(
[[ 1. , 9. , 565.98653513],
[ 1. , 1. , 973.18466261],
[ 1. , 25. , 803.17747373],
[ 2. , 9. , 82.56336897],
[ 2. , 1. , 104.69517373],
[ 2. , 25. , 627.01127514],
[ 3. , 21. , 334.07622382],
[ 3. , 34. , 921.37623107],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 10. , 0. , 232.71475407],
[ 10. , 1. , 330.44846202]])
But I want to sort the matrix by the first column in a cycle: 1, 2, 3, ...,10. It should look like this:
array(
[[ 1. , 9. , 565.98653513],
[ 2. , 9. , 82.56336897],
[ 3. , 21. , 334.07622382],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 1. , 1. , 973.18466261],
[ 2. , 1. , 104.69517373],
[ 3. , 34. , 921.37623107],
... ... ... ...
[ 10. , 0. , 232.71475407],
[ 1. , 25. , 803.17747373],
[ 2. , 25. , 627.01127514],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 1. , 330.44846202]])
How can I do this?
I was thinking of converting it to a dataframe (i.e. pandas) for more sorting options, then covert back to an array...but I don't see a straight forward function to do this.
I appreciate any help or ideas.
Assuming your array is named x:
y = [x[i::10] for i in range(int(len(x)/10))]
y = np.array(y)
y.reshape(x.shape)
print(y)
The x[i:j:k] notation means x from i to j with step k. So x[i::10] mean x from i to the end with step 10.
See more here.
I though this would be super easy but I am struggling a little. I have a data structure as follows
array([[ 5. , 3.40166205],
[ 10. , 2.72778882],
[ 15. , 2.31881804],
[ 20. , 2.50643777],
[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 1.50087647]])
And I want to sort it ONLY on the first column ... so it is ordered as follows:
array([[1. , 3.9],
[2. , 3.8],
... ,
[20. , 1.5]])
np.sort doesn't seem to work as it moves array to a flat structure. I've also used itemgetter
from operator import itemgetter
sorted(data, key=itemgetter(1))
But this doesn't give me the output I'm looking for.
Help appreciated!
This is a common numpy idiom. You can use argsort (on the first column) + numpy indexing here -
x[x[:, 0].argsort()]
array([[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 5. , 3.40166205],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 10. , 2.72778882],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 15. , 2.31881804],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 2.50643777],
[ 20. , 1.50087647]])
This question already has answers here:
Sorting a 2D numpy array by multiple axes
(7 answers)
Closed 7 years ago.
So I've a 2d array, that when sorted by the second column using a[np.argsort(-a[:,1])] looks like this:
array([[ 30. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 21. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
Now I want to sort this by the lowest "id" column so it looks like this:
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
I can't figure out how to do it, even if I take the top highest percentages from the first and then order them.
You can use np.lexsort for this:
>>> a[np.lexsort((a[:, 0], -a[:, 1]))]
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
This sorts by -a[:, 1], then by a[:, 0], returning an array of indices than you can use to index a.
I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?
One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.