Assign people which have different locations to a group

Assign people which have different locations to a group - python

I have a pandas dataframe with people who live in different locations (latitude, longitude, floor number). I would like to assign 3 people to a group. The groups should be numbered from 1 to n. Important: These 3 people share different locations in terms of latitude, longitude and floor. This means, at the end of this process, every person is assigned to one particular group. My dataframe has the length of multiples of 9 (e.g 18 people).
Example:
Here is my dataframe:
array_data=([[ 50.56419 , 8.67667 , 2. , 160. ],
[ 50.5740356, 8.6718179, 1. , 5. ],
[ 50.5746321, 8.6831284, 3. , 202. ],
[ 50.5747453, 8.6765588, 4. , 119. ],
[ 50.5748992, 8.6611471, 2. , 260. ],
[ 50.5748992, 8.6611471, 3. , 102. ],
[ 50.575 , 8.65985 , 2. , 267. ],
[ 50.5751 , 8.66027 , 2. , 7. ],
[ 50.5751 , 8.66027 , 2. , 56. ],
[ 50.57536 , 8.67741 , 1. , 194. ],
[ 50.57536 , 8.67741 , 1. , 282. ],
[ 50.5755255, 8.6884584, 0. , 276. ],
[ 50.5755273, 8.674282 , 3. , 167. ],
[ 50.57553 , 8.6826 , 2. , 273. ],
[ 50.5755973, 8.6847492, 0. , 168. ],
[ 50.5756757, 8.6846139, 4. , 255. ],
[ 50.57572 , 8.65965 , 0. , 66. ],
[ 50.57591 , 8.68175 , 1. , 187. ]])
all_persons = pd.DataFrame(data=array_data) # convert back to dataframe
all_persons.rename(columns={0: 'latitude', 1: 'longitude', 2:'floor', 3:'id'}, inplace=True) # rename columns
How can I create this column? As you can see, my approach doesn't work correctly.
This was my approach: Google Colab Link to my solution

temp = ()
temp += (pd.concat([df.loc[users group 1]], keys=[1], names=['group']),)
temp += (pd.concat([df.loc[users group 2]], keys=[2], names=['group']),)
temp += (pd.concat([df.loc[users group 3]], keys=[3], names=['group']),)
df = pd.concat(temp)
Of course you can do this in a loop and locate the users you need in a more elegant way.

Related

How can i sort an array based on the mean of each column in python?

Input:
array([[ 1. , 5. , 1. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ]])
Expected Output:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
i tried doing the below:
for i in M:
ls = i.mean()
x = np.append(i,ls)
print(x) #found the mean
After this i am unable to arrange each column based on the mean value in each row. All i can do
is to arrange each row in descending order but that is not what i wanted.

You can do this:
In [405]: row_idxs = np.argsort(np.mean(a * -1, axis=1))
In [406]: a[row_idxs, :]
Out[406]:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
Using argsort will sort the indices. Multiplying by -1 allows you to get descending order.

Numpy sorting a matrix by column by cycle

I have this numpy array:
array(
[[ 1. , 9. , 565.98653513],
[ 1. , 1. , 973.18466261],
[ 1. , 25. , 803.17747373],
[ 2. , 9. , 82.56336897],
[ 2. , 1. , 104.69517373],
[ 2. , 25. , 627.01127514],
[ 3. , 21. , 334.07622382],
[ 3. , 34. , 921.37623107],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 10. , 0. , 232.71475407],
[ 10. , 1. , 330.44846202]])
But I want to sort the matrix by the first column in a cycle: 1, 2, 3, ...,10. It should look like this:
array(
[[ 1. , 9. , 565.98653513],
[ 2. , 9. , 82.56336897],
[ 3. , 21. , 334.07622382],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 1. , 1. , 973.18466261],
[ 2. , 1. , 104.69517373],
[ 3. , 34. , 921.37623107],
... ... ... ...
[ 10. , 0. , 232.71475407],
[ 1. , 25. , 803.17747373],
[ 2. , 25. , 627.01127514],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 1. , 330.44846202]])
How can I do this?
I was thinking of converting it to a dataframe (i.e. pandas) for more sorting options, then covert back to an array...but I don't see a straight forward function to do this.
I appreciate any help or ideas.

Assuming your array is named x:
y = [x[i::10] for i in range(int(len(x)/10))]
y = np.array(y)
y.reshape(x.shape)
print(y)
The x[i:j:k] notation means x from i to j with step k. So x[i::10] mean x from i to the end with step 10.
See more here.

Sort 2D NumPy array by one of the columns

I though this would be super easy but I am struggling a little. I have a data structure as follows
array([[ 5. , 3.40166205],
[ 10. , 2.72778882],
[ 15. , 2.31881804],
[ 20. , 2.50643777],
[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 1.50087647]])
And I want to sort it ONLY on the first column ... so it is ordered as follows:
array([[1. , 3.9],
[2. , 3.8],
... ,
[20. , 1.5]])
np.sort doesn't seem to work as it moves array to a flat structure. I've also used itemgetter
from operator import itemgetter
sorted(data, key=itemgetter(1))
But this doesn't give me the output I'm looking for.
Help appreciated!

This is a common numpy idiom. You can use argsort (on the first column) + numpy indexing here -
x[x[:, 0].argsort()]
array([[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 5. , 3.40166205],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 10. , 2.72778882],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 15. , 2.31881804],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 2.50643777],
[ 20. , 1.50087647]])

Sort NumPy array by values in two columns [duplicate]

This question already has answers here:
Sorting a 2D numpy array by multiple axes
(7 answers)
Closed 7 years ago.
So I've a 2d array, that when sorted by the second column using a[np.argsort(-a[:,1])] looks like this:
array([[ 30. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 21. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
Now I want to sort this by the lowest "id" column so it looks like this:
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
I can't figure out how to do it, even if I take the top highest percentages from the first and then order them.

You can use np.lexsort for this:
>>> a[np.lexsort((a[:, 0], -a[:, 1]))]
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
This sorts by -a[:, 1], then by a[:, 0], returning an array of indices than you can use to index a.

NumPy: removing rows in an array if one column's value does not match

I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?

One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Assign people which have different locations to a group - python

Related

How can i sort an array based on the mean of each column in python?

Numpy sorting a matrix by column by cycle

Sort 2D NumPy array by one of the columns

Sort NumPy array by values in two columns [duplicate]

NumPy: removing rows in an array if one column's value does not match

Categories

Resources