Python Equivalent for R's order function - python

According to this post np.argsort() would be the function I am looking for.
However, this is not giving me my desire result.
Below is the R code that I am trying to convert to Python and my current Python code.
R Code
data.frame %>% select(order(colnames(.)))
Python Code
dataframe.iloc[numpy.array(dataframe.columns).argsort()]
The dataframe I am working with is 1,000,000+ rows and 42 columns, so I can not exactly re-create the output.
But I believe I can re-create the order() outputs.
From my understanding each number represents the original position in the columns list
order(colnames(data.frame)) returns
3,2,5,6,8,4,7,10,9,11,12,13,14,15,16,17,18,19,23,20,21,22,1,25,26,28,24,27,38,29,34,33,36,30,31,32,35,41,42,39,40,37
numpy.array(dataframe.columns).argsort() returns
2,4,5,7,3,6,9,8,10,11,12,13,14,15,16,17,18,22,19,20,21,0,24,25,27,23,26,37,28,33,32,35,29,30,31,34,40,41,38,39,36,1
I know R does not have 0 index like python, so I know the first two numbers 3 and 2 are the same.
I am looking for python code that could potentially return the same ordering at the R code.

Do you have mixed case? This is handled differently in python and R.
R:
order(c('a', 'b', 'B', 'A', 'c'))
# [1] 1 4 2 3 5
x <- c('a', 'b', 'B', 'A', 'c')
x[order(c('a', 'b', 'B', 'A', 'c'))]
# [1] "a" "A" "b" "B" "c"
Python:
np.argsort(['a', 'b', 'B', 'A', 'c'])+1
# array([4, 3, 1, 2, 5])
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.argsort(x)]
# array(['A', 'B', 'a', 'b', 'c'], dtype='<U1')
You can mimick R's behavior using numpy.lexsort and sorting by lowercase, then by the original array with swapped case:
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.lexsort([np.char.swapcase(x), np.char.lower(x)])]
# array(['a', 'A', 'b', 'B', 'c'], dtype='<U1')

np.argsort is the same thing as R's order.
Just experiment
> x=c(1,2,3,10,20,30,5,15,25,35)
> x
[1] 1 2 3 10 20 30 5 15 25 35
> order(x)
[1] 1 2 3 7 4 8 5 9 6 10
>>> x=np.array([1,2,3,10,20,30,5,15,25,35])
>>> x
array([ 1, 2, 3, 10, 20, 30, 5, 15, 25, 35])
>>> x.argsort()+1
array([ 1, 2, 3, 7, 4, 8, 5, 9, 6, 10])
+1 here is just to have index starting with 1, since output of argsort are index (0-based index).
So maybe the problem comes from your columns (shot in the dark: you have 2d-arrays, and are passing lines to R and columns to python, or something like that).
But np.argsort is R's order.

Related

How to update original array with groupby in python

I have a dataset and I am trying to iterate each group and based on each group, I am trying to update original groups:
import pandas as pd
import numpy as np
arr = np.array([1, 2, 4, 7, 11, 16, 22, 29, 37, 46])
df = pd.DataFrame({'grain': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']})
df["target"] = arr
for group_name, b in df.groupby("grain"):
if group_name == "A":
// do some processing
if group_name == "B":
// do another processing
I expect to see original df is updated. Is there any way to do it?
Here is a way to change the original data, this example requires a non-duplicate index. I am not sure what would be the benefit of this approach compared to using classical pandas operations.
import pandas as pd
import numpy as np
arr = np.array([1, 2, 4, 7, 11, 16, 22, 29, 37, 46])
df = pd.DataFrame({'grain': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']})
df["target"] = arr
for g_name, g_df in df.groupby("grain"):
if g_name == "A":
df.loc[g_df.index, 'target'] *= 10
if g_name == "B":
df.loc[g_df.index, 'target'] *= -1
Output:
>>> df
grain target
0 A 10
1 B -2
2 A 40
3 B -7
4 A 110
5 B -16
6 A 220
7 B -29
8 A 370
9 B -46

python: sort array when sorting other array

I have two arrays:
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
These two array are linked (in the sense that there is a 1-1 correspondence between the elements of the two arrays), so when i sort a by decreasing order I would like to sort b in the same order.
For instance, when I do:
a = np.sort(a)[::-1]
I get:
a = [6, 4, 3, 2, 1]
and I would like to be able to get also:
b = ['g', 'e', 'd', 'f', 'c']
i would do smth like this:
import numpy as np
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
idx_order = np.argsort(a)[::-1]
a = a[idx_order]
b = b[idx_order]
output:
a = [6 4 3 2 1]
b = ['g' 'e' 'd' 'f' 'c']
I don't know how or even if you can do this in numpy arrays. However there is a way using standard lists albeit slightly convoluted. Consider this:-
a = [1, 3, 4, 2, 6]
b = ['c', 'd', 'e', 'f', 'g']
assert len(a) == len(b)
c = []
for i in range(len(a)):
c.append((a[i], b[i]))
r = sorted(c)
for i in range(len(r)):
a[i], b[i] = r[i]
print(a)
print(b)
In your problem statement, there is no relationship between the two tables. What happens here is that we make a relationship by grouping relevant data from each table into a temporary list of tuples. In this scenario, sorted() will carry out an ascending sort on the first element of each tuple. We then just rebuild our original arrays

Using a list of dictionaries to associate letters with numbers in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am about a week into learning Python and I am very confused about making dictionaries (or if I should even be using a dictionary for this). I've searched all over for a similar problem, but it's possible I just don't understand well enough to recognize an applicable answer.
What I am trying to do is to associate a letter with a unique "score" (this score is a number). So for a toy example, for ABCA with scores of 1, 20, 10, 5...the first A = 1, B = 20, C = 10, last A = 5. The end goal is to then remove the letters with the low "scores"/numbers.
My data in a list is something like this:
x_list = ['ABCDEABCDE10 2 3 4 5 6 7 8 9 1', 'EDCABB6 9 8 8 8 6 9', etc.]
Similar to the toy example, I want A = 10, B = 2, C = 3, etc. in the first string and E = 6, D = 8, C = 8, etc. in the second string. So I think I want to make a dictionary were the letters are keys and numbers are values? And then a list of dictionaries? What I am thinking is something like:
dictionary1 = {A:10, B:2, C:3, D:4, E:5, A:6, B:7, C:8, D:9, E:1}
dictionary2 = {E:9, D:8, C:8, A:8, B:6, B:9}
dictionary_list = (dictionary1, dictionary2)
And then be able to remove all of the values lower than 5 from the original list.
final_list = []
for each_list in dictionary_list:
if value > 5 in each_list:
final_list.append(each_list)
final_list = [[A,A,B,C,D], [E,D,C,A,B,B]]
I've tried looping through x_list with for loops to get the result, but I can't figure out how to get the numbers to line up with the values without using a dictionary.
Any help is very much appreciated!
(This is also my first time posting so please let me know if I make any newbie errors either in coding or if I shouldn't be using dictionaries at all for this.)
*Edited to improve clarity
I'm not sure I understand the problem you are trying to solve correctly, so let me try to rephrase it. Let's focus on a single list for now. If I understand correctly, you are looking at lists that look like this.
l = ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 1, 2, 3, 4, 5, 6, 7, 8, 9, 1]
Here I assume that we have exactly as many letters (or strings) as we have numbers (or integers).
You then want to associate each letter in the first half of the list with a number from the second half of the list. So you want to look at pairs of letters and numbers. As is it turns out, Python supports tuples as a data type and pairs are just tuples with two elements. To match up the letters with the numbers, you could do the following:
letters = l[:len(l)//2] # = ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E']
numbers = l[len(l)//2:] # = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1]
pairs = list(zip(letters, numbers)) # = [('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', 5), ('A', 6), ('B', 7), ('C', 8), ('D', 9), ('E', 1)]
In the first two lines I use slices (see for example here) to split the list into two halves. Then I use zip to create pairs from the resulting lists.
To then get all letters that are associated with an integer less than k, you could do the following.
k = 5 # or whatever you choose
result = [letter for letter, number in pairs if number < k] # = ['A', 'B', 'C', 'D', 'E']
Here, I am using a list comprehension to generate the result list.
To do all of this on a list of lists, you can wrap the code in a function:
def f(input_list, threshold):
letters = input_list[:len(input_list)//2]
numbers = input_list[len(input_list)//2:]
pairs = list(zip(letters, numbers))
return [letter for letter, number in pairs if number < threshold]
You can then use another list comprehension to apply the function to each list in a list of lists.
l = [['A', 'B', 100, 2], ['C', 'D', 12, 42]]
threshold = 32
result = [f(input_list, threshold) for input_list in l] # = [['B'], ['C']]
Finally, dictionaries are probably not the right data structure for this particular problem. In a dictionary, you associate keys with values and each key can only have exactly one value. In your example above, you have two different occurrences of the letter 'A' and you associate them with the numbers 1 and 6, respectively. Therefore, using dictionaries wouldn't be very natural. (That said, you could define the value associated with the key 'A' to be a list or set of numbers, but I don't think that would necessarily lead to a better solution of your problem.)
You can convert each character to an ascii integer value using ord. In ascii, all of the characters values are right next to eachother, so something like ord('B') > ord('A') will always be true. You can use this to filter the list. e.g.
>>> [c for c in 'ABCDEFGHIJK' if ord(c) > ord('E')]
['F', 'G', 'H', 'I', 'J', 'K']
In dictionary, key are immutable. so, you can't create dictionary with repeated keys. Instead of that, you can create list of dictionary like this,
[{'A': [2, 7], 'B': [3, 8], 'C': [4, 9], 'D': [5, 1], 'E': [6]}, {'E': [9], 'D': [8], 'C': [8], 'A': [8], 'B': [6, 9]}]
you can achieve it by following code ;
x_list = [['ABCDEABCDE1', 2, 3, 4, 5, 6, 7, 8, 9, 1], ['EDCABB6', 9, 8, 8, 8, 6, 9]]
result=[]
for i in x_list:
string=i[0]
print(string)
l=i[1:]
print
d={}
for i,j in zip(string,l):
d.setdefault(i,[]).append(j)
result.append(d)
print(result)
after that, to get filtered list ;
final_result=[]
for i in result:
l1=[]
for key,val in i.items():
for v in val:
if v>=5:
l1.append(key)
final_result.append(l1)
print(final_result)
output:
[['A', 'B', 'C', 'D', 'E'], ['E', 'D', 'C', 'A', 'B', 'B']]

create simple list within a while loop python [duplicate]

This question already has answers here:
Python list doesn't reflect variable change
(6 answers)
Closed 2 years ago.
I want to create a simple list within a while loop in python
I'm using this code
def get_list(input):
create_cell = []
for line in input:
create_cell.append(line)
return create_cell
x=0
c = [x,'a','b']
while x < 5:
new_row = get_list(c)
print (new_row)
x = x + 1
It gives the following output
[0, 'a', 'b']
[0, 'a', 'b']
[0, 'a', 'b']
[0, 'a', 'b']
[0, 'a', 'b']
The output what I want is:
[0, 'a', 'b']
[1, 'a', 'b']
[2, 'a', 'b']
[3, 'a', 'b']
[4, 'a', 'b']
Assigning to x doesn't change c. You need to update that as well:
while x < 5:
new_row = get_list(c)
print (new_row)
x = x + 1
c[0] = x

pandas data frame / numpy array - roll without aggregate function

rolling in python aggregates data:
x = pd.DataFrame([[1,'a'],[2,'b'],[3,'c'],[4,'d']], columns=['a','b'])
y = x.rolling(2).mean()
print(y)
gives:
a b
0 NaN a
1 1.5 b
2 2.5 c
3 3.5 d
what I need is 3 dimension dataframes (or numpy arrays) shifting 3 samples by 1 step (in this example):
[
[[1,'a'],[2,'b'],[3,'c']],
[[2,'b'],[3,'c'],[4,'d']]
]
Whats the right way to do it for 900 samples shifting by 1 each step?
Using np.concantenate
np.concatenate([x.values[:-1],
x.values[1:]], axis=1)\
.reshape([x.shape[0] - 1, x.shape[1], -1])
You can try of concatenating window length associated dataframes based on the window length chosen (as selected 2)
length = df.dropna().shape[0]-1
cols = len(df.columns)
pd.concat([df.shift(1),df],axis=1).dropna().astype(int,errors='ignore').values.reshape((length,cols,2))
Out:
array([[[1, 'a'],
[2, 'b']],
[[2, 'b'],
[3, 'c']],
[[3, 'c'],
[4, 'd']]], dtype=object)
Let me know whether this solution suits your question.
p = x[['a','b']].values.tolist() # create a list of list ,as [i.a,i.b] for every i row in x
#### Output ####
[[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']]
#iterate through list except last two and for every i, fetch p[i],p[i+1],p[i+2] into a list
list_of_3 = [[p[i],p[i+1],p[i+2]] for i in range(len(p)-2)]
#### Output ####
[
[[1, 'a'], [2, 'b'], [3, 'c']],
[[2, 'b'], [3, 'c'], [4, 'd']]
]
# This is used if in case the list you require is numpy ndarray
from numpy import array
a = array(list_of_3)
#### Output ####
[[['1' 'a']
['2' 'b']
['3' 'c']]
[['2' 'b']
['3' 'c']
['4' 'd']]
]
Since pandas 1.1 you can iterate over rolling objects:
[window.values.tolist() for window in x.rolling(3) if window.shape[0] == 3]
The if makes sure we only get full windows. This solution has the advantage that you can use any parameter of the handy rolling function of pandas.

Categories