Related
According to this post np.argsort() would be the function I am looking for.
However, this is not giving me my desire result.
Below is the R code that I am trying to convert to Python and my current Python code.
R Code
data.frame %>% select(order(colnames(.)))
Python Code
dataframe.iloc[numpy.array(dataframe.columns).argsort()]
The dataframe I am working with is 1,000,000+ rows and 42 columns, so I can not exactly re-create the output.
But I believe I can re-create the order() outputs.
From my understanding each number represents the original position in the columns list
order(colnames(data.frame)) returns
3,2,5,6,8,4,7,10,9,11,12,13,14,15,16,17,18,19,23,20,21,22,1,25,26,28,24,27,38,29,34,33,36,30,31,32,35,41,42,39,40,37
numpy.array(dataframe.columns).argsort() returns
2,4,5,7,3,6,9,8,10,11,12,13,14,15,16,17,18,22,19,20,21,0,24,25,27,23,26,37,28,33,32,35,29,30,31,34,40,41,38,39,36,1
I know R does not have 0 index like python, so I know the first two numbers 3 and 2 are the same.
I am looking for python code that could potentially return the same ordering at the R code.
Do you have mixed case? This is handled differently in python and R.
R:
order(c('a', 'b', 'B', 'A', 'c'))
# [1] 1 4 2 3 5
x <- c('a', 'b', 'B', 'A', 'c')
x[order(c('a', 'b', 'B', 'A', 'c'))]
# [1] "a" "A" "b" "B" "c"
Python:
np.argsort(['a', 'b', 'B', 'A', 'c'])+1
# array([4, 3, 1, 2, 5])
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.argsort(x)]
# array(['A', 'B', 'a', 'b', 'c'], dtype='<U1')
You can mimick R's behavior using numpy.lexsort and sorting by lowercase, then by the original array with swapped case:
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.lexsort([np.char.swapcase(x), np.char.lower(x)])]
# array(['a', 'A', 'b', 'B', 'c'], dtype='<U1')
np.argsort is the same thing as R's order.
Just experiment
> x=c(1,2,3,10,20,30,5,15,25,35)
> x
[1] 1 2 3 10 20 30 5 15 25 35
> order(x)
[1] 1 2 3 7 4 8 5 9 6 10
>>> x=np.array([1,2,3,10,20,30,5,15,25,35])
>>> x
array([ 1, 2, 3, 10, 20, 30, 5, 15, 25, 35])
>>> x.argsort()+1
array([ 1, 2, 3, 7, 4, 8, 5, 9, 6, 10])
+1 here is just to have index starting with 1, since output of argsort are index (0-based index).
So maybe the problem comes from your columns (shot in the dark: you have 2d-arrays, and are passing lines to R and columns to python, or something like that).
But np.argsort is R's order.
I am wondering if there's a way to write the string conditional assigning method in a more concise way.
import pandas as pd
import numpy as np
from sklearn import datasets
wine = pd.DataFrame(datasets.load_wine().data)
wine.columns = datasets.load_wine().feature_names
conditions = [(wine.hue > 1), (wine.hue == 1), (wine.hue < 1)]
status = ['A', 'B', 'C']
wine['status_label'] = np.select(conditions, status)
you can use list comprehension !
win_hue = [1, 2, -5, 0, -9, 4, 10, 1, -9, 8]
wine_status = [((wh < 1)*'A' + (wh == 1)*'B' + (wh > 1)*'C') for wh in win_hue]
In [1]: wine_status
Out[1]: ['B', 'C', 'A', 'A', 'A', 'C', 'C', 'B', 'A', 'C']
I have a list and I want to mark the first occurrence of each element as 1, and other occurrences as 0s. How should I do that?
Inital Input:
my_lst = ['a', 'b', 'c', 'c', 'a', 'd']
Expected outputs:
[1,1,1,0,0,1]
You can use itertools.count and collections.defaultdict for the task:
from itertools import count
from collections import defaultdict
my_lst = ['a', 'b', 'c', 'c', 'a', 'd']
d = defaultdict(count)
out = [int(next(d[v])==0) for v in my_lst]
print(out)
Prints:
[1, 1, 1, 0, 0, 1]
If you want a barebones python solution, this monstrosity would work:
[*map(int, map(lambda x, y: x == my_lst.index(y), *zip(*enumerate(my_lst))))]
Out[30]: [1, 1, 1, 0, 0, 1]
For all items in my_lst, it returns 1 if its index is the index of the first occurrence.
you will need to keep track of which items you saw already so here's an example code:
seen_chars = set()
output = []
for c in my_lst:
if c not in seen_chars:
seen_chars.add(c)
output.append(1)
else:
output.append(0)
Hope that helped
l1 = [['a', 'b', 'c'],
['a', 'd', 'c'],
['a', 'e'],
['a', 'd', 'c'],
['a', 'f', 'c'],
['a', 'e'],
['p', 'q', 'r']]
l2 = [1, 1, 1, 2, 0, 0, 0]
I have two lists as represented above. l1 is a list of lists and l2 is another list with some kind of score.
Problem: For all the lists in l1 with a score of 0 (from l2), find those lists which are either entirely different or have the least length.
For example: if i have the lists [1, 2, 3], [2, 3], [5, 7] all with score 0, i will choose [5, 7] because these elements are not present in any other lists and [2, 3] since it has an intersection with [1, 2, 3] but is of a smaller length.
How I do this now:
l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))
un_usable = []
usable = []
for i, j in c:
intersection = len(set(i).intersection(set(j)))
if intersection > 0:
if len(i) < len(j):
usable.append(i)
un_usable.append(j)
else:
usable.append(j)
un_usable.append(i)
for i, j in c:
intersection = len(set(i).intersection(set(j)))
if intersection == 0:
if i not in un_usable and i not in usable:
usable.append(i)
if j not in un_usable and j not in usable:
usable.append(j)
final = lx + [(x, 0) for x in usable]
and final gives me:
[(['a', 'b', 'c'], 1),
(['a', 'd', 'c'], 1),
(['a', 'e'], 1),
(['a', 'd', 'c'], 2),
(['a', 'e'], 0),
(['p', 'q', 'r'], 0)]
which is the required result.
EDIT: to handle equal lengths:
l1 = [['a', 'b', 'c'],
['a', 'd', 'c'],
['a', 'e'],
['a', 'd', 'c'],
['a', 'f', 'c'],
['a', 'e'],
['p', 'q', 'r'],
['a', 'k']]
l2 = [1, 1, 1, 2, 0, 0, 0, 0]
l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))
un_usable = []
usable = []
for i, j in c:
intersection = len(set(i).intersection(set(j)))
if intersection > 0:
if len(i) < len(j):
usable.append(i)
un_usable.append(j)
elif len(i) == len(j):
usable.append(i)
usable.append(j)
else:
usable.append(j)
un_usable.append(i)
usable = [list(x) for x in set(tuple(x) for x in usable)]
un_usable = [list(x) for x in set(tuple(x) for x in un_usable)]
for i, j in c:
intersection = len(set(i).intersection(set(j)))
if intersection == 0:
if i not in un_usable and i not in usable:
usable.append(i)
if j not in un_usable and j not in usable:
usable.append(j)
final = lx + [(x, 0) for x in usable]
Is there a better, faster & pythonic way of achieving the same?
Assuming I understood everything correctly, here is an O(N) two-pass algorithm.
Steps:
Select lists with zero score.
For each element of each zero-score list, find the length of the shortest zero-score list in which the element occurs. Let's call this the length score of the element.
For each list, find the minimum of length scores of all elements of the list. If the result is less than the length of the list, the list is discarded.
def select_lsts(lsts, scores):
# pick out zero score lists
z_lsts = [lst for lst, score in zip(lsts, scores) if score == 0]
# keep track of the shortest length of any list in which an element occurs
len_shortest = dict()
for lst in z_lsts:
ln = len(lst)
for c in lst:
len_shortest[c] = min(ln, len_shortest.get(c, float('inf')))
# check if the list is of minimum length for each of its chars
for lst in z_lsts:
len_lst = len(lst)
if any(len_shortest[c] < len_lst for c in lst):
continue
yield lst
This question already has answers here:
Loop "Forgets" to Remove Some Items [duplicate]
(10 answers)
Closed 8 years ago.
def check(temp):
for i in temp:
if type(i) == str:
temp.remove(i)
temp = ['a', 'b']
print(temp) ==> Output: ['a','b']
check(temp)
print(temp) ==> Output: ['b']
When run with
temp = [ 'a', 1 ], Output is [1]
temp = [ 1, 'a', 'b', 'c', 2 ], Output is [ 1, 'b', 2 ]
Could someone care to explain how the result is evaluated.. Thnx
You are modifying the list while iterating it. It will skip elements because the list changes during the iteration. Removing items with list.remove() will also remove the first occurence of that element, so there might be some unexpected results.
The canonical way of removing elements from a list is to construct a new list, like so:
>>> def check(temp):
... return list(x for x in temp if not isinstance(x, str))
Or you could return a regular list comprehension:
>>> def check(temp):
... return [x for x in temp if not isinstance(x, str)]
You should generally test for types with isinstance() instead of type(). type has no idea about inheritance for example.
Examples:
>>> check(['a', 'b', 1])
[1]
>>> check([ 1, 'a', 'b', 'c', 2 ])
[1, 2]
>>> check(['a', 'b', 'c', 'd'])
[]
You can use ,
def check(temp):
return [i for i in temp if type(i)!=str]
temp = [ 1, 'a', 'b', 'c', 2 ]
print check(temp)
Output:
[1, 2]
OR
def check(temp):
return [i for i in temp if not isinstance(i, str)]
temp = [ 1, 'a', 'b', 'c', 2 ,"e",4,5,6,7]
print check(temp)
output:
[1, 2, 4, 5, 6, 7]
>>> text = ['a', 'b', 1, {}]
>>> filter(lambda x: type(x) == str, text)
['a', 'b']
function will be like:
>>> def check(temp):
... return list(filter(lambda x: type(x) == str, temp))
...
>>> check(text)
['a', 'b']