how to get the following result - python

these are the index:
Index= [2, 3, 4, 6]
these are the frequency of the index, this two arrays are related by the position for instance the first element of the array Index is 2 and has frequency 2 since the element of the position 2 of the array Frequency is 2.
Frequency=[2, 2, 2, 2, 2, 1, 1]
I need to get the following array labels:
labels=[2, 2, 3, 3, 4, 4, 6]
In order to get it I did the following code:
labels=[]
for index in Index:
Counter=Frequency[index]
for i in range(Counter):
labels.append(index)
print(labels)
labels=[2, 2, 3, 3, 4, 4, 6]
are there any other form to optimize this process ?

Assuming the frecuency list is the same length as the TrainIndex list:
frecuency = [2,2,2,1]
TrainIndex = [9,4,5,8]
[g for sublist in [[i]*f for (i,f) in zip(TrainIndex,frecuency)] for g in sublist]
[9, 9, 4, 4, 5, 5, 8]

Related

Relative size in list python

I have a list of integers. Each number can appear several times, the list is unordered.
I want to get the list of relative sizes. Meaning, if for example the original list is [2, 5, 7, 7, 3, 10] then the desired output is [0, 2, 3, 3, 1, 4]
Because 2 is the zero'th smallest number in the original list, 3 is one'th, etc.
Any clear easy way to do this?
Try a list comprehension with dictionary and also use set for getting unique values, like below:
>>> lst = [2, 5, 7, 7, 3, 10]
>>> newl = dict(zip(range(len(set(lst))), sorted(set(lst))))
>>> [newl[i] for i in lst]
[0, 2, 3, 3, 1, 4]
>>>
Or use index:
>>> lst = [2, 5, 7, 7, 3, 10]
>>> newl = sorted(set(lst))
>>> [newl.index(i) for i in lst]
[0, 2, 3, 3, 1, 4]
>>>

How do I get the index of the common integer element from two separate lists and plug it to another list?

I have 3 lists.
A_set = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Q_act = [2, 3]
dur = [0, 4, 5, 2, 1, 3, 4, 8, 2, 3]
All lists are integers.
What I am trying to do is to compare Q_act with A_set then obtain the indices of the numbers that match from A_set.
(Example:
Q_act has the elements [2,3]
it is located in indices [1,2] from A_set)
Afterwards, I will use those indices to obtain the corresponding value in dur and store this in a list called p_dur_Q_act.
(Example: using the result from the previous example, [1,2]
The values in the dur list corresponding to the indices [1,2] should be stored in another list called p_dur_Q_act
i.e. [4,5] should be the values stored in the list p_dur_Q_act)
So, how do I get the index of the common integer element (which is [1,2]) from two separate lists and plug it to another list?
So far here are the code(s) I used:
This one, I wrote because it returns the index. But not [4,5].
p_Q = set(Q_act).intersection(A_set)
p_dur_Q_act = [i + 1 for i, x in enumerate(p_Q)]
print(p_dur_Q_act)
I also tried this but I receive an error TypeError: argument of type 'int' is not iterable
p_dur_Q_act = [i + 1 for i, x in enumerate(Q_act) if any(elem in x for elem in A_set)]
print(p_dur_Q_act)
Another option is to use the enumerate iterator to generate every index, and then select only the ones you want:
a_set = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
q_act = [2, 3]
dur = [0, 4, 5, 2, 1, 3, 4, 8, 2, 3]
p_dur_q_act = [i for i,v in enumerate(a_set) if v in q_act]
print([dur[p] for p in p_dur_q_act if p in dur]) # [4, 5]
This is more efficient than repeatedly calling index if the number of matches is large, because the number of calls is proportional to the number of matches, but the duration of calls is proportional to the length of a_set. The enumerate approach can be made even more efficient by turning q_act into a set, since in scales better with sets than lists. At these scales, though, there will be no observable difference.
You don't need to map these to index values, though. You can get the same result if you use zip to map a_set to dur and then select the d values whose a values are in q_act.
a_set = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
q_act = {2, 3}
dur = [0, 4, 5, 2, 1, 3, 4, 8, 2, 3]
p_dur_q_act = [d for a, d in zip(a_set, dur) if a in q_act]
Use index function to get the index of the element in the list.
>>> a_set = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> q_act = [2, 3]
>>> dur = [0, 4, 5, 2, 1, 3, 4, 8, 2, 3]
>>>
>>> print([dur[a_set.index(q)] for q in set(a_set).intersection(q_act)])
[4, 5]

Python Pandas rolling aggregate a column of lists

I have a simple dataframe df with a column of lists lists. I would like to generate an additional column based on lists.
The df looks like:
import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df
lists
1 [1]
2 [1, 2, 3]
3 [2, 9, 7, 9]
4 [2, 7, 3, 5]
I would like df to look like this:
df
Out[9]:
lists rolllists
1 [1] [1]
2 [1, 2, 3] [1, 1, 2, 3]
3 [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
Basically I want to 'sum'/append the rolling 2 lists. Note that row 1, because I only have 1 list 1, rolllists is that list. But in row 2, I have 2 lists that I want appended. Then for row three, append df[2].lists and df[3].lists etc. I have worked on similar things before, reference this:Pandas Dataframe, Column of lists, Create column of sets of cumulative lists, and record by record differences.
In addition, if we can get this part above, then I want to do this in a groupby (so the example below would be 1 group for example, so for instance the df might look like this in the groupby):
Group lists rolllists
1 A [1] [1]
2 A [1, 2, 3] [1, 1, 2, 3]
3 A [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 A [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
5 B [1] [1]
6 B [1, 2, 3] [1, 1, 2, 3]
7 B [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
8 B [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
I have tried various things like df.lists.rolling(2).sum() and I get this error:
TypeError: cannot handle this type -> object
in Pandas 0.24.1 and unfortunatley in Pandas 0.22.0 the command doesn't error, but instead returns the exact same values as in lists. So Looks like newer versions of Pandas can't sum lists? That's a secondary issue.
Love any help! Have Fun!
You can start with
import pandas as pd
mylists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
mydf=pd.DataFrame.from_dict(mylists,orient='index')
mydf=mydf.rename(columns={0:'lists'})
mydf = pd.concat([mydf, mydf], axis=0, ignore_index=True)
mydf['group'] = ['A']*4 + ['B']*4
# initialize your new series
mydf['newseries'] = mydf['lists']
# define the function that appends lists overs rows
def append_row_lists(data):
for i in data.index:
try: data.loc[i+1, 'newseries'] = data.loc[i, 'lists'] + data.loc[i+1, 'lists']
except: pass
return data
# loop over your groups
for gp in mydf.group.unique():
condition = mydf.group == gp
mydf[condition] = append_row_lists(mydf[condition])
Output
lists Group newseries
0 [1] A [1]
1 [1, 2, 3] A [1, 1, 2, 3]
2 [2, 9, 7, 9] A [1, 2, 3, 2, 9, 7, 9]
3 [2, 7, 3, 5] A [2, 9, 7, 9, 2, 7, 3, 5]
4 [1] B [1]
5 [1, 2, 3] B [1, 1, 2, 3]
6 [2, 9, 7, 9] B [1, 2, 3, 2, 9, 7, 9]
7 [2, 7, 3, 5] B [2, 9, 7, 9, 2, 7, 3, 5]
How about this?
rolllists = [df.lists[1].copy()]
for row in df.iterrows():
index, values = row
if index > 1: # or > 0 if zero-indexed
rolllists.append(df.loc[index - 1, 'lists'] + values['lists'])
df['rolllists'] = rolllists
Or as a slightly more extensible function:
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
def rolling_lists(df, roll_period=2):
new_roll, rolllists = [], [df.lists[1].copy()] * (roll_period - 1)
for row in df.iterrows():
index, values = row
if index > roll_period - 1: # or -2 if zero-indexed
res = []
for i in range(index - roll_period, index):
res.append(df.loc[i + 1, 'lists']) # or i if 0-indexed
rolllists.append(res)
for li in rolllists:
while isinstance(li[0], list):
li = [item for sublist in li for item in sublist] # flatten nested list
new_roll.append(li)
df['rolllists'] = new_roll
return df
Easily extensible to groupby as well, just wrap it in a function and use df.apply(rolling_lists). You can give any number of rolling rows to use as roll_period. Hope this helps!

Union list of lists without duplicates

I have got list of lists. I need to get all combinations of that lists from 2 of N to N of N.
I'm searching for it with itertools.combinations. After this I got list of lists and I need to combine them without duplicates.
For example I have got array:
a = np.array([[1,4,7],[8,2,5],[8,1,4,6],[8,1,3,5],
[2,3,4,7],[2,5,6,7],[2,3,4,6,8],[1,3,5,6,7]])
I'm searching for all 3 elements combinations:
a2 = list(itertools.combinations(a, 3))
a2[:5]
[([1, 4, 7], [8, 2, 5], [8, 1, 4, 6]),
([1, 4, 7], [8, 2, 5], [8, 1, 3, 5]),
([1, 4, 7], [8, 2, 5], [2, 3, 4, 7]),
([1, 4, 7], [8, 2, 5], [2, 5, 6, 7]),
([1, 4, 7], [8, 2, 5], [2, 3, 4, 6, 8])]
The length of this array: 56.
I need to combine every list in this array without duplicates.
For exmple for a2[0] input:
([1, 4, 7], [8, 2, 5], [8, 1, 4, 6])
output:
[1, 2, 4, 5, 6, 7, 8]
And so all 56 elements.
I tried to do it with set:
arr = list(itertools.combinations(a,3))
for i in arr:
arrnew[i].append(list(set().union(arr[i][:3])))
But I had got error:
TypeError Traceback (most recent call last)
<ipython-input-75-4049ddb4c0be> in <module>()
3 arrnew = []
4 for i in arr:
----> 5 for j in arr[i]:
6 arrnew[i].append(list(set().union(arr[:n])))
TypeError: list indices must be integers or slices, not tuple
I need function for N combinations, that returns new combined array.
But I don't know how to do this because of this error.
Is there way to solve this error or another way to solve this task?
A small function which solves it:
def unique_comb(a):
return list(set(itertools.chain(*a)))
For example:
unique_comb(([1, 4, 7], [8, 2, 5], [8, 1, 4, 6]))
If you want to pass a list as an argument to the function, rather than a list inside a tuple, just remove the * (which unpacks the list).
If you want to apply it to the entire array in one statement without defining a function:
a3 = [list(set(itertools.chain(*row))) for row in a2]
Flatting a tuple of lists:
from itertools import chain
new_tuple = [ list(set(chain.from_iterable(each_tuple))) for each_tuple in main_tuple_coll ]
I think this might solve your problem.
Flatten list combinations
comb = []
for line in a2[:3]:
l = list(set([x for y in line for x in y]))
comb.append(l)
comb
[out]
[[1, 2, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 7, 8], [1, 2, 3, 4, 5, 7, 8]]
The issue with:
arr = list(itertools.combinations(a,3))
for i in arr:
arrnew[i].append(list(set().union(arr[i][:3])))
Is that i is not the index of the item but the item in the list itself.
What you need is:
import itertools
import numpy as np
a = np.array([[1,4,7],[8,2,5],[8,1,4,6],[8,1,3,5],
[2,3,4,7],[2,5,6,7],[2,3,4,6,8],[1,3,5,6,7]])
arrnew = []
for item in itertools.combinations(a,3):
arrnew.append(list(set().union(*item)))
The result arrnew contains 56 items. Some are equal but none contain duplicates.
I would suggest using sorted rather than list to ensure that the items in each combined list are in ascending order.

Fastest way to count identical sub-arrays in a nd-array?

Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
In unumpy >= 1.9.0, np.unique has a return_counts keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
You can use Counter class from collections module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure.
If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})

Categories