Append array to beginning of another array - python

I'm attempting to perform a simple task: append an array to the beginning of another array. Here a MWE of what I mean:
a = ['a','b','c','d','e','f','g','h','i']
b = [6,4,1.,2,8,784.,43,6.,2]
c = [8,4.,32.,6,1,7,2.,9,23]
# Define arrays.
a_arr = np.array(a)
bc_arr = np.array([b, c])
# Append a_arr to beginning of bc_arr
print np.concatenate((a_arr, bc_arr), axis=1)
but I keep getting a ValueError: all the input arrays must have same number of dimensions error.
The arrays a_arr and bc_arr come like that from a different process so I can't manipulate the way they are created (ie: I can't use the a,b,c lists).
How can I generate a new array of a_arr and bc_arr so that it will look like:
array(['a','b','c','d','e','f','g','h','i'], [6,4,1.,2,8,784.,43,6.,2], [8,4.,32.,6,1,7,2.,9,23])

Can you do something like.
In [88]: a = ['a','b','c','d','e','f','g','h','i']
In [89]: b = [6,4,1.,2,8,784.,43,6.,2]
In [90]: c = [8,4.,32.,6,1,7,2.,9,23]
In [91]: joined_arr=np.array([a_arr,b_arr,c_arr],dtype=object)
In [92]: joined_arr
Out[92]:
array([['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'],
[6.0, 4.0, 1.0, 2.0, 8.0, 784.0, 43.0, 6.0, 2.0],
[8.0, 4.0, 32.0, 6.0, 1.0, 7.0, 2.0, 9.0, 23.0]], dtype=object)

this should work
In [84]: a=np.atleast_2d(a).astype('object')
In [85]: b=np.atleast_2d(b).astype('object')
In [86]: c=np.atleast_2d(c).astype('object')
In [87]: np.vstack((a,b,c))
Out[87]:
array([[a, b, c, d, e, f, g, h, i],
[6.0, 4.0, 1.0, 2.0, 8.0, 784.0, 43.0, 6.0, 2.0],
[8.0, 4.0, 32.0, 6.0, 1.0, 7.0, 2.0, 9.0, 23.0]], dtype=object)

Related

How to use an index of a dataframe to assign values to a row of a new column?

I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.
My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.
For example:
For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.
Is there a simple way to do this with pandas or vanilla python?
example = [[1.0, 1.0, 1.0, 0.6912982024915187],
[1.0, 1.0, 2.0, 0.16453900411106737],
[1.0, 1.0, 3.0, 0.11362801727310845],
[1.0, 1.0, 4.0, 0.587778444335624],
[1.0, 1.0, 5.0, 0.8455388913351765],
[1.0, 1.0, 6.0, 0.5719366584505648],
[1.0, 1.0, 7.0, 0.4665520044952449],
[1.0, 1.0, 8.0, 0.9105152709573275],
[1.0, 1.0, 9.0, 0.4600099001744885],
[1.0, 1.0, 10.0, 0.8577060884077763],
[1.0, 2.0, 1.0, 0.11550722410813963],
[1.0, 2.0, 2.0, 0.5729090378222077],
[1.0, 2.0, 3.0, 0.43990164344919824],
[1.0, 2.0, 4.0, 0.595242293948498],
[1.0, 2.0, 5.0, 0.443684017624451],
[1.0, 2.0, 6.0, 0.3608135854303052],
[1.0, 2.0, 7.0, 0.28525404982906766],
[1.0, 2.0, 8.0, 0.11561422303194391],
[1.0, 2.0, 9.0, 0.8579134051748011],
[1.0, 2.0, 10.0, 0.540598113345226],
[2.0, 1.0, 1.0, 0.4058570295736075],
[2.0, 1.0, 2.0, 0.9422426000325298],
[2.0, 1.0, 3.0, 0.7918655742964762],
[2.0, 1.0, 4.0, 0.4145753321336241],
[2.0, 1.0, 5.0, 0.5256388261997529],
[2.0, 1.0, 6.0, 0.8140335187050629],
[2.0, 1.0, 7.0, 0.12134416740848841],
[2.0, 1.0, 8.0, 0.9016748379372173],
[2.0, 1.0, 9.0, 0.462241316800442],
[2.0, 1.0, 10.0, 0.7839715857746699],
[2.0, 2.0, 1.0, 0.5300527244824904],
[2.0, 2.0, 2.0, 0.8784844676567194],
[2.0, 2.0, 3.0, 0.14395673182343738],
[2.0, 2.0, 4.0, 0.7606405990262495],
[2.0, 2.0, 5.0, 0.5123048342846208],
[2.0, 2.0, 6.0, 0.25608277502943655],
[2.0, 2.0, 7.0, 0.4264542956426933],
[2.0, 2.0, 8.0, 0.9144976708651866],
[2.0, 2.0, 9.0, 0.875888479621729],
[2.0, 2.0, 10.0, 0.3428732760552141]]
turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively
turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively
pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])
I believe it is a better idea to turnPhase into a dictionary, and then use apply:
turn_dict = {1: [4, 9],
2: [2, 5]}
We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int:
df['id'] = df['id'].astype(int)
df['index'] = df['index'].astype(int)
Finally, apply:
df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)] else 1 , axis=1)

How to split a list into sublists based on unique values of one column?

I have a list and want to split it based on unique values of its last column in python. This is my list:
pnt=[[1.,2.,4.,'AA'], [0.,0.,0.,'AA'], [2.,1.,0.,'AA'],\
[0.,-3.,1.,'BB'], [2.,5.,8.,'BB'], [.1,3.,0.,'CC']]
I want to get it as:
pnt=[[[1.,2.,4.,'AA'], [0.,0.,0.,'AA'], [2.,1.,0.,'AA']],\
[[0.,-3.,1.,'BB'], [2.,5.,8.,'BB']], [[.1,3.,0.,'CC']]]
I read this solution but still cannot solve my issue.
itertools.groupby is the function you need. It needs to be customized to check key from the fourth element of each list.
Then elements and the return value itself needs to be forced to list to see the actual result (else it's optimized as generator functions). We discard the key (with _) because we're only interested in the elements, not the value that was used to group them.
import itertools
pnt=[[1.,2.,4.,'AA'], [0.,0.,0.,'AA'], [2.,1.,0.,'AA'],\
[0.,-3.,1.,'BB'], [2.,5.,8.,'BB'], [.1,3.,0.,'CC']]
print([list(x) for _,x in itertools.groupby(pnt,lambda x:x[3])])
result:
[[[1.0, 2.0, 4.0, 'AA'], [0.0, 0.0, 0.0, 'AA'], [2.0, 1.0, 0.0, 'AA']],
[[0.0, -3.0, 1.0, 'BB'], [2.0, 5.0, 8.0, 'BB']],
[[0.1, 3.0, 0.0, 'CC']]]
note that this method will group the consecutive groups only.
You can use a dict to get the desired output.
Ex:
pnt=[[1.,2.,4.,'AA'], [0.,0.,0.,'AA'], [2.,1.,0.,'AA'],\
[0.,-3.,1.,'BB'], [2.,5.,8.,'BB'], [.1,3.,0.,'CC']]
result = {}
for i in pnt:
result.setdefault(i[-1], []).append(i) # Form Dict with key as last value from list and value as the list
print(list(result.values())) # Get Values of Dict
Output:
[[[1.0, 2.0, 4.0, 'AA'], [0.0, 0.0, 0.0, 'AA'], [2.0, 1.0, 0.0, 'AA']],
[[0.0, -3.0, 1.0, 'BB'], [2.0, 5.0, 8.0, 'BB']],
[[0.1, 3.0, 0.0, 'CC']]]

How to add elements of one array into the same row in another array?

I have two arrays that I want to add together by inserting the elements of each row of the first list into the same row in the second list. So instead of converting two 2x3 matrices into one 4x3 matrix, I want to convert them into one 2x6 matrix. I have tried the following, as well as .append and .extend:
test_1 = [[0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]
test_2 = [[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]]
test_3 = test_1 + test_2
print(test_3)
This gives me the output:
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
However, what I want is:
[[0.0, 0.0, 0.0, 1.0, 2.0, 3.0],[0.0, 0.0, 0.0, 4.0, 5.0, 6.0]]
How do I add the elements of one matrix into the same row on the other matrix?
test_3 = [a+b for a,b in zip(test_1,test_2)]
However, if you're eventually going to convert this to numpy, as most Python matrix processing does, then you will use np.concatenate to do this.
test_1 = [[0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]
test_2 = [[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]]
test_3=[]
for i in range(len(test_1)):
var=test_1[i]+test_2[i]
test_3.append(var)
print(test_3)

Calculating mean and standard deviation and ignoring 0 values

I have a list of lists with sublists all of which contain float values.
For example the one below has 2 lists with sublists each:
mylist = [[[2.67, 2.67, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]], [[2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 0.0, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]]]
I want to calculate the standard deviation and the mean of the sublists and what I applied was this:
mean = [statistics.mean(d) for d in mylist]
stdev = [statistics.stdev(d) for d in mylist]
but it takes also the 0.0 values that I do not want because I turned them to 0 in order not to be empty ones. Is there a way to ignore these 0s as they do not exist in the sublist?To not take them under consideration at all? I could not find a way for how I am doing it.
You can use numpy's nanmean and nanstd functions.
import numpy as np
def zero_to_nan(d):
array = np.array(d)
array[array == 0] = np.NaN
return array
mean = [np.nanmean(zero_to_nan(d)) for d in mylist]
stdev = [np.nanstd(zero_to_nan(d)) for d in mylist]
You can do this with a list comprehension.
The following lambda function flattens the nested list into a single list and filters out all zeros:
flatten = lambda nested: [x for sublist in nested for x in sublist if x != 0]
Note that the list comprehension has two for and one ifstatement similar to this code snippet, which does essentially the same:
flat_list = []
for sublist in nested:
for x in sublist:
if x != 0:
flat_list.append(x)
To apply this to your list you can use map. The map function will return an iterator. To get a list we need to pass the iterator to list:
flat_list = list(map(flatten, myList))
Now you can calculate the mean and standard deviation:
mean = [statistics.mean(d) for d in flat]
stdev = [statistics.stdev(d) for d in flat]
print(mean)
print(stdev)
mean = [statistics.mean(d) for d in mylist if d != 0]
stdev = [statistics.stdev(d) for d in mylist if d != 0]
Try:
mean = [statistics.mean([k for k in d if k]) for d in mylist]
stdev = [statistics.stdev([k for k in d if k]) for d in mylist]

Binning a list in groups python

I have a list:
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
I want to group the elements in list in group size difference of 10. (i.e, 0-10,10-20,20-30,30-40...etc)
For eg:
Output that I'm looking for is:
[ [2,4,5,6,7,8,10],[12],[96],[192],[300],[360],[480],[504] ]
I tried using:
list(zip(*[iter(l)] * 10))
But getting wrong answer.
Use itertools.groupby to group together after dividing(//) it by 10
from itertools import groupby
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
groups = []
for _, g in groupby(l, lambda x: (x-1)//10):
groups.append(list(g)) # Store group iterator as a list
print(groups)
Output:
[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [480.0], [360.0], [504.0], [300.0]]
A defaultdict might not be bad for this, it's not in one pass, but you can sort the keys to keep everything in place. The integer divide by 10 will bin everything for you
groups = defaultdict(list)
for i in l:
groups[int((i-1)//10)].append(i)
groups_list = sorted(groups.values())
groups_list[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [300.0], [360.0], [480.0], [504.0]]
Even though, an answer is accepted, here is another way :
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
l1 = [int(k) for k in l]
l2 = list(list([k for k in l1 if len(str(k))==j]) for j in range(1,len(str(max(l1))) +1))
OUTPUT :
l2 = [[2, 4, 5, 6, 7, 8], [10, 12, 96], [192, 480, 360, 504, 300]]
It can be sub listed using dictionary : the key for dict will be value-1/10 if same key comes value will be appended:
gd={}
for i in l:
k=int((i-1)//10)
if k in gd:
gd[k].append(i)
else:
gd[k]=[i]
print(gd.values())
You can loop over you list l and create a new list using extend and an if condition:
smaller_list = []
larger_list = []
desired_result_list = []
for element in l:
if element <= 10:
smaller_list.extend([element])
else:
larger_list.append([element])
desired_result_list.extend(larger_list + [smaller_list])

Categories