How to store each iteration's values in dataframe? - python

I want to take input in the form of lists and join them into strings. How I can store the output as a dataframe column?
The input X is a dataframe and the column name is des:
X['des'] =
[5, 13]
[L32MM, 4MM, 2]
[724027, 40]
[58, 60MM, 36MM, 0, 36, 3]
[8.5, 45MM]
[5.0MM, 44MM]
[10]
This is my code:
def clean_text():
for i in range(len(X)):
str1 = " "
print(str1.join(X['des'][i]))
m = clean_text
m()
And here is my output, but how I can make it as a dataframe?
5 13
L32MM 4MM 2
724027 40
58 60MM 36MM 0 36 3
8.5 45MM
5.0MM 44MM
10

Note that iterating in pandas is an antipattern. Whenever possible, use DataFrame and Series methods to operate on entire columns at once.
Series.str.join (recommended)
X['joined'] = X['des'].str.join(' ')
Output:
des joined
0 [5, 13] 5 13
1 [L32MM, 4MM, 2] L32MM 4MM 2
2 [724027, 40] 724027 40
3 [58, 60MM, 36MM, 0, 36, 3] 58 60MM 36MM 0 36 3
4 [8.5, 45MM] 8.5 45MM
5 [5.0MM, 44MM] 5.0MM 44MM
6 [10] 10
Loop (not recommended)
Iterate the numpy values and assign using DataFrame.loc:
for i, des in enumerate(X['des'].to_numpy()):
X.loc[i, 'loop'] = ' '.join(des)
Or iterate via DataFrame.itertuples:
for row in X.itertuples():
X.loc[row.Index, 'itertuples'] = ' '.join(row.des)
Or iterate via DataFrame.iterrows:
for i, row in X.iterrows():
X.loc[i, 'iterrows'] = ' '.join(row.des)
Output:
des loop itertuples iterrows
0 [5, 13] 5 13 5 13 5 13
1 [L32MM, 4MM, 2] L32MM 4MM 2 L32MM 4MM 2 L32MM 4MM 2
2 [724027, 40] 724027 40 724027 40 724027 40
3 [58, 60MM, 36MM, 0, 36, 3] 58 60MM 36MM 0 36 3 58 60MM 36MM 0 36 3 58 60MM 36MM 0 36 3
4 [8.5, 45MM] 8.5 45MM 8.5 45MM 8.5 45MM
5 [5.0MM, 44MM] 5.0MM 44MM 5.0MM 44MM 5.0MM 44MM
6 [10] 10 10 10

Related

Find local maxima or peaks(index) in a numeric series using numpy and pandas Peak refers to the values surrounded by smaller values on both sides

Write a python program to find all the local maxima or peaks(index) in a numeric series using numpy and pandas Peak refers to the values surrounded by smaller values on both sides
Note
Create a Pandas series from the given input.
Input format:
First line of the input consists of list of integers separated by spaces to from pandas series.
Output format:
Output display the array of indices where peak values present.
Sample testcase
input1
12 1 2 1 9 10 2 5 7 8 9 -9 10 5 15
output1
[2 5 10 12]
smapletest cases image
How to solve this problem?
import pandas as pd
a = "12 1 2 1 9 10 2 5 7 8 9 -9 10 5 15"
a = [int(x) for x in a.split(" ")]
angles = []
for i in range(len(a)):
if i!=0:
if a[i]>a[i-1]:
angles.append('rise')
else:
angles.append('fall')
else:
angles.append('ignore')
prev=0
prev_val = "none"
counts = []
for s in angles:
if s=="fall" and prev_val=="rise":
prev_val = s
counts.append(1)
else:
prev_val = s
counts.append(0)
peaks_pd = pd.Series(counts).shift(-1).fillna(0).astype(int)
df = pd.DataFrame({
'a':a,
'peaks':peaks_pd
})
peak_vals = list(df[df['peaks']==1]['a'].index)
This could be improved further. Steps I have followed:
First find the angle whether its rising or falling
Look at the index at which it starts falling after rising and call it as peaks
Use:
data = [12, 1, 2, 1.1, 9, 10, 2.1, 5, 7, 8, 9.1, -9, 10.1, 5.1, 15]
s = pd.Series(data)
n = 3 # number of points to be checked before and after
from scipy.signal import argrelextrema
local_max_index = argrelextrema(s.to_frame().to_numpy(), np.greater_equal, order=n)[0].tolist()
print (local_max_index)
[0, 5, 14]
local_max_index = s.index[(s.shift() <= s) & (s.shift(-1) <= s)].tolist()
print (local_max_index)
[2, 5, 10, 12]
local_max_index = s.index[s == s.rolling(n, center=True).max()].tolist()
print (local_max_index)
[2, 5, 10, 12]
EDIT: Solution for processing value in DataFrame:
df = pd.DataFrame({'Input': ["12 1 2 1 9 10 2 5 7 8 9 -9 10 5 15"]})
print (df)
Input
0 12 1 2 1 9 10 2 5 7 8 9 -9 10 5 15
s = df['Input'].iloc[[0]].str.split().explode().astype(int).reset_index(drop=True)
print (s)
0 12
1 1
2 2
3 1
4 9
5 10
6 2
7 5
8 7
9 8
10 9
11 -9
12 10
13 5
14 15
Name: Input, dtype: int32
local_max_index = s.index[(s.shift() <= s) & (s.shift(-1) <= s)].tolist()
print (local_max_index)
[2, 5, 10, 12]
df['output'] = [local_max_index]
print (df)
Input output
0 12 1 2 1 9 10 2 5 7 8 9 -9 10 5 15 [2, 5, 10, 12]

pandas multiply each dataset row by multiple vectors

df = {1,2,3
4,5,6
7,8,9,
10,11,12
}
weights={[1,3,3],[2,2,2],[3,1,1]}
I want to multiply my df with every line of matrix weights(so I'll have like three different df, one for each vector of weights, and to combine each df by keeping the biggest line of values). Ex:
df0=df * weights[0]={1,6,9
4,15,18,
7,24,27
10,33,36
}
df1=df*wieghts[1]={2,4,6,
8,19,12,
14,16,18,
20,22,24
}
df2=df*wieghts[2]={3,2,3,
12,5,6,
21,8,9,
30,11,12
}
and
final_df_lines=max{df0,df1,df2}={1,6,9 - max line line from df0,
4,15,18, - max line from df0,
7,24,27 - max line from df0,
10,33,36 - max line from df0,
}
In this example all max were from df0 ... but they could be from any of the three df. Max line is just adding the numbers from the same line..
I need to do this things vectorized(without any loops or if...) how do I do this? is it possible at least? I really need welp :( for 2 days I'm searching the internet to do this... I did not work in python for too long...
you can try of concatenating all weights mulitpied columns as one dataframe with suffix of column represeting each weight ,
and by grouping with respect to the weight it multiplied get max summation of index
with max index weight you can multiply the dataframe
df2 = pd.concat([(df*i).add_suffix('__'+str(i)) for i in weights],axis=1).T
0 1 2 3
0__[1, 3, 3] 1 4 7 10
1__[1, 3, 3] 6 15 24 33
2__[1, 3, 3] 9 18 27 36
0__[2, 2, 2] 2 8 14 20
1__[2, 2, 2] 4 10 16 22
2__[2, 2, 2] 6 12 18 24
0__[3, 1, 1] 3 12 21 30
1__[3, 1, 1] 2 5 8 11
2__[3, 1, 1] 3 6 9 12
# by grouping with respect to the weight it multiplied, get max index
a = df2.groupby(df2.index.str.split('__').str[1]).apply(lambda x: x.sum()).idxmax()
# max weights with respect to summation of rows
df['idxmax'] = a.str.slice(1,-1).str.split(',').apply(lambda x: list(map(int,x)))
c [1, 3, 3]
d [1, 3, 3]
3 [1, 3, 3]
4 [1, 3, 3]
dtype: object
df.apply(lambda x: x.loc[df.columns.difference(['idxmax'])] * x['idxmax'],1)
0 1 2
0 1 6 9
1 4 15 18
2 7 24 27
3 10 33 36
EDIT: As question has been updated I had to update too:
You have to align matrices first to be able to make an element-wise matrix operation without using any loop:
import numpy as np
a = [
[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]
]
weights = [
[1,3,3],
[2,2,2],
[3,1,1]
]
w_s = np.array( (4 * [weights[0]], 4 * [weights[1]], 4 * [weights[2]]) )
a_s = np.array(3 * [a])
result_matrix1 = w_s * a_s[0]
result_matrix2 = w_s * a_s[1]
result_matrix3 = w_s * a_s[2]
print(result_matrix1)
print(result_matrix2)
print(result_matrix3)
Output:
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
The solution is numpy, but you can do it with pandas as well, if you prefer it, of course.

Creating a subarray with no of aubarrays passed as arguments in python

I have a large 100x15 array like this:
[a b c d e f g h i j k l m n o]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
.
.
.(Up to 100 rows)
I want to select a portion of this data into a subset using a function which has an argument 'k' in which 'k' denotes the no of subsets to be made, like say k=5 means the data attributes are divided into 3 subsets like below:
[a b c d e] [f g h i j] [k l m n o]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
.
.
.(Up to 100 rows)
and they are stored in a different array. I want to implement this using python. I have implemented this partially. Can any one implement this and provide me the code in the answer?
Partial logic for the inner loop
given k
set start_index = 0
end_index = length of array/k = increment
for j from start_index to end_index
start_index=end_index + 1
end_index = end_index + increment
//newarray[][] (I'm not sure abt here)
Thank You.
This returns an array of matrices with columnsize = 2 , which works for k=2:
import numpy as np
def portion(mtx, k):
array = []
array.append( mtx[:, :k])
for i in range(1, mtx.shape[1]-1):
array.append( mtx[:, k*i:k*(i+1)])
return array[:k+1]
mtx = np.matrix([[1,2,3,10,13,14], [4,5,6,11,15,16], [7,8,9,12,17,18]])
k = 2
print(portion(mtx, k))
Unfortunately I have to do it myself and this is the code in python for the logic. Anyway thanks to #astaning for the attempt.
def build_rotationtree_model(k):
mtx =np.array([[2.95,6,63,23],[2,53,7,79],[3.57,5,65,32],[3.16,5,47,34],[21,2.58,4,46],[3.1,2.16,6,22],[3.5,3.27,3,52],[12,2.56,4,42]])
#Length of attributes (width of matrix)
a = mtx.shape[1]
newArray =[[0 for x in range(k)] for y in range(len(mtx))]
#Height of matrix(total rows)
b = mtx.shape[0]
#Seperation limit
limit = a/k
#Starting of sub matrix
start = 0
#Ending of sub matrix
end = a/k
print(end)
print(a)
#Loop
while(end != a):
for i in range(0,b-1):
for j in range(start,int(end)):
newArray[i][j] = mtx[i][j]
print(newArray[i])
#Call LDA function and add the result to Sparse Matrix
#sparseMat = LDA(newArray) SHould be inside a loop
start = end + 1
end = end + limit
a=list(input())
for i in range(0,len(a)):
for j in range(i,len(a)):
for k in range(i,j+1):
print(a[k],end=" ")
print("\n",end="")

Summing rows based on cumsum values

I have a data frame like
index  A B C
0   4 7 9
1   2 6 22   6 9 13   7 2 44   8 5 6
I want to create another data frame out of this based on the sum of C column. But the catch here is if the sum of C reached 10 or higher it should create another row. Something like this.
index  A B C
0   6 13 11
1   21 16 11
Any help will be highly appreciable. Is there a robust way to do this, or iterating is my last resort?
There is a non-iterative approach. You'll need a groupby based on C % 11.
# Groupby logic - https://stackoverflow.com/a/45959831/4909087
out = df.groupby((df.C.cumsum() % 10).diff().shift().lt(0).cumsum(), as_index=0).agg('sum')
print(out)
A B C
0 6 13 11
1 21 16 11
The code would look something like this:
import pandas as pd
lista = [4, 7, 10, 11, 7]
listb= [7, 8, 2, 5, 9]
listc = [9, 2, 1, 4, 6]
df = pd.DataFrame({'A': lista, 'B': listb, 'C': listc})
def sumsc(df):
suma=0
sumb=0
sumc=0
list_of_sums = []
for i in range(len(df)):
suma+=df.iloc[i,0]
sumb+=df.iloc[i,1]
sumc+=df.iloc[i,2]
if sumc > 10:
list_of_sums.append([suma, sumb, sumc])
suma=0
sumb=0
sumc=0
return pd.DataFrame(list_of_sums)
sumsc(df)
0 1 2
0 11 15 11
1 28 16 11

Printing all solutions in the shape of a matrix using \n\

This function returns all possible multiplication from 1 to d. I want to print the solution in the shape of a d×d matrix.
def example(d):
for i in range(1,d+1):
for l in range(1,d+1):
print(i*l)
For d = 5, the expected output should look like:
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
You could add the values in your second for loop to a list, join the list, and finally print it.
def mul(d):
for i in range(1, d+1):
list_to_print = []
for l in range(1, d+1):
list_to_print.append(str(l*i))
print(" ".join(list_to_print))
>>> mul(5)
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
If you want it to be printed in aligned rows and columns, have a read at Pretty print 2D Python list.
EDIT
The above example will work for both Python 3 and Python 2. However, for Python 3 (as #richard has put in the comments), you can use:
def mul(d):
for i in range(1, d+1):
for l in range(1, d+1):
print(i*l, end=" ")
print()
>>> mul(5)
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
Try this:
mm = []
ll = []
def mul(d):
for i in range(1,d+1):
ll = []
for l in range(1,d+1):
# print(i*l),
ll.append((i*l))
mm.append(ll)
mul(5)
for x in mm:
print(x)
[1, 2, 3, 4, 5]
[2, 4, 6, 8, 10]
[3, 6, 9, 12, 15]
[4, 8, 12, 16, 20]
[5, 10, 15, 20, 25]

Categories