Looping though a dataframe element by element - python

If I have a data frame df (indexed by integer)
BBG.KABN.S BBG.TKA.S BBG.CON.S BBG.ISAT.S
index
0 -0.004881 0.008011 0.007047 -0.000307
1 -0.004881 0.008011 0.007047 -0.000307
2 -0.005821 -0.016792 -0.016111 0.001028
3 0.000588 0.019169 -0.000307 -0.001832
4 0.007468 -0.011277 -0.003273 0.004355
and I want to iterate though each element individually (by row and column) I know I need to use .iloc(row,column) but do I need to create 2 for loops (one for row and one for column) and how I would do that?
I guess it would be something like:
for col in rollReturnRandomDf.keys():
for row in rollReturnRandomDf.iterrows():
item = df.iloc(col,row)
But I am unsure of the exact syntax.

Maybe try using df.values.ravel().
import pandas as pd
import numpy as np
# data
# =================
df = pd.DataFrame(np.arange(25).reshape(5,5), columns='A B C D E'.split())
Out[72]:
A B C D E
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
# np.ravel
# =================
df.values.ravel()
Out[74]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24])
for item in df.values.ravel():
# do something with item

Related

How to conditionally fill new column using for loop in python? [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I want to add a new column and fill values based on condition.
df:
indicator, value, a, b
1, 20, 5, 3
0, 30, 6, 8
0, 70, 2, 2
1, 10, 3, 7
I want to add a new column (value_new) based on Indicator. If indicator == 1, value_new = a*b otherwise value_new = value.
df:
indicator, value, a, b, value_new
1, 20, 5, 3, 15
0, 30, 6, 8, 30
0, 70, 2, 2, 70
1, 10, 3, 7, 21
I have tried following:
value_new = []
for in in range(1, len(df)):
if indicator[i] == 1:
value_new.append(df['a'][i]*df['b'][i])
else:
value_new.append(df['value'][i])
df['value_new'] = value_new
Error: 'Length of values does not match length of index'
And I have also tried:
for in in range(1, len(df)):
if indicator[i] == 1:
df['value_new'][i] = df['a'][i]*df['b'][i]
else:
df['value_new'][i] = df['value'][i]
KeyError: 'value_new'
You can use np.where:
df['value_new'] = np.where(df['indicator'], df['a']*df['b'], df['value'])
print(df)
Prints:
indicator value a b value_new
0 1 20 5 3 15
1 0 30 6 8 30
2 0 70 2 2 70
3 1 10 3 7 21

Python how to use name of a list as a column name in a dataframe

I have a list of variables. I want to assign name of this list to a column in dataframe. The name stress and its elements keep on change.
stress = ['M13', 'M14', 'M15', 'M16', 'M17', 'M18']
outputlist = [ 13, 14, 15, 16, 17 18 ] ### obtained from analysis
resultdf[stress] = outputlist ### I want to name the column same as list name.
I want something like this given below.
print(resultdf)
stress
0 13
1 14
2 15
3 16
4 17
5 18
It results error when I attempt to do this because whole list values getting list in column header. How to achieve this.
Just needs to be a string. You are trying to use a variable as a column name. Instead write
resultd["stress"] = outputlist
This might be what you're looking for, although I'm not sure what the result data looks like:
>>> stress = ['M13', 'M14', 'M15', 'M16', 'M17', 'M18']
>>> data = [[1,2,3,4,5,6], [7,8,9,10,11,12], [13,14,15,16,17,18], [19,20,21,22,23,24], [25,26,27,28,29,30], [31,32,33,34,35,36]]
>>> result = {x: y for x,y in zip(stress, data)}
>>> result
{'M13': [1, 2, 3, 4, 5, 6], 'M14': [7, 8, 9, 10, 11, 12], 'M15': [13, 14, 15, 16, 17, 18], 'M16': [19, 20, 21, 22, 23, 24], 'M17': [25, 26, 27, 28, 29, 30], 'M18': [31, 32, 33, 34, 35, 36]}
Then you can convert the dictionary to a DataFrame:
>>> import pandas as pd
>>> d = pd.DataFrame(result)
>>> d
M13 M14 M15 M16 M17 M18
0 1 7 13 19 25 31
1 2 8 14 20 26 32
2 3 9 15 21 27 33
3 4 10 16 22 28 34
4 5 11 17 23 29 35
5 6 12 18 24 30 36
Edit (based on your update)
If you literally just want a single column with the variable as the name, put it in quotes:
>>> d = pd.DataFrame({'stress': outputlist})
>>> d
stress
0 13
1 14
2 15
3 16
4 17
5 18

Summing rows based on cumsum values

I have a data frame like
index  A B C
0   4 7 9
1   2 6 22   6 9 13   7 2 44   8 5 6
I want to create another data frame out of this based on the sum of C column. But the catch here is if the sum of C reached 10 or higher it should create another row. Something like this.
index  A B C
0   6 13 11
1   21 16 11
Any help will be highly appreciable. Is there a robust way to do this, or iterating is my last resort?
There is a non-iterative approach. You'll need a groupby based on C % 11.
# Groupby logic - https://stackoverflow.com/a/45959831/4909087
out = df.groupby((df.C.cumsum() % 10).diff().shift().lt(0).cumsum(), as_index=0).agg('sum')
print(out)
A B C
0 6 13 11
1 21 16 11
The code would look something like this:
import pandas as pd
lista = [4, 7, 10, 11, 7]
listb= [7, 8, 2, 5, 9]
listc = [9, 2, 1, 4, 6]
df = pd.DataFrame({'A': lista, 'B': listb, 'C': listc})
def sumsc(df):
suma=0
sumb=0
sumc=0
list_of_sums = []
for i in range(len(df)):
suma+=df.iloc[i,0]
sumb+=df.iloc[i,1]
sumc+=df.iloc[i,2]
if sumc > 10:
list_of_sums.append([suma, sumb, sumc])
suma=0
sumb=0
sumc=0
return pd.DataFrame(list_of_sums)
sumsc(df)
0 1 2
0 11 15 11
1 28 16 11

function that takes one column value and returns another column value

Apologies, if this is a duplicate please let me know, I'll gladly delete.
My dataset:
Index Col 1 Col 2
0 1 4, 5, 6
1 2 7, 8, 9
2 3 10, 11, 12
3 4 13, 14, 15
I am attempting to create a function that will take a particular column 1 value as its input and output the column 1 value with its corresponding column 2 values.
For example, if I used the function for when column 1 is equal to 3, the function would return a list of 4 values: 3, 10, 11, 12
Many thanks
def f(a):
return df.loc[df['Col 1'] == a, 'Col 2'].item()
But if need more general:
print (df)
Col 1 Col 2
0 1 4, 5, 6
1 1 7, 8, 9
2 3 10, 11, 12
3 4 13, 14, 15
def f(a):
a = df.loc[df['Col 1'] == a, 'Col 2']
#for no match
if a.empty:
return 'no match'
#for multiple match
elif len(a) > 1:
return a
else:
#for match one value only
return a.item()
print (f(1))
0 4, 5, 6
1 7, 8, 9
Name: Col 2, dtype: object
print (f(3))
10, 11, 12
print (f(6))
no match
Just a simple function:
def getColumn(col1):
beg = 1+3*col1
return [col1] + list(range(beg, beg+3))

Python array get positions of value changes

I'm working with some large arrays where usually values are repeated. Something similar to this:
data[0] = 10
data[1] = 10
data[2] = 12
data[3] = 12
data[4] = 13
data[5] = 9
Is there any way to get the positions where values do change. I mean, get something similar to this:
data[0] = 10
data[2] = 12
data[4] = 13
data[5] = 9
The goal is somehow compress the array so I can work with smaller arrays. I have been looking at pandas too, but without any success at the moment.
Thank you,
You can use pandas shift and loc to filter out consecutive duplicates.
In [11]:
# construct a numpy array of data
import pandas as pd
import numpy as np
# I've added some more values at the end here
data = np.array([10,10,12,12,13,9,13,12])
data
Out[11]:
array([10, 10, 12, 12, 13, 9, 13, 12])
In [12]:
# construct a pandas dataframe from this
df = pd.DataFrame({'a':data})
df
Out[12]:
a
0 10
1 10
2 12
3 12
4 13
5 9
6 13
7 12
In [80]:
df.loc[df.a != df.a.shift()]
Out[80]:
a
0 10
2 12
4 13
5 9
6 13
7 12
In [81]:
data[np.roll(data,1)!=data]
Out[81]:
array([10, 12, 13, 9, 13, 12])
In [82]:
np.where(np.roll(data,1)!=data)
Out[82]:
(array([0, 2, 4, 5, 6, 7], dtype=int64),)

Categories