How do I add values to an existing column in Pandas? - python

I have a Pandas columns as such:
Free-Throw Percentage
0 .371
1 .418
2 .389
3 .355
4 .386
5 .605
And I have a list of values: [.45,.31,.543]
I would like to append these values to the above column such that the final result would be:
Free-Throw Percentage
0 .371
1 .418
2 .389
3 .355
4 .386
5 .605
6 .45
7 .31
8 .543
How can I achieve this?

df.append(pd.DataFrame({'Free-Throw Percentage':[.45,.31,.543]}))
should do the job

import pandas as pd
df_1 = pd.DataFrame(data=zip(np.random.randint(0, 20, 10), np.random.randint(0, 10, 10)), columns=['A', 'B'])
new_vals = [3, 4, 5]
df_1 = df_1.append(pd.DataFrame({'B': new_vals}), sort=False, ignore_index=True)
print(df_1)

Related

Multi-column, multi-row pandas data frame

I am trying to pass the row with index 0 to a multi-column df. That is, I want to have the same df but with index-0 row in bold, above the vertical line. I have tried to do this without success. Help would be appreciated.
enter image description here
If I understand your problem correctly you could try this to transfer your first row from the original df to the head of the new one.
import pandas as pd
df = pd.DataFrame(data=
{'Name': ["Peter", "Andy", "Bob", "Lisa"],
'Age': [50, 40, 34, 39],
'Nationality': ["USA", "Australian", "Candian", "Mexican"]}
)
dict = {}
for titel in df.iloc[0]:
dict[titel] = ["TestValue", "Testvalue"]
print(dict)
dfN = pd.DataFrame(data=dict) # new Dict
Pandas dataframe can have multiple column headers for columns or rows.
Make a dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame(
data=np.random.randint(
0, 10, (6,4)),
columns=["a", "b", "c", "d"])
Insert a second column index.
df.columns = pd.MultiIndex.from_tuples(
zip(['A', 'B','C', 'D'],
df.columns))
output
A B C D
a b c d
0 4 9 5 1
1 7 5 3 7
2 1 1 5 1
3 8 3 8 3
4 3 8 8 2
5 5 2 3 5

How to split dataframe on based on columns row

I have one excel file , dataframe have 20 rows . after few rows there is again column names row, i want to divide dataframe based on column names row.
here is example:
x
0
1
2
3
4
x
23
34
5
6
expected output is:
df1
x
0
1
2
3
4
df2
x
23
34
5
6
Considering your column name is col , you can first group the dataframe taking a cumsum on the col where the value equals x by df['col'].eq('x').cumsum() , then for each group create a dataframe by taking the values from the 2nd row of that group and the columns as the first value of that group using df.iloc[] and save them in a dictionary:
d={f'df{i}':pd.DataFrame(g.iloc[1:].values,columns=g.iloc[0].values)
for i,g in df.groupby(df['col'].eq('x').cumsum())}
print(d['df1'])
x
0 0
1 1
2 2
3 3
4 4
print(d['df2'])
x
0 23
1 34
2 5
3 6
Use df.index[df['x'] == 'x'] to look for the row index of where the column name appears again.
Then, split the dataframe into 2 based on the index found
df = pd.DataFrame(columns=['x'], data=[[0], [1], [2], [3], [4], ['x'], [23], [34], [5], [6]])
df1 = df.iloc[:df.index[df['x'] == 'x'].tolist()[0]]
df2 = df.iloc[df.index[df['x'] == 'x'].tolist()[0]+1:]
You did't mention this is sample of your dataset.Then you can try this
import pandas as pd
df1 = []
df2 = []
df1 = pd.DataFrame({'df1': ['x', 0, 1, 2, 3, 4]})
df2 = pd.DataFrame({'df2': ['x', 23, 34, 5, 6]})
display(df1, df2)

Match rows between dataframes and preserve order

I work in python and pandas.
Let's suppose that I have a dataframe like that (INPUT):
A B C
0 2 8 6
1 5 2 5
2 3 4 9
3 5 1 1
I want to process it to finally get a new dataframe which looks like that (EXPECTED OUTPUT):
A B C
0 2 7 NaN
1 5 1 1
2 3 3 NaN
3 5 0 NaN
To manage this I do the following:
columns = ['A', 'B', 'C']
data_1 = [[2, 5, 3, 5], [8, 2, 4, 1], [6, 5, 9, 1]]
data_1 = np.array(data_1).T
df_1 = pd.DataFrame(data=data_1, columns=columns)
df_2 = df_1
df_2['B'] -= 1
df_2['C'] = np.nan
df_2 looks like that for now:
A B C
0 2 7 NaN
1 5 1 NaN
2 3 3 NaN
3 5 0 NaN
Now I want to do a matching/merging between df_1 and df_2 with using as keys the columns A and B.
I tried with isin() to do this:
df_temp = df_1[df_1[['A', 'B']].isin(df_2[['A', 'B']])]
df_2.iloc[df_temp.index] = df_temp
but it gives me back the same df_2 as before without matching the common row 5 1 1 for A, B, C respectively:
A B C
0 2 7 NaN
1 5 1 NaN
2 3 3 NaN
3 5 0 NaN
How can I do this properly?
By the way, just to be clear, the matching should not be done like
1st row of df1 - 1st row of df1
2nd row of df1 - 2nd row of df2
3rd row of df1 - 3rd row of df2
...
But it has to be done as:
any row of df1 - any row of df2
based on the specified columns as keys.
I think that this is why isin() above at my code does not work since it does the filtering/matching in the former way.
On the other hand, .merge() can do the matching in the latter way but it does not preserve the order of the rows in the way I want and it is pretty tricky or inefficient to fix that.
Finally, keep in mind that with my actual dataframes way more than only 2 columns (e.g. 15) will be used as keys for the matching so it is better that you come up with something concise even for bigger dataframes.
P.S.
See my answer below.
Here's my suggestion using a lambda function in apply. Should be easily scalable to more columns to compare (just adjust cols_to_compare accordingly). By the way, when generating df_2, be sure to copy df_1, otherwise changes in df_2 will carry over to df_1 as well.
So generating the data first:
columns = ['A', 'B', 'C']
data_1 = [[2, 5, 3, 5], [8, 2, 4, 1], [6, 5, 9, 1]]
data_1 = np.array(data_1).T
df_1 = pd.DataFrame(data=data_1, columns=columns)
df_2 = df_1.copy() # Be sure to create a copy here
df_2['B'] -= 1
df_2['C'] = np.nan
an now we 'scan' df_1 for the rows of interest:
cols_to_compare = ['A', 'B']
df_2['C'] = df_2.apply(lambda x: 1 if any((df_1.loc[:, cols_to_compare].values[:]==x[cols_to_compare].values).all(1)) else np.nan, axis=1)
What is does is check whether the values in the current row are also like this in any row in the concerning columns of df_1.
The output is:
A B C
0 2 7 NaN
1 5 1 1.0
2 3 3 NaN
3 5 0 NaN
Someone (I do not remember his username) suggested the following (which I think works) and then he deleted his post for some reason (??!):
df_2=df_2.set_index(['A','B'])
temp = df_1.set_index(['A','B'])
df_2.update(temp)
df_2.reset_index(inplace=True)
You can accomplish this using two for loops:
for row in df_2.iterrows():
for row2 in df_1.iterrows():
if [row[1]['A'],row[1]['B']] == [row2[1]['A'],row2[1]['B']]:
df_2['C'].iloc[row[0]] = row2[1]['C']
Just modify your below line:
df_temp = df_1[df_1[['A', 'B']].isin(df_2[['A', 'B']])]
with:
df_1[df_1['A'].isin(df_2['A']) & df_1['B'].isin(df_2['B'])]
It works fine!!

Pandas - Sorting By Column

I have a pandas data frame known as "df":
x y
0 1 2
1 2 4
2 3 8
I am splitting it up into two frames, and then trying to merge back together:
df_1 = df[df['x']==1]
df_2 = df[df['x']!=1]
My goal is to get it back in the same order, but when I concat, I am getting the following:
frames = [df_1, df_2]
solution = pd.concat(frames)
solution.sort_values(by='x', inplace=False)
x y
1 2 4
2 3 8
0 1 2
The problem is I need the 'x' values to go back into the new dataframe in the same order that I extracted. Is there a solution?
use .loc to specify the order you want. Choose the original index.
solution.loc[df.index]
Or, if you trust the index values in each component, then
solution.sort_index()
setup
df = pd.DataFrame([[1, 2], [2, 4], [3, 8]], columns=['x', 'y'])
df_1 = df[df['x']==1]
df_2 = df[df['x']!=1]
frames = [df_1, df_2]
solution = pd.concat(frames)
Try this:
In [14]: pd.concat([df_1, df_2.sort_values('y')])
Out[14]:
x y
0 1 2
1 2 4
2 3 8
When you are sorting the solution using
solution.sort_values(by='x', inplace=False)
you need to specify inplace = True. That would take care of it.
Based on these assumptions on df:
Columns x and y are note necessarily ordered.
The index is ordered.
Just order your result by index:
df = pd.DataFrame({'x': [1, 2, 3], 'y': [2, 4, 8]})
df_1 = df[df['x']==1]
df_2 = df[df['x']!=1]
frames = [df_2, df_1]
solution = pd.concat(frames).sort_index()
Now, solution looks like this:
x y
0 1 2
1 2 4
2 3 8

python pandas dataframe : fill nans with a conditional mean

I have the following dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame(data={'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'],
'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]})
Cat Vals
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 A NaN
6 B NaN
And I want indexes 5 and 6 to be filled with the conditional mean of 'Vals' based on the 'Cat' column, namely 2 and 4.5
The following code works fine:
means = df.groupby('Cat').Vals.mean()
for i in df[df.Vals.isnull()].index:
df.loc[i, 'Vals'] = means[df.loc[i].Cat]
Cat Vals
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 A 2
6 B 4.5
But I'm looking for something nicer, like
df.Vals.fillna(df.Vals.mean(Conditionally to column 'Cat'))
Edit: I found this, which is one line shorter, but I'm still not happy with it:
means = df.groupby('Cat').Vals.mean()
df.Vals = df.apply(lambda x: means[x.Cat] if pd.isnull(x.Vals) else x.Vals, axis=1)
We wish to "associate" the Cat values with the missing NaN locations.
In Pandas such associations are always done via the index.
So it is natural to set Cat as the index:
df = df.set_index(['Cat'])
Once this is done, then fillna works as desired:
df['Vals'] = df['Vals'].fillna(means)
To return Cat to a column, you could then of course use reset_index:
df = df.reset_index()
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'],
'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]})
means = df.groupby(['Cat'])['Vals'].mean()
df = df.set_index(['Cat'])
df['Vals'] = df['Vals'].fillna(means)
df = df.reset_index()
print(df)
yields
Cat Vals
0 A 1.0
1 A 2.0
2 A 3.0
3 B 4.0
4 B 5.0
5 A 2.0
6 B 4.5

Categories