Pandas Multiple Column Division - python

I am trying to do a division of column 0 by columns 1 and 2. From the below, I would like to return a dataframe of 10 rows, 3 columns. The first column should all be 1's. Instead I get a 10x10 dataframe. What am I doing wrong?
data = np.random.randn(10,3)
df = pd.DataFrame(data)
df[0] / df

First you should create a 10 by 3 DataFrame with all columns equal to the first column and then divide it with your DataFrame.
df[[0, 0, 0]] / df.values
or
df[[0, 0, 0]].values / df
If you want to keep the column names.
(I use .values to avoid reindexing which will fail due to duplicate column values.)

You need to match the dimension of the Series with the rows of the DataFrame. There are a few ways to do this but I like to use transposes.
data = np.random.randn(10,3)
df = pd.DataFrame(data)
(df[0] / df.T).T
0 1 2
0 1 -0.568096 -0.248052
1 1 -0.792876 -3.539075
2 1 -25.452247 1.434969
3 1 -0.685193 -0.540092
4 1 0.451879 -0.217639
5 1 -2.691260 -3.208036
6 1 0.351231 -1.467990
7 1 0.249589 -0.714330
8 1 0.033477 -0.004391
9 1 -0.958395 -1.530424

Related

Create dataframe with values from other dataframe's indices and columns

I have a large dataframe df1 that looks like:
0 1 2
0 NaN 1 5
1 0.5 NaN 1
2 1.25 3 NaN
And I want to create another dataframe df2 with three columns where the values for the first two columns correspond to the df1 columns and indices, and the third column is the cell value.
So df2 would look like:
src dst cost
0 0 1 0.5
1 0 2 1.25
2 1 0 5
3 1 2 3
How can I do this?
Thanks
I'm sure there's probably a clever way to do this with pd.pivot or pd.melt but this works:
df2 = (
# reorganize the data to be row-wise with a multi-index
df1.stack()
# drop missing values
.dropna()
# name the axes
.rename_axis(['src', 'dst'])
# name the values
.to_frame('cost')
# return src and dst to columns
.reset_index(drop=False)
)

Drop a group of rows if one column has missing data in a pandas dataframe

I have the following dataframe:
df
Group Dist
0 A 5
1 B 2
2 A 3
3 B 1
4 B 0
5 A 5
I am trying to drop all rows that match Group if the Dist column equals zero. This works to delete row 4:
df = df[df.Dist != 0]
however I also want to delete rows 1 and 3 so I am left with:
df
Group Dist
0 A 5
2 A 3
5 A 5
Any ideas on how to drop the group based off this condition?
Thanks!
First get all Group values for Entry == 0 and then filter out them by check column Group with inverted mask by ~:
df1 = df[~df['Group'].isin(df.loc[df.Dist == 0, 'Group'])]
print (df1)
Group Dist
0 A 5
2 A 3
5 A 5
Or you can use GroupBy.transform with GroupBy.all for test if groups has no 0 values:
df1 = df[(df.Dist != 0).groupby(df['Group']).transform('all')]
EDIT: For remove all groups with missing values:
df2 = df[df['Dist'].notna().groupby(df['Group']).transform('all')]
For test missing values:
print (df[df['Dist'].isna()])
if return nothing there are no missing values NaN or no None like Nonetype.
So is possible check scalar, e.g. if this value is in row with index 10:
print (df.loc[10, 'Dist'])
print (type(df.loc[10, 'Dist']))
You can use groupby and the method filter:
df.groupby('Group').filter(lambda x: x['Dist'].ne(0).all())
Output:
Group Dist
0 A 5
2 A 3
5 A 5
If you want to filter out groups with missing values:
df.groupby('Group').filter(lambda x: x['Dist'].notna().all())

Multiply columns with rows by matching column name and row name in Python / Pandas

I have a data frame which looks like this
> data
A B
1 1 2
2 2 1
I have a reference data frame which looks like this
> ref
Names Values
1 A 5
2 B 10
I want to multiply each column by corresponding row in Ref having same Name
the result should be this
> result
A B
1 5 20
2 10 10
What is the fastest way to achieve this in Python? Any help would be greatly appreciated
You may want to check mul
df.mul(ref.set_index('Names').Values)
Out[137]:
A B
1 5 20
2 10 10
Your reference dataframe ref can be represented as a Series as follows or with ref.set_index('Names')['Values']
s = pd.Series([5, 10], index=['A', 'B'])
Your data dataframe is as follows:
df = pd.DataFrame(dict(A=[1,2], B=[2,1]))
Multiplying the two with df * s produces the desired output because the indexing of each object is used to determine which arrays get multiplied together.

Python adding column to dataframe causes NaN

I have a series and df
s = pd.Series([1,2,3,5])
df = pd.DataFrame()
When I add columns to df like this
df.loc[:, "0-2"] = s.iloc[0:3]
df.loc[:, "1-3"] = s.iloc[1:4]
I get df
0-2 1-3
0 1 NaN
1 2 2.0
2 3 3.0
Why am I getting NaN? I tried create new series with correct idxs, but adding it to df still causes NaN.
What I want is
0-2 1-3
0 1 2
1 2 3
2 3 5
Try either of the following lines.
df.loc[:, "1-3"] = s.iloc[1:4].values
# -OR-
df.loc[:, "1-3"] = s.iloc[1:4].reset_index(drop=True)
Your original code is trying unsuccessfully to match the index of the data frame df to the index of the subset series s.iloc[1:4]. When it can't find the 0 index in the series, it places a NaN value in df at that location. You can get around this by only keeping the values so it doesn't try to match on the index or resetting the index on the subset series.
>>> s.iloc[1:4]
1 2
2 3
3 5
dtype: int64
Notice the index values since the original, unsubset series is the following.
>>> s
0 1
1 2
2 3
3 5
dtype: int64
The index of the first row in df is 0. By dropping the indices with the values call, you bypass the index matching which is producing the NaN. By resetting the index in the second option, you make the indices the same.

How do I stack rows in a Pandas data frame to get one "long row"?

Let's say I have a data frame with 4 rows, 3 columns. I'd like to stack the rows horizontally so that I get one row with 12 columns. How to do it and how to handle colliding column names?
You can achieve this by stacking the frame to produce a series of all the values, we then want to convert this back to a df using to_frame and then reset_index to drop the index levels and then transpose using .T:
In [2]:
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'))
df
Out[2]:
a b c
0 -1.744219 -2.475923 1.794151
1 0.952148 -0.783606 0.784224
2 0.386506 -0.242355 -0.799157
3 -0.547648 -0.139976 -0.717316
In [3]:
df.stack().to_frame().reset_index(drop=True).T
Out[3]:
0 1 2 3 4 5 6 \
0 -1.744219 -2.475923 1.794151 0.952148 -0.783606 0.784224 0.386506
7 8 9 10 11
0 -0.242355 -0.799157 -0.547648 -0.139976 -0.717316

Categories