Getting nan when making a dataframe column equal to another - python

I am trying to make a subset of a dataframe
combo.iloc[:,orig_start_col:orig_start_col+2]
equal to the values another subset already has
combo.iloc[:,sm_col:sm_col+2]
where the columns will vary in a loop. The problem is that all I am getting is NaNs despite that the second subset values are not NaN
I tried to do this for the first column and it worked, however doing so with just the second column of the subset returns all NaNs. Then doing for the whole subset returns Nan values for everything
My code is:
for node_col in ('leg2_node', 'leg4_node'):
combo=orig_combos.merge(all, how='inner', left_on='leg6_node', right_on=node_col)
combo.reset_index(drop=True, inplace=True)
orig_start_col=combo.columns.get_loc('leg6_alpha_x')
sm_col=combo.columns.get_loc(node_col+'_y')
combo.iloc[:,orig_start_col+1:orig_start_col+2]=combo.
iloc[:,sm_col+1:sm_col+2]
What I would expect having sm_col:sm_col+2 subset all rows with values is to have those values in orig_start_col:orig_start_col+2 subset, but instead what I am having is all values=NaN

Related

How do I drop all rows in a DataFrame that have NAN in that row, in a specified column?

EDIT: (User error, I wasn't scanning entire dataframe. Delete Question if needed )A page I found had a solution that claimed to drop all rows with NAN in a selected column. In this case I am interested in the column with index 78 (int, not string, I checked).
The code fragment they provided turns out to look like this for me:
df4=df_transposed.dropna(subset=[78])
That did exactly the opposite of what I wanted. df4 is a dataframe that has NAN in all elements of the dataframe. I'm not sure how to
I tried the dropna() method as suggested on half a dozen pages and I expected a dataframe with no NAN values in the column with index 78. Instead every element was NAN in the dataframe.
df_transposed.dropna(subset=[78], in place=True) #returns dataframe with rows that have missing values in column 78 removed.

fillna() only fills the 1st value of the dataframe

I'm facing a strange issue in which I'm trying to replace all NaN values in a dataframe with values taken from another one (same length) that has the relevant values.
Here's a glimpse for the "target dataframe" in which I want to replace the values:
data_with_null
Here's the dataframe where I want to take data from: predicted_paticipant_groups
I've tried:
data_with_null.participant_groups.fillna(predicted_paticipant_groups.participant_groups, inplace=True)
but it just fills all values NaN values with the 1st one (Infra)
Is it because of the indexes of data_with_null are all zeros?
Reset the index and try again.
data_with_null.reset_index(drop=True, inplace=True)

Pandas joining on index is producing all NaN for right-side DataFrame

I am trying to join two pandas DataFrames on index. The both have the same number of rows and everything about the index appears to be correct. However, when I run the code,
df1=df2.join(df3)
iIt produces all NaN for df3's values. I have been searching google for a while now and have no idea why.
I have tried casting into pandas data frames and also reset_index. Neither did the trick.
df1=df2.join(df3)
producing all NaN for df3's columns
In the expected results the NaN's would all have the values of df3. The actual results are producing all NaN.
My answer to this was to change the index types on both my dataframes. In my particular instance I converted to string. Thanks!

Pandas: Calculating column-wise mean yields nulls

I have a pandas DataFrame, df, and I'd like to get the mean for columns 180 through the end (not including the last column), only using the first 100K rows.
If I use the whole DataFrame:
df.mean().isnull().any()
I get False
If I use only the first 100K rows:
train_means = df.iloc[:100000, 180:-1].mean()
train_means.isnull().any()
I get: True
I'm not sure how this is possible, since the second approach is only getting the column means for a subset of the full DataFrame. So if no column in the full DataFrame has a mean of NaN, I don't see how a column in a subset of the full DataFrame can.
For what it's worth, I ran:
df.columns[df.isna().all()].tolist()
and I get: []. So I don't think I have any columns where every entry is NaN (which would cause a NaN in my train_means calculation).
Any idea what I'm doing incorrectly?
Thanks!
Try look at
(df.iloc[:100000, 180:-1].isnull().sum()==100000).any()
If this return True , which mean you have a columns' value is all NaN in the first 100000 rows
And Now let us explain why you get all notnull when do the mean to the whole dataframe , since mean have skipna default as True so it will drop NaN before mean

When taking nlargest in pandas dataframe, is there a way to ignore column with NaN values?

When taking nlargest in pandas dataframe, is there a way to ignore column with NaN values? If say I want to pick 5 column headings with the 5 largest values, and if some of the columns has NaN values, then the column is ignored. If the number of columns with finite values is smaller than 5, then pick all the column headings with finite values (<5).
nlargest takes the n top rows sorted descendingly by the columns passed to the method. If there are NaN values that get to the top then it will include these. If you wan to ignore rows in which NaN values exist in the columns that were sorted by then do this:
# assume a variable 'columns' exist that defines what columns to sort
# by. You'll have to assign this yourself. Also assign 'n' yourself.
df = df.dropna(subset=columns)
df = df.nlargest(n, columns=columns)

Categories