Suppose I have a slice of a column within a dataframe df where I want to replace float values with other float values. Only the values to replace are from another dataframe, `newdf.
I've tried using
df.loc[row index condition, [column to replace vals]] = newdf[column]
but for some reason the resulting values are all NaN. Why is this so?
The value from newdf need to align with the index of df. If newdf has the exact number of values you want to insert, you can try using .values:
df.loc[row index condition, [column to replace vals]] = newdf[column].values
Related
I want to create a new column, V, in an existing DataFrame, df. I would like the value of the new column to be the difference between the value in the 'x' column in that row, and the value of the 'x' column in the row below it.
As an example, in the picture below, I want the value of the new column to be
93.244598 - 93.093285 = 0.151313.
I know how to create a new column based on existing columns in Pandas, but I don't know how to reference other rows using this method. Is there a way to do this that doesn't involve iterating over the rows in the dataframe? (since I have read that this is generally a bad idea)
You can use pandas.DataFrame.shift for your use case.
The last row will not have any row to subtract from so you will get the value for that cell as NaN
df['temp_x'] = df['x'].shift(-1)
df[`new_col`] = df['x'] - df['temp_x']
or one liner :
df[`new_col`] = df['x'] - df['x'].shift(-1)
the column new_col will contain the expected data
An ideal solution is to use diff:
df['new'] = df['x'].diff(-1)
The problem is, when I transpose the DataFrame, the header of the transposed DataFrame becomes the Index numerical values and not the values in the "id" column. See below original data for examples:
Original data that I wanted to transpose (but keep the 0,1,2,... Index intact and change "id" to "id2" in final transposed DataFrame).
DataFrame after I transpose, notice the headers are the Index values and NOT the "id" values (which is what I was expecting and needed)
Logic Flow
First this helped to get rid of the numerical index that got placed as the header: How to stop Pandas adding time to column title after transposing a datetime index?
Then this helped to get rid of the index numbers as the header, but now "id" and "index" got shuffled around: Reassigning index in pandas DataFrame & Reassigning index in pandas DataFrame
But now my id and index values got shuffled for some reason.
How can I fix this so the columns are [id2,600mpe, au565...]?
How can I do this more efficiently?
Here's my code:
DF = pd.read_table(data,sep="\t",index_col = [0]).transpose() #Add index_col = [0] to not have index values as own row during transposition
m, n = DF.shape
DF.reset_index(drop=False, inplace=True)
DF.head()
This didn't help much: Add indexed column to DataFrame with pandas
If I understand your example, what seems to happen to you is that you transpose takes your actual index (the 0...n sequence as column headers. First, if you then want to preserve the numerical index, you can store that as id2.
DF['id2'] = DF.index
Now if you want id to be the column headers then you must set that as an index, overriding the default one:
DF.set_index('id',inplace=True)
DF.T
I don't have your data reproduced, but this should give you the values of id across columns.
I have a dataframe with multiple values as zero.
I want to replace the values that are zero with the mean values of that column Without repeating code.
I have columns called runtime, budget, and revenue that all have zero and i want to replace those Zero values with the mean of that column.
Ihave tried to do it one column at a time like this:
print(df['budget'].mean())
-> 14624286.0643
df['budget'] = df['budget'].replace(0, 14624286.0643)
Is their a way to write a function to not have to write the code multiple time for each zero values for all columns?
So this is pandas dataframe I will using mask make all 0 to np.nan , then fillna
df=df.mask(df==0).fillna(df.mean())
Same we can achieve directly using replace method. Without fillna
df.replace(0,df.mean(axis=0),inplace=True)
Method info:
Replace values given in "to_replace" with "value".
Values of the DataFrame are replaced with other values dynamically.
This differs from updating with .loc or .iloc which require
you to specify a location to update with some value.
How about iterating through all columns and replacing them?
for col in df.columns:
val = df[col].mean()
df[col] = df[col].replace(0, val)
I have one dataframe (df) with a column called "id". I have another dataframe (df2) with only one column called "id". I want to drop the rows in df that have the same values in "id" as df2.
How would I go about doing this?
use boolean indexing with the isin method.
Note that the tilde ~ indicates that I take the negation of the boolean series returned by df['id'].isin(df2['id'])
df[~df['id'].isin(df2['id'])]
query
Using a query string we refer df2 using the # symbol.
df.query('id not in #df2.id')
I looked at the unique values in a column of a dataframe - pandas that I have. And there are some names in one of the columns that I do not want to include, how do I remove those rows from the dataframe, without using index value notation, but by saying if row value = "this" then remove
like...
new = df.copy
df['some column'].drop_values('this','that','other')
See indexing with isin (also, boolean indexing):
mask = df['some column'].isin(['this', 'that', 'other'])
df[~mask]