This question already has answers here:
NaN in mapper - name 'nan' is not defined
(3 answers)
Closed 6 months ago.
After I copy/paste a list of intervals taken from a column dataframe, a 'nan' entry is included, and the list looks exactly like the following one:
from pandas import Interval
inter=[Interval(32.252, 40.21, closed='right'), Interval(40.21, 48.168, closed='right'),nan]
but if I try to print it
print(inter)
I get the following error:
NameError: name 'nan' is not defined
I tried to substitute 'nan' for 'np.nan' but it seems like that the presence of the 'nan' entry in the 'inter' list, which I repeat, I manually copied and pasted it from an existing one,
is a problem.
How should I solve this?
Python does not have a built-in name nan, nor is there a keyword.
It looks as if you forgot to import it;
numpy defines such a name:
from numpy import nan
From the local name df I infer you are probably using pandas; pandas' documentation usually uses np.nan, where np is the numpy module imported with import numpy as np.
Reference: here
Related
Calculating the basic statistics, I get the following working well:
import pandas as pd
max(df[Price])
min(df[Price])
But, this is returning an error:
mean(df[Price])
NameError: name 'mean' is not defined
I'm just trying to understand the logic of this.
This one works well:
df[Price].mean()
What kind of statistics work after the dot and which ones must wrap the column?
min() and max() are functions provided as Python built-ins.
You can use them on any iterable, which includes Pandas series, which is why what you're doing works.
Pandas also provides .min() and .max() as methods on series and dataframes, so e.g. df["Price"].min() would also work. The full list of Series functions is here; the full list of DataFrame functions is here.
If you do want to use a free function called mean(), e.g. when you have something that's not a Pandas series and you don't want to convert it to one, one actually does exist in the Python standard library, but you will have to import it:
from statistics import mean
This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
I am trying to create two new columns in my dataframe depending on the values of the columns Subscribers, External Party and Direction. If the Direction is I for Incoming, column a should become External Party and col B should become Subscriber. If the Direction is O for Outgoing, it should be the other way around. I use the code:
import pandas as pd
import numpy as np
...
df['a'] = np.where((df.Direction == 'I'), df['External Party'], df['Subscriber'])
df['b'] = np.where((df.Direction == 'O'), df['External Party'], df['Subscriber'])
I get a SettingWithCopyWarning from Pandas, but the code does what it needs to do. How can I improve this operation to avoid the error?
Thanks in advance!
Jo
Inspect the place in your code where df is created.
Most probably, it is a view of another DataFrame, something like:
df = df_src[...]
Then any atempt to save something in df causes just this warning.
To avoid it, create df as a truly independent DataFrame, with its
own data buffer. Something like:
df = df_src[...].copy()
Now df has its own data buffer, and can be modified without the
above warning.
If you are planning to work with the same df later on in your code then it is sometimes useful to create a deep copy of the df before making any iterations.
Pandas native copy method is not always acting as one would expect - here is a similar question that might give more insights.
You can use copy module that comes with python to copy the entire object and to ensure that there are no links between 2 dataframes.
import copy
df_copy = copy.deepcopy(df)
This question already has answers here:
Is there a NumPy function to return the first index of something in an array?
(20 answers)
Closed 3 years ago.
I want to ask a question about finding the position of an element within an array in Python's numpy package.
I am using Jupyter Notebook for Python 3 and have the following code illustrated below:
concentration_list = array([172.95, 173.97, 208.95])
and I want to write a block of code that would be able to return the position of an element within the array.
For this purpose, I wanted to use 172.95 to demonstrate.
Initially, I attempted to use .index(), passing in 172.95 inside the parentheses but this did not work as numpy does not recognise the .index() method -
concentration_position = concentration_list.index(172.95)
AttributeError: 'numpy.ndarray' object has no attribute 'index'
The Sci.py documentation did not mention anything about such a method being available when I accessed the site.
Is there any function available (that I may not have discovered) to solve the problem?
You can go through the where function from the numpy library
import numpy as np
concentration_list = np.array([172.95, 173.97, 208.95])
number = 172.95
print(np.where(concentration_list == number)[0])
Output : [0]
Use np.where(...) for this purpose e.g.
import numpy as np
concentration_list = np.array([172.95, 173.97, 208.95])
index=np.ravel(np.asarray(concentration_list==172.95).nonzero())
print(index)
#outputs (array of all indexes matching the condition):
>> [0]
This question already has answers here:
Deleting multiple columns based on column names in Pandas
(11 answers)
Closed 4 years ago.
I can't figure this bug out. I think it is my misunderstanding of a dataframe and indexing through one. Also, maybe a misunderstanding of a for loop. (I am used to matlab for loops... iterations are, intuitively, way easier :D)
Here is the error:
KeyError: "['United States' 'Canada' 'Mexico'] not found in axis"
This happens at the line: as_df=as_df.drop(as_df[column])
But this makes no sense... I am calling an individual column not the entire set of dummy variables.
The following code can be copied and ran. I made sure of it.
MY CODE:
import pandas as pd
import numpy as np
df=pd.DataFrame({"country": ['United States','Canada','Mexico'], "price": [23,32,21], "points": [3,4,4.5]})
df=df[['country','price','points']]
df2=df[['country']]
features=df2.columns
print(features)
target='points'
#------_-__-___---____________________
as_df=pd.concat([df[features],df[target]],axis=1)
#Now for Column Check
for column in as_df[features]:
col=as_df[[column]]
#Categorical Data Conversion
#This will split the countries into their own column with 1 being when it
#is true and 0 being when it is false
col.select_dtypes(include='object')
dummies=pd.get_dummies(col)
#ML Check:
dumcols=dummies.drop(dummies.columns[1],axis=1)
if dumcols.shape[1] > 1:
print(column)
as_df=as_df.drop(as_df[column])
else:
dummydf=col
as_df=pd.concat([as_df,dummydf],axis=1)
as_df.head()
I would comment instead of answering, but I do not have enough reputation to do so. (I need clarification to help you and Stack Exchange does not provide me with a way to do so "properly".)
I'm not entirely sure what your end-goal is. Could you clarify what your end result for as_df would look like? Including after the for loop ends, and after the entire code is finished running?
Found my mistake.
as_df=as_df.drop(as_df[column])
should be
as_df=as_df.drop(column,axis=1)
This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 4 years ago.
I have a small dataframe, say this one :
Mass32 Mass44
12 0.576703 0.496159
13 0.576658 0.495832
14 0.576703 0.495398
15 0.576587 0.494786
16 0.576616 0.494473
...
I would like to have a rolling mean of column Mass32, so I do this:
x['Mass32s'] = pandas.rolling_mean(x.Mass32, 5).shift(-2)
It works as in I have a new column named Mass32s which contains what I expect it to contain but I also get the warning message:
A value is trying to be set on a copy of a slice from a DataFrame. Try
using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I'm wondering if there's a better way to do it, notably to avoid getting this warning message.
This warning comes because your dataframe x is a copy of a slice. This is not easy to know why, but it has something to do with how you have come to the current state of it.
You can either create a proper dataframe out of x by doing
x = x.copy()
This will remove the warning, but it is not the proper way
You should be using the DataFrame.loc method, as the warning suggests, like this:
x.loc[:,'Mass32s'] = pandas.rolling_mean(x.Mass32, 5).shift(-2)