Error during Label Encoding Sci-kit Library

Error during Label Encoding Sci-kit Library - python

I am trying to encode my dataframe which is in the form strings but i am receiving this error :
error :
'<' not supported between instances of 'str' and 'NoneType'",
'occurred at index ProductFabric'
CODE:
from sklearn import preprocessing
df1=df1.apply(preprocessing.LabelEncoder().fit_transform)

here is an example from sklearn documentation hope this will help you
however in your case, you are taking df which might be a dataFrame with multiple column or there might be null values
from sklearn import preprocessing
df = [1, 1, 2, 6]
le = preprocessing.LabelEncoder().fit_transform(df)
print(le)

Related

Getting NaN in a column after applying map() function

I'm trying to replace the categorical variable in the Gender column - M, F with 0, 1. However, after running my code I'm getting NaN in place of 0 & 1.
Code-
df['Gender'] = df['Gender'].map({'F':1, 'M':0})
My input data frame-
Dataframe after running the code-
Details- Gender (Data Type) - object
Kindly, suggest a way out!

Maybe values in your dataframe are different from the expected strings 'F' and 'M'. Try to use LabelEncoder from SkLearn.
from sklearn.preprocessing import LabelEncoder
df['Gender'] = LabelEncoder().fit_transform(df['Gender'])

This particular code resolved the issue-
# Import label encoder
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'Gender'.
df['Gender']= label_encoder.fit_transform(df['Gender'])
df['Gender'].unique()

FeatureAgglomeration: feature_names_in and get_feature_names_out

I used FeatureAgglomeration to cluster my 105x105 dataframe into 40 clusters based on Spearman. Now I want to get the output feature names using feature_names_in and get_feature_names_out, but it does not seem to work, and I cannot find the solution anymore. This is my code:
import pandas as pd
import numpy as np
from sklearn.cluster import FeatureAgglomeration
features = np.array([...])
print(features.shape)
>>> (105,)
Class1_rank=pd.read_excel(r'H:\PycharmProjects\RadiomicsPipeline\Class1_rank.xlsx')
print(Class1_rank)
>>> original_shape_Elongation ... original_ngtdm_Strength
original_shape_Elongation 1.000000 ... -0.054310
original_shape_Flatness 0.616327 ... -0.019544
original_shape_LeastAxisLength 0.271645 ... -0.293157
>>> [105 rows x 105 columns]
print(agglo.n_features_in_)
>>> 105
print(agglo.feature_names_in_(Class1_rank))
print(agglo.get_feature_names_out())
df_reduced = agglo.transform(Class1)
At print(agglo.feature_names_in_()) I get to following error:
TypeError: 'numpy.ndarray' object is not callable
However, Class1_rank is a DataFrame, and thus should not give that error? What I am doing wrong here?
What I have tried:
Comment print(agglo.feature_names_in_(Class1_rank)). Works, but then print(agglo.get features out) gives the following result, and not the names of the features I included.
['featureagglomeration0' 'featureagglomeration1' 'featureagglomeration2' 'featureagglomeration3' 'featureagglomeration4'....]
Use features as input for both functions, gives the same error.
Insert the features as strings for Class1_rank, gives the same error.

feature_names_in_ is an array, not a callable, so agglo.feature_names_in_ is correct, but parentheses after it (empty or not) is incorrect.
get_feature_names_out() gives names for each cluster, which are not in 1-1 correspondence with input features, so it cannot give you something like the original feature names. You can use the labels_ attribute to find which input features go into which output features, see e.g. this answer.

TypeError while using label encoder

I am using the Beers dataset in which I want to encode the data with datatype 'object'.
Following is my code.
from sklearn import preprocessing
df3 = BeerDF.select_dtypes(include=['object']).copy()
label_encoder = preprocessing.LabelEncoder()
df3 = df3.apply(label_encoder.fit_transform)
The following error is occurring.
TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']
Any insights are helpful!!!

Use:
df3 = df3.astype(str).apply(label_encoder.fit_transform)

From the TypeError, it seems that the column you want to transform into label has two different data types (dtypes), in your case string and float which raises an error. To avoid this, you have to modify your column to have a uniform dtype (string or float). For example in Iris classification dataset, class=['Setosa', 'Versicolour', 'Virginica'] and not ['Setosa', 3, 5, 'Versicolour']

Python 3.6.5 returns '<' not supported between instances of 'tuple' and 'str' error message

I'm trying to split a data set into a training and testing part. I am struggling at a structural problem as it seems as the hierarchy of the data seems to be wrong to proceed with below code.
I tried the following:
import pandas as pd
data = pd.DataFrame(web.DataReader('SPY', data_source='morningstar')['Close'])
cutoff = '2015-1-1'
data = data[data.index < cutoff].dropna().copy()

As data.head() will reveal, data is not actually a pd.DataFrame but a pd.Series whose index is a pd.MultiIndex (as suggested also by the error which hints that each element is a tuple) rather than a pd.DatetimeIndex.
What you could do would be to simply let
df = data.unstack(0)
With that, df[df.index < cutoff] performs the filtering you are trying to do.

Running sklearns label encoder on all columns at once

Image of ull error
I am trying to run LabelEncoder on all columns that are of type object. This is the code I wrote but it throws this error:
TypeError: '<' not supported between instances of 'int' and 'str'
Does anybody know how to fix this?
le=LabelEncoder()
for col in X_test.columns.values:
if X_test[col].dtypes=='object':
data=X_train[col].append(X_test[col])
le.fit(data.values)
X_train[col]=le.transform(X_train[col])
X_test[col]=le.transform(X_test[col])

Looks like it has different types while appending. You try converting all to str at fit method:
le.fit(data.values.astype(str))
And you have to change your data type to str for transform as well since the classes in LabelEncoder will be str:
X_train[col]=le.transform(X_train[col].astype(str))
X_test[col]=le.transform(X_test[col].astype(str))
Trying to recreate similar problem. If dataframe has values with int and str:
import pandas as pd
df = pd.DataFrame({'col1':["tokyo", 1 , "paris"]})
print(df)
Result:
col1
0 tokyo
1 1
2 paris
Now, using Labelenconder would give similar error message i.e. TypeError: unorderable types: int() < str() :
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.col1.values)
Converting all to str in fit or before may resolve issue:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.col1.values.astype(str))
print(le.classes_)
Result:
['1' 'paris' 'tokyo']
If you just call le.transform(df.col1), it will throw similar error again.
So, it has to be le.transform(df.col1.astype(str)) instead.

The error is basically telling you the exact problem: some of the values are strings and some are not. You can solve this by calling c.astype(str) each time you call fit, fit_transform, or transform, on Series c, e.g.:
le.fit(data.values.astype(str))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error during Label Encoding Sci-kit Library - python

I am trying to encode my dataframe which is in the form strings but i am receiving this error : error : '<' not supported between instances of 'str' and 'NoneType'", 'occurred at index ProductFabric' CODE: from sklearn import preprocessing df1=df1.apply(preprocessing.LabelEncoder().fit_transform)

Related

Getting NaN in a column after applying map() function

FeatureAgglomeration: feature_names_in and get_feature_names_out

TypeError while using label encoder

Python 3.6.5 returns '<' not supported between instances of 'tuple' and 'str' error message

Running sklearns label encoder on all columns at once

Categories

Resources