Convert Pandas Series to Categorical

Convert Pandas Series to Categorical - python

I have a Panda series 'ids' of only unique ids, which is a dtype of object.
data_df.id.dtype
returns dtype('O')
I'm trying to follow the example here to create a sparse matrix from my df: Efficiently create sparse pivot tables in pandas?
id_u= list(data_df.id.unique())
row = data_df.id.astype('category', categories=reviewer_u).cat.codes
and I get:
TypeError: data type "category" not understood
I'm not sure what this error means and I haven't been able to find much on it.

Try instead:
row = pd.Categorical(data_df['id'], categories=reviewer_u)
You can get the codes using:
row.codes

Related

How to make matrix taking specific columns from Dataframe pandas?

I have my data set, https://github.com/mayuripandey/Data-Analysis/blob/main/similarity.csv, is there any way i can make matrix with two specific column and make a matrix of it? For eg:
Count and Topic?

Simply subset the columns of interest, and retrieve the values without the column names using the ".values" attribute.
df = pd.read_html("https://github.com/mayuripandey/Data-Analysis/blob/main/similarity.csv")[0]
df[["Count","Topic"]].values
This returns a 2D numpy array of only the values, then if you need, you can transform into a matrix object like this:
np.matrix(df[["Count","Topic"]].values)

Why does pandas DataFrame convert integer to object like this?

I am using value_counts() to get the frequency for sec_id. The output of value_counts() should be integers.
When I build DataFrame with these integers, I found those columns are object dtype. Does anyone know the reason?

They are the object dtype because your sec_id column contains string values (e.g. "94114G"). When you call .values on the dataframe created by .reset_index(), you get two arrays which both contain string objects.
More importantly, I think you are doing some unnecessary work. Try this:
>>> sec_count_df = df['sec_id'].value_counts().rename_axis("sec_id").rename("count").reset_index()

How to maintain same index for dictonary from dataframe

When I tried to run this code:
X_test = df.values
df_new = ks.DataFrame(X_test, columns = ['Sales','T_Year','T_Month','T_Week','T_Day','T_Hour'])
I am getting new index for df_new data frame which is not the same has df.
I tried changing the code below to retain index for dictionary. However it gives an error:
X_test = df.values(index=df.index)
'numpy.ndarray' object is not callable.
Is there a way to maintain an index for df_new which are the same has df dataframe?

DataFrames have a set_index() method in order to manually set the "index column". Koalas in particular accepts as main argument:
keys: label or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index and np.ndarray.
By that, you can pass the Index object of your original df:
X_test = df.values
df_new = ks.DataFrame(X_test, columns = ['Sales','T_Year','T_Month','T_Week','T_Day','T_Hour'])
df_new = df_new.set_index(df.index)
Now about the line you are getting an error:
X_test = df.values(index=df.index)
The errors arise due to the fact that you are kind of confusing numpy arrays with pandas DataFrames.
When you call df.values of a DataFrame df, this returns a np.ndarray object with all the dataframe values without the index.
This is not a function, so you cannot "call it" by writing (index=df.index).
Numpy arrays don't have custom indixes, they are just arrays. Your df_new only cares about that, and you can set it as I showed above.
Disclaimer: I wasn't able to install koalas for this answer, so this is only tested in pandas Dataframes. If koalas does support pandas' interface completely, that should work.

Convert dataframe column of type array to integer

I have the following data frame:
After I perform this operation:
pages = df_ref.groupby("KV").work_p.unique().reset_index()
I'm getting the new dataframe where the data type of the column work_p is an array. How can I extract/convert it to integer?
I feel like I can achieve the goal also by changing the first step, but as I am new to pandas I, unfortunately, stuck.

Try:
pages['work_p'] = pages.work_p.apply(lambda x: x[0])

How to convert grouped/binned dataframe to numpy array?

I was wondering how I would be able to convert my binned dataframe to a binned numpy array that I can use in sklearn's PCA.
Here's my code so far (x is my original unbinned dataframe):
bins=(2,6,10,14,20,26,32,38,44,50,56,62,68,74,80,86,92,98)
binned_data = x.groupby(pd.cut(x.Weight, bins))
I want to convert binned_data to a numpy array. Thanks in advance.
EDIT:
When I try binned_data.values, I receive this error:
AttributeError: Cannot access attribute 'values' of 'DataFrameGroupBy' objects, try using the 'apply' method

You need to apply some kind of aggregation to the GroupBy object to return a DataFrame. Once you have that, you can use .values to extract the numpy arrary.
For example, if you wanted the sum or count of the data in each bin you could do:
binned_data.sum().values
binned_data.size().values
Edit:
My code wasn't exactly right, because the column (Weight) and the index will have the same name. It can be fixed by renaming the index, as below:
binned_data = x.groupby(pd.cut(x.Weight, bins)).sum()
binned_data.index.name = 'Weight_Bin'
binned_data.reset_index().values

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert Pandas Series to Categorical - python

Try instead: row = pd.Categorical(data_df['id'], categories=reviewer_u) You can get the codes using: row.codes

Related

How to make matrix taking specific columns from Dataframe pandas?

Why does pandas DataFrame convert integer to object like this?

How to maintain same index for dictonary from dataframe

Convert dataframe column of type array to integer

How to convert grouped/binned dataframe to numpy array?

Categories

Resources