Turn pandas dataframe column into float - python

I need to turn a pandas dataframe column into a float. This float is taken from a larger csv file. To get just the number I need to a dataframe I did:
m_df = pd.read_csv(input_file,nrows=1,header=None,skiprows=4)
m1=m_df.ix[:,1:1]
This gets me the dataframe with just the number I want in the first column. How do I turn that number into a float?

float((m_df.ix[:,1:1]).values)
For pandas dataframes, type casting works when done on the values, rather than the dataframe.

Related

extract number of ranking position in pandas dataframe

I have a pandas dataframe with a column named ranking_pos. All the rows of this column look like this: #123 of 12,216.
The output I need is only the number of the ranking, so for this example: 123 (as an integer).
How do I extract the number after the # and get rid of the of 12,216?
Currently the type of the column is object, just converting it to integer with .astype() doesn't work because of the other characters.
You can use .str.extract:
df['ranking_pos'].str.extract(r'#(\d+)').astype(int)
or you can use .str.split():
df['ranking_pos'].str.split(' of ').str[0].str.replace('#', '').astype(int)
df.loc[:,"ranking_pos"] =df.loc[:,"ranking_pos"].str.replace("#","").astype(int)

How to check if only the integer portion of the elements in two pandas data columns match?

I checked the answer here but this doesn't work for me.
How to get the integer portion of a float column in pandas
As I need to write further conditional statements which will perform operations on the exact values in the columns and the corresponding values in other columns.
So basically I am hoping that for my two dataframes df1 and df2 I will form a concatenated dataframe using
dfn_c = pd.concat([dfn_1, dfn_2], axis=1)
then write something like
dfn_cn = dfn_c.loc[df1.X1.isin(df2['X2'])]
where X1 and X2 are the said columns respectively. The above line of course makes an exact comparison whereas I want to compare only the integer portion and then form the new dataframe.
IIUC, try casting to int then compare.
dfn_cn = dfn_c.loc[df1['X1'].astype(int).isin(df2['X2'].astype(int))]

read row and convert float to integer in pandas

I have a dataframe with multiple rows and columns. One of my columns (lets call that column A) has rows that contain mix of strings, strings and integers (i.e RSE1023), integers only and floats only. I want to find a way to convert the rows of the column A that are floats to integers. Probably with something that can scan through the column in the dataframe and find the rows that are columns and make them integers?
You could try something like:
df['A']=df['A'].apply(lambda r:int(r) if isinstance(r,float) else r)
In pandas you do not give datatypes to rows but to columns.
A trick you could use would be to .transpose the dataframe, turning the rows into columns and vice versa.

How do I convert integer column values to categorical or string column values dynamically in python?

I have a column that has the values 1,2,3....
I need to change this value to Cluster_1, Cluster_2, Cluster_3... dynamically. My original table looks like below, where cluster_predicted is a column, containing integer value and I need to convert these numbers to cluster_0, cluster_1...
I have tried the below code
clustersDf['clusterDfCategorical'] = "Cluster_" + str(clustersDf['clusterDfCategorical'])
But this is giving me a very weird output as shown below.
import pandas as pd
df = pd.DataFrame()
df['cols']=[1,2,3,4,5]
df['vals']=['one','two','three','four','five']
df['cols'] =df['cols'].astype(str)
df['cols']= 'confuse_'+df['cols']
print(df)
try this , the string conversion is making the issue for you.
One way to convert to string is to use astype

How to select columns in python based on datatype

I'm trying to organize the columns in the dataframe based on datatype. I thought I'd do this by using pandas.loc to isolate datatypes of each column and then append them to each other to get one large organized dataset
import numpy as np
import pandas as pd
control = pd.read_csv(loan_path, chunksize=1000)
control = pd.concat(control, ignore_index=True)
int_columns= control.loc[:, control.dtypes==int]
I expect a new dataset with every row and only the columns that have integer datatypes. Instead I get the index of every row but 0 columns.
I know there are columns with integer datatypes. I've also tried looking for categories and floats and always get the same wrong result

Categories