I have a dataframe with multiple rows and columns. One of my columns (lets call that column A) has rows that contain mix of strings, strings and integers (i.e RSE1023), integers only and floats only. I want to find a way to convert the rows of the column A that are floats to integers. Probably with something that can scan through the column in the dataframe and find the rows that are columns and make them integers?
You could try something like:
df['A']=df['A'].apply(lambda r:int(r) if isinstance(r,float) else r)
In pandas you do not give datatypes to rows but to columns.
A trick you could use would be to .transpose the dataframe, turning the rows into columns and vice versa.
Related
suppose I have a DataFrame
I want to convert this into dataframe which has single value in each row like below
Is there any way to obtain this?
I checked the answer here but this doesn't work for me.
How to get the integer portion of a float column in pandas
As I need to write further conditional statements which will perform operations on the exact values in the columns and the corresponding values in other columns.
So basically I am hoping that for my two dataframes df1 and df2 I will form a concatenated dataframe using
dfn_c = pd.concat([dfn_1, dfn_2], axis=1)
then write something like
dfn_cn = dfn_c.loc[df1.X1.isin(df2['X2'])]
where X1 and X2 are the said columns respectively. The above line of course makes an exact comparison whereas I want to compare only the integer portion and then form the new dataframe.
IIUC, try casting to int then compare.
dfn_cn = dfn_c.loc[df1['X1'].astype(int).isin(df2['X2'].astype(int))]
I have a DataFrame that has columns with numbers, but these numbers are represented as strings. I want to find these columns automatically, without telling which column should be numeric. How can I do this in pandas?
You can utilise contains from pandas
>>> df.columns[df.columns.str.contains('.*[0-9].*', regex=True)]
The regex can be modified to accomodate a wide range of patterns you want to search
You can first filter using pd.to_numeric and then combine_first with original column:
df['COL_NAME'] = pd.to_numeric(df['COL_NAME'],errors='coerce').combine_first(df['COL_NAME'])
I have a dataframe with 15 columns. 5 of those columns use numbers but some of the entries are either blanks, or words. I want to convert those to zero.
I am able to convert the entries in one of the column to zero but when I try to do that for multiple columns, I am not able to do it. I tried this for one column:
pd.to_numeric(Tracker_sample['Product1'],errors='coerce').fillna(0)
and it works, but when I try this for multiple columns:
pd.to_numeric(Tracker_sample[['product1','product2','product3','product4','Total']],errors='coerce').fillna(0)
I get the error : arg must be a list, tuple, 1-d array, or Series
I think it is the way I am calling the columns to be fixed. I am new to pandas so any help would be appreciated. Thank you
You can use:
Tracker_sample[['product1','product2','product3','product4','Total']].apply(pd.to_numeric, errors='coerce').fillna(0)
With a for loop?
for col in ['product1','product2','product3','product4','Total']:
Tracker_sample[col] = pd.to_numeric(Tracker_sample[col],errors='coerce').fillna(0)
I need to turn a pandas dataframe column into a float. This float is taken from a larger csv file. To get just the number I need to a dataframe I did:
m_df = pd.read_csv(input_file,nrows=1,header=None,skiprows=4)
m1=m_df.ix[:,1:1]
This gets me the dataframe with just the number I want in the first column. How do I turn that number into a float?
float((m_df.ix[:,1:1]).values)
For pandas dataframes, type casting works when done on the values, rather than the dataframe.