Pandas Dataframe column will set to string but not integer

Pandas Dataframe column will set to string but not integer - python

I am trying to set the values in a Dataframe to the values from a separate dataframe. This works just fine when the source column is a string but the integer columns are not being copied, or throwing an error.
RentryDf=pd.DataFrame(index=tportDf.index.values,columns=tradesDf.columns)
RentryDf.loc[:,'TRADER']=tportDf.loc[:,'TRADER']
RentryDf.loc[:,'CONTRACT_VOL']=tportDf.loc[:,'DELIVERY VOLUME']
the second line has no problem setting to the string names of trader but the third line stays NaN. I have tried the two lines of code to just see if they would work and even these dont work.
RentryDf.loc[:,'CONTRACT_VOL']=11
RentryDf.loc[:,'CONTRACT_VOL'].apply(lambda x: 11)

I solved my question while trying to recreate it (i guess i learned a good strategy!)
The problem was in my deceleration of the dataframe i was passing columns=tradesDf.columns rather than columns=tradesDf.columns.values.
I am pleased to have it fixed but does anyone know why this would cause the DF not to set the integer values but it would set string values?

I can't reproduce the bug, it works both for float64 and int64.
I guess the problem could be wrong indexing, since line 1 will create a DF with all value as NaN.

Related

Truncate decimal numbers in string

A weird thing - i have a dataframe, lets call it ID.
While importing xlsx source file, I do .astype({"ID_1": str, "ID_2": str})
Yet, for example instead of 10300 I get 10300.0.
Moreover, then I get string "nan" as well.
In order to fix both issues I did this rubbish:
my_df['ID_1'].replace(['None', 'nan'], np.nan, inplace=True)
my_df[my_df['ID_1'].notnull()].ID_1.astype(float).astype(int).astype(str)
As a result I still have these 10300.0
Any thoughts how to fix these? I could keep it as float while importing data, instead of .astype, but it does not change anything.

The issue is that int cannot represent NaN value, so pandas converts the column to float.
It is a common pitfall, as the presence of additional rows with missing data can change the result of a given row.
You can however pick a specific pandas type to indicate that it is an integer with missing values, see Convert Pandas column containing NaNs to dtype `int`, especially the link https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

I need to change the type of few columns in a pandas dataframe. Can't do so using iloc

In a dataframe with around 40+ columns I am trying to change dtype for first 27 columns from float to int by using iloc:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
However, it's not working. I'm not getting any error, but dtype is not changing as well. It still remains float.
Now the strangest part:
If I first change dtype for only 1st column (like below):
df1.iloc[:,0]=df1.iloc[:,0].astype('int')
and then run the earlier line of code:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
It works as required.
Any help to understand this and solution to same will be grateful.
Thanks!

I guess it is a bug in 1.0.5. I tested on my 1.0.5. I have the same issue as yours. The .loc also has the same issue, so I guess pandas devs break something in iloc/loc. You need to update to latest pandas or use a workaround. If you need a workaround, using assignment as follows
df1[df1.columns[0:27]] = df1.iloc[:, 0:27].astype('int')
I tested it. Above way overcomes this bug. It will turn first 27 columns to dtype int32

Just don't use iloc. You can just create a loop over the 27 columns and convert them into the data type that you want.
df.info()
my_columns = df.columns.to_list()[0:27]
for i in my_columns:
df[i] = df[i].astype('int32')
df.info()

Trying to convert a column with strings to float via Pandas

Hi I have looked but on stackoverflow and not found a solution for my problem. Any help highly appeciated.
After importing a csv I noticed that all the types of the columns are object and not float.
My goal is to convert all the columns but the YEAR column to float. I have read that you first have to strip the columns for taking blanks out and then also convert NaNs to 0 and then try to convert strings to floats. But in the code below I'm getting an error.
My code in Jupyter notes is:
And I get the following error.
How do I have to change the code.
All the columns but the YEAR column have to be set to float.
If you can help me set the column Year to datetime that would be also very nice. But my main problem is getting the data right so I can start making calculations.
Thanks
Runy

Easiest would be
df = df.astype(float)
df['YEAR'] = df['YEAR'].astype(int)
Also, your code fails because you have two columns with the same name BBPWN, so when you do df['BBPWN'], you will get a dataframe with those two columns. Then, df['BBPWN'].str will fail.

Python Pandas Dataframe Pulling cell value of Column B based on Column A

struggling here. Probably missing something incredibly easy, but beating my head on my desk while trying to learn Python and realizing that's probably not going to solve this for me.
I have a dataframe df and need to pull the value of column B based on the value of column A.
Here's what I can tell you of my dataset that should make it easier. Column A is unique (FiscalYear) but despite being a year was converted to_numeric. Column B is not specifically unique (Sales) and like Column A was converted to to_numeric. This is what I have been trying as I was able to do this when finding the value of sales using idx max. However at a specific value, this is returning an error:
v = df.at[df.FiscalYear == 2007.0, 'Sales']
I am getting ValueError: At based indexing on an integer index can only have integer indexers I am certain that I am doing something wrong, but I can't quite put my finger on it.
And here's the code that is working for me.
v = df.at[df.FiscalYear.idxmax(), 'Sales']
No issues there, returning the proper value, etc.
Any help is appreciated. I saw a bunch of similar threads, but for some reason searching and blindly writing lines of code is failing me tonight.

you can use .loc method
df.Sales.loc[df.FiscalYear==2007.0]
this will be pandas series type object.
if you want it in a list you can do:
df.Sales.loc[df.FiscalYear==2007.0].tolist()

Can you try this:
v = df.at[df.FiscalYear.eq(2007.0).index[0], 'Sales']

Pandas read scientific notation and change

I have a dataframe in pandas that i'm reading in from a csv.
One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23
My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.
I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...
So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?
this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.
EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS
7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5

It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0 then you could use following:
df1 = df.apply(pd.to_numeric, args=('coerce',))
Then you could drop row with NA values with dropna or fill them with zeros with fillna

I realised it was the infinity statement causing the issue in my data. Removing this with a find and replace worked.
#Anton Protopopov answer also works as did #DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).
Thanks for the help.

In my case, using pandas.round() worked.
df['column'] = df['column'].round(2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.