How to convert "object" columns to "datetime" and keep NaNs as-is - python

I want to convert these "object' columns to "datetime"
I've tried this
dashboard[['started_at_ahc', 'ended_at_ahc']] = dashboard[['started_at_ahc', 'ended_at_ahc']].apply(pd.to_datetime, erros="coerce")
I want to keep nan values as nan, but the code above converted the nans to Sep 21, 1677 2:17 AM. How can I fix that error; I mean to convert the object to datetime but in the same time keep the nans as nans.

Pass errors='ignore' to the to_datetime function.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

The problem simply was in Streamlit itself. After conversion, I used the command st.write(dashboard[['date_of_birth', 'started_at_ahc', 'ended_at_ahc']]) which fill each NaT value to an initial date I think Streamlit developers use as a default value with NaT values. While using the same logic and also tried your solution #Ismael EL ATIFI in Jupyter Notebook, the results were okay and everything is correct. The problem is only with Streamlit. I've posted an issue and waiting for a reply

Related

I need to change the type of few columns in a pandas dataframe. Can't do so using iloc

In a dataframe with around 40+ columns I am trying to change dtype for first 27 columns from float to int by using iloc:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
However, it's not working. I'm not getting any error, but dtype is not changing as well. It still remains float.
Now the strangest part:
If I first change dtype for only 1st column (like below):
df1.iloc[:,0]=df1.iloc[:,0].astype('int')
and then run the earlier line of code:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
It works as required.
Any help to understand this and solution to same will be grateful.
Thanks!
I guess it is a bug in 1.0.5. I tested on my 1.0.5. I have the same issue as yours. The .loc also has the same issue, so I guess pandas devs break something in iloc/loc. You need to update to latest pandas or use a workaround. If you need a workaround, using assignment as follows
df1[df1.columns[0:27]] = df1.iloc[:, 0:27].astype('int')
I tested it. Above way overcomes this bug. It will turn first 27 columns to dtype int32
Just don't use iloc. You can just create a loop over the 27 columns and convert them into the data type that you want.
df.info()
my_columns = df.columns.to_list()[0:27]
for i in my_columns:
df[i] = df[i].astype('int32')
df.info()

Trying to convert a column with strings to float via Pandas

Hi I have looked but on stackoverflow and not found a solution for my problem. Any help highly appeciated.
After importing a csv I noticed that all the types of the columns are object and not float.
My goal is to convert all the columns but the YEAR column to float. I have read that you first have to strip the columns for taking blanks out and then also convert NaNs to 0 and then try to convert strings to floats. But in the code below I'm getting an error.
My code in Jupyter notes is:
And I get the following error.
How do I have to change the code.
All the columns but the YEAR column have to be set to float.
If you can help me set the column Year to datetime that would be also very nice. But my main problem is getting the data right so I can start making calculations.
Thanks
Runy
Easiest would be
df = df.astype(float)
df['YEAR'] = df['YEAR'].astype(int)
Also, your code fails because you have two columns with the same name BBPWN, so when you do df['BBPWN'], you will get a dataframe with those two columns. Then, df['BBPWN'].str will fail.

How to convert pandas data frame datetime column to int?

I am facing an issue while converting one of my datetime columns in pandas dataframe to int. My code is:
df['datetime_column'].astype(np.int64)
The error which I am getting is:
invalid literal for int() with base 10: '2018-02-25 09:31:15'
I am quite clueless about what is happening as the conversion for some of my other datetime columns are working fine. Is there some issue with the range of the date which can be converted to int?
You would use
df['datetime_colum'].apply(lambda x:x.toordinal())
If it fails, the cause could be that your column is an object and not datetime. So you need:
df['datetime_colum'] = pd.to_datetime(df['datetime_colum'])
before sending it to ordinal.
If you are working on features engineering, you can try creating days between date1 and date2, get boolean for if it is winter, summer, autumn or spring by looking at months, and if you have time, boolean of if it is morning, noontime, or night, but all depending on your machines learning problem.
it seems you solved the problem yourself judging from your comment. My guess is that you created the data frame without specifying that the column should be read as anything other than a string, so it's a string. If I'm right, and you check the column type, it should show as object. If you check an individual entry in the column, it should show as a string.
If the issue is something else, please follow up.

Pandas dropna does not work as expected on a MultiIndex

I have a Pandas DataFrame with a multiIndex. The index consists of a date and a text string. Some of the values are NaN and when I use dropna(), the row disappears as expected. However, when I look at the index using df.index, the dropped dates are still there. This is problematic as when I use the to_panel function, the dropped dates reappear.
Am I using dropna incorrectly or how can I resolve this?
I think it is issue 2770.
And solution is decribe here.
index.get_level_values(level)
For me this actually worked :
df1=df1[pd.notnull(df1['Cloumn Name'])]

Pandas read scientific notation and change

I have a dataframe in pandas that i'm reading in from a csv.
One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23
My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.
I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...
So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?
this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.
EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS
7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5
It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0 then you could use following:
df1 = df.apply(pd.to_numeric, args=('coerce',))
Then you could drop row with NA values with dropna or fill them with zeros with fillna
I realised it was the infinity statement causing the issue in my data. Removing this with a find and replace worked.
#Anton Protopopov answer also works as did #DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).
Thanks for the help.
In my case, using pandas.round() worked.
df['column'] = df['column'].round(2)

Categories