I have a column in a dataframe that contains time in the below format.
Dataframe: df
column: time
value: 07:00:00, 13:00:00 or 14:00:00
The column will have only one of these three values in each row. I want to convert these to 0, 1 and 2. Can you help replace the times with these numeric values?
Current:
df['time'] = [07:00:00, 13:00:00, 14:00:00]
Expected:
df['time'] = [0, 1, 2]
Thanks in advance.
You can use map to do this:
import datetime
mapping = {datetime.time(07,00,00):0, datetime.time(13,00,00):1, datetime.time(14,00,00):2}
df['time']=df['time'].map(mapping)
One approach is to use map
Ex:
val = {"07:00:00":0, "13:00:00":1, "14:00:00":2}
df = pd.DataFrame({'time':["07:00:00", "13:00:00", "14:00:00"] })
df["time"] = df["time"].map(val)
print(df)
Output:
time
0 0
1 1
2 2
Related
I have a dataframe (DF1) as such - each Personal-ID will have 3 dates associated w/that ID:
I have created a dataframe (DF_ID) w/1 row for each Personal-ID & Column for Each Respective Date (which is currently blank) and would like load/loop the 3 dates/Personal-ID (DF1) into the respective date columns the final dataframe to look as such:
I am trying to learn python and have tried a number of codinging script to accomplish such as:
{for index, row in df_bnp_5.iterrows():
df_id['Date-1'] = (row.loc[0,'hv_lab_test_dt'])
df_id['Date-2'] = (row.loc[1,'hv_lab_test_dt'])
df_id['Date-3'] = (row.loc[2,'hv_lab_test_dt'])
for i in range(len(df_bnp_5)) :
df_id['Date-1'] = df1.iloc[i, 0], df_id['Date-2'] = df1.iloc[i, 2])}
Any assistance would be appreciated.
Thank You!
Here is one way. I created a 'helper' column to arrange the dates for each Personal-ID.
import pandas as pd
# create data frame
df = pd.DataFrame({'Personal-ID': [1, 1, 1, 5, 5, 5],
'Date': ['10/01/2019', '12/28/2019', '05/08/2020',
'01/19/2020', '06/05/2020', '07/19/2020']})
# change data type
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
# create grouping key
df['x'] = df.groupby('Personal-ID')['Date'].rank().astype(int)
# convert to wide table
df = df.pivot(index='Personal-ID', columns='x', values='Date')
# change column names
df = df.rename(columns={1: 'Date-1', 2: 'Date-2', 3: 'Date-3'})
print(df)
x Date-1 Date-2 Date-3
Personal-ID
1 2019-10-01 2019-12-28 2020-05-08
5 2020-01-19 2020-06-05 2020-07-19
Is there a way of parsing the column names themselves as datetime.? My column names look like this:
Name SizeRank 1996-06 1996-07 1996-08 ...
I know that I can convert values for a column to datetime values, e.g for a column named datetime, I can do something like this:
temp = pd.read_csv('data.csv', parse_dates=['datetime'])
Is there a way of converting the column names themselves? I have 285 columns i.e my data is from 1996-2019.
There's no way of doing that immediately while reading the data from a file afaik, but you can fairly simply convert the columns to datetime after you've read them in. You just need to watch out that you don't pass columns that don't actually contain a date to the function.
Could look something like this, assuming all columns after the first two are dates (as in your example):
dates = pd.to_datetime(df.columns[2:])
You can then do whatever you need to do with those datetimes.
You could do something like this.
df.columns = df.columns[:2] + pd.to_datetime (df.columns[2:])
It seems pandas will accept a datetime object as a column name...
import pandas as pd
from datetime import datetime
import re
columns = ["Name", "2019-01-01","2019-01-02"]
data = [["Tom", 1,0], ["Dick",1,1], ["Harry",0,0]]
df = pd.DataFrame(data, columns = columns)
print(df)
newcolumns = {}
for col in df.columns:
if re.search("\d+-\d+-\d+", col):
newcolumns[col] = pd.to_datetime(col)
else:
newcolumns[col] = col
print(newcolumns)
df.rename(columns = newcolumns, inplace = True)
print("--------------------")
print(df)
print("--------------------")
for col in df.columns:
print(type(col), col)
OUTPUT:
Name 2019-01-01 2019-01-02
0 Tom 1 0
1 Dick 1 1
2 Harry 0 0
{'Name': 'Name', '2019-01-01': Timestamp('2019-01-01 00:00:00'), '2019-01-02': Timestamp('2019-01-02 00:00:00')}
--------------------
Name 2019-01-01 00:00:00 2019-01-02 00:00:00
0 Tom 1 0
1 Dick 1 1
2 Harry 0
--------------------
<class 'str'> Name
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-02 00:00:00
For brevity you can use...
newcolumns = {col:(pd.to_datetime(col) if re.search("\d+-\d+-\d+", col) else col) for col in df.columns}
df.rename(columns = newcolumns, inplace = True)
Got the following issue.
I have an column in my pandas with some dates and some empty values.
Example:
1 - 3-20-2019
2 -
3 - 2-25-2019
etc
I want to convert the format from month-day-year to day-month-year, and when its empty, i just want to keep it empty.
What is the fastest approach?
Thanks!
One can initialize the data for the days using strings, then convert the strings to datetimes. A print can then deliver the objects in the needed format.
I will use an other format (with dots as separators), so that the conversion is clear between the steps.
Sample code first:
import pandas as pd
data = {'day': ['3-20-2019', None, '2-25-2019'] }
df = pd.DataFrame( data )
df['day'] = pd.to_datetime(df['day'])
df['day'] = df['day'].dt.strftime('%d.%m.%Y')
df[ df == 'NaT' ] = ''
Comments on the above.
The first instance of df is in the ipython interpreter:
In [56]: df['day']
Out[56]:
0 3-20-2019
1 None
2 2-25-2019
Name: day, dtype: object
After the conversion to datetime:
In [58]: df['day']
Out[58]:
0 2019-03-20
1 NaT
2 2019-02-25
Name: day, dtype: datetime64[ns]
so that we have
In [59]: df['day'].dt.strftime('%d.%m.%Y')
Out[59]:
0 20.03.2019
1 NaT
2 25.02.2019
Name: day, dtype: object
That NaT makes problems. So we replace all its occurrences with the empty string.
In [73]: df[ df=='NaT' ] = ''
In [74]: df
Out[74]:
day
0 20.03.2019
1
2 25.02.2019
Not sure if this is the fastest way to get it done. Anyway,
df = pd.DataFrame({'Date': {0: '3-20-2019', 1:"", 2:"2-25-2019"}}) #your dataframe
df['Date'] = pd.to_datetime(df.Date) #convert to datetime format
df['Date'] = [d.strftime('%d-%m-%Y') if not pd.isnull(d) else '' for d in df['Date']]
Output:
Date
0 20-03-2019
1
2 25-02-2019
Every time I append a dataframe to a text file, I want it to contain a column with the same timestamp for each row. The timestamp could be any arbitrary time as long as its different from next time when I append a new dataframe to the existing text file. Below code inserts a column named TimeStamp, but doesn't actually insert datetime values. The column is simply empty. I must be overlooking something simple. What am I doing wrong?
t = [datetime.datetime.now().replace(microsecond=0) for i in range(df.shape[0])]
s = pd.Series(t, name = 'TimeStamp')
df.insert(0, 'TimeStamp', s)
I think simpliest is use insert only:
df = pd.DataFrame({'A': list('AAA'), 'B': range(3)}, index=list('xyz'))
print (df)
A B
x A 0
y A 1
z A 2
df.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0))
print (df)
TimeStamp A B
x 2018-02-15 07:35:35 A 0
y 2018-02-15 07:35:35 A 1
z 2018-02-15 07:35:35 A 2
Your working version - change range(df.shape[0]) to df.index for same indices in Series and in DataFrame:
t = [datetime.datetime.utcnow().replace(microsecond=0) for i in df.index]
s = pd.Series(t, name = 'TimeStamp')
df.insert(0, 'TimeStamp', s)
I have a pandas dataframe indexed by time. I want to know the total number of observations (i.e. dataframe rows) that happen each day.
Here is my dataframe:
import pandas as pd
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
'value': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
df = pd.DataFrame(data, columns = ['date', 'value'])
print(df)
What I want is a dataframe (or series) that looks like this:
date value
0 2014-05-01 2
1 2014-05-02 3
2 2014-05-03 2
3 2014-05-04 2
After reaching a bunch of StackOverflow questions, the closest I can get is:
df['date'].groupby(df.index.map(lambda t: t.day))
But that doesn't produce anything of use.
Use resampling. You'll need the date columns to be datetime data type (as is, they are strings) and you'll need to set it as the index to use resampling.
In [13]: df['date'] = pd.to_datetime(df['date'])
In [14]: df.set_index('date').resample('D', 'count')
Out[14]:
value
date
2014-05-01 2
2014-05-02 4
2014-05-03 2
2014-05-04 2
You can use any arbitrary function or built-in convenience functions given as strings, included 'count' and 'sum' etc.
Wow, #Jeff wins:
df.resample('D',how='count')
My worse answer:
The first problem is that your date column is strings, not datetimes. Using code from this thread:
import dateutil
df['date'] = df['date'].apply(dateutil.parser.parse)
Then it's trivial, and you had the right idea:
grouped = df.groupby(df['date'].apply(lambda x: x.date()))
grouped['value'].count()
I know nothing about pandas, but in Python you could do something like:
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
'value': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
import datetime
dates = [datetime.datetime.strptime(ts, '%Y-%m-%d %H:%M:%S.%f').strftime('%Y-%m-%d') for ts in data['date']]
cnt = {}
for d in dates: cnt[d] = (cnt.get(d) or 0) + 1
for i, k in enumerate(sorted(cnt)):
print("%d %s %d" % (i,k,cnt[k]))
Which would output:
0 2014-05-01 2
1 2014-05-02 4
2 2014-05-03 2
3 2014-05-04 2
If you didn't care about parsing and reformatting your datetime strings, I suppose something like
dates = [d[0:10] for d in data['date']]
could replace the longer dates=... line, but it seems less robust.
As exp1orer mentions, you'll need to convert string date to date format. Or if you simply want to count obs but don't care date format, you can take the first 10 chars of date column. Then use the value_counts() method (Personally, I prefer this to groupby + sum for this simple obs counts.
You can achive what you need by one liner:
In [93]: df.date.str[:10].value_counts()
Out[93]:
2014-05-02 4
2014-05-04 2
2014-05-01 2
2014-05-03 2
dtype: int64