I have a dataset which looks like below
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:16 000]
[25/May/2015:23:11:16 000]
Now i have made this into a DF and df[0] has [25/May/2015:23:11:15 and df[1] has 000]. I want to send all the data which ends with same seconds to a file. in the above example they end with 15 and 16 as seconds. So all ending with 15 seconds into one and the other into a different one and many more
I have tried the below code
import pandas as pd
data = pd.read_csv('apache-access-log.txt', sep=" ", header=None)
df = pd.DataFrame(data)
print(df[0],df[1].str[-2:])
Converting that column to a datetime would make it easier to work on, e.g.:
df['date'] = pd.to_datetime(df['date'], format='%d/%B/%Y:%H:%m:%S')
The you can simply iterate over a groupby(), e.g.:
In []:
for k, frame in df.groupby(df['date'].dt.second):
#frame.to_csv('file{}.csv'.format(k))
print('{}\n{}\n'.format(k, frame))
Out[]:
15
date value
0 2015-11-25 23:00:15 0
1 2015-11-25 23:00:15 0
16
date value
2 2015-11-25 23:00:16 0
3 2015-11-25 23:00:16 0
You can set your datetime as the index for the dataframe, and then use loc and to_csv Pandas' functions. Obviously, as other answers points out, you should convert your date to datetime while reading your dataframe.
Example:
df = df.set_index(['date'])
df.loc['25/05/2018 23:11:15':'25/05/2018 23:11:15'].to_csv('df_data.csv')
Try out this,
## Convert a new column with seconds value
df['seconds'] = df.apply(lambda row: row[0].split(":")[3].split(" ")[0], axis=1)
for sec in df['seconds'].unique():
## filter by seconds
print("Resutl ",df[df['seconds'] == sec])
Related
I have the following DataFrame with a Date column,
0 2021-12-13
1 2021-12-10
2 2021-12-09
3 2021-12-08
4 2021-12-07
...
7990 1990-01-08
7991 1990-01-05
7992 1990-01-04
7993 1990-01-03
7994 1990-01-02
I am trying to find the index for a specific date in this DataFrame using the following code,
# import raw data into DataFrame
df = pd.DataFrame.from_records(data['dataset']['data'])
df.columns = data['dataset']['column_names']
df['Date'] = pd.to_datetime(df['Date'])
# sample date to search for
sample_date = dt.date(2021,12,13)
print(sample_date)
# return index of sample date
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
The output of the program is,
2021-12-13
[]
I can't understand why. I have cast the Date column in the DataFrame to a DateTime and I'm doing a like-for-like comparison.
I have reproduced your Dataframe with minimal samples. By changing the way that you can compare the date will work like this below.
import pandas as pd
import datetime as dt
df = pd.DataFrame({'Date':['2021-12-13','2021-12-10','2021-12-09','2021-12-08']})
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
sample_date = dt.datetime.strptime('2021-12-13', '%Y-%m-%d')
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
output:
[0]
The search data was in the index number 0 of the DataFrame
Please let me know if this one has any issues
Just as the title says, I am trying to convert my DataFrame lables to type datetime. In the following attempted solution I pulled the labels from the DataFrame to dates_index and tried converting them to datetime by using the function DatetimeIndex.to_datetime, however, my compiler says that DatetimeIndex has no attribute to_datetime.
dates_index = df.index[0::]
dates = DatetimeIndex.to_datetime(dates_index)
I've also tried using the pandas.to_datetime function.
dates = pandas.to_datetime(dates_index, errors='coerce')
This returns the datetime wrapped in DatetimeIndex instead of just datetimes.
My DatetimeIndex labels contain data for date and time and my goal is to push that data into two seperate columns of the DataFrame.
if your DateTimeIndex is myindex, then
df.reset_index() will create a myindex column, which you can do what you want with, and if you want to make it an index again later, you can revert by `df.set_index('myindex')
You can set the index after converting the datatype of the column.
To convert datatype to datetime, use: to_datetime
And, to set the column as index use: set_index
Hope this helps!
import pandas as pd
df = pd.DataFrame({
'mydatecol': ['06/11/2020', '06/12/2020', '06/13/2020', '06/14/2020'],
'othcol1': [10, 20, 30, 40],
'othcol2': [1, 2, 3, 4]
})
print(df)
print(f'Index type is now {df.index.dtype}')
df['mydatecol'] = pd.to_datetime(df['mydatecol'])
df.set_index('mydatecol', inplace=True)
print(df)
print(f'Index type is now {df.index.dtype}')
Output is
mydatecol othcol1 othcol2
0 06/11/2020 10 1
1 06/12/2020 20 2
2 06/13/2020 30 3
3 06/14/2020 40 4
Index type is now int64
othcol1 othcol2
mydatecol
2020-06-11 10 1
2020-06-12 20 2
2020-06-13 30 3
2020-06-14 40 4
Index type is now datetime64[ns]
I found a quick solution to my problem. You can create a new pandas column based on the index and then use datetime to reformat the date.
df['date'] = df.index # Creates new column called 'date' of type Timestamp
df['date'] = df['date'].dt.strftime('%m/%d/%Y %I:%M%p') # Date formatting
I would like to convert a date column in a Dataframe to unixtime in a new column.
Index Date Second Measurement
0 0 2020-02-24 10:52:38 0.000 0.001155460021
1 1 2020-02-24 10:52:39 0.109 0.001124729984
2 2 2020-02-24 10:52:40 0.203 0.001119069988
I tried a lot, but always get an error. This does not work:
laser['unixtime'] = laser['Date'].arrow.timestamp()
laser['unixtime'] = laser['Date'].timestamp()
laser['unixtime'] = time.mktime(laser['Date'].timetuple())
Can anyone help me out?
greets
Solution with two examples (one when the Date column is a string and one when it is not).
import pandas as pd
from datetime import timezone
# Suppress scientific notation in Pandas
pd.set_option('display.float_format', lambda x: f"{x:.0f}")
df = pd.DataFrame()
df["Date"] = pd.date_range(start='2014-08-01 09:00', freq='H', periods=3, tz='Europe/Berlin')
df["DateUnix"] = df.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df
df2 = pd.DataFrame({"Date": ["2020-02-24 10:52:38"]})
# Convert from object to datetime
df2.Date = pd.to_datetime(df2.Date)
df2["DateUnix"] = df2.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df2
I'm trying to create a new date column based on an existing date column in my dataframe. I want to take all the dates in the first column and make them the first of the month in the second column so:
03/15/2019 = 03/01/2019
I know I can do this using:
df['newcolumn'] = pd.to_datetime(df['oldcolumn'], format='%Y-%m-%d').apply(lambda dt: dt.replace(day=1)).dt.date
My issues is some of the data in the old column is not valid dates. There is some text data in some of the rows. So, I'm trying to figure out how to either clean up the data before I do this like:
if oldcolumn isn't a date then make it 01/01/1990 else oldcolumn
Or, is there a way to do this with try/except?
Any assistance would be appreciated.
At first we generate some sample data:
df = pd.DataFrame([['2019-01-03'], ['asdf'], ['2019-11-10']], columns=['Date'])
This can be safely converted to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
mask = df['Date'].isnull()
df.loc[mask, 'Date'] = dt.datetime(1990, 1, 1)
Now you don't need the slow apply
df['New'] = df['Date'] + pd.offsets.MonthBegin(-1)
Try with the argument errors=coerce.
This will return NaT for the text values.
df['newcolumn'] = pd.to_datetime(df['oldcolumn'],
format='%Y-%m-%d',
errors='coerce').apply(lambda dt: dt.replace(day=1)).dt.date
For example
# We have this dataframe
ID Date
0 111 03/15/2019
1 133 01/01/2019
2 948 Empty
3 452 02/10/2019
# We convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date, format='%m/%d/%Y', errors='coerce')
Output
ID Date
0 111 2019-03-15
1 133 2019-01-01
2 948 NaT
3 452 2019-02-10
I have the following subset with a starting date (DD/MM/YYYY) and Amount
Start Date Amount
1 01/01/2013 20
2 02/05/2007 10
3 01/05/2004 15
4 01/06/2014 20
5 17/08/2008 21
I'd like to create a subset of this dataframe where only where the Start Date Day is 01:
Start Date Amount
1 01/01/2013 20
3 01/05/2004 15
4 01/06/2014 20
I've tried to loop through the table and use the index but couldn't find a suitable way to iterate through a dataframe rows.
Assuming your dates are datetime already then the following should work, if they are strings you can convert them using to_datetime so df['Start Date'] = pd.to_datetime(df['Start Date']), you may also need to pass param dayfirst = True if required. If you imported the data using read_csv you could've done this at the point of import so df = pd.read_csv('data.csv', parse_dates=[n], dayfirst=True) where n is the index (0-based of course) so if it was the first then pass parse_dates=[0].
One method could be to apply a lambda to the column and use the boolean index returned this to index against:
In [19]:
df[df['Start Date'].apply(lambda x: x.day == 1)]
Out[19]:
Start Date Amount
index
1 2013-01-01 20
3 2004-05-01 15
4 2014-06-01 20
Not sure if there is a built in method that doesn't involve setting this to be the index which will convert it into a timeseries index.