extract data in specific interval - python

I have one table with two columns: date (01-01-2010 to 31-08-2021), value (mm)
I would like to get only data during 2020. There is a function or similar to get some only data in specific period?
For example to create one pivot.

try this:
df = pd.DataFrame(
{'date':['27-02-2010','31-1-2020','31-1-2021','02-1-2020','13-2-2020',
'07-2-2019','30-4-2018','04-8-2020','06-4-2013','21-6-2020'],
'value':['foo','bar','lorem','ipsum','alpha','omega','big','small','salt','pepper']})
df[pd.to_datetime(df['date']).dt.year == 2020]
Output:
date value
1 31-1-2020 bar
3 02-1-2020 ipsum
4 13-2-2020 alpha
7 04-8-2020 small
9 21-6-2020 pepper
Or for serching with any range you can use this:
df['date'] = pd.to_datetime(df['date'])
df[(df['date']>pd.Timestamp(2020,1,1)) & (df['date']<pd.Timestamp(2020,12,31))]

Here is an example of a idea on how you can return the values from a dataset based on the year using string slicing! If this doesn't pertain to your situation I would need you to edit your post with a specific example of code!
import pandas as pd
df = pd.DataFrame(
{'date':['27-02-2010','31-1-2020','31-1-2021','02-1-2020','13-2-2020','07-2-2019','30-4-2018','04-8-2020','06-4-2013','21-6-2020'],'value':['foo','bar','lorem','ipsum','alpha','omega','big','small','salt','pepper']})
for row in df.iterrows():
if row[1]['date'][-4::1] == '2020':
print (row[1]['value'])
this will only return the values from the dataframe that come from dates with a year of 2020

Pandas has extensive time series features that you may want to use, but for a simpler approach, you could define the date column as the index and then slice the data (assuming the table is already sorted by date):
import pandas as pd
df = pd.DataFrame({'date': ['31-12-2019', '01-01-2020', '01-07-2020',
'31-12-2020', '01-01-2021'],
'value': [1, 2, 3, 4, 5]})
df.index = df.date
df.loc['01-01-2020':'31-12-2020']
date value
date
01-01-2020 01-01-2020 2
01-07-2020 01-07-2020 3
31-12-2020 31-12-2020 4

Related

In pandas dataframes, how would you convert all index labels as type DatetimeIndex to datetime.datetime?

Just as the title says, I am trying to convert my DataFrame lables to type datetime. In the following attempted solution I pulled the labels from the DataFrame to dates_index and tried converting them to datetime by using the function DatetimeIndex.to_datetime, however, my compiler says that DatetimeIndex has no attribute to_datetime.
dates_index = df.index[0::]
dates = DatetimeIndex.to_datetime(dates_index)
I've also tried using the pandas.to_datetime function.
dates = pandas.to_datetime(dates_index, errors='coerce')
This returns the datetime wrapped in DatetimeIndex instead of just datetimes.
My DatetimeIndex labels contain data for date and time and my goal is to push that data into two seperate columns of the DataFrame.
if your DateTimeIndex is myindex, then
df.reset_index() will create a myindex column, which you can do what you want with, and if you want to make it an index again later, you can revert by `df.set_index('myindex')
You can set the index after converting the datatype of the column.
To convert datatype to datetime, use: to_datetime
And, to set the column as index use: set_index
Hope this helps!
import pandas as pd
df = pd.DataFrame({
'mydatecol': ['06/11/2020', '06/12/2020', '06/13/2020', '06/14/2020'],
'othcol1': [10, 20, 30, 40],
'othcol2': [1, 2, 3, 4]
})
print(df)
print(f'Index type is now {df.index.dtype}')
df['mydatecol'] = pd.to_datetime(df['mydatecol'])
df.set_index('mydatecol', inplace=True)
print(df)
print(f'Index type is now {df.index.dtype}')
Output is
mydatecol othcol1 othcol2
0 06/11/2020 10 1
1 06/12/2020 20 2
2 06/13/2020 30 3
3 06/14/2020 40 4
Index type is now int64
othcol1 othcol2
mydatecol
2020-06-11 10 1
2020-06-12 20 2
2020-06-13 30 3
2020-06-14 40 4
Index type is now datetime64[ns]
I found a quick solution to my problem. You can create a new pandas column based on the index and then use datetime to reformat the date.
df['date'] = df.index # Creates new column called 'date' of type Timestamp
df['date'] = df['date'].dt.strftime('%m/%d/%Y %I:%M%p') # Date formatting

Create a new column in a dataframe that shows Day of the Week from an already existing dd/mm/yy column? Python

I have a dataframe that contains a column with dates e.g. 24/07/15 etc
Is there a way to create a new column into the dataframe that displays all the days of the week corresponding to the already existing 'Date' column?
I want the output to appear as:
[Date][DayOfTheWeek]
This might work:
If you want day name:
In [1405]: df
Out[1405]:
dates
0 24/07/15
1 25/07/15
2 26/07/15
In [1406]: df['dates'] = pd.to_datetime(df['dates']) # You don't need to specify the format also.
In [1408]: df['dow'] = df['dates'].dt.day_name()
In [1409]: df
Out[1409]:
dates dow
0 2015-07-24 Friday
1 2015-07-25 Saturday
2 2015-07-26 Sunday
If you want day number:
In [1410]: df['dow'] = df['dates'].dt.day
In [1411]: df
Out[1411]:
dates dow
0 2015-07-24 24
1 2015-07-25 25
2 2015-07-26 26
I would try the apply function, so something like this:
def extractDayOfWeek(dateString):
...
df['DayOfWeek'] = df.apply(lambda x: extractDayOfWeek(x['Date'], axis=1)
The idea is that, you map over every row, extract the 'date' column, and then apply your own function to create a new row entry named 'Day'
Depending of the type of you column Date.
df['Date']=pd.to_datetime(df['Date'], format="d/%m/%y")
df['weekday'] = df['Date'].dt.dayofweek

How to create a pandas DataFrame column based on two other columns that holds dates?

I have a pandas Dataframe with two date columns (A and B) and I would like to create a 3rd column (C) that holds dates created using month and year from column A and the day of column B. Obviously I would need to change the day for the months that day doesn't exist like we try to create 31st Feb 2020, it would need to change it to 29th Feb 2020.
For example
import pandas as pd
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'],
'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
dfx[c] = pd.to_datetime(dfx[c])
Then I want to create a new column C that is a new datetime that is:
year = df.A.dt.year
month = df.A.dt.month
day = df.B.dt.day
I don't know how to create this column. Can you please help?
Here is one way to do it, using pandas' time series functionality:
import pandas as pd
# your example data
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'],
'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
# keep using the same dataframe here
df[c] = pd.to_datetime(df[c])
# set back every date from A to the end of the previous month,
# then add the number of days from the date in B
df['C'] = df.A - pd.offsets.MonthEnd() + pd.TimedeltaIndex(df.B.dt.day, unit='D')
display(df)
Result:
A B C
0 2020-02-21 2020-01-31 2020-03-02
1 2020-03-21 2020-02-11 2020-03-11
2 2020-03-21 2020-02-01 2020-03-01
As you can see in row 0, this handles the case of "February 31st" not quite as you suggested, but still in a logical way.

How to fill in missing dates and values in a Pandas DataFrame?

so the data set I am using is only business days but I want to change the date index such that it reflects every calendar day. When I use reindex and have to use reindex(), I am unsure how to use 'fill value' field of reindex to inherit the value above.
import pandas as pd
idx = pd.date_range("12/18/2019","12/24/2019")
df = pd.Series({'12/18/2019':22.63,
'12/19/2019':22.2,
'12/20/2019':21.03,
'12/23/2019':17,
'12/24/2019':19.65})
df.index = pd.DatetimeIndex(df.index)
df = df.reindex()
Currently, my data set looks like this.
However, when I use reindex I get the below result
In reality I want it to inherit the values directly above if it is a NaN result so the data set becomes the following
Thank you guys for your help!
You were close! You just need to pass the index you want to reindex on (idx in this case) as a parameter to the reindex method, and then you can set the method parameter to 'ffill' to propagate the last valid value forward.
idx = pd.date_range("12/18/2019","12/24/2019")
df = pd.Series({'12/18/2019':22.63,
'12/19/2019':22.2,
'12/20/2019':21.03,
'12/23/2019':17,
'12/24/2019':19.65})
df.index = pd.DatetimeIndex(df.index)
df = df.reindex(idx, method='ffill')
It seems that you have created a 'Series', not a dataframe. See if the code below helps you.
df = df.to_frame().reset_index() #to convert series to dataframe
df = df.fillna(method='ffill')
print(df)
Output You will have to rename columns
index 0
0 2019-12-18 22.63
1 2019-12-19 22.20
2 2019-12-20 21.03
3 2019-12-21 21.03
4 2019-12-22 21.03
5 2019-12-23 17.00
6 2019-12-24 19.65

Python data-frame using pandas

I have a dataset which looks like below
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:16 000]
[25/May/2015:23:11:16 000]
Now i have made this into a DF and df[0] has [25/May/2015:23:11:15 and df[1] has 000]. I want to send all the data which ends with same seconds to a file. in the above example they end with 15 and 16 as seconds. So all ending with 15 seconds into one and the other into a different one and many more
I have tried the below code
import pandas as pd
data = pd.read_csv('apache-access-log.txt', sep=" ", header=None)
df = pd.DataFrame(data)
print(df[0],df[1].str[-2:])
Converting that column to a datetime would make it easier to work on, e.g.:
df['date'] = pd.to_datetime(df['date'], format='%d/%B/%Y:%H:%m:%S')
The you can simply iterate over a groupby(), e.g.:
In []:
for k, frame in df.groupby(df['date'].dt.second):
#frame.to_csv('file{}.csv'.format(k))
print('{}\n{}\n'.format(k, frame))
Out[]:
15
date value
0 2015-11-25 23:00:15 0
1 2015-11-25 23:00:15 0
16
date value
2 2015-11-25 23:00:16 0
3 2015-11-25 23:00:16 0
You can set your datetime as the index for the dataframe, and then use loc and to_csv Pandas' functions. Obviously, as other answers points out, you should convert your date to datetime while reading your dataframe.
Example:
df = df.set_index(['date'])
df.loc['25/05/2018 23:11:15':'25/05/2018 23:11:15'].to_csv('df_data.csv')
Try out this,
## Convert a new column with seconds value
df['seconds'] = df.apply(lambda row: row[0].split(":")[3].split(" ")[0], axis=1)
for sec in df['seconds'].unique():
## filter by seconds
print("Resutl ",df[df['seconds'] == sec])

Categories