I have a little problem with the .loc function.
Here is the code:
date = df.loc [df ['date'] == d] .index [0]
d is a specific date (e.g. 21.11.2019)
The problem is that the weekend can take days. In the dataframe in the column date there are no values for weekend days. (contains calendar days for working days only)
Is there any way that if d is on the weekend he'll take the next day?
I would have something like index.get_loc, method = bfill
Does anyone know how to implement that for .loc?
IIUC you want to move dates of format: dd.mm.yyyy to nearest Monday, if they happen to fall during the weekend, or leave them as they are, in case they are workdays. The most efficient approach will be to just modify d before you pass it to pandas.loc[...] instead of looking for the nearest neighbour.
What I mean is:
import datetime
d="22.12.2019"
dt=datetime.datetime.strptime(d, "%d.%m.%Y")
if(dt.weekday() in [5,6]):
dt=dt+datetime.timedelta(days=7-dt.weekday())
d=dt.strftime("%d.%m.%Y")
Output:
23.12.2019
Edit
In order to just take first date, after or on d, which has entry in your dataframe try:
import datetime
df['date']=pd.to_datetime(df['date'], format='%d.%m.%Y')
dt=datetime.datetime.strptime(d, "%d.%m.%Y")
d=df.loc[df ['date'] >= d, 'date'].min()
dr.loc[df['date']==d]...
...
Related
Assuming that I have a series made of daily values:
dates = pd.date_range('1/1/2004', periods=365, freq="D")
ts = pd.Series(np.random.randint(0,101, 365), index=dates)
I need to use .groupby or .reduce with a fixed schema of dates.
Use of the ts.resample('8d') isn't an option as dates need to not fluctuate within the month and the last chunk of the month needs to be flexible to address the different lengths of the months and moreover in case of a leap year.
A list of dates can be obtained through:
g = dates[dates.day.isin([1,8,16,24])]
How I can group or reduce my data to the specific schema so I can compute the sum, max, min in a more elegant and efficient way than:
for i in range(0,len(g)-1):
ts.loc[(dec[i] < ts.index) & (ts.index < dec[i+1])]
Well from calendar point of view, you can group them to calendar weeks, day of week, months and so on.
If that is something that you would be intrested in, you could do that easily with datetime and pandas for example:
import datetime
df['week'] = df['date'].dt.week #create week column
df.groupby(['week'])['values'].sum() #sum values by weeks
I'm sure this is really easy to answer but I have only just started using Pandas.
I have a column in my excel file called 'Day' and a Date/time column called 'Date'.
I want to update my Day column with the corresponding day of NUMEROUS Dates from the 'Date' column.
So far I use this code shown below to change the date/time to just date
df['Date'] = pd.to_datetime(df.Date).dt.strftime('%d/%m/%Y')
And then use this code to change the 'Day' column to Tuesday
df.loc[df['Date'] == '02/02/2018', 'Day'] = '2'
(2 signifies the 2nd day of the week)
This works great. The problem is, my excel sheet has 500000+ rows of data and lots of dates. Therefore I need this code to work with numerous dates (4 different dates to be exact)
For example; I have tried this code;
df.loc[df['Date'] == '02/02/2018' + '09/02/2018' + '16/02/2018' + '23/02/2018', 'Day'] = '2'
Which does not give me an error, but does not change the date to 2. I know I could just use the same line of code numerous times and change the date each time...but there must be a way to do it the way I explained? Help would be greatly appreciated :)
2/2/2018 is a Friday so I don't know what "2nd day in a week" mean. Does your week starts on Thursday?
Since you have already converted day to Timestamp, use the dt accessor:
df['Day'] = df['Date'].dt.dayofweek()
Monday is 0 and Sunday = 6. Manipulate that as needed.
If got it right, you want to change the Day column for just a few Dates, right? If so, you can just include these dates in a separated list and do
my_dates = ['02/02/2018', '09/02/2018', '16/02/2018', '23/02/2018']
df.loc[df['Date'].isin(my_dates), 'Day'] = '2'
Here's a quick problem that I, at first, dismissed as easy. An hour in, and I'm not so sure!
So, I have a list of Python datetime objects, and I want to graph them. The x-values are the year and month, and the y-values would be the amount of date objects in this list that happened in this month.
Perhaps an example will demonstrate this better (dd/mm/yyyy):
[28/02/2018, 01/03/2018, 16/03/2018, 17/05/2018]
-> ([02/2018, 03/2018, 04/2018, 05/2018], [1, 2, 0, 1])
My first attempt tried to simply group by date and year, along the lines of:
import itertools
group = itertools.groupby(dates, lambda date: date.strftime("%b/%Y"))
graph = zip(*[(k, len(list(v)) for k, v in group]) # format the data for graphing
As you've probably noticed though, this will group only by dates that are already present in the list. In my example above, the fact that none of the dates occurred in April would have been overlooked.
Next, I tried finding the starting and ending dates, and looping over the months between them:
import datetime
data = [[], [],]
for year in range(min_date.year, max_date.year):
for month in range(min_date.month, max_date.month):
k = datetime.datetime(year=year, month=month, day=1).strftime("%b/%Y")
v = sum([1 for date in dates if date.strftime("%b/%Y") == k])
data[0].append(k)
data[1].append(v)
Of course, this only works if min_date.month is smaller than max_date.month which is not necessarily the case if they span multiple years. Also, its pretty ugly.
Is there an elegant way of doing this?
Thanks in advance
EDIT: To be clear, the dates are datetime objects, not strings. They look like strings here for the sake of being readable.
I suggest use pandas:
import pandas as pd
dates = ['28/02/2018', '01/03/2018', '16/03/2018', '17/05/2018']
s = pd.to_datetime(pd.Series(dates), format='%d/%m/%Y')
s.index = s.dt.to_period('m')
s = s.groupby(level=0).size()
s = s.reindex(pd.period_range(s.index.min(), s.index.max(), freq='m'), fill_value=0)
print (s)
2018-02 1
2018-03 2
2018-04 0
2018-05 1
Freq: M, dtype: int64
s.plot.bar()
Explanation:
First create Series from list of dates and convert to_datetimes.
Create PeriodIndex by Series.dt.to_period
groupby by index (level=0) and get counts by GroupBy.size
Add missing periods by Series.reindex by PeriodIndex created by max and min values of index
Last plot, e.g. for bars - Series.plot.bar
using Counter
dates = list()
import random
import collections
for y in range(2015,2019):
for m in range(1,13):
for i in range(random.randint(1,4)):
dates.append("{}/{}".format(m,y))
print(dates)
counter = collections.Counter(dates)
print(counter)
for your problem with dates with no occurrences you can use the subtract method of Counter
generate a list with all range of dates, each date will appear on the list only once, and then you can use subtract
like so
tmp_date_list = ["{}/{}".format(m,y) for y in range(2015,2019) for m in range(1,13)]
counter.subtract(tmp_date_list)
I'm new to panda. I have a dataframe of TTI which is sorted by hour of a day for many years. I want to add a new column showing last year's tti value for each value. I wrote this code:
import pandas as pd
tti = pd.read_csv("c:\\users\\Mehrdad\\desktop\\Hourly_TTI.csv")
tti['new_date'] = pd.to_datetime(tti['Date'])
tti['last_year'] = tti['TTI'].shift(1,freq='1-Jan-2009')
print tti.head(10)
but I don't know how to define frequency value for shift! So that it would shift my data for one year behind my first date which is 01-01-2010.!?
df['last_year'] = df['date'].apply(lambda x: x - pd.DateOffset(years=1))
df['new_value'] = df.loc[df['last_year'],:]
df.shift can only move by a fixed distance.
Use offset to create a new datetime index and retrieve the value using the new index. Be aware to truncate the date of the first year.
I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1