Make conditional changes to numerous dates - python

I'm sure this is really easy to answer but I have only just started using Pandas.
I have a column in my excel file called 'Day' and a Date/time column called 'Date'.
I want to update my Day column with the corresponding day of NUMEROUS Dates from the 'Date' column.
So far I use this code shown below to change the date/time to just date
df['Date'] = pd.to_datetime(df.Date).dt.strftime('%d/%m/%Y')
And then use this code to change the 'Day' column to Tuesday
df.loc[df['Date'] == '02/02/2018', 'Day'] = '2'
(2 signifies the 2nd day of the week)
This works great. The problem is, my excel sheet has 500000+ rows of data and lots of dates. Therefore I need this code to work with numerous dates (4 different dates to be exact)
For example; I have tried this code;
df.loc[df['Date'] == '02/02/2018' + '09/02/2018' + '16/02/2018' + '23/02/2018', 'Day'] = '2'
Which does not give me an error, but does not change the date to 2. I know I could just use the same line of code numerous times and change the date each time...but there must be a way to do it the way I explained? Help would be greatly appreciated :)

2/2/2018 is a Friday so I don't know what "2nd day in a week" mean. Does your week starts on Thursday?
Since you have already converted day to Timestamp, use the dt accessor:
df['Day'] = df['Date'].dt.dayofweek()
Monday is 0 and Sunday = 6. Manipulate that as needed.

If got it right, you want to change the Day column for just a few Dates, right? If so, you can just include these dates in a separated list and do
my_dates = ['02/02/2018', '09/02/2018', '16/02/2018', '23/02/2018']
df.loc[df['Date'].isin(my_dates), 'Day'] = '2'

Related

How to find missing dates in an excel file by python

I'm a beginner in python. I have an excel file. This file shows the rainfall amount between 2016-1-1 and 2020-6-30. It has 2 columns. The first column is date, another column is rainfall. Some dates are missed in the file (The rainfall didn't estimate). For example there isn't a row for 2016-05-05 in my file. This a sample of my excel file.
Date rainfall (mm)
1/1/2016 10
1/2/2016 5
.
.
.
12/30/2020 0
I want to find the missing dates but my code doesn't work correctly!
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import dates as mpl_dates
from matplotlib.dates import date2num
df=pd.read_excel ('rainfall.xlsx')
a= pd.date_range(start = '2016-01-01', end = '2020-06-30' ).difference(df.index)
print(a)
Here' a beginner friendly way of doing it.
First you need to make sure, that the Date in your dataframe is really a date and not a string or object.
Type (or print) df.info().
The date column should show up as datetime64[ns]
If not, df['Date'] = pd.to_datetime(df['Date'], dayfirst=False)fixes that. (Use dayfirst to tell if the month is first or the day is first in your date string because Pandas doesn't know. Month first is the default, if you forget, so it would work without...)
For the tasks of finding missing days, there's many ways to solve it. Here's one.
Turn all dates into a series
all_dates = pd.Series(pd.date_range(start = '2016-01-01', end = '2020-06-30' ))
Then print all dates from that series which are not in your dataframe "Date" column. The ~ sign means "not".
print(all_dates[~all_dates.isin(df['Date'])])
Try:
df = pd.read_excel('rainfall.xlsx', usecols=[0])
a = pd.date_range(start = '2016-01-01', end = '2020-06-30').difference([l[0] for l in df.values])
print(a)
And the date in the file must like 2016/1/1
To find the missing dates from a list, you can apply Conditional Formatting function in Excel. 4. Click OK > OK, then the position of the missing dates are highlighted. Note: The last date in the date list will be highlighted.
this TRICK Is not with python,a NORMAL Trick

Iterating Rows while comparing dates gives the same result each time

I am iterating through column showing the due dates of invoices. I have also create a variable storing the date of Sunday of the current week.
I am trying to create a new row to show if the due_date is smaller than this week's sunday, I should pay the invoice.
However, when I run the code, the Status column only shows the value Pay.
My code is as below:
for index, row in df_320.iterrows():
if due_date[index] < sunday:
df_320['Status'] = "Pay"
elif due_date[index] >= sunday:
df_320['Status'] = "Skip"
I have tried the below code to see if all the conditions show True but it also shows False values.
for index, row in df_320.iterrows():
print(due_date[index] < sunday)
I would appreciate if you can point out what Im doing wrong.
DATEFRAME EXAMPLE:
enter image description here
Have you confirmed that due_date and Sunday are datetime / date objects? You need to parse the date strings into objects to perform comparisons between them. Link to datetime docs: https://docs.python.org/3/library/datetime.html
Also you seem to modify the same structure you are iterating (df_320), same column regardless of your position, are you sure this is what you want to do? I'm guessing you want to change 'row' instead?

Subsetting rows by date

I have a dataframe ('df') and the first column is a timestamp. I successfully converted that timestamp from that milliseconds since Unix epoch thing to a date like this "2020-02-18 13:00:00" (which is 1:00 pm on February 18th, 2020) with the following code:
df['Time'] = pd.to_datetime(df['Time'], unit='ms')
I'm trying to subset to just all of the rows from 2020-02-17 but this code:
df_1day = df[(df['Time'] == '2020-02-17')]
only returns the row at midnight (2020-02-17 00:00:00)
I'm sorry if the answer is somewhere else in this site, or the internet in general, but TIA for any help.
Not sure of protocol if I answer my own questions but I'm doing this edit to include lines of code that solved my issue--even though I'm pretty sure there's an easier way of doing this
## Create new column with 'Time' as a string
df['Day'] = df['Time'].astype(str)
## Take only the first 10 characters of the string (which would be date only)
df['Day'] = df['Day'].str[:10]
## Create dataframe subset based on values in the new column
df_1day = df[(df['Day'] == '2020-02-17')]

loc Funktion next value

I have a little problem with the .loc function.
Here is the code:
date = df.loc [df ['date'] == d] .index [0]
d is a specific date (e.g. 21.11.2019)
The problem is that the weekend can take days. In the dataframe in the column date there are no values for weekend days. (contains calendar days for working days only)
Is there any way that if d is on the weekend he'll take the next day?
I would have something like index.get_loc, method = bfill
Does anyone know how to implement that for .loc?
IIUC you want to move dates of format: dd.mm.yyyy to nearest Monday, if they happen to fall during the weekend, or leave them as they are, in case they are workdays. The most efficient approach will be to just modify d before you pass it to pandas.loc[...] instead of looking for the nearest neighbour.
What I mean is:
import datetime
d="22.12.2019"
dt=datetime.datetime.strptime(d, "%d.%m.%Y")
if(dt.weekday() in [5,6]):
dt=dt+datetime.timedelta(days=7-dt.weekday())
d=dt.strftime("%d.%m.%Y")
Output:
23.12.2019
Edit
In order to just take first date, after or on d, which has entry in your dataframe try:
import datetime
df['date']=pd.to_datetime(df['date'], format='%d.%m.%Y')
dt=datetime.datetime.strptime(d, "%d.%m.%Y")
d=df.loc[df ['date'] >= d, 'date'].min()
dr.loc[df['date']==d]...
...

How can a DataFrame change from having two columns (a "from" datetime and a "to" datetime) to having a single column for a date?

I've got a DataFrame that looks like this:
It has two columns, one of them being a "from" datetime and one of them being a "to" datetime. I would like to change this DataFrame such that it has a single column or index for the date (e.g. 2015-07-06 00:00:00 in datetime form) with the variables of the other columns (like deep) split proportionately into each of the days. How might one approach this problem? I've meddled with groupby tricks and I'm not sure how to proceed.
So I don't have time to work through your specific problem at the moment. But the way to approach this is to us pandas.resample(). Here are the steps I would take. 1) Resample your to date column by minute. 2) Populate the other columns out over that resample. 3) Add the date column back in as an index.
If this doesn't work or is being tricky to work with I would create a date range from your earliest date to your latest date (at the smallest interval you want - so maybe hourly?) and then run some conditional statements over your other columns to fill in the data.
Here is somewhat what your code may look like for the resample portion (replace day with hour or whatever):
drange = pd.date_range('01-01-1970', '01-20-2018', freq='D')
data = data.resample('D').fillna(method='ffill')
data.index.name = 'date'
Hope this helps!

Categories