Python Pandas: convert DD.MM.YYYY to datetime - python

I have a DataFrame that looks like this:
date | ...
09.01.2000 |
02.03.2001 | ...
The format is DD.MM.YYYY. I want to select only data where it is year 2014 (for example). I convert them into datetime using pd.to_datetime:
date = pd.to_datetime(table["date"], format = "%d.%m.%Y")
Then I want my table to take 3 new columns Day, Month, Year. I do:
table[["day", "month", "year"]] = date.dt.day, date.dt.month, date.dt.year
But it throws error: Must have equal len keys and value when setting with an ndarray
How do I convert them into datetime properly to use df.loc[df["year"] == 2014]?

You need to do manually:
table['day'], table['month'], table['year'] = date.dt.day, date.dt.month, date.dt.year
Explanation: date.dt.day, date.dt.month, date.dt.year is short hand for a tuple with length 3, which is most likely different from len(table).
On the other hand, since you already have date, you can also slice with:
table[date.dt.year==2014]

Related

Create date from one year with string and int error - PYTHON

I have the following problem. I want to create a date from another. To do this, I extract the year from the database date and then create the chosen date (day = 30 and month = 9) being the year extracted from the database.
The code is the following
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
But error message is this
"cannot convert the series to <class 'int'>"
I think dt mean datetime, so the line 'dt.datetime(y,m,d)' create datetime object type.
bbdd20Q3['mydate'] should get int?
If so, try to think of another way to store the date (8 numbers maybe).
hope I helped :)
I assume that you did import datetime as dt then by doing:
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
You are delivering series as first argument to datetime.datetime, when it excepts int or something which can be converted to int. You should create one datetime.datetime for each element of series not single datetime.datetime, consider following example
import datetime
import pandas as pd
df = pd.DataFrame({"year":[2001,2002,2003]})
df["day"] = df["year"].apply(lambda x:datetime.datetime(x,9,30))
print(df)
Output:
year day
0 2001 2001-09-30
1 2002 2002-09-30
2 2003 2003-09-30
Here's a sample code with the required logic -
import pandas as pd
df = pd.DataFrame.from_dict({'date': ['2019-12-14', '2020-12-15']})
print(df.dtypes)
# convert the date in string format to datetime object,
# if the date column(Series) is already a datetime object then this is not required
df['date'] = pd.to_datetime(df['date'])
print(f'after conversion \n {df.dtypes}')
# logic to create a new data column
df['new_date'] = pd.to_datetime({'year':df['date'].dt.year,'month':9,'day':30})
#eollon I see that you are also new to Stack Overflow. It would be better if you can add a simple sample code, which others can tryout independently
(keeping the comment here since I don't have permission to comment :) )

Pandas giving me the wrong max date in a date time column?

I have a dataframe with a date column:
data['Date']
0 1/1/14
1 1/8/14
2 1/15/14
3 1/22/14
4 1/29/14
...
255 11/21/18
256 11/28/18
257 12/5/18
258 12/12/18
259 12/19/18
But, when I try to get the max date out of that column, I get:
test_data.Date.max()
'9/9/15'
Any idea why this would happen?
Clearly the column is of type object. You should try using pd.to_datetime() and then performing the max() aggregator:
data['Date'] = pd.to_datetime(data['Date'],errors='coerce') #You might need to pass format
print(data['Date'].max())
The .max() understands it as a date (like you want), if it is a datetime object. Building upon Seshadri's response, try:
type(data['Date'][1])
If it is a datetime object, this returns this:
pandas._libs.tslibs.timestamps.Timestamp
If not, you can make that column a datatime object like so:
data['Date'] = pd.to_datetime(data['Date'],format='%m/%d/%y')
The format argument makes sure you get the right formatting. See the full list of formatting options here in the python docs.
Your date may be stored as a string. First convert the column from string to datetime. Then, max() should work.
test = pd.DataFrame(['1/1/2010', '2/1/2011', '3/4/2020'], columns=['Dates'])
Dates
0 1/1/2010
1 2/1/2011
2 3/4/2020
pd.to_datetime(test['Dates'], format='%m/%d/%Y').max()
Timestamp('2020-03-04 00:00:00')
That timestamp can be cleaned up using .dt.date:
pd.to_datetime(test['Dates'], format='%m/%d/%Y').dt.date.max()
datetime.date(2020, 3, 4)
to_datetime format argument table python docs
pandas to_datetime pandas docs

Convert year from unicode format to python's datetime format

I have only year parameter as input in the following manner:
2014,2015,2016
I want to convert each element from my list into python's datetime format. Is it possible to do this kind of things if the only given parameter is the year ?
Just set month and day manually to 1
from datetime import date
YearLst = [2014,2015,2016]
map(lambda t: date(t, 1, 1),YearLst)

Sort and Filter data from a Panda Dataframe according to date range

My dataframe has two columns: (i) a date column in a string format and (ii) an int value. I would like to convert the date string into a date object and then filter and sort the data according to a date range. Converting one string to a date worked fine with:
date = dateutil.parser.parse(date_string)
date = ("%02d:%02d:%02d" % (date.hour, date.minute, date.second))
How can I iterate on all the values in the dataframe and apply the parsing so I can then use the panda library on the df to filter and sort the data as follows?
df.sort(['etime'])
df[df['etime'].isin([begin_date, end_date])]
Sample of my dataframe data is below:
etime instantaneous_ops_per_sec
3 2016-06-15T15:30:09Z 26
4 2016-06-15T15:30:14Z 26
5 2016-06-15T15:30:19Z 24
6 2016-06-15T15:30:24Z 27
You want to use pd.to_datetime:
df['etime'] = pd.to_datetime(df['etime'], format="%H:%M:%S")
Try this:
df['etime'] = pd.to_datetime(df['etime'], format="%Y%m%d %H:%M:%S")
df[df['etime'].between([begin_date, end_date])]
Caution: Since your code says date and you use time and then sort on time. The results may not be what you are after. You usually want to filter then sort, But the code in OP does the opposite.

Changing list answers in python

I've been trying to input into a mysql table using python, thing is I'm trying to create a list with all dates from April 2016 to now so I can insert them individually into the sql insert, I searched but I didn't find how can I change value per list result (if it's 1 digit or 2 digits):
dates = ['2016-04-'+str(i+1) for i in range(9,30)]
I would like i to add a 0 every time i is a single digit (i.e 1,2,3 etc.)
and when its double digit for it to stay that way (i.e 10, 11, 12 etc.)
dates = ['2016-04-'+ '{:02d}'.format(i) for i in range(9,30)]
>>> print dates
['2016-04-09', '2016-04-10', '2016-04-11', '2016-04-12', '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-16', '2016-0
4-17', '2016-04-18', '2016-04-19', '2016-04-20', '2016-04-21', '2016-04-22', '2016-04-23', '2016-04-24', '2016-04-25', '
2016-04-26', '2016-04-27', '2016-04-28', '2016-04-29']
>>>
Using C style formatting, all the dates in April:
dates = ['2016-04-%02d'%i for i in range(1,31)]
Need range(1,31) since the last value in the range is not used, or use range(30) and add 1 to i.
The same using .format():
dates = ['2016-04-{:02}'.format(i) for i in range(1,31)]
You can use dateutil module
from datetime import datetime
from dateutil.rrule import rrule, DAILY
start_date = datetime(2016,04,01)
w=[each.strftime('%Y-%m-%d') for each in list(rrule(freq=DAILY, dtstart=start_date, until=datetime(2016,05,9)))]

Categories