Manipulating Series in Dataframe using Pandas [duplicate] - python

This question already has answers here:
Pandas filter dataframe rows with a specific year
(2 answers)
Closed 3 years ago.
I have a date column in a data frame that looks like this:
(Year-Month-Day)
2017-09-21
2018-11-25
I am trying to create a function that considers only the year, I have been trying the following.
df[df['DateColumn'].str[:3]=='2017']
But I am receiving this error:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How can I only consider the first four characters of the date in a function? Thanks.

I think you are looking for:
df['year'] = [d.year for d in df['DateColumn']]
This works only if the elements of the column are pandas.tslib.Timestamp. If not then :
df['DateColumn'] = pd.to_datetime(df['DateColumn'])
df['year'] = [d.year for d in df['DateColumn']]
UPDATE: Use this instead:
df.loc[pd.to_datetime(df['DateColumn']).dt.year == 2017]

According to this:
https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dt-accessor
If you have a Series in a DateTime format, you should be able to use the dt accessor.
So you might be able to do something like this:
df[df.dt.year == 2017]

Try:
df = pd.to_datetime(df.col).apply(lambda x: x.year)
This converts col into datetime format, then extracts year from it to make it a series.

Related

Read excel dates and convert them to individual strings [duplicate]

This question already has answers here:
Extract day and month from a datetime object
(4 answers)
Closed 12 months ago.
I recently started using python.
I have a series of dates in excel
01-05-2021
02-05-2021
.
.
29-05-2021
Now, I want to load this column and convert it into individual strings based on rows. So i can extract the day, month and year separately for each dates
Can someone help me how to do that??
you can do:
df = pd.read_excel("filename.xlsx")
# let's imagine your date column name is "date"
df["day"] = df["date"].apply(lambda elem:elem.split("-")[0])
df["month"] = df["date"].apply(lambda elem:elem.split("-")[1])
df["year"] = df["date"].apply(lambda elem:elem.split("-")[2])
from datetime import datetime
str_time = 01-05-2021
time_convert = datetime.strptime(str_time, '%d-%m-%Y')
print (time_convert, time_convert.day, time_convert.month, time_convert.year)
in your case, make the convert in looping for each data you got from the excel file

Create date from one year with string and int error - PYTHON

I have the following problem. I want to create a date from another. To do this, I extract the year from the database date and then create the chosen date (day = 30 and month = 9) being the year extracted from the database.
The code is the following
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
But error message is this
"cannot convert the series to <class 'int'>"
I think dt mean datetime, so the line 'dt.datetime(y,m,d)' create datetime object type.
bbdd20Q3['mydate'] should get int?
If so, try to think of another way to store the date (8 numbers maybe).
hope I helped :)
I assume that you did import datetime as dt then by doing:
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
You are delivering series as first argument to datetime.datetime, when it excepts int or something which can be converted to int. You should create one datetime.datetime for each element of series not single datetime.datetime, consider following example
import datetime
import pandas as pd
df = pd.DataFrame({"year":[2001,2002,2003]})
df["day"] = df["year"].apply(lambda x:datetime.datetime(x,9,30))
print(df)
Output:
year day
0 2001 2001-09-30
1 2002 2002-09-30
2 2003 2003-09-30
Here's a sample code with the required logic -
import pandas as pd
df = pd.DataFrame.from_dict({'date': ['2019-12-14', '2020-12-15']})
print(df.dtypes)
# convert the date in string format to datetime object,
# if the date column(Series) is already a datetime object then this is not required
df['date'] = pd.to_datetime(df['date'])
print(f'after conversion \n {df.dtypes}')
# logic to create a new data column
df['new_date'] = pd.to_datetime({'year':df['date'].dt.year,'month':9,'day':30})
#eollon I see that you are also new to Stack Overflow. It would be better if you can add a simple sample code, which others can tryout independently
(keeping the comment here since I don't have permission to comment :) )

Efficient way to convert datetime object to string in Python [duplicate]

This question already has answers here:
datetime to string with series in pandas
(3 answers)
Closed 2 years ago.
I'm converting a datetime column (referred to as DATE) in my Pandas dataframe df to a string of the form 'Ymd' (e.g. '20191201' for December 1st 2019). My current way of doing that is:
import datetime as dt
df['DATE'] = df['DATE'].apply(lambda x: dt.datetime.strftime(x, '%Y%m%d'))
But this is surprisingly inefficient and slow when run on large dataframes with millions of rows. Is there a more efficient alternative I am not seeing? That would be extremely helpful. Thanks.
In pandas you do not need apply
df['Date']=df['DATE'].dt.strftime('%Y%m%d')

Why can't I make a column with extracted months from the 'dates' column in my DataFrame? [duplicate]

This question already has answers here:
Extracting just Month and Year separately from Pandas Datetime column
(13 answers)
Closed 3 years ago.
I have a dataframe with dates, and I want to make a column with only the month of the corresponding date in each row. First, I converted my dates to ts objects like this:
df['Date'] = pd.to_datetime(df['Date'])
After that, I tried to make my new column for the month like this:
df['Month'] = df['Date'].month
However, it gives me an error:
AttributeError: 'Series' object has no attribute 'month'
I do not understand why I can't do it like this. I double checked whether the conversion to ts objects actually works, and that does work. Also, if I extract 1 date using slicing, I can append .month to get the month. I technically could solve the problem by looping over all indices and then slicing for each index, but my dataframe contains 166000+ rows so that is not an option.
You have to use property (or accessor object) dt
df["month"] = df.date.dt.month

Why my code didn't select data from Pandas dataframe? [duplicate]

This question already has answers here:
How to filter by month, day, year with Pandas
(1 answer)
Keep only date part when using pandas.to_datetime
(13 answers)
Closed 4 years ago.
Why didn't my date filter work? All others filters work fine.
import pandas as pd
import datetime
data =pd.DataFrame({
'country': ['USA', 'USA', 'Belarus','Brazil'],
'time': ['2018-01-15 16:11:45.923570+00:00', '2018-01-15 16:19:45.923570+00:00', '2018-01-16 16:12:45.923570+00:00', '2018-01-17 16:14:45.923570+00:00']})
# Конвертируем в datetime
data['time'] = pd.to_datetime(data['time'])
# Конвертируем в date
data['time'] = data['time'].dt.date
print(data)
# Ищем дату '2018-12-12'
select_date = data.loc[data['time'] == '2018-01-17']
print(select_date)
How can I filter exact data from dataframe?
How can I iterate dataframe by date daily?
for i in data:
All rows in a specific day
I wish you all good luck and prosperity!
datetime.date objects are not vectorised with Pandas. The docs indicate this:
Returns numpy array of python datetime.date objects
Regular Python objects are stored in object dtype series which do not support fancy date indexing. Instead, you can normalize:
data['time'] = pd.to_datetime(data['time'])
select_date = data.loc[data['time'].dt.normalize() == '2018-01-17']
You can use the same idea to iterate your dataframe by day:
for day, day_df in data.groupby(data['time'].dt.normalize()):
# do something with day_df

Categories