Converting Twitter created_at to datetime not working - python

I'm working on a project to analyze tweets and am first trying to convert the created_at column to datetimes.
format = "%Y-%m-%d %H:%M:%S"
df['created_at_datetime'] = pd.to_datetime(df['created_at'], format = format).dt.tz_localize(None)
I keep on getting the following error
I am in a very introductory and rudimentary class on analyzing Twitter so am not a coding expert at all. I've done homework assignments before where this line of code worked so am unsure as to what the error is now.
I am working in Colab and here is the full thing: https://colab.research.google.com/drive/1XXJsoMQouzH-1t7eWRd1c-fsrI3vYFcf?usp=sharing.
Thank you!

Check that all values in the 'created_at' column are timestamps formatted as you expect.
It seems like some row could have the string "en" instead of a timestamp.

try this :
format_y = "%Y-%m-%d %H:%M:%S"
pd.to_datetime(date, format = format_y).tz_localize(None)

You need to find the culprit value that doesn't fit.
Here's the workflow:
import pandas as pd
raw_dt_series = pd.Series(['2022-05-05', 'foobar','2022-05-02', '202', None])
raw_dt_series_notna = raw_dt_series.dropna()
dt_series = pd.to_datetime(raw_dt_series_notna, errors='coerce')
Output:
0 2022-05-05
1 NaT
2 2022-05-02
3 NaT **< - Treated as np.NaN in pandas**
dtype: datetime64[ns]
You found the rows that raised the Type error.
raw_dt_series_notna.loc[dt_series.isna()]
Time to investigate why the given values don't meet the format.
After you've found out, adjust the format parameter:
pd.to_datetime(raw_dt_series, format='%YOUR%NEW%FORMAT)

Related

How to convert datetime.time into datetime.date

I have a dataframe called pomi that looks like this
date time sub
2019-09-20 00:00:00 25.0 org
I want to convert the values in the column 'date' to datetime.date, so that I'm left with only the dates (ie '2019-09-20').
I have tried:
pomi['date'] = pd.to_datetime(pomi['date'])
pomi['just_date'] = pomi['date'].dt.date
pomi.date = pd.to_datetime(pomi.date,dayfirst=True)
pomi['date'] = pd.to_datetime(pomi["date"].astype(str)).dt.time
pomi['date'] = pd.to_datetime(pomi['date']).dt.date
pomi['date'] = pd.to_datetime(pomi['date']).dt.normalize()
None of them have worked.
Most often I get the error message "TypeError: <class 'datetime.time'> is not convertible to datetime"
All help appreciated. Thanks.
Full disclosure, I am not 100% sure what is the issue, your code was working fine at my end. But there is something you can try as convert to Timestamp & than check. This & your code both works at my end giving required out.
import pandas as pd
df = pd.DataFrame({'date': ['2019-09-20 00:00:00'], 'time':[25], 'sub':['org']})
df['date'] = df['date'].apply(pd.Timestamp)
df['just_date'] = df['date'].dt.date
df

How could I convert a string of the format (YYYYQx) into a datetime object?

I have a table with a Date column and several other country-specific columns (see the picture below). I want to create a heatmap in Seaborn but for that I need the Date column to be a datetime object. How can I change the dates from the current format -i.e. 2021Q3 - to 2021-09-01 (YYYY-MM-DD)?
I have tried the solution below (which works for monthly data - to_date = lambda x: pd.to_datetime(x['Date'], format='%YM%m')), but it does not work for the quarterly data. I get a ValueError: 'q' is a bad directive in format '%YQ%q'... I could not find any solution to the error online...
# loop to transform the Date column's format
to_date = lambda x: pd.to_datetime(x['Date'], format='%YQ%q')
df_eurostat_reg_bank_x = df_eurostat_reg_bank.assign(Date=to_date)
I have also tried this solution, but I get the first month of the quarter in return, whereas I want the last month of the quarter:
df_eurostat_reg_bank['Date'] = df_eurostat_reg_bank['Date'].str.replace(r'(\d+)(Q\d)', r'\1-\2')
df_eurostat_reg_bank['Date'] = pd.PeriodIndex(df_eurostat_reg_bank.Date, freq='Q').to_timestamp()
df_eurostat_reg_bank.Date = df_eurostat_reg_bank.Date.dt.strftime('%m/%d/%Y')
df_eurostat_reg_bank = df_eurostat_reg_bank.set_index('Date')
Thank you in advance!
I assume that your example of 2022Q3 is a string on the grounds that it's not a date format that I recognise.
Thus, simple arithmetic and f-strings will avoid the use of any external modules:
def convdate(d):
return f'{d[:4]}-{int(d[5]) * 3 - 2:02d}-01'
for d in ['2022Q1','2022Q2','2022Q3','2022Q4']:
print(convdate(d))
Output:
2022-01-01
2022-04-01
2022-07-01
2022-10-01
Note:
There is no attempt to ensure that the input string to convdate() is valid

how to read data from csv as date format in python pandas

I have data in following format
Month Country BA Total
11/1/2018 CN 3 10
after reading Month comes as object though I want in date format,
I tried to convert it in date time format using
hs = pd.read_csv('history.csv',parse_dates=['Month']) #this is not solving the issue either
hs['Month'] = pd.to_datetime(hs['Month']) #this throws error
Please suggest me how to read it as date or convert it to date format
try one of this two line, maybe don't get error, your error maybe base on day is first or month is first:
df['sale_date'] = pd.to_datetime(df['sale_date'], format='%m/%d/%y')
# or
df['sale_date'] = pd.to_datetime(df['sale_date'], dayfirst=False)
OR
df['sale_date'] = pd.to_datetime(df['sale_date'], format='%d/%m/%y')
# or
df['sale_date'] = pd.to_datetime(df['sale_date'], dayfirst=True)
Try this
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv('history.csv', parse_dates=['Month'], date_parser=dateparse)
Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
Workaround:
This will force the dates which are outside the bounds to NaT
pd.to_datetime(date_col_to_force, errors = 'coerce')
If the dates in the file are less than pristine, it is often a good idea to load the file without parse_dates, then use pd.to_datetime() which has better control, including format and how to deal with errors ('raise', 'coerce', or 'ignore').
thanks a lot to all of the suggestions, I tried last one and it worked, as I am running short of time, could not try other suggestions, will definitely try once I get time.

reformatting the timestamp in my dataset to have it as datetime

I want to reformat the timestamp in my dataset to have it as a date + time.
here is my dataset
and I tried this
data1 = pd.read_excel(r"C:\Users\user\Desktop\Consumption.xlsx")
data1['Timestamp']= pd.to_datetime(['Timestamp'], unit='s')
and I got this error
ValueError: non convertible value Timestamp with the unit 's'
I also tried not to pass the "unit" in the pd.to_datetime function and it gave an error
The type of time stamp is Object. Please any help.
Format of datetimes is not unix time, so raised error. You can split values by ; and select second lists by str[1] and then convert to datetimes:
data1['Timestamp']= pd.to_datetime(data1['Timestamp'].str.split(';').str[1])
I would suggest you check the documentation of the function here
If you want to add date-time, you can format like this:
format='%d/%m/%Y %H:%M:%S'
Try this:
data1['Date'] = pd.DataFrame(data1['Timestamp'], format ='%d/%m/%Y')

Passing chopped down datetimes

I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.

Categories