I am working on Pyspark.
I have a column end_date, I want to make some work on it. But I can't due to his type: OffsetDateTimeUdt.
It's possible to convert this type in date or string ?
Exemple of value:2021-08-15T00:00:00.000Z
If you have any idea please let me know :)
Thanks in advance
Convert into timestamp by ignoring the milliseconds like below
x = spark.createDataFrame([("2021-08-15T00:00:00.000Z",)], ['date_str'])
x.select(x.date_str, from_unixtime(unix_timestamp(x.date_str,"yyyy-MM-dd'T'HH:mm:ss'.000Z'"))).show(truncate=False)
output:
+------------------------+------------------------------------------------------------------------------------------+
|date_str |from_unixtime(unix_timestamp(date_str, yyyy-MM-dd'T'HH:mm:ss'.000Z'), yyyy-MM-dd HH:mm:ss)|
+------------------------+------------------------------------------------------------------------------------------+
|2021-08-15T00:00:00.000Z|2021-08-15 00:00:00 |
+------------------------+------------------------------------------------------------------------------------------+
Related
I'm working on a project to analyze tweets and am first trying to convert the created_at column to datetimes.
format = "%Y-%m-%d %H:%M:%S"
df['created_at_datetime'] = pd.to_datetime(df['created_at'], format = format).dt.tz_localize(None)
I keep on getting the following error
I am in a very introductory and rudimentary class on analyzing Twitter so am not a coding expert at all. I've done homework assignments before where this line of code worked so am unsure as to what the error is now.
I am working in Colab and here is the full thing: https://colab.research.google.com/drive/1XXJsoMQouzH-1t7eWRd1c-fsrI3vYFcf?usp=sharing.
Thank you!
Check that all values in the 'created_at' column are timestamps formatted as you expect.
It seems like some row could have the string "en" instead of a timestamp.
try this :
format_y = "%Y-%m-%d %H:%M:%S"
pd.to_datetime(date, format = format_y).tz_localize(None)
You need to find the culprit value that doesn't fit.
Here's the workflow:
import pandas as pd
raw_dt_series = pd.Series(['2022-05-05', 'foobar','2022-05-02', '202', None])
raw_dt_series_notna = raw_dt_series.dropna()
dt_series = pd.to_datetime(raw_dt_series_notna, errors='coerce')
Output:
0 2022-05-05
1 NaT
2 2022-05-02
3 NaT **< - Treated as np.NaN in pandas**
dtype: datetime64[ns]
You found the rows that raised the Type error.
raw_dt_series_notna.loc[dt_series.isna()]
Time to investigate why the given values don't meet the format.
After you've found out, adjust the format parameter:
pd.to_datetime(raw_dt_series, format='%YOUR%NEW%FORMAT)
I am trying to convert my date column called df['CO date'] which shows in this format 3/02/21 meaning date/month/year, the problem arises when I parse it and then pass it to string, like this.
df['CO date'] = pd.to_datetime(df['CO date']).dt.strftime("%d/%m/%y")
for some reason after I converted from datetime to string with the shown format it returns my date in an american format like 02/03/21 , I don't understand why this happens, the only thing I can think of is that Python only has the string format %d which shows the days as 01,02,03,04,etc where as the date on my df originally is day "3" (non-padding zero).
Does anybody know how can I solve this problem?.
Many thanks in advance
Your formatting looks right. The only way you get that result, is your data frame contains wrong or corrupted data. You can make a sanity check by:
pd.to_datetime("2021-03-02").strftime("%d/%m/%y")
>>>
'02/03/21'
I think you are converting with wrong format in the beginning at pd.to_datetime(df['CO date']) part. If you know exact format you should use format in pd.to_datetime like:
pd.to_datetime("2021-02-03", format="%Y-%d-%m").strftime("%d/%m/%y")
>>>
'02/03/21'
output date in a try and catch block and see if you can get the dataframe column with the invalid date to try an error. Check for ranges for day and month and year and custom throw and error if exceeded.
print(date.day)
print(date.month)
print(date.year)
def date_check(date):
try:
datetime.strptime(date, '%d/%m/%Y')
return True
except ValueError:
return False
or
if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():
I want to reformat the timestamp in my dataset to have it as a date + time.
here is my dataset
and I tried this
data1 = pd.read_excel(r"C:\Users\user\Desktop\Consumption.xlsx")
data1['Timestamp']= pd.to_datetime(['Timestamp'], unit='s')
and I got this error
ValueError: non convertible value Timestamp with the unit 's'
I also tried not to pass the "unit" in the pd.to_datetime function and it gave an error
The type of time stamp is Object. Please any help.
Format of datetimes is not unix time, so raised error. You can split values by ; and select second lists by str[1] and then convert to datetimes:
data1['Timestamp']= pd.to_datetime(data1['Timestamp'].str.split(';').str[1])
I would suggest you check the documentation of the function here
If you want to add date-time, you can format like this:
format='%d/%m/%Y %H:%M:%S'
Try this:
data1['Date'] = pd.DataFrame(data1['Timestamp'], format ='%d/%m/%Y')
For a date column which is like 20190101, 20190102, how could I change it to 2019/01/01, 2019/01/02 or 2019-01-01, 2019-01-02? Thanks for your help.
I have tried with df['date'] = df['date'].dt.strftime('%Y%m%d'), but it doesn't work.
Using
df['date']=pd.to_datetime(df['date'],format='%Y%m%d')
I need to get today's data in pandas with the following int format:
2015,7,27
I am using it to get some data from a certain time frame:
sdate = date(2015,7,27)
edate = date(2015,7,31) #I would like not to hardcode it.
I tried:
today = datetime.today().strftime("%Y/%m/%d")
It outputs a string that I have to convert.
<type 'str'>
If I use the str it gives:
TypeError: an integer is required
Is there a pythonic way to solve this?
Thanks in advance for suggestions.
What you're looking for is not an int tuple, but you're calling a function date which requires 3 parameters of type int, the year, month and day. So to set the right end date you should call the date function and pass it the current date using:
edate = date(datetime.today().year, datetime.today().month, datetime.today().day)