convert month-day string to datetime - python

I have a dataframe with a column for month-day in the format of '1-29' ( no data for year). I want to convert it from a string to datetime.
I have produced a sample dataframe as follows;
import pandas as pd
df = pd.DataFrame({'id':[78,18,94,55,68,57,78,8],
'monthday':['1-29','1-28','1-27','1-19','1-28','1-19','1-29','1-28']})
I have tried
df['month_day']= [datetime.strptime(x, '%m-%d') for x in df.monthday]
but the output inserts a year and I end up with 1900-01-29.
Also, with my full dataframe I end up with the error 'ValueError: day is out of range for month'.

Related

Remove time from pandas dataframe datetime64[ns] index

I am trying to merge two pandas dataframes, and to do this I want to make it so that they both have the same index. The problem is, one df has an index of datatype object which just includes the date while the other df has an index of datatype datetime64[ns] which includes the date and time. Is there a way to make these both the same data type so that I can merge the two dataframes?
Convert both date types into a pandas datetime format and convert them with having just dates.
df['date_only'] = df['dates'].dt.date
You could convert a date and time format to just date as below
import pandas as pd
date_n_time='2015-01-08 22:44:09'
date=pd.to_datetime(date_n_time).date()
make your index as a column using
df.reset_index()
set it back to index using
df.set_index()

How to correctly convert the column in csv that contains the dates into JSON

In my csv file, the "ESTABLİSHMENT DATE" column is delimited by the slashes like this: 01/22/2012.
I am converting the csv format into the JSON format, which needs to be done with pandas, but the "ESTABLİSHMENT DATE" column isn't correctly translated to JSON.
df = pd.read_csv(my_csv)
df.to_json("some_path", orient="records")
I don't understand why it awkwardly adds the backward slashes.
"ESTABLİSHMENT DATE":"01\/22\/2012",
However, I need to write the result to a file as the following:
"ESTABLİSHMENT DATE":"01/22/2012",
Forward slash in json file from pandas dataframe answers why it awkwardly adds the backward slashes, and this answer shows how to use the json library to solve the issue.
As long as the date format is 01/22/2012, the / will be escaped with \.
To correctly convert the column in a csv that contains the dates into JSON, using pandas, can be done by converting the 'date' column to a correct datetime dtype, and then using .to_json.
2012-01-22 is the correct datetime format, but .to_json will convert that to 1327190400000. After using pd.to_datetime to set the correct format as %Y-%m-%d, the type must be set to a string.
import pandas as pd
# test dataframe
df = pd.DataFrame({'date': ['01/22/2012']})
# display(df)
date
0 01/22/2012
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":"01\/22\/2012"}]
# set the date column to a proper datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y')
# display(df)
date
0 2012-01-22
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":1327190400000}]
# set the date column type to string
df.date = df.date.astype(str)
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":"2012-01-22"}]
# as a single line of code
df.date = pd.to_datetime(df.date, format='%m/%d/%Y').astype(str)

Python: Reading excel file but Index should be DateTime not Sequential Numbers

Hey I am loading in data from an excel sheet. The excel sheet has 5 columns. The first colume is a DateTime, and the next 4 are datasets corresponding to that time. Here is the code:
import os
import numpy as np
import pandas as pd
df = pd.read_excel (r'path\test.xlsx', sheet_name='2018')
I thought it would load it in such that the DateTime is the index, but instead it has another column called Index which is just a set of numbers going from 0 up to the end of the array. How do I have the DateTime column be the index and remove the other column?
Try this after you read the excel, it is two extra lines
df['Datetime'] = pd.to_datetime(df['Datetime'], format="%m/%d/%Y, %H:%M:%S")
"""
Assuming that Datetime is the name of the Datetime column and the format of the column is 07/15/2020 12:24:45 -"%m/%d/%Y, %H:%M:%S"
if the format of the date time string is different change the format mentioned
"""
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
"""
This will set the index as datetime index
"""
There is a solution for this problem:
import pandas as pd
df = pd.read_excel (r'path\test.xlsx', sheet_name='2018')
df = df.set_index('timestamp') #Assuming thename of your datetime column is timestamp
You can try this method for setting the Datetime column as the index.

How to convert python dataframe timestamp to datetime format

I have a dataframe with date information in one column.
The date visually appears in the dataframe in this format: 2019-11-24
but when you print the type it shows up as:
Timestamp('2019-11-24 00:00:00')
I'd like to convert each value in the dataframe to a format like this:
24-Nov
or
7-Nov
for single digit days.
I've tried using various datetime and strptime commands to convert but I am getting errors.
Here's a way to do:
df = pd.DataFrame({'date': ["2014-10-23","2016-09-08"]})
df['date_new'] = pd.to_datetime(df['date'])
df['date_new'] = df['date_new'].dt.strftime("%d-%b")
date date_new
0 2014-10-23 23-Oct
1 2016-09-08 08-Sept

Convert string column(JAN 2018) to date column(01-01-2018) for pyspark dataframe

I'm trying to convert a string type column to date type column using udf functions as given below
Example input column value:
JAN 2018
Expected output value:
01-01-2018
here is my code
from datetime import datetime
from pyspark.sql.types import DateType
squared_udf = udf(lambda z: datetime.strptime(z,'%b %Y').strftime('%Y-%m-%d'), DateType())
df = df.select('TIME PERIOD', squared_udf('TIME PERIOD'))
Output of my code:
DataFrame[TIME PERIOD: string, (TIME PERIOD): date]
But I'm expecting spark dataframe updated with TIME PERIOD column
Please suggest on the same.

Categories