I have a yfinance download that is working fine, but I want the Date column to be in YYYY/MM/DD format when I write to disk.
The Date column is the Index, so I first remove the index. Then I have tried using Pandas' "to_datetime" and also ".str.replace" to get the column data to be formatted in YYYY/MM/DD.
Here is the code:
import pandas
import yfinance as yf
StartDate_T = '2021-12-20'
EndDate_T = '2022-05-14'
df = yf.download('CSCO', start=StartDate_T, end=EndDate_T, rounding=True)
df.sort_values(by=['Date'], inplace=True, ascending=False)
df.reset_index(inplace=True) # Make it no longer an Index
df['Date'] = pandas.to_datetime(df['Date'], format="%Y/%m/%d") # Tried this, but it fails
#df['Date'] = df['Date'].str.replace('-', '/') # Tried this also - but error re str
file1 = open('test.txt', 'w')
df.to_csv(file1, index=True)
file1.close()
How can I fix this?
Change the format of the date after resetting the index:
df.reset_index(inplace=True)
df['Date'] = df['Date'].dt.strftime('%Y/%m/%d')
As noted in Convert datetime to another format without changing dtype, you can not change the format and keep the datetime format, due to how datetime stores the dates internally. So I would use the line above before writing to the file (which changes the column to string format) and convert it back to datetime afterwards, to have the datetime properties.
df['Date'] = pd.to_datetime(df['Date'])
You can pass a date format to the to_csv function:
df.to_csv(file1, date_format='%Y/%m/%d')
Related
i have a csv file with many lines and three column. first column is the unix time, second column the price, and third column represents the volume of the symbol that has been traded at that specific price. what i'm doing is, calculating ohlc for different time frames (e.g. 1h, 4h, 12h, 1d) out of tha csv file. that is working very well by first converting the unix time into datetime
code:
import pandas as pd
df = pd.read_csv('file.csv', names=['date', 'price', 'volume'])
df['date'] = pd.to_datetime(df['date'], unit='s')
df = df.set_index('date')
df = df['price'].resample('4h').ohlc()
df.to_csv('file_4h_ohlc.csv')
result:
date,open,high,low,close
2017-05-01 20:00:00,0.757881,1.07,0.650011,1.069999
target:
i wanna now converte the datetime (2017-05-01 20:00:00) back to the unix time (1493658000) within the same file by keeping the ohlc values. or if not possible so, to save into a different file.
thanks a lot for support and sorry if such question has been already answered, but i didnt find it
-hotshot
You can create a new date column instead of overwriting the existing one, so you can re-use it as the index.
import pandas as pd
df = pd.read_csv('file.csv', names=['date', 'price', 'volume'])
df['datestamp'] = pd.to_datetime(df['date'], unit='s')
df = df.set_index('datestamp')
df = df['price'].resample('4h').ohlc()
# Set the index back to the original (after calculating ohlc)
df = df.set_index('date')
# Optional: Drop the datestamp column
df = df.drop(columns=['datestamp'])
df.to_csv('file_4h_ohlc.csv')
Alternatively, you can convert the existing datetime column to a Unix timestamp like so:
df['date'].apply(lambda x : (x - datetime.datetime(1970, 1, 1)).total_seconds())
In my csv file, the "ESTABLİSHMENT DATE" column is delimited by the slashes like this: 01/22/2012.
I am converting the csv format into the JSON format, which needs to be done with pandas, but the "ESTABLİSHMENT DATE" column isn't correctly translated to JSON.
df = pd.read_csv(my_csv)
df.to_json("some_path", orient="records")
I don't understand why it awkwardly adds the backward slashes.
"ESTABLİSHMENT DATE":"01\/22\/2012",
However, I need to write the result to a file as the following:
"ESTABLİSHMENT DATE":"01/22/2012",
Forward slash in json file from pandas dataframe answers why it awkwardly adds the backward slashes, and this answer shows how to use the json library to solve the issue.
As long as the date format is 01/22/2012, the / will be escaped with \.
To correctly convert the column in a csv that contains the dates into JSON, using pandas, can be done by converting the 'date' column to a correct datetime dtype, and then using .to_json.
2012-01-22 is the correct datetime format, but .to_json will convert that to 1327190400000. After using pd.to_datetime to set the correct format as %Y-%m-%d, the type must be set to a string.
import pandas as pd
# test dataframe
df = pd.DataFrame({'date': ['01/22/2012']})
# display(df)
date
0 01/22/2012
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":"01\/22\/2012"}]
# set the date column to a proper datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y')
# display(df)
date
0 2012-01-22
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":1327190400000}]
# set the date column type to string
df.date = df.date.astype(str)
# to JSON
print(df.to_json(orient='records'))
[out]: [{"date":"2012-01-22"}]
# as a single line of code
df.date = pd.to_datetime(df.date, format='%m/%d/%Y').astype(str)
This question is different from all the available questions and answers available in stack overflow because I do not want to change my data type to string in order to obtain desired output.
I find it as a most confusing and not able to find proper solution of my problem.
I read an excel file which have one column as following-
Date
9/20/2017 7:27:30 PM
9/20/2017 7:27:30 PM
11/21/2018 8:28:30 AM
7/18/2019 9:30:08 PM
.
.
.
I am taking this data from excel sheet with the help of dataframe
df = pd.read_excel("data.xlsx")
Firstly I want to remove time from this column. I am doing it as -
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = pd.to_datetime(df['Date'], errors='ignore', format='%d/%b/%Y').dt.date
It produces following output and datatype as datetime.date
Date
20/9/2017
20/9/2017
21/11/2018
18/7/2019
.
.
.
But I want it as following type without changing it into string.Because I want to store this data into another excel file and this column must behave as a date column if we apply filtering in my excel sheet.
Date
20/Sep/2017
20/Sep/2017
21/Nov/2018
18/Jul/2019
.
.
.
I can produce above output by
df['Date'] = df['Date'].apply(lambda x: x.strftime('%d/%b/%Y'))
But again this date column will be changed into string.But I do not want it as string. I want it as datetime type excluding time values from each cell.
A possible solution after converting it from string to datetime is as following but it will again add time values in it-
df['Date'] = pd.to_datetime(df['Date'])
After executing above two steps it will also include time as 12:00:00 AM or 00:00:00 AM along with date value.
Hope I am clear.
How to obtained the desired result with final column value as date type
But I want it as following type without changing it into string
No it is not possible, if want datetimes without times there is only pattern YYYY-MM-DD in python/pandas.
#datetimes with no times
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.floor('d')
#python dates
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.date
For all custom formats are datetimes converted to strings like:
df['Date'] = df['Date'].dt.strftime('%d/%b/%Y')
You can set the date_format in the excelwriter
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter',
date_format='%d/%b/%Y')
df.to_excel(writer)
think i am bit late here, as a workaround
do not format the date column , let it be a regular df date column, save the excel workbook and then open the excel again and using openpyxl module format that column range
import openpyxl
workbook = openpyxl.load_workbook(file_path)
sheet = workbook['Sheet1'] # get the active sheet
#-- assuming that the column is M and data starts from M2
last_line_end = 'M' + str(len(df)+1)
for row in sheet['M2:' + last_line_end]:
for cell in row:
cell.number_format = "DD/MM/YY"
workbook.save(file_name) # save workbook
workbook.close()
I'm having an issue where the date format is not matching up. Meaning in my .csv file the dates are as follows %m/%d/%Y (ex. 11/3/2001) but in the error it saying %Y/%m/%d or %Y/%d/%m. I've tried all the possible permutations as far as year, month and day and I continue to recieve the same error of ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S'. Below is my code. Thanks.
df = pd.read_excel('.xlsx', header=None)
df.to_csv('.csv', header=None, index=False)
df= pd.read_csv('.csv', index_col[5,8,9,12], date_parser=lambda x: datetime.datetime.strptime(x, '%Y/%m/$d %H:%M:%S').strptime('%m/%d/%Y))
Note: What I'm trying to do is convert an .xlsx file to .csv and then remove the trailing 0:00 from multiple columns within the .csv file. Hope this helps.
Use the parse from dateutil.parser to parse the date appropriately. It is an easy access. The fastest way to parse dates.
from dateutil.parser import parse
df = pd.read_csv('filename.csv', date_parser = parse, index_..)
our you can use to_datetime native to Pandas
pd.to_datetime(df['Date Col'])
In order to format the date properly, you should use the following:
date_parser=lambda x: parse(x)
#parse from dateutil.parser
df['Date Col'] = df['Date Col'].strftime('%m/%d/%Y')
df.to_csv('New File.csv')
You can use to_datetime since you are using pandas. MoreInfo
import pandas as pd
df = pd.DataFrame({"a": ["11/3/2001", '2001-11-03']})
df["a"] = pd.to_datetime(df["a"])
print(df["a"])
Output:
0 2001-11-03
1 2001-11-03
Name: a, dtype: datetime64[ns]
I convert a string to date using pandas.
When I write the DF to CSV, the date comes like '2016-08-15 instead of plain 2016-08-15. Unable to read it as date in ETL tool.Same is the case for all date fields.
Any suggestion to get the date format correctly ?
df =pd.read_csv(r'/Users/tcssig/Documents/ABP_News_Aug01.csv', parse_dates=['Dates'])
df.to_csv('/Users/tcssig/Documents/Sarang.csv')
You can try this
df = pd.read_csv(r'/Users/tcssig/Documents/ABP_News_Aug01.csv')
df['date'] = pd.to_datetime(df['date'])
df.to_csv('/Users/tcssig/Documents/Sarang.csv')
(assuming name of the date field is 'date'