i have a csv file with many lines and three column. first column is the unix time, second column the price, and third column represents the volume of the symbol that has been traded at that specific price. what i'm doing is, calculating ohlc for different time frames (e.g. 1h, 4h, 12h, 1d) out of tha csv file. that is working very well by first converting the unix time into datetime
code:
import pandas as pd
df = pd.read_csv('file.csv', names=['date', 'price', 'volume'])
df['date'] = pd.to_datetime(df['date'], unit='s')
df = df.set_index('date')
df = df['price'].resample('4h').ohlc()
df.to_csv('file_4h_ohlc.csv')
result:
date,open,high,low,close
2017-05-01 20:00:00,0.757881,1.07,0.650011,1.069999
target:
i wanna now converte the datetime (2017-05-01 20:00:00) back to the unix time (1493658000) within the same file by keeping the ohlc values. or if not possible so, to save into a different file.
thanks a lot for support and sorry if such question has been already answered, but i didnt find it
-hotshot
You can create a new date column instead of overwriting the existing one, so you can re-use it as the index.
import pandas as pd
df = pd.read_csv('file.csv', names=['date', 'price', 'volume'])
df['datestamp'] = pd.to_datetime(df['date'], unit='s')
df = df.set_index('datestamp')
df = df['price'].resample('4h').ohlc()
# Set the index back to the original (after calculating ohlc)
df = df.set_index('date')
# Optional: Drop the datestamp column
df = df.drop(columns=['datestamp'])
df.to_csv('file_4h_ohlc.csv')
Alternatively, you can convert the existing datetime column to a Unix timestamp like so:
df['date'].apply(lambda x : (x - datetime.datetime(1970, 1, 1)).total_seconds())
Related
I have a yfinance download that is working fine, but I want the Date column to be in YYYY/MM/DD format when I write to disk.
The Date column is the Index, so I first remove the index. Then I have tried using Pandas' "to_datetime" and also ".str.replace" to get the column data to be formatted in YYYY/MM/DD.
Here is the code:
import pandas
import yfinance as yf
StartDate_T = '2021-12-20'
EndDate_T = '2022-05-14'
df = yf.download('CSCO', start=StartDate_T, end=EndDate_T, rounding=True)
df.sort_values(by=['Date'], inplace=True, ascending=False)
df.reset_index(inplace=True) # Make it no longer an Index
df['Date'] = pandas.to_datetime(df['Date'], format="%Y/%m/%d") # Tried this, but it fails
#df['Date'] = df['Date'].str.replace('-', '/') # Tried this also - but error re str
file1 = open('test.txt', 'w')
df.to_csv(file1, index=True)
file1.close()
How can I fix this?
Change the format of the date after resetting the index:
df.reset_index(inplace=True)
df['Date'] = df['Date'].dt.strftime('%Y/%m/%d')
As noted in Convert datetime to another format without changing dtype, you can not change the format and keep the datetime format, due to how datetime stores the dates internally. So I would use the line above before writing to the file (which changes the column to string format) and convert it back to datetime afterwards, to have the datetime properties.
df['Date'] = pd.to_datetime(df['Date'])
You can pass a date format to the to_csv function:
df.to_csv(file1, date_format='%Y/%m/%d')
I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])
Is there a way to create a new data frame from a time series with the daily diffence?
This means, suppose that on October 5 I had 5321 counts and on October 6 5331 counts. This represents the difference of 10; what I want is, for example, that my DataFrame shows 10 on October 6.
Here's my code of the raw dataframe:
import pandas as pd
from datetime import datetime, timedelta
url = 'https://raw.githubusercontent.com/mariorz/covid19-mx-time-series/master/data/covid19_confirmed_mx.csv'
df = pd.read_csv(url, index_col=0)
df = df.loc['Colima','18-03-2020':'06-10-2020']
df = pd.DataFrame(df)
df.index = pd.to_datetime(df.index, format='%d-%m-%Y')
df
This is the raw outcome:
Thank you guys!
There's an inbuilt diff function just for these kind of operations:
df['Diff'] = df.Colima.diff()
Yes, you can use the shift method to access the preceding row's value to calculate the difference.
df['difference'] = df.Colima - df.Colima.shift(1)
I want to use time series with Pandas. I read multiple time series one by one, from a csv file which has the date in the column named "Date" as (YYYY-MM-DD):
Date,Business,Education,Holiday
2005-01-01,6665,8511,86397
2005-02-01,8910,12043,92453
2005-03-01,8834,12720,78846
2005-04-01,8127,11667,52644
2005-05-01,7762,11092,33789
2005-06-01,7652,10898,34245
2005-07-01,7403,12787,42020
2005-08-01,7968,13235,36190
2005-09-01,8345,12141,36038
2005-10-01,8553,12067,41089
2005-11-01,8880,11603,59415
2005-12-01,8331,9175,70736
df = pd.read_csv(csv_file, index_col = 'Date',header=0)
Series_list = df.keys()
The time series can have different frequencies: day, week, month, quarter, year and I want to index the time series according to a frequency I decide before I generate the Arima model. Could someone please explain how can I define the frequency of the series?
stepwise_fit = auto_arima(df[Series_name]....
pandas has a built in function pandas.infer_freq()
import pandas as pd
df = pd.DataFrame({'Date': ['2005-01-01', '2005-02-01', '2005-03-01', '2005-04-01'],
'Date1': ['2005-01-01', '2005-01-02', '2005-01-03', '2005-01-04'],
'Date2': ['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01'],
'Date3': ['2006-01-01', '2006-02-06', '2006-03-11', '2006-04-01']})
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['Date3'] = pd.to_datetime(df['Date3'])
pd.infer_freq(df.Date)
#'MS'
pd.infer_freq(df.Date1)
#'D'
pd.infer_freq(df.Date2)
#'AS-JAN'
Alternatively you could also make use of the datetime functionality of the columns.
df.Date.dt.freq
#'MS'
Of course if your data doesn't actually have a real frequency, then you won't get anything.
pd.infer_freq(df.Date3)
#
The frequency descriptions are docmented under offset-aliases.
Say I'm looking at the Rdataset acme.csv found here. How do I import this with appropriately coarse date? Using parse_dates, it assigns the day to the present day (today being the 18th of July), since no day was specified. Can I make it deal with just month/year like the table is but keep using the date functionality of PANDAS?
import pandas as pd
url = 'http://vincentarelbundock.github.io/Rdatasets/csv/boot/acme.csv'
df = pd.read_csv(url, parse_dates=[1])
df.drop('Unnamed: 0', axis=1, inplace=True)
Don't parse dates in read_csv() but use to_datetime with format
df['month'] = pd.to_datetime(df['month'], format='%m/%y')
or you can use that function in read_csv() using lambda
df = pd.read_csv(url, parse_dates=['month'], date_parser=lambda x:pd.to_datetime(x, format='%m/%y'))
But you always get some day number in datetime.
BTW: In datetime you have always time too, but sometimes pandas doesn't show it.
print df['month'].head()
print df['month'].apply(lambda x:x.time()).head()