Sorting Timestamps inside a CSV file with Python - python

I'm trying to sort the content of a csv file by the given timestamps but it just doesn't seem to work for me. They are given in such a way:
2021-04-16 12:59:26+02:00
My current code:
from datetime import datetime
import csv
from csv import DictReader
with open('List_32_Data_New.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
csv_dict_reader = sorted(csv_dict_reader, key = lambda row: datetime.strptime(row['Timestamp'], "%Y-%m-%d %H:%M:%S%z"))
writer = csv.writer(open("Sorted.csv", 'w'))
for row in csv_dict_reader:
writer.writerow(row)
However it always throws the error:
time data '2021-04-16 12:59:26+02:00' does not match format '%Y-%m-%d %H:%M:%S%z'
I tried already an online compiler at apparently it works there.
Any help would be much appreciated.

If you use pandas as a library it could be a bit easier (Credits to: MrFuppes).
import pandas as pd
df = pd.read_csv(r"path/your.csv")
df['new_timestamps'] = pd.to_datetime(df['timestamps'], format='%Y-%m-%d %H:%M:%S%z')
df = df.sort_values(['new_timestamps'], ascending=True)
df.to_csv(r'path/your.csv')
If you still have errors you can also try to parse the date like this (Credits to: Zerox):
from dateutil.parser import parse
df['new_timestamps'] = df['timestamps'].map(lambda x: datetime.strptime((parse(x)).strftime('%Y-%m-%d %H:%M:%S%z'), '%Y-%m-%d %H:%M:%S%z'))
Unsure about the correct datetime-format? You can try auto-detection infer_datetime_format=True:
df['new_timestamps'] = pd.to_datetime(df['timestamps'], infer_datetime_format=True)
Tested with following sample:
df = pd.DataFrame(['2021-04-15 12:59:26+02:00','2021-04-13 12:59:26+02:00','2021-04-16 12:59:26+02:00'], columns=['timestamps'])

Related

Convert date format from a 'yfinance' download

I have a yfinance download that is working fine, but I want the Date column to be in YYYY/MM/DD format when I write to disk.
The Date column is the Index, so I first remove the index. Then I have tried using Pandas' "to_datetime" and also ".str.replace" to get the column data to be formatted in YYYY/MM/DD.
Here is the code:
import pandas
import yfinance as yf
StartDate_T = '2021-12-20'
EndDate_T = '2022-05-14'
df = yf.download('CSCO', start=StartDate_T, end=EndDate_T, rounding=True)
df.sort_values(by=['Date'], inplace=True, ascending=False)
df.reset_index(inplace=True) # Make it no longer an Index
df['Date'] = pandas.to_datetime(df['Date'], format="%Y/%m/%d") # Tried this, but it fails
#df['Date'] = df['Date'].str.replace('-', '/') # Tried this also - but error re str
file1 = open('test.txt', 'w')
df.to_csv(file1, index=True)
file1.close()
How can I fix this?
Change the format of the date after resetting the index:
df.reset_index(inplace=True)
df['Date'] = df['Date'].dt.strftime('%Y/%m/%d')
As noted in Convert datetime to another format without changing dtype, you can not change the format and keep the datetime format, due to how datetime stores the dates internally. So I would use the line above before writing to the file (which changes the column to string format) and convert it back to datetime afterwards, to have the datetime properties.
df['Date'] = pd.to_datetime(df['Date'])
You can pass a date format to the to_csv function:
df.to_csv(file1, date_format='%Y/%m/%d')

Not able to change DateTime Format to a specified format in python

Convert date string "1/09/2020" to string "1-Sep-2020" in python. Try every solution mentioned in stackoverflow but not able to change it. Sometimes the Value error comes data format doesn't match, when I try to match it then error come day out of range. Is there problem in excel data or I am writing the code wrong. Please help me to solve this issue???
xlsm_files=['202009 - September - Diamond Plod Day & Night MKY021.xlsm']
import time
import pandas as pd
import numpy as np
import datetime
df=pd.DataFrame()
for fn in xlsm_files:
all_dfs=pd.read_excel(fn, sheet_name=None, engine='openpyxl')
list_data = all_dfs.keys()
all_dfs.pop('Date',None)
all_dfs.pop('Ops Report',None)
all_dfs.pop('Fuel Report',None)
all_dfs.pop('Bit Report',None)
all_dfs.pop('Plod Example',None)
all_dfs.pop('Plod Definitions',None)
all_dfs.pop('Consumables',None)
df2 = pd.DataFrame(columns=["PlodDate"])
for ws in list_data:
df1 = all_dfs[ws]
new_row = {'PlodDate':df1.iloc[3,3]}
df2 = df2.append(new_row,ignore_index=True)
df2['PlodDate']=pd.to_datetime(df2['PlodDate'].astype(str), format="%d/%m/%Y")
df2['PlodDate']=df2['PlodDate'].apply(lambda x: x.strftime("%d-%b-%Y"))
df2
ValueError: day is out of range for month or doesnot match format
Method 1-Tried because it show error date out of range
try:
datetime.datetime.strptime(df2['PlodDate'].astype(str).values[0],"%d/%m/%Y")
except ValueError:
continue
df2['PlodDate']=pd.to_datetime(df2['PlodDate'].astype(str), format="%d/%m/%Y")
df2['PlodDate']=df2['PlodDate'].apply(lambda x: x.strftime("%d-%b-%Y"))
Excel File Attached
df2['PlodDate']=pd.to_datetime(df2['PlodDate'].astype(str), format="%d/%m/%Y")
date = df2['PlodDate'].split('/')
df2['PlodDate'] = datetime.date(int(date[2]), int(date[1]), int(date[0])).strftime('%d-%b-%Y')

Convert the timestamp column into date format

enter image description here I want to convert the timestamp into the readable date format column. But when i tried the following code, the output of date is all the same. Can anyone help me with this problem?
import json
import pandas as pd
with open('/Users/Damon/Desktop/percent-utx-os-in-profit.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
——> what df looks like before
from datetime import date
df["date"] = pd.to_datetime(df.t)
——> what you get and what you want to get

Convert any Date String Format to a specific date format string

I am making a generic tool which can take up any csv file.I have a csv file which looks something like this. The first row is the column name and the second row is the type of variable.
sam.csv
Time,M1,M2,M3,CityName
temp,num,num,num,city
20-06-13,19,20,0,aligarh
20-02-13,25,42,7,agra
20-03-13,23,35,4,aligarh
20-03-13,21,32,3,allahabad
20-03-13,17,27,1,aligarh
20-02-13,16,40,5,aligarh
Other CSV file looks like:
Time,M1,M2,M3,CityName
temp,num,num,num,city
20/8/16,789,300,10,new york
12/6/17,464,67,23,delhi
12/6/17,904,98,78,delhi
So, there could be any date format or it could be a time stamp.I want to convert it to "20-May-13" or "%d-%b-%y" format string everytime and sort the column from oldest date to the newest date. I have been able to search the column name where the type is "temp" and try to convert it to the required format but all the methods require me to specify the original format which is not possible in my case.
Code--
import csv
import time
from datetime import datetime,date
import pandas as pd
import dateutil
from dateutil.parser import parse
filename = 'sam.csv'
data_date = pd.read_csv(filename)
column_name = data_date.ix[:, data_date.loc[0] == "temp"]
column_work = column_name.iloc[1:]
column_some = column_work.iloc[:,0]
default_date = datetime.combine(date.today(), datetime.min.time()).replace(day=1)
for line in column_some:
print(parse(line[0], default=default_date).strftime("%d-%b-%y"))
In "sam.csv", the dates are in 2013. But in my output it gives the correct format but all the 6 dates as 2-Mar-2018
You can use the dateutil library for converting any date format to your required format.
Ex:
import csv
from dateutil.parser import parse
p = "PATH_TO_YOUR_CSV.csv" #I have used your sample data to test.
with open(p, "r") as infile:
reader = csv.reader(infile)
next(reader) #Skip Header
next(reader) #Skip Header
for line in reader:
print(parse(line[0]).strftime("%d-%B-%y")) #Parse Date and convert it to date-month-year
Output:
20-June-13
20-February-13
20-March-13
20-March-13
20-March-13
20-February-13
20-August-16
06-December-17
06-December-17
MoreInfo on Dateutil

TypeError: Timestamp subtraction

I have a script that goes and collects data. I am running into the TypeError: Timestamp subtraction must have the same timezones or no timezones error. I have looked at other postings on this error, but had trouble finding a solution for me.
How can I bypass this error. Once the data is collected, I don't manipulate it and I don't quite understand why I cannot save this dataframe into an excel document. Can anyone offer help?
import pandas as pd
import numpy as np
import os
import datetime
import pvlib
from pvlib.forecast import GFS, NAM
#directories and filepaths
barnwell_dir = r'D:\Saurabh\Production Forecasting\Machine Learning\Sites\Barnwell'
barnwell_training = r'8760_barnwell.xlsx'
#constants
writer = pd.ExcelWriter('test' + '_PythonExport.xlsx', engine='xlsxwriter')
time_zone = 'Etc/GMT+5'
barnwell_list = [r'8760_barnwell.xlsx', 33.2376, -81.3510]
def get_gfs_processed_data1():
start = pd.Timestamp(datetime.date.today(), tz=time_zone) #used for testing last week
end = start + pd.Timedelta(days=6)
gfs = GFS(resolution='quarter')
#get processed data for lat/long point
forecasted_data = gfs.get_processed_data(barnwell_list[1], barnwell_list[2], start, end)
forecasted_data.to_excel(writer, sheet_name='Sheet1')
get_gfs_processed_data1()
When I run your sample code I get the following warning from XlsxWriter at the end of the stacktrace:
"Excel doesn't support timezones in datetimes. "
TypeError: Excel doesn't support timezones in datetimes.
Set the tzinfo in the datetime/time object to None or use the
'remove_timezone' Workbook() option
I think that is reasonably self-explanatory. To strip the timezones from the timestamps pass the remove_timezone option as recommended:
writer = pd.ExcelWriter('test' + '_PythonExport.xlsx',
engine='xlsxwriter',
options={'remove_timezone': True})
When I make this change the sample runs and produces an xlsx file. Note, the remove_timezone option requires XlsxWriter >= 0.9.5.
You can delete timezone from all your datetime columns like that:
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_convert(None)
df.to_excel('test' + '_PythonExport.xlsx')
after that you save excel without any problem
Note:
To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0)
or 'datetime64[ns, tz]'

Categories