I am doing some data cleaning and I have a csv with a date column containing “month day”, for example: April 12. I want to add the year 2020 to each date in that column, so that I have: April 12 2020.
I’ve tried using pandas and datetime, but I feel like I am clearly missing an easy answer.
Thanks!
edit:
I should have said this before, I have already imported the csv and I want to add the year after the fact. Furthermore, I have already told pandas that the ‘onset’ column contains dates.
edit 2:
Thanks to: You can try df['onset'] = df['onset'].apply(lambda dt: dt.replace(year=2020)) in that case – MrNobody33 13
That worked! Thanks for all the help,I’ll try to make my future posts more clear in the future and add my data when asking a question. I knew there had to be a simple answer!
try this...
df['onset'] = df['onset'].astype(str) +'2020'
If you are trying to edit the csv itself, you can try something like this:
import pandas as pd
path = 'test.csv' #path of the .csv
df = pd.read_csv(path) #reads the file
df['onset'] = df['onset'].astype(str) +' 2020' #Add the year
df.to_csv("test.csv", index=False) #modify the file
Or, if you want to edit the dataframe imported from that csv, you can try this:
import pandas as pd
path = 'test.csv'
path = 'pathofthecsv'
df = pd.read_csv(path) #reads the file
df['onset'] = df['onset'].astype(str) +' 2020' #Add the year
Related
I'm trying to parse the date from the column and add a column with the weekday. However, I am very new to python and programming in general. The
date_parse = pd.read_csv(input_file, parse_cols = "Date")
gives me an error. My goal was to parse the data from the column "Date" of the pandas dataset and and save it as date_parse, so I could then convert it with date.weekday. It didn't like the parse_cols. I'm not sure if I'm even closely doing this correctly or if I'm way off base. I've been working on this for about a week and no cookie. So I broke down and decided to ask here. I suspect that pd.read is maybe for Excel only? Any tips?
Error:
Exception has occurred: TypeError
read_csv() got an unexpected keyword argument 'parse_cols'
File "input_file", line 21, in <module>
date_parse = pd.read_csv(input_file, parse_cols = "Date")
import yfinance as yf
import pandas as pd
import csv
import datetime as dt
import calendar
#Added input_file to make is easier than copying the file path everywhere
input_file = "BrkHist.csv"
#Pulling information from yahoo
brk = yf.Ticker("brk-B")
brk.info
Hist_data = brk.history(period="2y")
Hist_data.to_csv(input_file)
#Change csv file to pandas
data = pd.read_csv(BrkHist.csv)
#Select which columns I want to see on the dataframe
brkp = pd.DataFrame(data, columns= ['Date','Weekday','Open','Close'])
#Parse data from date column
date_parse = pd.read_csv(input_file, parse_cols = "Date")
print(date_parse)
#Add Weekday column to hold days of the week data in integer form
dp_weekday = date(date_parse).weekday()
#Convert weekday column data into string value
days = {0:'Monday',1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Saturday',6:'Sunday'}
#Add day of week column with new data
data.insert (1, "Weekday"[dp_weekday])
pandas read csv has no parse_cols keyword (see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
Did you mean to use parse_dates=['Date'] ?
I was looking on Stackoverflow for this thing but I didn't find exactly what I wanted. I would like to open csv file on Python and add new column with header "Date" and until end of the file add today's date. How can I do it? I was trying to do it with pandas but I only know how to append to the end.
I was trying to do this that way with package csv:
x=open(outfile_name1)
y=csv.reader(x)
z=[]
for row in y:
z.append(['0'] + row)
Instead of ['0'] I wanted to put today's date. Can I then convert this list to csv with pandas or something? Thanks in advance for help!
Try this:
import pandas as pd
import datetime
df = pd.read_csv("my.csv")
df.insert(0, 'Date', datetime.datetime.today().strftime('%Y-%m-%d'))
df.to_csv("my_withDate.csv", index=False)
PS: Read the docs
Is this what you are looking for?
import pandas as pd
import datetime
df = pd.read_csv("file.csv")
df['Date'] = datetime.datetime.today().strftime('%Y-%m-%d')
df.to_csv("new_file.csv", index=False)
As far as I undestand ultimate goal is to write data to csv. One option to do that is to open first file for reading data, second for writing data then write header row into new file prepending it with column name 'Date,' and then iterate over data rows prepending them with date (requires 3.6 <= Python as uses f-strings):
import datetime
with open('columns.csv', 'r') as out, open('out.csv', 'w') as into:
headers = 'Date,' + next(out)
print(headers, end='', file=into)
for row in out:
print(f'{datetime.datetime.today().date()}, {row}', end='', file=into)
So I recently concatenated multiple csv files into one. Since the filenames were dates, I also included "filename" as a column for reference. However, the filename has info that I would not like to include such as the time and file extension. As a beginner, I'm only familiar with importing and printing the file to view. What code is recommended to mass remove the info after the date?
answer filename
7 2018-04-12 21_01_01.csv
7 2018-04-18 18_36_30.csv
7 2018-04-18 21_01_32.csv
8 2018-04-20 15_21_02.csv
7 2018-04-20 21_00_44.csv
7 2018-04-22 21_01_05.csv
It could be done with regular python, not that difficult, but an very easy way with pandas would be:
import pandas as pd
df = pd.read_csv(<your name of the csv here>, sep='\s\s+', engine='python')
df['filename'] = df['filename'].str.rstrip('.csv')
print(df)
When working with tabular data in python I highly recommend using the pandas package.
import pandas as pd
df = pd.read_csv("../test_data.csv")
def rem_part(string):
return string.split(' ')[0] # could also split on '.' if you want to keep the time
df['date'] = df['filename'].apply(rem_part)
df.drop('filename', axis = 1, inplace=True) # remove the filename column if you so please
df.to_csv("output.csv"). # save the file as a new CSV or overwrite the old
The test_data.csv file contains the following:
answer,filename
7,2018-04-12 21_01_01.csv
7,2018-04-18 18_36_30.csv
7,2018-04-18 21_01_32.csv
8,2018-04-20 15_21_02.csv
7,2018-04-20 21_00_44.csv
7,2018-04-22 21_01_05.csv
How to write dates of dataframe in a file.
import csv
import pandas as pd
writeFile = open("dates.csv","w+")
writer = csv.writer(writeFile)
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
Convert2List = dates.values.tolist()
for row in Convert2List:
writer.writerow(row)
writeFile.close()
My actual values are:
1.54699E+18
1.54708E+18
1.54716E+18
1.54725E+18
1.54734E+18
And the expected values should be:
01-09-2019
02-09-2019
03-09-2019
If you have a pandas dataframe you can just use the method pandas.DataFrame.to_csv and set the parameters (link to documentation).
Pandas has a write to file function build-in. Try:
import pandas as pd
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
#print (dates) # check here if the dates is written correctly.
dates.to_csv('dates.csv') # writes the dataframe directly to a file.
The date.csv file gives me:
,0
0,2019-01-09
1,2019-01-10
2,2019-01-11
3,2019-01-12
...snippet...
262,2019-09-28
263,2019-09-29
264,2019-09-30
Changing date order to get date range September for default settings:
dates = pd.DataFrame(pd.date_range(start = '2019-09-01', end = '2019-09-30'))
Gives:
0_29 entries for 30 days of September.
Furthermore, changing the date order for custom settings:
dates[0] = pd.to_datetime(dates[0]).apply(lambda x:x.strftime('%d-%m-%Y'))
Gives you:
01-09-2019
02-09-2019
03-09-2019
...etc.
I have a csv file as follows:
Date,Data
01-01-01,111
02-02-02,222
03-03-03,333
The Date has the following format YEAR-MONTH-DAY. I would like to calculate from these dates the monthly average values of the data (there are way more than 3 dates in my file).
For that I wish to use the following code:
import pandas as pd
import dateutil
import datetime
import os,sys,math,time
from os import path
os.chdir("in/base/dir")
data = pd.DataFrame.from_csv("data.csv")
data['Month'] = pd.DatetimeIndex(data['Date']).month
mean_data = data.groupby('Month').mean()
with open("data_monthly.csv", "w") as f:
print(mean_data, file=f)
For some reason this gives me the error KeyError: 'Date'.
So it seems that the header is not read by pandas. Does anyone know how to fix that?
Your Date column header is read but put into the index. You got to use:
data['Month'] = pd.DatetimeIndex(data.reset_index()['Date']).month
Another solution is to use index_col=None while making the dataframe from csv.
data = pd.DataFrame.from_csv("data.csv", index_col=None)
After which your code would be fine.
The ideal solution would be to use read_csv().
data = pd.read_csv("data.csv")
Use the read_csv method. By Default it is comma separated.
import pandas as pd
df = pd.read_csv(filename)
print(pd.to_datetime(df["Date"]))
Output:
0 2001-01-01
1 2002-02-02
2 2003-03-03
Name: Date, dtype: datetime64[ns]