I have a csv file as follows:
Date,Data
01-01-01,111
02-02-02,222
03-03-03,333
The Date has the following format YEAR-MONTH-DAY. I would like to calculate from these dates the monthly average values of the data (there are way more than 3 dates in my file).
For that I wish to use the following code:
import pandas as pd
import dateutil
import datetime
import os,sys,math,time
from os import path
os.chdir("in/base/dir")
data = pd.DataFrame.from_csv("data.csv")
data['Month'] = pd.DatetimeIndex(data['Date']).month
mean_data = data.groupby('Month').mean()
with open("data_monthly.csv", "w") as f:
print(mean_data, file=f)
For some reason this gives me the error KeyError: 'Date'.
So it seems that the header is not read by pandas. Does anyone know how to fix that?
Your Date column header is read but put into the index. You got to use:
data['Month'] = pd.DatetimeIndex(data.reset_index()['Date']).month
Another solution is to use index_col=None while making the dataframe from csv.
data = pd.DataFrame.from_csv("data.csv", index_col=None)
After which your code would be fine.
The ideal solution would be to use read_csv().
data = pd.read_csv("data.csv")
Use the read_csv method. By Default it is comma separated.
import pandas as pd
df = pd.read_csv(filename)
print(pd.to_datetime(df["Date"]))
Output:
0 2001-01-01
1 2002-02-02
2 2003-03-03
Name: Date, dtype: datetime64[ns]
Related
I'm trying to parse the date from the column and add a column with the weekday. However, I am very new to python and programming in general. The
date_parse = pd.read_csv(input_file, parse_cols = "Date")
gives me an error. My goal was to parse the data from the column "Date" of the pandas dataset and and save it as date_parse, so I could then convert it with date.weekday. It didn't like the parse_cols. I'm not sure if I'm even closely doing this correctly or if I'm way off base. I've been working on this for about a week and no cookie. So I broke down and decided to ask here. I suspect that pd.read is maybe for Excel only? Any tips?
Error:
Exception has occurred: TypeError
read_csv() got an unexpected keyword argument 'parse_cols'
File "input_file", line 21, in <module>
date_parse = pd.read_csv(input_file, parse_cols = "Date")
import yfinance as yf
import pandas as pd
import csv
import datetime as dt
import calendar
#Added input_file to make is easier than copying the file path everywhere
input_file = "BrkHist.csv"
#Pulling information from yahoo
brk = yf.Ticker("brk-B")
brk.info
Hist_data = brk.history(period="2y")
Hist_data.to_csv(input_file)
#Change csv file to pandas
data = pd.read_csv(BrkHist.csv)
#Select which columns I want to see on the dataframe
brkp = pd.DataFrame(data, columns= ['Date','Weekday','Open','Close'])
#Parse data from date column
date_parse = pd.read_csv(input_file, parse_cols = "Date")
print(date_parse)
#Add Weekday column to hold days of the week data in integer form
dp_weekday = date(date_parse).weekday()
#Convert weekday column data into string value
days = {0:'Monday',1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Saturday',6:'Sunday'}
#Add day of week column with new data
data.insert (1, "Weekday"[dp_weekday])
pandas read csv has no parse_cols keyword (see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
Did you mean to use parse_dates=['Date'] ?
I haven't been able to find a solution in similar questions yet so I'll have to give it a go here.
I am importing a csv file looking like this in notepad:
",""ItemName"""
"Time,""Raw Values"""
"7/19/2019 10:31:29 PM,"" 0"","
"7/19/2019 10:32:01 PM,"" 1"","
What I want when I save it as a new csv, is to reformat the date/time and the corresponding value to this (required by analysis software): The semicolon as separator and in the end is important, and I don't really need a header.
2019-07-19 22:31:29;0;
2019-07-19 22:32:01;1;
This is what it looks like in Python:
Item1 = pd.read_csv(r'.\Datafiles\ItemName.csv')
Item1
#Output:
# ,"ItemName"
# 0 Time,"Raw Values"
# 1 7/19/2019 10:31:29 AM," 0",
# 2 7/19/2019 10:32:01 AM," 1",
valve_G1.dtypes
# ,"ItemName" object
# dtype: object
I have tried using datetime without any luck but there might be something fishy with the datatypes that I am not aware of.
What you want in principle is read to DataFrame, convert datetime column and export df to csv again. I think you will need to get rid of the quote-chars to get the import correct. You can do so by reading the file content to a string, replace the '"', and feed that string to pandas.read_csv. EX:
import os
from io import StringIO
import pandas as pd
# this is just to give an example:
s='''",""ItemName"""
"Time,""Raw Values"""
"7/19/2019 10:31:29 PM,"" 0"","
"7/19/2019 10:32:01 PM,"" 1"","'''
f = StringIO(s)
# in your script, make f a file pointer instead, e.g.
# with open('path_to_input.csv', 'r') as f:
# now get rid of the "
csvcontent = ''
for row in f:
csvcontent += row.replace('"', '')
# read to DataFrame
df = pd.read_csv(StringIO(csvcontent), sep=',', skiprows=1, index_col=False)
df['Time'] = pd.to_datetime(df['Time'])
# save cleaned output as ;-separated csv
dst = 'path_where_to_save.csv'
df.to_csv(dst, index=False, sep=';', line_terminator=';'+os.linesep)
I was looking on Stackoverflow for this thing but I didn't find exactly what I wanted. I would like to open csv file on Python and add new column with header "Date" and until end of the file add today's date. How can I do it? I was trying to do it with pandas but I only know how to append to the end.
I was trying to do this that way with package csv:
x=open(outfile_name1)
y=csv.reader(x)
z=[]
for row in y:
z.append(['0'] + row)
Instead of ['0'] I wanted to put today's date. Can I then convert this list to csv with pandas or something? Thanks in advance for help!
Try this:
import pandas as pd
import datetime
df = pd.read_csv("my.csv")
df.insert(0, 'Date', datetime.datetime.today().strftime('%Y-%m-%d'))
df.to_csv("my_withDate.csv", index=False)
PS: Read the docs
Is this what you are looking for?
import pandas as pd
import datetime
df = pd.read_csv("file.csv")
df['Date'] = datetime.datetime.today().strftime('%Y-%m-%d')
df.to_csv("new_file.csv", index=False)
As far as I undestand ultimate goal is to write data to csv. One option to do that is to open first file for reading data, second for writing data then write header row into new file prepending it with column name 'Date,' and then iterate over data rows prepending them with date (requires 3.6 <= Python as uses f-strings):
import datetime
with open('columns.csv', 'r') as out, open('out.csv', 'w') as into:
headers = 'Date,' + next(out)
print(headers, end='', file=into)
for row in out:
print(f'{datetime.datetime.today().date()}, {row}', end='', file=into)
How to write dates of dataframe in a file.
import csv
import pandas as pd
writeFile = open("dates.csv","w+")
writer = csv.writer(writeFile)
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
Convert2List = dates.values.tolist()
for row in Convert2List:
writer.writerow(row)
writeFile.close()
My actual values are:
1.54699E+18
1.54708E+18
1.54716E+18
1.54725E+18
1.54734E+18
And the expected values should be:
01-09-2019
02-09-2019
03-09-2019
If you have a pandas dataframe you can just use the method pandas.DataFrame.to_csv and set the parameters (link to documentation).
Pandas has a write to file function build-in. Try:
import pandas as pd
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
#print (dates) # check here if the dates is written correctly.
dates.to_csv('dates.csv') # writes the dataframe directly to a file.
The date.csv file gives me:
,0
0,2019-01-09
1,2019-01-10
2,2019-01-11
3,2019-01-12
...snippet...
262,2019-09-28
263,2019-09-29
264,2019-09-30
Changing date order to get date range September for default settings:
dates = pd.DataFrame(pd.date_range(start = '2019-09-01', end = '2019-09-30'))
Gives:
0_29 entries for 30 days of September.
Furthermore, changing the date order for custom settings:
dates[0] = pd.to_datetime(dates[0]).apply(lambda x:x.strftime('%d-%m-%Y'))
Gives you:
01-09-2019
02-09-2019
03-09-2019
...etc.
I want to export a dataframe to csv. But on top of it, I would like to print the date of the dataframe to produce the following result in the csv file. How can I join the string sentence to the dataframe so that I can export it together to csv?
import pandas as pd
import datetime as dt
today1=dt.datetime.today().strftime('%Y%m%d')
print('This dataframe is created on ',today1)
df=pd.DataFrame({'A':[1,2],'B':[3,4]})
print(df)
df.to_csv('temp.csv')
pd.to_csv accepts a filehandle as input. So write your first line, then call to_csv with the same handle:
import pandas as pd
import datetime as dt
today1=dt.datetime.today().strftime('%Y%m%d')
df=pd.DataFrame({'A':[1,2],'B':[3,4]})
with open("temp.csv","w") as f:
f.write('This dataframe is created on {}\n'.format(today1))
df.to_csv(f)
when you read the data back just do the same with pd.read_csv():
with open("temp.csv","r") as f:
date_line = next(f)
df = pd.read_csv(f)
Just remove the to_csv line in your code, then run it in a terminal window as below:
python code.py >> temp.csv
Your print instructions will be printed in temp.csv. The output file is:
('This dataframe is created on ', '20161220')
A B
0 1 3
1 2 4
Not sure if it works in every OS though.