Parse Dates from column within Pandas dataframe - python

I'm trying to parse the date from the column and add a column with the weekday. However, I am very new to python and programming in general. The
date_parse = pd.read_csv(input_file, parse_cols = "Date")
gives me an error. My goal was to parse the data from the column "Date" of the pandas dataset and and save it as date_parse, so I could then convert it with date.weekday. It didn't like the parse_cols. I'm not sure if I'm even closely doing this correctly or if I'm way off base. I've been working on this for about a week and no cookie. So I broke down and decided to ask here. I suspect that pd.read is maybe for Excel only? Any tips?
Error:
Exception has occurred: TypeError
read_csv() got an unexpected keyword argument 'parse_cols'
File "input_file", line 21, in <module>
date_parse = pd.read_csv(input_file, parse_cols = "Date")
import yfinance as yf
import pandas as pd
import csv
import datetime as dt
import calendar
#Added input_file to make is easier than copying the file path everywhere
input_file = "BrkHist.csv"
#Pulling information from yahoo
brk = yf.Ticker("brk-B")
brk.info
Hist_data = brk.history(period="2y")
Hist_data.to_csv(input_file)
#Change csv file to pandas
data = pd.read_csv(BrkHist.csv)
#Select which columns I want to see on the dataframe
brkp = pd.DataFrame(data, columns= ['Date','Weekday','Open','Close'])
#Parse data from date column
date_parse = pd.read_csv(input_file, parse_cols = "Date")
print(date_parse)
#Add Weekday column to hold days of the week data in integer form
dp_weekday = date(date_parse).weekday()
#Convert weekday column data into string value
days = {0:'Monday',1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Saturday',6:'Sunday'}
#Add day of week column with new data
data.insert (1, "Weekday"[dp_weekday])

pandas read csv has no parse_cols keyword (see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
Did you mean to use parse_dates=['Date'] ?

Related

Cannot get a file to be read into a list of stock tickers and then get yfinance data for each

I am trying to read a csv file into a dataframe and then iterate over each ticker to get some yahoo finance data, but I struggle with matching the right data type read from the CSV file. The problem is that yfinance needs a str for the ticker parameter data = yf.download("AAPL', start ="2019-01-01", end="2022-04-20")
and I cannot convert the df row item into str.
This is my code:
combined = yf.download("SPY", start ="2019-01-01", end="2019-01-02")
for index, row in stockslist.iterrows():
data = yf.download([index,row["ticker"].to_string], start ="2019-01-01", end="2022-04-20")
and this is the csv file
The question is basically about this part of the code " [index,row["ticker"].to_string] " . I cannot get to pass each row of the dataframe as a ticker argument to finance.
The error I get is "TypeError: expected string or bytes-like object "
The download function doesn't understand [index,row["ticker"].to_string] parameter. Like where does it come from ?
you have to give the context of it. Like building an array with the values from the CSV then you pass the array[i].value to the download function.
A quick example with fictional code :
#initiate the array with the ticker list
array_ticker = ['APPL''MSFT''...']
#reads array + download
for i=0 in range(array_ticker.size):
data = yf.download(array_ticker[i], start ="2019-01-01", end="2022-04-20")
i=i+1
UPDATE :
If you want to keep the dataframe as you are using now, I just did a simple code to help you to sort your issue :
import pandas as pd
d = {'ticker': ['APPL', 'MSFT', 'GAV']}
ticker_list = pd.DataFrame(data=d) #creating the dataframe
print(ticker_list) #print the whole dataframe
print('--------')
print(ticker_list.iloc[1]['ticker']) #print the 2nd value of the column ticker
Same topic : How to iterate over rows in a DataFrame in Pandas

Filtering a CSV File using two columns

I am a newbie to python. I am working on a CSV file where it has over a million records. In the data, every Location has a unique ID (SiteID). I want to filter for and remove any records where there is no value or mismatch between SiteID and Location in my CSV file. (Note: This script should print the lines number and mismatch field values for each record.)
I have the following code. Please help me out:
import pandas as pd
pd = pandas.read_csv ('car-auction-data-from-ghana', delimiter = ";")
pd.head()
date_time = (pd['Date Time'] >= '2010-01-01T00:00:00+00:00') #to filter from a specific date
comparison_column = pd.where(pd['SiteID'] == pd['Location'], True, False)
comparison_column
This should be your solution:
df = pd.read_csv('car-auction-data-from-ghana', delimiter = ";")
print(df.head())
date_time = (df['Date Time'] >= '2010-01-01T00:00:00+00:00') #to filter from a specific date
df = df[df['SiteID'] == df['Location']]
print(df)
You need to call read_csv as a member of pd because it is the alias to the imported package, and use df as the variable for your data frame. The line with the comparison drops rows in which the boolean is not equal, the two not being equal in this case.

Read a csv file into Pandas dataframe yfinance with a tickercode as filename

I'm trying to read a csv file into a Pandas dataframe with a ticker variable as filename, but I can't find a Function to read the csv file, for the 5x?????. And it is important that it is with a ticker code as a file name, because I have already received several suggestions with pd.read_csv('value.txt'), and that is not what I am looking for. Can anyone help
import pandas as pd
from pandas_datareader import data as pdr
yf.pdr_override
for ticker in tickers:
print(ticker)
df = pdr.?????('daily_stock_dfs/{}.csv'.format(ticker))
df = rsi_calculator(df)
print('END')
read_csv() is what you are looking for.
This will create a dataframe by reading in a csv. The ticker string can be concatenated in many ways.
df = pd.read_csv('daily_stock_dfs/' + ticker + '.csv')
or
df = pd.read_csv('daily_stock_dfs/{}.csv'.format(ticker))

How to convert dataframe containing date into list of list with correct date format and save in csv file

How to write dates of dataframe in a file.
import csv
import pandas as pd
writeFile = open("dates.csv","w+")
writer = csv.writer(writeFile)
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
Convert2List = dates.values.tolist()
for row in Convert2List:
writer.writerow(row)
writeFile.close()
My actual values are:
1.54699E+18
1.54708E+18
1.54716E+18
1.54725E+18
1.54734E+18
And the expected values should be:
01-09-2019
02-09-2019
03-09-2019
If you have a pandas dataframe you can just use the method pandas.DataFrame.to_csv and set the parameters (link to documentation).
Pandas has a write to file function build-in. Try:
import pandas as pd
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
#print (dates) # check here if the dates is written correctly.
dates.to_csv('dates.csv') # writes the dataframe directly to a file.
The date.csv file gives me:
,0
0,2019-01-09
1,2019-01-10
2,2019-01-11
3,2019-01-12
...snippet...
262,2019-09-28
263,2019-09-29
264,2019-09-30
Changing date order to get date range September for default settings:
dates = pd.DataFrame(pd.date_range(start = '2019-09-01', end = '2019-09-30'))
Gives:
0_29 entries for 30 days of September.
Furthermore, changing the date order for custom settings:
dates[0] = pd.to_datetime(dates[0]).apply(lambda x:x.strftime('%d-%m-%Y'))
Gives you:
01-09-2019
02-09-2019
03-09-2019
...etc.

Calculate monthly value from csv file

I have a csv file as follows:
Date,Data
01-01-01,111
02-02-02,222
03-03-03,333
The Date has the following format YEAR-MONTH-DAY. I would like to calculate from these dates the monthly average values of the data (there are way more than 3 dates in my file).
For that I wish to use the following code:
import pandas as pd
import dateutil
import datetime
import os,sys,math,time
from os import path
os.chdir("in/base/dir")
data = pd.DataFrame.from_csv("data.csv")
data['Month'] = pd.DatetimeIndex(data['Date']).month
mean_data = data.groupby('Month').mean()
with open("data_monthly.csv", "w") as f:
print(mean_data, file=f)
For some reason this gives me the error KeyError: 'Date'.
So it seems that the header is not read by pandas. Does anyone know how to fix that?
Your Date column header is read but put into the index. You got to use:
data['Month'] = pd.DatetimeIndex(data.reset_index()['Date']).month
Another solution is to use index_col=None while making the dataframe from csv.
data = pd.DataFrame.from_csv("data.csv", index_col=None)
After which your code would be fine.
The ideal solution would be to use read_csv().
data = pd.read_csv("data.csv")
Use the read_csv method. By Default it is comma separated.
import pandas as pd
df = pd.read_csv(filename)
print(pd.to_datetime(df["Date"]))
Output:
0 2001-01-01
1 2002-02-02
2 2003-03-03
Name: Date, dtype: datetime64[ns]

Categories