How to access date row in Pandas' DataReader? - python

I'm using Python's Pandas to get some finance data from Yahoo Finance, which is a builtin functionality of pandas_datareader. I wish to access the dates it prints out but it doesn't seem to be in the columns nor in the json I require but is there when I print out the object:
from pandas_datareader.data import DataReader
import datetime
start = datetime.datetime(2015, 6, 1)
goog = DataReader('GOOGL', 'yahoo', start)
goog.to_json(None, orient="records") # this doesn't include dates!
print(goog) # this prints out the dates along with the other data!
print(goog.columns) # prints every column except the date column
How can I include the dates along with the other data in the json string?

list(goog.index) gives you the dates as a list of timestamps.
For getting the dates in the json, I had a quick look at the docs. Try this:
print goog.to_json(orient="index", date_format='iso')

Pandas dataframes have an index that is used for grouping and fast lookups. The index is not considered one of the columns. To move the dates from the index to the columns, you can use reset_index(), which will make Date a column and make the index just a sequence of numbers counting up from 0.
So to export as JSON with the dates included:
goog.reset_index().to_json(None, orient='records', date_format='iso')

Related

Python how to select specific cells on excel with pandas

I have an excel here as shown in this picture:
I am using pandas to read my excel file and it is working fine, this code below can print all the data in my excel:
import pandas as pd
df = pd.read_csv('alpha.csv')
print(df)
I want to get the values from C2 cell to H9 cell which month is October and day is Monday only. And I want to store these values in my python array below:
mynumbers= []
but I am not sure how should I do it, can you please help me?
You should consider slicing your dataframe and then using .values to story them. If you want them as a list, then you can use to_list():
First transform the Date column to a datetime:
df['Date'] = pd.to_datetime(df['Date'],dayfirst=True,infer_datetime_format=True)
Then, slice and return the values for the Column Number 2
mynumbers = df[(df['Date'].dt.month == 10) & \
(df['Date'].dt.weekday == 0)]['Column 2'].values.tolist()
Assigning the following values to mynumbers:
[11,8]
A first step would be to convert your Date column to datetime objects
import datetime
myDate = "10-11-22"
myDate = datetime.datetime.strptime(myDate, '%d-%m-%y')
Then using myDate.month and myDate.weekday() you can select for mondays in October

Sort pandas dataframe by date or column

The solutions I have found in a similar question are not working for me. I have a pandas DataFrame including mock sales data. I want to sort by date since they are currently out of order. I have tried converting to a datetime object. I also tried creating a Month and Day column and sorting by them but that did not work either. Date is in YYYY-MM-DD format
Here is my solution:
import pandas as pd
import datetime
data = pd.read_csv(path)
# sort by date (not working)
data['OrderDate'] = pd.to_datetime(data['OrderDate'])
data.sort_values(by='OrderDate')
data.reset_index(inplace=True)
# sort by month then day (not working)
data.sort_values(by='Month')
data.sort_values(by='Day')
data.reset_index(inplace=True)
# export csv
data.to_csv(fileName, index=False)

How to find missing dates in an excel file by python

I'm a beginner in python. I have an excel file. This file shows the rainfall amount between 2016-1-1 and 2020-6-30. It has 2 columns. The first column is date, another column is rainfall. Some dates are missed in the file (The rainfall didn't estimate). For example there isn't a row for 2016-05-05 in my file. This a sample of my excel file.
Date rainfall (mm)
1/1/2016 10
1/2/2016 5
.
.
.
12/30/2020 0
I want to find the missing dates but my code doesn't work correctly!
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import dates as mpl_dates
from matplotlib.dates import date2num
df=pd.read_excel ('rainfall.xlsx')
a= pd.date_range(start = '2016-01-01', end = '2020-06-30' ).difference(df.index)
print(a)
Here' a beginner friendly way of doing it.
First you need to make sure, that the Date in your dataframe is really a date and not a string or object.
Type (or print) df.info().
The date column should show up as datetime64[ns]
If not, df['Date'] = pd.to_datetime(df['Date'], dayfirst=False)fixes that. (Use dayfirst to tell if the month is first or the day is first in your date string because Pandas doesn't know. Month first is the default, if you forget, so it would work without...)
For the tasks of finding missing days, there's many ways to solve it. Here's one.
Turn all dates into a series
all_dates = pd.Series(pd.date_range(start = '2016-01-01', end = '2020-06-30' ))
Then print all dates from that series which are not in your dataframe "Date" column. The ~ sign means "not".
print(all_dates[~all_dates.isin(df['Date'])])
Try:
df = pd.read_excel('rainfall.xlsx', usecols=[0])
a = pd.date_range(start = '2016-01-01', end = '2020-06-30').difference([l[0] for l in df.values])
print(a)
And the date in the file must like 2016/1/1
To find the missing dates from a list, you can apply Conditional Formatting function in Excel. 4. Click OK > OK, then the position of the missing dates are highlighted. Note: The last date in the date list will be highlighted.
this TRICK Is not with python,a NORMAL Trick

Change date '01-Sept-20' to '01-Sep-20' using pandas dataframe

I have a huge .csv file with date as one of the column and I'm trying to plot it on a graph but I'm getting this error
"time data '01-Sept-20' does not match format '%d-%b-%y' (match)"
I'm using this line of code to convert it into datetime format
df['Date'] = pd.to_datetime(df['Date'], format="%d-%b-%y")
I think this error is because 'Sept' should be 'Sep'
What can I do to make Sept to Sep?
I'm using this dataset: covid19 api
As #Mayank pointed out in the comment you could replace the "Sept" string. And it works.
However, in your dataset is a column named Date_YMD which will give you the date without string replacement.
A complete example:
import pandas as pd
df = pd.read_csv('covid.csv')
df['Date_YMD'] = pd.to_datetime(df['Date_YMD'])
df['Date'] = pd.to_datetime(df['Date'].str.replace('Sept', 'Sep'), format='%d-%b-%y')
I think the main point here is to familiarize yourself with the data before searching for a technical solution.

Break-up year, months & days in Pandas

I have a input parameter dictionary as below -
InparamDict = {'DataInputDate':'2014-10-25'
}
Using the field InparamDict['DataInputDate'], I want to pull up data from 2013-10-01 till 2013-10-25. What would be the best way to arrive at the same using Pandas?
The sql equivalent is -
DATEFROMPARTS(DATEPART(year,GETDATE())-1,DATEPART(month,GETDATE()),'01')
You forgot to mention if you're trying to pull up data from a DataFrame, Series or what. If you just want to get the date parts, you just have to get the attribute you want from the Timestamp object.
from pandas import Timestamp
dt = Timestamp(InparamDict['DataInputDate'])
dt.year, dt.month, dt.day
If the dates are in a DataFrame (df) and you convert them to dates instead of strings. You can select the data by ranges as well, for instance
df[df['DataInputDate'] > datetime(2013,10,1)]

Categories