Quandl + Python: The date column not "working" - python

I am trying to get some data through the API from quandl but the date column doesn't seem to work the same level as the other columns. E.g. when I use the following code:
data = quandl.get("WIKI/KO", trim_start = "2000-12-12", trim_end =
"2014-12-30", authtoken=quandl.ApiConfig.api_key)
print(data['Open'])
I end up with the below result
Date
2000-12-12 57.69
2000-12-13 57.75
2000-12-14 56.00
2000-12-15 55.00
2000-12-18 54.00
E.g. date appearing along with the 'Open' column. And when I try to directly include Date like this:
print(data[['Open','Date']]),
it says Date doesn't exist as a column. So I have two questions: (1) How do I make Date an actual column and (2) How do I select only the 'Open' column (and thus not the dates).
Thanks in advance

Why print(data['Open']) show dates even though Date is not a column:
quandle.get returns a Pandas DataFrame, whose index is a DatetimeIndex.
Thus, to access the dates you would use data.index instead of data['Date'].
(1) How do I make Date an actual column
If you wish to make the DatetimeIndex into a column, call reset_index:
data = data.reset_index()
print(data[['Open', 'Date']])
(2) How do I select only the 'Open' column (and thus not the dates)
To obtain a NumPy array of values without the index, use data['Open'].values.
(All Pandas Series and DataFrames have Indexs (that's Pandas' raison d'ĂȘtre!), so the only way obtain the values without the index is to convert the Series or DataFrame to a different kind of object, like a NumPy array.)

Related

How to loop through dataframe column and compare dates to current

Hello I have a dataframe containing a date column I would like to loop through these dates and compare it to the current date to see if any entry is today. I tried converting the column to a list using the tolist() method but it outputted not the date but rather "Timestamp('2022-08-02 00:00:00')" however my column only contains dates formatted as %Y-%m-%d as you can see in the image.
dataframe
Assuming that your Dataframe is called df, here's a possible way of solving your issue:
df.loc[df.Date == pd.Timestamp.now().date().strftime('%Y-%m-%d')]
I think it's a straightforward solution, you filter your dataframe by "Date" and compare to the date part of "today's date" while maintaining the correct format of y-m-d.

I have made date as index of a dataframe in pandas. How can I search rows for a particular date?

I have a data frame named inject. I have made a column name date as the index of the data frame inject. I want to find the rows corresponding to a particular date. The data type of column date is datetime.
inject_2017["2017-04-20"]
Writing this code throwing me an error.
Try inject_2017.loc["2017-04-20"]
This way you can select the row (or group of rows) with the corresponding datetime index.

Select DataFrame rows of a specific day

I have a DataFrame with a date_time column. The date_time column contains a date and time. I also managed to convert the column to a datetime object.
I want to create a new DataFrame containing all the rows of a specific DAY.
I managed to do it when I set the date column as the index and used the "loc" method.
Is there a way to do it even if the date column is not set as the index? I only found a method which returns the rows between two days.
You can use groupby() function. Let's say your dataframe is df,
df_group = df.groupby('Date') # assuming the column containing dates is called Date.
Now you can access rows of any date by passing the date in the get_group function,
df_group.get_group('date_here')

appending values to a new dataframe and making one of the datatypes the index

I have a new Data-frame df. Which was created using:
df= pd.DataFrame()
I have a date value called 'day' which is in format dd-mm-yyyy and a cost value called 'cost'.
How can I append the date and cost values to the df and assign the date as the index?
So for example if I have the following values
day = 01-01-2001
cost = 123.12
the resulting df would look like
date cost
01-01-2001 123.12
I will eventually be adding paired values for multiple days, so the df will eventually look something like:
date cost
01-01-2001 123.12
02-01-2001 23.25
03-01-2001 124.23
: :
01-07-2016 2.214
I have tried to append the paired values to the data frame but am unsure of the syntax. I've tried various thinks including the below but without success.
df.append([day,cost], columns='date,cost',index_col=[0])
There are a few things here. First, making a column the index goes like this, though you can also do it when you load the dataframe from a file (see below):
df.set_index('date', inplace=True)
To add new rows, you should write them out to file first. Pandas isn't great at adding rows dynamically, and this way you can just read the data in when you need it for analysis.
new_row = ... #a row of new data in string format with values
#separated by commas and ending with \n
with open(path, 'a') as f:
f.write(new_row)
You can do this in a loop, or singly, as many time as you need. Then when you're ready to work with it, you use:
df = pd.read_csv(path, index_col=0, parse_dates=True)
index_col can't take a string name for the index column, so you have to use the index of the order on disk; in my case it makes the first column the index. Passing parse_dates=True will make it turn your datetime strings that you declared as the index into datetime objects.
Try this:
dfapp = [day,cost]
df.append(dfapp)

'combine first' in pandas produces NA error

I have two dataframes, each with a series of dates as the index. The dates to not overlap (in other words one date range from, say, 2013-01-01 through 2016-06-15 by month and the second DataFrame will start on 2016-06-15 and run quarterly through 2035-06-15.
Most of the column names overlap (i.e. are the same) and the join just does fine. However, there is one columns in each DataFrame that I would like to preserve as 'belonging' to the original DataFrame so that I have them both available for future use. I gave each a different name. For example, DF1 has a column entitled opselapsed_time and DF2 has a column entitled constructionelapsed_time.
When I try to combine DF1 and DF2 together using the command DF1.combine_first(DF2) or vice versa I get this error: ValueError: Cannot convert NA to integer.
Could someone please give me advice on how best to resolve?
Do I need to just stick with using a merge/join type solution instead of combine_first?
Found the best solution:
pd.tools.merge.concat([test.construction,test.ops],join='outer')
Joins along the date index and keeps the different columns. To the extent the column names are the same, it will join 'inner' or 'outer' as specified.

Categories