Separating Dataframes in Python to train, test, and graph the data - python

I am trying to graph the date on the x-axis and the opening stock price on the y-axis of one stock specifically and then I would like to train, test, split the data, but I need the data separated from this huge dataframe first.
Reference to my problem
And now I am trying to change "st = stock_final.query("Name == 'AMZN'")" to instead check my user argument string called 'ticker' but I do not know how to implement to check the Name ticker with this query function check that we made? Any advice?

I assume the date column is the index, if that not the case, I recommend to make the date column the index.
So you can perform some 'operations' on the dataframe to get a new dataframe which only will have the information you need.
Since you only want a single stock, you can use the dataframe function query to select the stock you want based on his name in the 'Name' column and then you select the columns you want based on their name, for example
df = df.query("Name == 'AAIT'")
df = df[['Open', 'Name']]
Or if you don't need the Name column anymore in the dataframe
df = df['Open']
And this new dataframe will have the date in the index and the open value based on the stock you select, now you can graph this easily
Here is the link to the query function in pandas https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html?highlight=query#pandas.DataFrame.query

Related

Add Row to Dataframe With Streamlit

I have a data frame with my stock portfolio. I want to be able to add a stock to my portfolio on streamlit. I have text input in my sidebar where I input the Ticker, etc. When I try to add the Ticker at the end of my dataframe, it does not work.
ticker_add = st.sidebar.text_input("Ticker")
df['Ticker'][len(df)+1] = ticker_add
The code [len(df)+1] does not work. When I try to do [len(df)-1], it works but I want to add it to the end of the dataframe, not replace the last stock. It seems like it can't add a row to the dataframe.
Solution
You MUST first check the type of ticker_add.
type(ticker_add)
Adding new row to a dataframe
Assuming your ticker_add is a dictionary with the column names of the dataframe df as the keys, you can do this:
df.append(pd.DataFrame(ticker_add))
Assuming it is a single non-array-like input, you can do this:
# adds a new row for a single column ("Ticker") dataframe
df = df.append({'Ticker': ticker_add}, ignore_index=True)
References
Add one row to pandas DataFrame
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

I have made date as index of a dataframe in pandas. How can I search rows for a particular date?

I have a data frame named inject. I have made a column name date as the index of the data frame inject. I want to find the rows corresponding to a particular date. The data type of column date is datetime.
inject_2017["2017-04-20"]
Writing this code throwing me an error.
Try inject_2017.loc["2017-04-20"]
This way you can select the row (or group of rows) with the corresponding datetime index.

Select DataFrame rows of a specific day

I have a DataFrame with a date_time column. The date_time column contains a date and time. I also managed to convert the column to a datetime object.
I want to create a new DataFrame containing all the rows of a specific DAY.
I managed to do it when I set the date column as the index and used the "loc" method.
Is there a way to do it even if the date column is not set as the index? I only found a method which returns the rows between two days.
You can use groupby() function. Let's say your dataframe is df,
df_group = df.groupby('Date') # assuming the column containing dates is called Date.
Now you can access rows of any date by passing the date in the get_group function,
df_group.get_group('date_here')

Cryptocurrency correlation in python, working with dictionaries

I'm working with a crypto-currency data sample, each cell contains a dictionary. The dictionary containing the open price, close price, highest price, lowest price, volume and market cap. The columns are the corresponding dates and the index is the name of each cryptocurrency.
I don't know how to prepare the data in order for me to find the correlation between different currencies and between highest price and volume for example. How can this be done in python (pandas)...also how would I define a date range in such a situation?
Here's a link to the data sample, my coding and a printout of the data (Access is OPEN TO PUBLIC): https://drive.google.com/open?id=1mjgq0lEf46OmF4zK8sboXylleNs0zx7I
To begin with, I would suggest rearranging your data so that each currency's OHLCV values are their own columns (e.g. "btc_open | btc_high" etc.). This makes generating correlation matrices far easier. I'd also suggest beginning with only one metric (e.g. close price) and perhaps period movement (e.g. close-open) in your analysis. To answer your question:
Pandas can return a correlation matrix of all columns with:
df.corr()
If you want to use only specific columns, select those from the DataFrame:
df[["col1", "col2"]].corr()
You can return a single correlation value between two columns with the form:
df["col1"].corr(df["col2"])
If you'd like to specify a specific date range, I'd refer you to this question. I believe this will require your date column or index to be of the type datetime. If you don't know how to work with or convert to this type, I would suggest consulting the pandas documentation (perhaps begin with pandas.to_datetime).
In future, I would suggest including a data snippet in your post. I don't believe Google Drive is an appropriate form to share data, and it definitely is not appropriate to set the data to "request access".
EDIT: I checked your data and created a smaller subset to test this method on. If there are imperfections in the data you may find problems, but I had none when I tested it on a sample of your first 100 days and 10 coins (after transposing, df.iloc[:100, :10].
Firstly, transpose the DataFrame so columns are organised by coin and rows are dates.
df = df.T
Following this, we concatenate to a new DataFrame (result). Alternatively, concatenate to the original and drop columns after. Unfortunately I can't think of a non-iterative method. This method goes column by column, creates a DataFrame for each coins, adds the coin name prefix to the column names, then concatenates each DataFrame to the end.
result = pd.DataFrame()
coins = df.columns.tolist()
for coin in coins:
coin_data = df[coin]
split_coin = coin_data.apply(pd.Series).add_prefix(coin+"_")
result = pd.concat([result, split_coin], axis=1)

appending values to a new dataframe and making one of the datatypes the index

I have a new Data-frame df. Which was created using:
df= pd.DataFrame()
I have a date value called 'day' which is in format dd-mm-yyyy and a cost value called 'cost'.
How can I append the date and cost values to the df and assign the date as the index?
So for example if I have the following values
day = 01-01-2001
cost = 123.12
the resulting df would look like
date cost
01-01-2001 123.12
I will eventually be adding paired values for multiple days, so the df will eventually look something like:
date cost
01-01-2001 123.12
02-01-2001 23.25
03-01-2001 124.23
: :
01-07-2016 2.214
I have tried to append the paired values to the data frame but am unsure of the syntax. I've tried various thinks including the below but without success.
df.append([day,cost], columns='date,cost',index_col=[0])
There are a few things here. First, making a column the index goes like this, though you can also do it when you load the dataframe from a file (see below):
df.set_index('date', inplace=True)
To add new rows, you should write them out to file first. Pandas isn't great at adding rows dynamically, and this way you can just read the data in when you need it for analysis.
new_row = ... #a row of new data in string format with values
#separated by commas and ending with \n
with open(path, 'a') as f:
f.write(new_row)
You can do this in a loop, or singly, as many time as you need. Then when you're ready to work with it, you use:
df = pd.read_csv(path, index_col=0, parse_dates=True)
index_col can't take a string name for the index column, so you have to use the index of the order on disk; in my case it makes the first column the index. Passing parse_dates=True will make it turn your datetime strings that you declared as the index into datetime objects.
Try this:
dfapp = [day,cost]
df.append(dfapp)

Categories