Add Row to Dataframe With Streamlit - python

I have a data frame with my stock portfolio. I want to be able to add a stock to my portfolio on streamlit. I have text input in my sidebar where I input the Ticker, etc. When I try to add the Ticker at the end of my dataframe, it does not work.
ticker_add = st.sidebar.text_input("Ticker")
df['Ticker'][len(df)+1] = ticker_add
The code [len(df)+1] does not work. When I try to do [len(df)-1], it works but I want to add it to the end of the dataframe, not replace the last stock. It seems like it can't add a row to the dataframe.

Solution
You MUST first check the type of ticker_add.
type(ticker_add)
Adding new row to a dataframe
Assuming your ticker_add is a dictionary with the column names of the dataframe df as the keys, you can do this:
df.append(pd.DataFrame(ticker_add))
Assuming it is a single non-array-like input, you can do this:
# adds a new row for a single column ("Ticker") dataframe
df = df.append({'Ticker': ticker_add}, ignore_index=True)
References
Add one row to pandas DataFrame
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Related

Separating Dataframes in Python to train, test, and graph the data

I am trying to graph the date on the x-axis and the opening stock price on the y-axis of one stock specifically and then I would like to train, test, split the data, but I need the data separated from this huge dataframe first.
Reference to my problem
And now I am trying to change "st = stock_final.query("Name == 'AMZN'")" to instead check my user argument string called 'ticker' but I do not know how to implement to check the Name ticker with this query function check that we made? Any advice?
I assume the date column is the index, if that not the case, I recommend to make the date column the index.
So you can perform some 'operations' on the dataframe to get a new dataframe which only will have the information you need.
Since you only want a single stock, you can use the dataframe function query to select the stock you want based on his name in the 'Name' column and then you select the columns you want based on their name, for example
df = df.query("Name == 'AAIT'")
df = df[['Open', 'Name']]
Or if you don't need the Name column anymore in the dataframe
df = df['Open']
And this new dataframe will have the date in the index and the open value based on the stock you select, now you can graph this easily
Here is the link to the query function in pandas https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html?highlight=query#pandas.DataFrame.query

Iteratively add columns to the same dataframe

for date in list_of_dates:
df = *dataframe with identifiers for rows and dates for columns*
new_column = *a new column with a new date to be added to the df*
df_incl_new_column = *original df merged with new column*
I want to use the 'df_incl_new_column' at the start of the loop as the new 'df' and keep iteratively adding new columns for each date, and using the dataframe with the new column as the start 'df' again and again.
I want to do this for a list of over 20 dates to build a new dataframe with all the new columns.
Each new column has data which changes depending on the previous new column having been added to the df.
What is the best way to do this?
It may be that a for loop is not appropriate but i need to build a dataframe gradually using the latest data in the dataframe to add the next column.
You should try this:
df = *dataframe with identifiers for rows and dates for columns*
for date in list_of_dates:
df['New Column Name'] = *a new column with a new date to be added to the df*

How to feed new columns every time in a loop to a spark dataframe?

I have a task of reading each columns of Cassandra table into a dataframe to perform some operations. Here I want to feed the data like if 5 columns are there in a table I want:-
first column in the first iteration
first and second column in the second iteration to the same dataframe
and likewise.
I need a generic code. Has anyone tried similar to this? Please help me out with an example.
This will work:
df2 = pd.DataFrame()
for i in range(len(df.columns)):
df2 = df2.append(df.iloc[:,0:i+1],sort = True)
Since, the same column name is getting repeated, obviously df will not have same column name twice and hence it will keep on adding rows
You can extract the names from dataframe's schema and then access that particular column and use it the way you want to.
names = df.schema.names
columns = []
for name in names:
columns.append(name)
//df[columns] use it the way you want

appending values to a new dataframe and making one of the datatypes the index

I have a new Data-frame df. Which was created using:
df= pd.DataFrame()
I have a date value called 'day' which is in format dd-mm-yyyy and a cost value called 'cost'.
How can I append the date and cost values to the df and assign the date as the index?
So for example if I have the following values
day = 01-01-2001
cost = 123.12
the resulting df would look like
date cost
01-01-2001 123.12
I will eventually be adding paired values for multiple days, so the df will eventually look something like:
date cost
01-01-2001 123.12
02-01-2001 23.25
03-01-2001 124.23
: :
01-07-2016 2.214
I have tried to append the paired values to the data frame but am unsure of the syntax. I've tried various thinks including the below but without success.
df.append([day,cost], columns='date,cost',index_col=[0])
There are a few things here. First, making a column the index goes like this, though you can also do it when you load the dataframe from a file (see below):
df.set_index('date', inplace=True)
To add new rows, you should write them out to file first. Pandas isn't great at adding rows dynamically, and this way you can just read the data in when you need it for analysis.
new_row = ... #a row of new data in string format with values
#separated by commas and ending with \n
with open(path, 'a') as f:
f.write(new_row)
You can do this in a loop, or singly, as many time as you need. Then when you're ready to work with it, you use:
df = pd.read_csv(path, index_col=0, parse_dates=True)
index_col can't take a string name for the index column, so you have to use the index of the order on disk; in my case it makes the first column the index. Passing parse_dates=True will make it turn your datetime strings that you declared as the index into datetime objects.
Try this:
dfapp = [day,cost]
df.append(dfapp)

adding a first difference column to a pandas dataframe

I have a dataframe df with two columns date and data. I want to take the first difference of the data column and add it as a new column.
It seems that df.set_index('date').shift() or df.set_index('date').diff() give me the desired result. However, when I try to add it as a new column, I get NaN for all the rows.
How can I fix this command:
df['firstdiff'] = df.set_index('date').shift()
to make it work?

Categories