Here is my current code:
all_teams = np.unique(march['PitcherTeam'])
all_pitchers = np.unique(march['Pitcher'])
all_pitches = np.unique(march['TaggedPitchType'])
_ = widgets.interact(pitch_chart, df=march, team=list(all_teams), pitcher = list(all_pitchers), pitch_type = list(all_pitches))
It currently outputs this:
pitch chart
I want it to be able to eliminate future drop down choices based on choices they have already made. For example, when they select ORE_DUC as their team, I want it to only present pitchers that play for ORE_DUC in the next line, and then based on the pitcher they choose, only present the pitches they pitch.
Related
Can anyone help me figure out how to get my y axis sorted by a field in my dataset?
See code below. I want to sort the y axis which is a string concat field of ADP (a decimal number) and a NFL player name. I want to sort this y axis by the ADP which is a field called "OWNER Player ADP" which i cast as a float once it goes into the pandas dataframe. (I wanted it to be a number with a decimal point)
I also created a field called ADP which is pretty much the same thing but it is a varchar when it enters the pandas dataframe. In either case, I can't seem to get the graph to sort the y axis on either of these two variations of fields... I'm also attaching two screenshots of the current output so you can see the data output and chart output. You can see that Aaron Rodgers is at the top of the list, however he has an ADP of 48.3 . I want the player with the lowest number ADP to be at the top of the list...
import altair as alt
import pandas as pd
from main.views import sql_to_dataframe
#--get draft history for a specifc owner, league, and draft type
query ="""
SELECT draft_type,season,"Player Name","Player Team"
,count(*) "Times Drafted"
,cast(round(cast(SUM(pick_no) AS DECIMAL)/cast(COUNT(DISTINCT draft_id) AS DECIMAL),1) as varchar) "OWNER Player ADP"
,cast(round(cast(SUM(pick_no) AS DECIMAL)/cast(COUNT(DISTINCT draft_id) AS DECIMAL),1) as varchar) "ADP"
,concat(cast(round(cast(SUM(pick_no) AS DECIMAL)/cast(COUNT(DISTINCT draft_id) AS DECIMAL),1) as varchar),' ',"Player Team") "Player ADP"
,1 "x_axis"
FROM
mytable
GROUP BY draft_type,season,draft_type,"Player Name","Player Team"
"""
source = sql_to_dataframe(query)
source['OWNER Player ADP'] = source['OWNER Player ADP'].astype(float)
print(source.head())
base=alt.Chart(
source,
title="Player Average Draft Position"
).encode(
x=alt.X('x_axis:O')
,y=alt.Y('Player ADP:O',sort=alt.EncodingSortField(field="OWNER Player ADP:O",order ='descending', op='min'))
#,tooltip=['Player Team','OWNER Player ADP:O']
)
bar=base.mark_square(size=300).encode(
color=alt.Color('Times Drafted:Q', scale=alt.Scale(scheme="darkred"))#,domain=[5,0])
,tooltip=['Player Team','OWNER Player ADP:O','Times Drafted:N']
)
# Configure text
text = base.mark_text(baseline='middle',color='white').encode(
text='Times Drafted:O'
,tooltip=['Player Team','OWNER Player ADP:O','Times Drafted:N']
)
(bar+text).properties(width=50)#.interactive()
alt.EncodingSortField does not require type codes, and does not parse them out of the input. Instead of
alt.EncodingSortField(field="OWNER Player ADP:O", ...)
use
alt.EncodingSortField(field="OWNER Player ADP", ...)
I'm trying to match data to data in a dataframe. The way I'm currently attempting to is not working. After some research, I believe I'm only choosing either or, not and. I have transactions I want to match the opening and closing, and disregard the rest. The results still show unclosed transactions.
Code:
# import important stuffs
import pandas as pd
# open file and sort through options only and pair opens to closes
with open('TastyTrades.csv'):
trade_reader = pd.read_csv('TastyTrades.csv') # create reader
options_frame = trade_reader.loc[(trade_reader['Instrument Type'] == 'Equity Option')] # sort for options only
BTO = options_frame[options_frame['Action'].isin(['BUY_TO_OPEN', 'SELL_TO_CLOSE'])] # look for BTO/STC
STO = options_frame[options_frame['Action'].isin(['SELL_TO_OPEN', 'BUY_TO_CLOSE'])] # look for STO/BTC
paired_frame = [BTO, STO] # combine
results = pd.concat(paired_frame) # concat
results_sorted = results.sort_values(by=['Symbol', 'Call or Put', 'Date'], ascending=True) # sort by symbol
results_sorted.to_csv('new_taste.csv') # write new list
Results:
310,2019-12-19T15:47:24-0500,Trade,SELL_TO_OPEN,APA 200117P00020000,Equity Option,Sold 1 APA 01/17/20 Put 20.00 # 0.33,33,1,33.0,-1.0,-0.15,100.0,APA,1/17/2020,20.0,PUT
296,2019-12-31T09:30:07-0500,Trade,BUY_TO_CLOSE,APA 200117P00020000,Equity Option,Bought 1 APA 01/17/20 Put 20.00 # 0.08,-8,1,-8.0,0.0,-0.14,100.0,APA,1/17/2020,20.0,PUT
8,2020-02-14T12:19:30-0500,Trade,BUY_TO_OPEN,AXAS 200918C00002500,Equity Option,Bought 2 AXAS 09/18/20 Call 2.50 # 0.05,-10,2,-5.0,-2.0,-0.28,100.0,AXAS,9/18/2020,2.5,CALL
172,2020-01-28T10:05:14-0500,Trade,SELL_TO_OPEN,BAC 200320C00033000
As you can see here, I have one full transaction: APA, one half of a transaction: AXAS, and the first half of a full transaction: BAC. I don't want to see AXAS in there. AXAS and the others keep popping up no matter how many times I try to get rid of them.
Right now you're just selecting for all opens and all closes, and then stacking them; there's no actual pairing going on. If I'm understanding you correctly, you only want to include transactions that have both an Open and a Close in the dataset? If that's the case, I'd suggest finding the set intersection of the transaction IDs, and using that to select the paired transactions. It'd look something like the code below, assuming that the fifth column in your data (e.g. "APA 200117P00020000") is the TransactionID.
import pandas as pd
trade_reader = pd.read_csv('TastyTrades.csv')
options_frame = trade_reader.loc[
(trade_reader['Instrument Type'] == 'Equity Option')
] # sort for options only
opens = options_frame[
options_frame['Action'].isin(['BUY_TO_OPEN', 'SELL_TO_OPEN'])
] # look for opens
closes = options_frame[
options_frame['Action'].isin(['BUY_TO_CLOSE', 'SELL_TO_CLOSE'])
] # look for closes
# Then create the set intersection of the open and close transaction IDs
paired_ids = set(opens['TransactionID']) & set(closes['TransactionID'])
paired_transactions = options_frame[
options_frame['TransactionID'].isin(paired_ids)
] # And use those to select the paired items
results = paired_transactions.sort_values(
by=['Symbol', 'Call or Put', 'Date'],
ascending=True
) # sort by symbol
results.to_csv('NewTastyTransactions.csv')
Trying to get hv graph with ability to tap edges separately from nodes. In my case - all meaningful data bound to edges.
gNodes = hv.Nodes((nodes_data.x,nodes_data.y, nodes_data.nid, nodes_data.name),\
vdims=['name'])
gGraph = hv.Graph(((edges_data.source, edges_data.target, edges_data.name),gNodes),vdims=['name'])
opts = dict(width=1200,height=800,xaxis=None,yaxis=None,bgcolor='black',show_grid=True)
gEdges = gGraph.edgepaths
tiles = gv.tile_sources.Wikipedia()
(tiles * gGraph.edgepaths * gGraph.nodes.opts(size=12)).opts(**opts)
If I use gGraph.edgepaths * gGraph.nodes - where is no edge information displayed with Hover tool.
Inspection policy 'edges' for hv.Graph is not suitable for my task, because no single edge selection available.
Where did edge label information in edgepaths property gone? How to add it?
Thank you!
I've created separate dataframe for each link, then i grouped it by unique link label, and insert empty row between each group (two rows for edge - source and target), like in this case: Pandas: Inserting an empty row after every 2nd row in a data frame
emty_row = pd.Series(np.NaN,edges_data.columns)
insert_f = lambda d: d.append(emty_row, ignore_index=True)
edges_df = edges_test.groupby(by='name', group_keys=False).apply(insert_f).reset_index(drop=True)
and create hv.EdgesPaths from df:
gPaths2= hv.EdgePaths(edges_df, kdims=['lon_conv_a','lat_conv_a'])
TAP and HOVER works fine for me.
I am writing a bot using gspread and IMDbPy. The script right now takes input(a movie title), it then grabs the movie ID, finds the movie's rating on IMDB.com, then posts the rating onto a spreadsheet into a specific cell.
There is a function named "update_cell" that updates the the specific cell based off the given row and column parameters. Once the bot is complete, I don't want to have to keep going into the code to update the row cell parameter. I want it to update by 1 each time the bot executes.
Is there a way to do this? I'll post the code below:
ia = imdb.IMDb()
def take_input():
fd = open('movielist.txt',"w")
print("Input your movie please: \n")
inp = input()
fd.write(inp)
fd.close()
take_input()
# Wed 8/28/19 - movie_list is a list object. Must set it equal to our ia.search_movies
# Need to find out where to put movie_list = ia.search_movies in the code, and what to
# remove or keep.
a = int(52)
b = int(18)
def Main():
c = """Python Movie Rating Scraper by Nickydimebags"""
print(c)
time.sleep(2)
f1 = open('movielist.txt')
movie_list = []
for i in f1.readlines():
movie_list.append(i)
movie_list = ia.search_movie(i)
movie_id = movie_list[0].movieID
print(movie_id)
m = ia.get_movie(movie_id)
print(m)
rating = m['rating']
print(rating)
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets', "https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("creds.json", scope)
client = gspread.authorize(creds)
sheet = client.open("Movie Fridays").sheet1
sheet.update_cell(a, b, rating) #updates specific cell
Main()
^ The a variable is what I need to update by 1 everytime the bot runs
I am guessing the a variable tracks the row index. You could get the index of the next empty row cell in the column you are adding the values to.
def next_available_row(worksheet, col):
return len(worksheet.col_values(col)) + 1
sheet = client.open("Movie Fridays").sheet1
sheet.update_cell(next_available_row(sheet, b), b, rating)
You are going to need to save the current or next value of your a variable somewhere and update it every time the script runs.
You could abuse a cell in the spreadsheet for this, or write it out to a file.
I have a dataset of users, books and ratings and I want to find users who rated high particular book and to those users I want to find what other books they liked too.
My data looks like:
df.sample(5)
User-ID ISBN Book-Rating
49064 102967 0449244741 8
60600 251150 0452264464 9
376698 52853 0373710720 7
454056 224764 0590416413 7
54148 25409 0312421273 9
I did so far:
df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.ix['0345339703'] # Lord of the Rings Part 1
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr['User-ID']
last line failed for
KeyError: 'User-ID'
I want to obtain users who rated LOTR > 7 to those users further find movies they liked too from the matrix.
Help would be appreciated. Thanks.
In your like_lotr dataframe 'User-ID' is the name of the index, you cannot select it like a normal column. That is why the line users = like_lotr['User-ID'] raises a KeyError. It is not a column.
Moreover ix is deprecated, better to use loc in your case. And don't put quotes: it need to be an integer, since 'User-ID' was originally a column of integers (at least from your sample).
Try like this:
df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.loc[452264464] # used another number from your sample dataframe to test this code.
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr.index.tolist()
user is now a list with the ids you want.
Using your small sample above and the number I used to test, user is [251150].
An alternative solution is to use reset_index. The two last lins should look like this:
like_lotr = lotr[lotr > 7].to_frame().reset_index()
users = like_lotr['User-ID']
reset_index put the index back in the columns.