Python Bot Import Panda - python

So i trying to use pandas in python bot from a excel sheet. see below table is a sample of excel with 4 columns and 10000+ rows with multiple sheets.
Name Marks Rank School
Student1 655 1 Cambridge
Student2 345 2 Cambridge
Student3 554 3 Cambridge
Student4 847 4 St Peter
Student5 343 5 Cambridge
Student6 546 6 St Peter
Student7 755 7 St Peter
Student8 465 8 St Peter
Student9 467 9 Cambridge
So i tried all pandas examples found in google search. but everything shows results in console or bash file.
import xlrd
df = pd.read_excel('results.xlsx', sheet_name='Sheet1')
print(df[df["Name"] == "Student5"])
So how to get info about a particular student in discord channel.
example student5
Name Marks Rank School
Student5 343 5 Cambridge

I am not familiar with discord but you can get the result as a numpy array:
myarray = df[df["Name"] == "Student5"].values

Related

Can I have a two line caption in pandas dataframe?

Can I have a two line caption in pandas dataframe?
Create dataframe with:
df = pd.DataFrame({'Name' : ['John','Harry','Gary','Richard','Anna','Richard','Gary','Richard'], 'Age' : [25,32,37,43,44,56,37,22], 'Zone' : ['East','West','North','South','East','West','North', 'South']})
df=df.drop_duplicates('Name',keep='first')
df.style.set_caption("Team Members Per Zone")
which outputs:
Team Members Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South
However I'd like it to look like:
Team Members
Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South
Using a break works for me in JupyterLab:
df.style.set_caption('This is line one <br> This is line two')
Have you tried with \n ? (Sorry too low reputation to just comment.

pandas read_csv function reading values as NaN even though there are values in cells

I have a file that I am trying to read in a pandas dataframe. However, some of the cells, are coming up as NaN even though there are values in there. The cells that are showing up as float value. The cells that are not showing up were copied pasted in the cells. Not sure why that would make a difference. Can anyone help? I have included the file as a link at this location: https://www.dropbox.com/s/30rxw07eaza29df/manhattan_hs_gps.csv?dl=0
Tried this and it worked fine, both encoding='unicode-escape' and encoding='latin-1' work:
df = pd.read_csv('manhattan_hs_gps.csv', encoding='unicode-escape', header=None)
print(df)
0 1 2 3
0 0 A. Philip Randolph Campus High School 40.818500 -73.950000
1 1 Aaron School 40.744800 -73.983700
2 2 Abraham Joshua Heschel School 40.772300 -73.989700
3 3 Academy of Environmental Science Secondary Hig... 40.785200 -73.942200
4 4 Academy for Social Action: A College Board School 40.815400 -73.955300
.. ... ... ... ...
162 164 Xavier High School 40.737900 -73.994600
163 165 Yeshiva University High School for Boys 40.851749 -73.928695
164 166 York Preparatory School 40.774100 -73.979400
165 167 Young Women's Leadership School 40.792900 -73.947200
166 168 Washington Heights Expeditionary Learning School 40.774100 -73.979400

Parsing html with correct encoding

I'm trying to use the basketball-reference API using python with the requests and bs4 libraries.
from requests import get
from bs4 import BeautifulSoup
Here's a minimal example of what I'm trying to do:
# example request
r = get(f'https://widgets.sports-reference.com/wg.fcgi?css=1&site=bbr&url=%2Fteams%2FMIL%2F2015.html&div=div_roster')
soup = BeautifulSoup(dd.content, 'html.parser')
table = soup.find('table')
It all works well, I can then feed this table to pandas with its read_html and get the data I need nicely packed into a dataframe.
The problem I have is the encoding.
In this particular request I got two NBA player names with weird characters: Ersan Ä°lyasova (Ersan İlyasova) and Jorge Gutiérrez (Jorge Gutiérrez). In the current code they are interpreted as "Ersan Ä°lyasova" and "Jorge Gutiérrez", which is obviously not what I want.
So the question is -- how do I fix it? This website seems to suggest they have the windows-1251 encoding, but I'm not sure how to use that information (in fact I'm not even sure if that's true).
I know I'm missing something fundamental here as I'm a bit confused how these encodings work at which point they are being "interpreted" etc, so I'll be grateful if you help me with this!
I really don't know why you are usingformat string and even your question is not clear. you've just copy/paste the url from the network traffic and then you mixing things about quoted string with encoding.
Below you should be able to done it.
import pandas as pd
df = pd.read_html("https://www.basketball-reference.com/teams/MIL/2015.html")
print(df)
Output:
[ No. Player Pos ... Unnamed: 6 Exp College
0 34 Giannis Antetokounmpo SG ... gr 1 NaN
1 19 Jerryd Bayless PG ... us 6 Arizona
2 5 Michael Carter-Williams PG ... us 1 Syracuse
3 9 Jared Dudley SG ... us 7 Boston College
4 11 Tyler Ennis PG ... ca R Syracuse
5 13 Jorge Gutiérrez PG ... mx 1 California
6 31 John Henson C ... us 2 UNC
7 7 Ersan İlyasova PF ... tr 6 NaN
8 23 Chris Johnson SF ... us 2 Dayton
9 11 Brandon Knight PG ... us 3 Kentucky
10 5 Kendall Marshall PG ... us 2 UNC
11 6 Kenyon Martin PF ... us 14 Cincinnati
12 0 O.J. Mayo SG ... us 6 USC
13 22 Khris Middleton SF ... us 2 Texas A&M
14 3 Johnny O'Bryant PF ... us R LSU
15 27 Zaza Pachulia C ... ge 11 NaN
16 12 Jabari Parker PF ... us R Duke
17 21 Miles Plumlee C ... us 2 Duke
18 8 Larry Sanders C ... us 4 Virginia Commonwealth
19 6 Nate Wolters PG ... us 1 South Dakota State

Amend row in a data-frame if it exists in another data-frame

I have two dataframes DfMaster and DfError
DfMaster which looks like:
Id Name Building
0 4653 Jane Smith A
1 3467 Steve Jones B
2 34 Kim Lee F
3 4567 John Evans A
4 3643 Kevin Franks S
5 244 Stella Howard D
and DfError looks like
Id Name Building
0 4567 John Evans A
1 244 Stella Howard D
In DfMaster I would like to change the Building value for a record to DD if it appears in the DfError data-frame. So my desired output would be:
Id Name Building
0 4653 Jane Smith A
1 3467 Steve Jones B
2 34 Kim Lee F
3 4567 John Evans DD
4 3643 Kevin Franks S
5 244 Stella Howard DD
I am trying to use the following:
DfMaster.loc[DfError['Id'], 'Building'] = 'DD'
however I get an error:
KeyError: "None of [Int64Index([4567,244], dtype='int64')] are in the [index]"
What have I done wrong?
try this using np.where
import numpy as np
errors = list(dfError['id'].unqiue())
dfMaster['Building_id'] = np.where(dfMaster['Building_id'].isin(errors),'DD',dfMaster['Building_id'])
DataFrame.loc expects that you input an index or a Boolean series, not a value from a column.
I believe this should do the trick:
DfMaster.loc[DfMaster['Id'].isin(DfError['Id']), 'Building'] = 'DD'
Basically, it's telling:
For all rows where Id value is present in DfError['Id'], set the value of 'Building' to 'DD'.

pandas - how to extract top three rows from the dataframe provided

My pandas Data frame df could produce result as below:
grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['TDS','XYZ','UVW']))].groupby(['company_code','sector'])['X_sector'].count()
The output of this is as follows:
company_code sector
TDS Meta 404
Electrical 333
Mechanical 533
Agri 453
XYZ Sports 331
Electrical 354
Movies 375
Manufacturing 355
UVW Sports 505
Robotics 345
Movies 56
Health 3263
Manufacturing 456
Others 524
Name: X_sector, dtype: int64
What I want to get is the top three sectors within the company codes.
What is the way to do it?
You will have to chain a groupby here. Consider this example:
import pandas as pd
import numpy as np
np.random.seed(111)
names = [
'Robert Baratheon',
'Jon Snow',
'Daenerys Targaryen',
'Theon Greyjoy',
'Tyrion Lannister'
]
df = pd.DataFrame({
'season': np.random.randint(1, 7, size=100),
'actor': np.random.choice(names, size=100),
'appearance': 1
})
s = df.groupby(['season','actor'])['appearance'].count()
print(s.sort_values(ascending=False).groupby('season').head(1)) # <-- head(3) for 3 values
Returns:
season actor
4 Daenerys Targaryen 7
6 Robert Baratheon 6
3 Robert Baratheon 6
5 Jon Snow 5
2 Theon Greyjoy 5
1 Jon Snow 4
Where s is (clipped at 4)
season actor
1 Daenerys Targaryen 2
Jon Snow 4
Robert Baratheon 2
Theon Greyjoy 3
Tyrion Lannister 4
2 Daenerys Targaryen 4
Jon Snow 3
Robert Baratheon 1
Theon Greyjoy 5
Tyrion Lannister 3
3 Daenerys Targaryen 2
Jon Snow 1
Robert Baratheon 6
Theon Greyjoy 3
Tyrion Lannister 3
4 ...
Why would you want things to be complicated, when there are simple codes possible:
Z = df.groupby('country_code')['sector'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()
Z

Categories