I found a big table of data online. I would like to use it in python. I was going to make a graph out of two of the columns of data.
I copy and pasted the table trying to make a string out of it but the table is just raw numbers no commas or anything and python isn't happy with that.
Is there any way I can do this in python?
(I added the first couple of commas experimenting)
import math
a=(
1983, 937.700, 645 1580 71.6 65.9 65.9 65.8 65.8
1984 3426.020 645 6742 76.8 67.8 67.4 60.5 61.6
1985 3189.450 645 6347 72.4 71.1 69.1 56.4 59.3
1986 3792.140 645 7488 85.5 85.8 74.2 67.1 61.7
1987 4658.460 640 7654 87.4 85.5 76.8 83.1 66.7
1988 5283.590 640 8372 95.3 95.3 80.4 94.0 71.9
1989 4870.250 640 7722 88.2 89.5 81.8 86.9 74.3
1990 4080.560 640 7748 88.4 72.9 80.6 72.8 74.1
1991 3925.510 640 6317 72.1 69.9 79.3 70.0 73.6
1992 4701.500 640 7431 84.6 84.8 79.9 83.6 74.7
1993 4827.100 685 7731 88.2 92.4 81.2 80.4 75.2
1994 5405.460 635 8634 98.6 98.6 82.7 97.2 77.2
1995 4518.970 635 7229 82.5 82.5 82.7 81.2 77.5
1996 5241.980 635 8289 94.4 94.4 83.6 94.0 78.7
1997 4217.520 635 6901 78.8 78.8 83.2 75.8 78.5
1998 3825.060 635 6258 71.4 71.4 82.5 68.8 77.9
1999 3793.280 635 6132 70.0 69.9 81.7 68.2 77.3
2000 4886.200 635 7879 89.7 89.7 82.2 87.6 77.9
2001 4711.190 635 7766 88.6 88.3 82.5 84.7 78.3
2002 4532.290 635 7366 84.1 83.4 82.5 81.5 78.4
2003 3567.070 635 5833 66.6 65.2 81.7 64.1 77.7
2004 4875.390 635 7905 90.0 89.2 82.0 87.4 78.2
2005 4486.190 635 7329 83.7 83.5 82.1 80.6 78.3
2006 4595.250 635 7541 86.1 86.1 82.3 82.6 78.5
2007 4328.590 635 7126 81.4 77.8 82.1 77.8 78.4
2008 3648.410 635 6207 70.7 65.4 81.4 65.4 77.9
2009 3611.440 635 6039 68.9 64.9 80.8 64.9 77.4
2010 3490.450 635 5641 64.4 62.8 80.2 62.8 76.9
2011 3490.600 635 5861 66.9 62.8 79.5 62.8 76.4
2012 3911.560 )
File "", line 3
1983, 937.70, 645 1580 71.6 65.9 65.9 65.8 65.8
^
SyntaxError: invalid syntax
Create file with name data and csv extension like this data.csv. Paste the original values to files (not the commas you added). Now you can read this file:
import csv
with open('data.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
print(', '.join(row))
Related
I have this dataframe called "df_pressure":
Ranking Squad Press Succ Succ% Fail Fail%
11 1 Manchester City 4254 1381 32.5 2873 67.5
10 2 Liverpool 5360 1731 32.3 3629 67.7
5 3 Chelsea 5533 1702 30.8 3831 69.2
16 4 Tottenham 5477 1523 27.8 3954 72.2
0 5 Arsenal 4772 1440 30.2 3332 69.8
12 6 Manchester Utd 5069 1462 28.8 3607 71.2
18 7 West Ham 4917 1372 27.9 3545 72.1
9 8 Leicester City 5982 1719 28.7 4263 71.3
3 9 Brighton 5670 1832 32.3 3838 67.7
19 10 Wolves 5529 1633 29.5 3896 70.5
13 11 Newcastle Utd 5430 1460 26.9 3970 73.1
6 12 Crystal Palace 6041 1809 29.9 4232 70.1
2 13 Brentford 5566 1609 28.9 3957 71.1
1 14 Aston Villa 5515 1524 27.6 3991 72.4
15 15 Southampton 5869 1806 30.8 4063 69.2
7 16 Everton 6346 1892 29.8 4454 70.2
8 17 Leeds United 7078 2118 29.9 4960 70.1
4 18 Burnley 5527 1499 27.1 4028 72.9
17 19 Watford 5730 1656 28.9 4074 71.1
14 20 Norwich City 6146 1570 25.5 4576 74.5
I then decided to create another dataframe for some columns only:
df_pressure_perc=df_pressure[['Squad','Succ%','Fail%']]
df_pressure_perc.reset_index(drop=True, inplace=True)
df_pressure_perc.set_index('Squad')
print(df_pressure_perc)
Output:
Squad Succ% Fail%
0 Manchester City 32.5 67.5
1 Liverpool 32.3 67.7
2 Chelsea 30.8 69.2
3 Tottenham 27.8 72.2
4 Arsenal 30.2 69.8
5 Manchester Utd 28.8 71.2
6 West Ham 27.9 72.1
7 Leicester City 28.7 71.3
8 Brighton 32.3 67.7
9 Wolves 29.5 70.5
10 Newcastle Utd 26.9 73.1
11 Crystal Palace 29.9 70.1
12 Brentford 28.9 71.1
13 Aston Villa 27.6 72.4
14 Southampton 30.8 69.2
15 Everton 29.8 70.2
16 Leeds United 29.9 70.1
17 Burnley 27.1 72.9
18 Watford 28.9 71.1
19 Norwich City 25.5 74.5
Based on this new dataframe "df_pressure_perc", I decided to create a stacked barplot. Upon creating it with the following code: df_pressure_perc.plot(kind='barh', stacked=True, ylabel='Squad', colormap='tab10', figsize=(10, 6))
I realised my viz Y axis were not labelled in terms of the Squad names. Would like to seek some advice on how I can reflect the Y axis in Squad names instead of 0-19.
Visualization(stacked barplot)
Here is the data I used for the fit which does not work:
x_vals = [20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9 21. 21.1 21.2 21.3 21.4
21.5 21.6 21.7 21.8 21.9 22. 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8
22.9 23. 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.8 23.9 24. 24.1 24.2
24.3 24.4 24.5 24.6 24.7 24.8 24.9 25. 25.1 25.2 25.3 25.4 25.5 25.6
25.7 25.8 25.9 26. 26.1 26.2 26.3 26.4 26.5 26.6 26.7 26.8 26.9 27.
27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 27.9 28. 28.1 28.2 28.3 28.4
28.5 28.6 28.7 28.8 28.9 29. 29.1 29.2 29.3 29.4 29.5 29.6 29.7 29.8
29.9]
y_vals = [1922 1947 1985 2019 2050 1955 2143 2133 2132 2214 2268 2293 2397 2339
2407 2447 2540 2504 2661 2714 2758 2945 3108 3161 3254 3434 3883 3997
4250 4659 4782 5150 5603 5833 6225 6613 6502 6911 6873 6941 6876 6709
6663 6238 5949 5728 5120 4649 4273 3671 3340 2855 2621 2246 1920 1666
1476 1293 1099 1061 982 993 908 905 806 821 744 705 751 701
673 728 662 677 658 615 684 688 679 624 600 622 608 572
626 637 586 567 579 576 572 585 557 536 549 565 509 511
521]
The fit isn't so great, its off by a lot and I am not sure how to fix it. Please let me know if there is a better way to fit this.
def lorentzian(x, a, x0):
return a / ((x-x0)**2 + a**2) / np.pi
# Obtain xdata and ydata
...
# Initial guess of the parameters (you must find them some way!)
#pguess = [2.6, 24]
# Fit the data
normalization_factor = np.trapz(x_vals, y_vals) # area under the curve
popt, pcov = curve_fit(lorentzian, x_vals, y_vals/normalization_factor)
# Results
a, x0 = popt[0], popt[1]
plt.plot(x_vals, lorentzian(x_vals, popt[0], popt[1])*(normalization_factor),
color='crimson', label='Fitted function')
plt.plot(x_vals, y_vals, 'o', label='data')
plt.show()
You have the arguments to np.trapz reversed. It should be
normalization_factor = np.trapz(y_vals, x_vals)
I have 2 data frames, from which I want to create a third data frame(country) from data from the 2 data frames.
Below the data:
Indicator 1
country 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1 Angola 200.0 193.0 185.0 176.0 167.0 157.0 148.0 138.0 129.0 120.0
2 Albania 24.5 23.1 21.8 20.4 19.2 17.9 16.7 15.5 14.4 13.3
195 Zambia 153.0 142.0 130.0 119.0 110.0 101.0 95.4 90.4 85.1 80.3
Indicator2
country 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1 Angola 53.4 54.5 55.1 55.5 56.4 57.0 58.0 58.8 59.5 60.2
2 Albania 76.0 75.9 75.6 75.8 76.2 76.9 77.5 77.6 78.0 78.1
193 Zambia 45.2 45.9 46.6 47.7 48.7 50.0 51.9 54.1 55.7 56.5
I need to create a new data frame for each country like below
Angloa
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Indicator1 200.0 193.0 185.0 176.0 167.0 157.0 148.0 138.0 129.0 120.0
Indicator2 53.4 54.5 55.1 55.5 56.4 57.0 58.0 58.8 59.5 60.2
I need to know the code for creating this new data frame
What you asked can be done this way :
# Setting up DataFrames
indicator1 = pd.DataFrame({
'country': ['Angola', 'Albania', 'Zambia'],
'2001': ["200.0", "24.5", "153.0"],
'2002': ["193.0", "23.1", "142.0"]
})
indicator2 = pd.DataFrame({
'country': ['Angola', 'Albania', 'Zambia'],
'2001': ["53.4", "76.0", "45.2"],
'2002': ["54.5", "75.9", "45.9"]
})
# For each country
for index, row in indicator1.iterrows():
# create a new variable with the country as name
globals()[f"{row['country']}"] = {}
# For each column of your 2 dataframes
for key, value in indicator1.iteritems():
if key != 'country':
globals()[f"{row['country']}"][key] = [row[key], indicator2.iloc[indicator2[indicator2['country'] == row[
'country']].index.values[0]][key]]
globals()[f"{row['country']}"] = pd.DataFrame(globals()[f"{row['country']}"])
I've only did it with an extract from your data, but it can be generalised. I'm not sure saving the newly created DataFrame like this is the best way, but I had no variable idea so I let you worry about this.
print(Angola)
# Output :
2001 2002
0 200.0 193.0
1 53.4 54.5
I am running into an issue when using BeautifulSoup to scrape data off of www.basketball-reference.com. I've used BeautifulSoup before on Bballreference before so I am a little stumped as to what is happening (granted I am a pretty huge noob so please bear with me).
I am trying to scrape team season stats off of https://www.basketball-reference.com/leagues/NBA_2020.html and am running into troubles from the very start:
from bs4 import BeautifulSoup
import requests
web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')
table = soup.find('table', id='team-stats-per_game')
print(table)
This shows that the finding of the table in question was unsuccessful even though I can clearly locate that tag when inspecting the web page. Okay... no biggie so far (usually these errors are on my end) so I instead just print out the whole soup:
soup = BeautifulSoup(web_response, 'lxml')
print(soup)
I copy and paste that into https://codebeautify.org/htmlviewer/. To get a better view than from the terminal and I see that it does not look how I would expect it to. Essentially the meta tags are fine but everything else appears to have lost its opening and closing tags, just making my soup into an actual soup...
Again, no biggie (still pretty sure it is something that I am doing), so I go and grab the html from a simple blog site, print it, and paste it into codebeautify and lo and behold it looks normal. Now I have a suspicion that something is occurring on basketball-reference's side that is obscuring my ability to even grab the html.
My question is this; what exactly is going on here? I am assuming it's an 80% chance it is still me but the 20% is not so sure at this point. Can someone point out what I am doing wrong or how to grab the html?
The data is stored within the page, but inside the HTML comment.
To parse it, you can do for example:
import requests
from bs4 import BeautifulSoup, Comment
web_response = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html').text
soup = BeautifulSoup(web_response, 'lxml')
table = soup.find('table', id='team-stats-per_game')
# find the comment section where the data is stored
for idx, c in enumerate(soup.select_one('div#all_team-stats-per_game').contents):
if isinstance(c, Comment):
break
# load the data from comment:
soup2 = BeautifulSoup(soup.select_one('div#all_team-stats-per_game').contents[idx], 'html.parser')
# print data:
for tr in soup2.select('tr:has(td)'):
tds = tr.select('td')
for td in tds:
print(td.get_text(strip=True), end='\t')
print()
Prints:
Dallas Mavericks 67 241.5 41.6 90.0 .462 15.3 41.5 .369 26.3 48.5 .542 17.9 23.1 .773 10.6 36.4 47.0 24.5 6.3 5.0 12.8 19.0 116.4
Milwaukee Bucks* 65 240.8 43.5 91.2 .477 13.7 38.6 .356 29.8 52.6 .567 17.8 24.0 .742 9.5 42.2 51.7 25.9 7.4 6.0 14.9 19.2 118.6
Houston Rockets 64 241.2 41.1 90.7 .454 15.4 44.3 .348 25.7 46.4 .554 20.5 26.0 .787 10.4 34.6 44.9 21.5 8.5 5.1 14.7 21.6 118.1
Portland Trail Blazers 66 240.8 41.9 90.9 .461 12.6 33.8 .372 29.3 57.1 .513 17.3 21.7 .798 10.1 35.4 45.5 20.2 6.1 6.2 13.0 21.4 113.6
Atlanta Hawks 67 243.0 40.6 90.6 .449 12.0 36.1 .333 28.6 54.5 .525 18.5 23.4 .790 9.9 33.4 43.3 24.0 7.8 5.1 16.2 23.1 111.8
New Orleans Pelicans 64 242.3 42.6 92.2 .462 14.0 37.6 .372 28.6 54.6 .525 16.9 23.2 .729 11.2 35.8 47.0 27.0 7.6 5.1 16.2 21.0 116.2
Los Angeles Clippers 64 241.2 41.6 89.7 .464 12.2 33.2 .366 29.5 56.5 .522 20.8 26.2 .792 11.0 37.0 48.0 23.8 7.1 5.0 14.8 22.0 116.2
Washington Wizards 64 241.2 41.9 91.0 .461 12.3 33.1 .372 29.6 57.9 .511 19.5 24.8 .787 10.1 31.6 41.7 25.3 8.1 4.3 14.1 22.6 115.6
Memphis Grizzlies 65 240.4 42.8 91.0 .470 10.9 31.1 .352 31.8 59.9 .531 16.2 21.3 .761 10.4 36.3 46.7 27.0 8.0 5.6 15.3 20.8 112.6
Phoenix Suns 65 241.2 40.8 87.8 .464 11.2 31.7 .353 29.6 56.1 .527 19.8 24.0 .826 9.8 33.3 43.1 27.2 7.8 4.0 15.1 22.1 112.6
Miami Heat 65 243.5 39.6 84.4 .470 13.4 34.8 .383 26.3 49.6 .530 19.5 25.1 .778 8.5 36.0 44.5 26.0 7.4 4.5 14.9 20.4 112.2
Minnesota Timberwolves 64 243.1 40.4 91.6 .441 13.3 39.7 .336 27.1 52.0 .521 19.1 25.4 .753 10.5 34.3 44.8 23.8 8.7 5.7 15.3 21.4 113.3
Boston Celtics* 64 242.0 41.2 89.6 .459 12.4 34.2 .363 28.8 55.4 .519 18.3 22.8 .801 10.7 35.3 46.0 22.8 8.3 5.6 13.6 21.4 113.0
Toronto Raptors* 64 241.6 40.6 88.5 .458 13.8 37.0 .371 26.8 51.5 .521 18.1 22.6 .800 9.7 35.5 45.2 25.4 8.8 4.9 14.4 21.5 113.0
Los Angeles Lakers* 63 240.8 42.9 88.6 .485 11.2 31.4 .355 31.8 57.1 .556 17.3 23.7 .730 10.6 35.5 46.1 25.9 8.6 6.8 15.1 20.6 114.3
Denver Nuggets 65 242.3 41.8 88.9 .471 10.9 30.4 .358 31.0 58.5 .529 15.9 20.5 .775 10.8 33.5 44.3 26.5 8.1 4.6 13.7 20.0 110.4
San Antonio Spurs 63 242.8 42.0 89.5 .470 10.7 28.7 .371 31.4 60.8 .517 18.4 22.8 .809 8.8 35.6 44.4 24.5 7.2 5.5 12.3 19.2 113.2
Philadelphia 76ers 65 241.2 40.8 87.7 .465 11.4 31.6 .362 29.4 56.1 .523 16.6 22.1 .752 10.4 35.1 45.5 25.9 8.2 5.4 14.2 20.6 109.6
Indiana Pacers 65 241.5 42.2 88.4 .477 10.0 27.5 .363 32.2 60.9 .529 15.1 19.1 .787 8.8 34.0 42.8 25.9 7.2 5.1 13.1 19.6 109.3
Utah Jazz 64 240.4 40.1 84.6 .475 13.2 34.4 .383 27.0 50.2 .537 17.6 22.8 .772 8.8 36.3 45.1 22.2 5.9 4.0 14.9 20.0 111.0
Oklahoma City Thunder 64 241.6 40.3 85.1 .473 10.4 29.3 .355 29.9 55.8 .536 19.8 24.8 .797 8.1 34.6 42.7 21.9 7.6 5.0 13.5 18.8 110.8
Brooklyn Nets 64 243.1 40.0 90.0 .444 12.9 37.9 .340 27.1 52.2 .519 18.0 24.1 .744 10.8 37.6 48.5 24.0 6.5 4.6 15.5 20.7 110.8
Detroit Pistons 66 241.9 39.3 85.7 .459 12.0 32.7 .367 27.3 53.0 .515 16.6 22.4 .743 9.8 32.0 41.7 24.1 7.4 4.5 15.3 19.7 107.2
New York Knicks 66 241.9 40.0 89.3 .447 9.6 28.4 .337 30.4 61.0 .499 16.3 23.5 .694 12.0 34.5 46.5 22.1 7.6 4.7 14.3 22.2 105.8
Sacramento Kings 64 242.3 40.4 87.8 .459 12.6 34.7 .364 27.7 53.2 .522 15.6 20.3 .769 9.6 32.9 42.5 23.4 7.6 4.2 14.4 21.9 109.0
Cleveland Cavaliers 65 241.9 40.3 87.9 .458 11.2 31.8 .351 29.1 56.1 .519 15.1 19.9 .758 10.8 33.4 44.2 23.1 6.9 3.2 16.5 18.3 106.9
Chicago Bulls 65 241.2 39.6 88.6 .447 12.2 35.1 .348 27.4 53.5 .511 15.5 20.5 .755 10.5 31.4 41.9 23.2 10.0 4.1 15.5 21.8 106.8
Orlando Magic 65 240.4 39.2 88.8 .442 10.9 32.0 .341 28.3 56.8 .498 17.0 22.1 .770 10.4 34.2 44.5 24.0 8.4 5.7 12.6 17.6 106.4
Golden State Warriors 65 241.9 38.6 88.2 .438 10.4 31.3 .334 28.2 56.9 .495 18.7 23.2 .803 10.0 32.9 42.8 25.6 8.2 4.6 14.9 20.1 106.3
Charlotte Hornets 65 242.3 37.3 85.9 .434 12.1 34.3 .352 25.2 51.6 .489 16.2 21.6 .748 11.0 31.8 42.8 23.8 6.6 4.1 14.6 18.8 102.9
League Average 65 241.7 40.8 88.8 .460 12.1 33.9 .357 28.7 54.9 .523 17.7 22.9 .771 10.1 34.7 44.9 24.3 7.7 4.9 14.5 20.6 111.4
CODE
import pandas
df = pandas.read_csv('biharpopulation.txt', delim_whitespace=True)
df.columns = ['SlNo','District','Total','Male','Female','Total','Male','Female','SC','ST','SC','ST']
DATA
SlNo District Total Male Female Total Male Female SC ST SC ST
1 Patna 729988 386991 342997 9236 5352 3884 15.5 0.2 38.6 68.7
2 Nalanda 473786 248246 225540 970 524 446 20.2 0.0 29.4 29.8
3 Bhojpur 343598 181372 162226 8337 4457 3880 15.3 0.4 39.1 46.7
4 Buxar 198014 104761 93253 8428 4573 3855 14.1 0.6 37.9 44.6
5 Rohtas 444333 233512 210821 25663 13479 12184 18.1 1.0 41.3 30.0
6 Kaimur 286291 151031 135260 35662 18639 17023 22.2 2.8 40.5 38.6
7 Gaya 1029675 529230 500445 2945 1526 1419 29.6 0.1 26.3 49.1
8 Jehanabad 174738 90485 84253 1019 530 489 18.9 0.07 32.6 32.4
9 Arawal 11479 57677 53802 294 179 115 18.8 0.04
10 Nawada 435975 223929 212046 2158 1123 1035 24.1 0.1 22.4 20.5
11 Aurangabad 472766 244761 228005 1640 865 775 23.5 0.1 35.7 49.7
Saran
12 Saran 389933 199772 190161 6667 3384 3283 12 0.2 33.6 48.5
13 Siwan 309013 153558 155455 13822 6856 6966 11.4 0.5 35.6 44.0
14 Gopalganj 267250 134796 132454 6157 2984 3173 12.4 0.3 32.1 37.8
15 Muzaffarpur 594577 308894 285683 3472 1789 1683 15.9 0.1 28.9 50.4
16 E. Champaran 514119 270968 243151 4812 2518 2294 13.0 0.1 20.6 34.3
17 W. Champaran 434714 228057 206657 44912 23135 21777 14.3 1.5 22.3 24.1
18 Sitamarhi 315646 166607 149039 1786 952 834 11.8 0.1 22.1 31.4
19 Sheohar 74391 39405 34986 64 35 29 14.4 0.0 16.9 38.8
20 Vaishali 562123 292711 269412 3068 1595 1473 20.7 0.1 29.4 29.9
21 Darbhanga 511125 266236 244889 841 467 374 15.5 0.0 24.7 49.5
22 Madhubani 481922 248774 233148 1260 647 613 13.5 0.0 22.2 35.8
23 Samastipur 628838 325101 303737 3362 2724 638 18.5 0.1 25.1 22.0
24 Munger 150947 80031 70916 18060 9297 8763 13.3 1.6 42.6 37.3
25 Begusarai 341173 177897 163276 1505 823 682 14.5 0.1 31.4 78.6
26 Shekhapura 103732 54327 49405 211 115 96 19.7 0.0 25.2 45.6
27 Lakhisarai 126575 65781 60794 5636 2918 2718 15.8 0.7 26.8 12.9
28 Jamui 242710 124538 118172 67357 34689 32668 17.4 4.8 24.5 26.7
The issue is with these 2 lines:
16 E. Champaran 514119 270968 243151 4812 2518 2294 13.0 0.1 20.6 34.3
17 W. Champaran 434714 228057 206657 44912 23135 21777 14.3 1.5 22.3 24.1
If you can somehow remove the space between E. Champaran and W. Champaran then you can do this:
df = pd.read_csv('test.csv', sep=r'\s+', skip_blank_lines=True, skipinitialspace=True)
print(df)
SlNo District Total Male Female Total.1 Male.1 Female.1 SC ST SC.1 ST.1
0 1 Patna 729988 386991 342997 9236 5352 3884 15.5 0.20 38.6 68.7
1 2 Nalanda 473786 248246 225540 970 524 446 20.2 0.00 29.4 29.8
2 3 Bhojpur 343598 181372 162226 8337 4457 3880 15.3 0.40 39.1 46.7
3 4 Buxar 198014 104761 93253 8428 4573 3855 14.1 0.60 37.9 44.6
4 5 Rohtas 444333 233512 210821 25663 13479 12184 18.1 1.00 41.3 30.0
5 6 Kaimur 286291 151031 135260 35662 18639 17023 22.2 2.80 40.5 38.6
6 7 Gaya 1029675 529230 500445 2945 1526 1419 29.6 0.10 26.3 49.1
7 8 Jehanabad 174738 90485 84253 1019 530 489 18.9 0.07 32.6 32.4
8 9 Arawal 11479 57677 53802 294 179 115 18.8 0.04 NaN NaN
9 10 Nawada 435975 223929 212046 2158 1123 1035 24.1 0.10 22.4 20.5
10 11 Aurangabad 472766 244761 228005 1640 865 775 23.5 0.10 35.7 49.7
11 12 Saran 389933 199772 190161 6667 3384 3283 12.0 0.20 33.6 48.5
12 13 Siwan 309013 153558 155455 13822 6856 6966 11.4 0.50 35.6 44.0
13 14 Gopalganj 267250 134796 132454 6157 2984 3173 12.4 0.30 32.1 37.8
14 15 Muzaffarpur 594577 308894 285683 3472 1789 1683 15.9 0.10 28.9 50.4
15 16 E.Champaran 514119 270968 243151 4812 2518 2294 13.0 0.10 20.6 34.3
16 17 W.Champaran 434714 228057 206657 44912 23135 21777 14.3 1.50 22.3 24.1
17 18 Sitamarhi 315646 166607 149039 1786 952 834 11.8 0.10 22.1 31.4
18 19 Sheohar 74391 39405 34986 64 35 29 14.4 0.00 16.9 38.8
19 20 Vaishali 562123 292711 269412 3068 1595 1473 20.7 0.10 29.4 29.9
20 21 Darbhanga 511125 266236 244889 841 467 374 15.5 0.00 24.7 49.5
21 22 Madhubani 481922 248774 233148 1260 647 613 13.5 0.00 22.2 35.8
22 23 Samastipur 628838 325101 303737 3362 2724 638 18.5 0.10 25.1 22.0
23 24 Munger 150947 80031 70916 18060 9297 8763 13.3 1.60 42.6 37.3
24 25 Begusarai 341173 177897 163276 1505 823 682 14.5 0.10 31.4 78.6
25 26 Shekhapura 103732 54327 49405 211 115 96 19.7 0.00 25.2 45.6
26 27 Lakhisarai 126575 65781 60794 5636 2918 2718 15.8 0.70 26.8 12.9
27 28 Jamui 242710 124538 118172 67357 34689 32668 17.4 4.80 24.5 26.7
Your problem is that the CSV is whitespace-delimited, but some of your district names also have whitespace in them. Luckily, none of the district names contain '\t' characters, so we can fix this:
df = pandas.read_csv('biharpopulation.txt', delimiter='\t')