The index in my DataFrame don't want to drop - python

I try to drop the index from my DataFrame but nothing works. From this DF I make report in HTML. I use style.set_table_styles() on this DF to change header colours, which is probably reason of my problem. Let's say this is my DF:
A B C
0 263 90 10,8
1 718 90 10,6
2 219 80 9,7
3 217 90 9,6
I want this DF to look like this:
A B C
263 90 10,8
718 90 10,6
219 80 9,7
217 90 9,6
And this is my part of my code:
DF.sort_values(by=['B','C'],ascdening=[True,False],inplace=True)
DF = DF.reset_index=(drop=True)
colors = {'A': '#e6ffcc','B':'#406c13','C':'#b3e87d'}
StyleDF = DF.style.set_table_styles(
[{
'selector': f'th.col{i}',
'props': [('background-color', color)]
} for i, color in enumerate(DF.columns.map(colors))
])
HTML = f'''
<html>
<head>
<title>some title</title>
</head>
<style type="text/css">
some style format.........
</style>
<body>
some text.........
{StyleDF.to_html()}
</body>
</html>
I tried also:
Style.to_html(index=False)
DF = pd.read_csv("file.csv",index_col=False)
None of these methods works. I only want to say that my program generates a similar Data Frame where I don't use `style.set_table_styles()' and in this case, I don't have this problem.

Style docs suggest .hide(axis="index") on the Styler object.

Related

Get the sum of absolutes of columns for a dataframe

If I have a dataframe and I want to sum the values of the columns I could do something like
import pandas as pd
studentdetails = {
"studentname":["Ram","Sam","Scott","Ann","John","Bobo"],
"mathantics" :[80,90,85,70,95,100],
"science" :[85,95,80,90,75,100],
"english" :[90,85,80,70,95,100]
}
index_labels=['r1','r2','r3','r4','r5','r6']
df = pd.DataFrame(studentdetails ,index=index_labels)
print(df)
df3 = df.sum()
print(df3)
col_list= ['studentname', 'mathantics', 'science']
print( df[col_list].sum())
How can I do something similar but instead of getting only the sum, getting the sum of absolute values (which in this particular case would be the same though) of some columns?
I tried abs in several way but it did not work
Edit:
studentname mathantics science english
r1 Ram 80 85 90
r2 Sam 90 95 -85
r3 Scott -85 80 80
r4 Ann 70 90 70
r5 John 95 -75 95
r6 Bobo 100 100 100
Expected output
mathantics 520
science 525
english 520
Edit2:
The col_list cannot include string value columns
You need numeric columns for absolute values:
col_list = df.columns.difference(['studentname'])
df[col_list].abs().sum()
df.set_index('studentname').abs().sum()
df.select_dtypes(np.number).abs().sum()

Renaming some part of columns of dataframe with values from another dataframe

I want to change the column names from another DataFrame.
There are some similar questions in stackoverflow, but I need advanced version of it.
data1 = {
"ABC-123_afd": [420, 380, 390],
"LFK-402_ote": [50, 40, 45],
"BPM-299_qbm": [50, 40, 45],
}
data2 = {
"ID": ['ABC-123', 'LFK-402', 'BPM-299'],
"NewID": ['IQU', 'EUW', 'NMS']
}
data1_df=pd.DataFrame(data1)
# ABC-123_afd LFK-402_ote BPM-299_qbm
#0 420 50 50
#1 380 40 40
#2 390 45 45
data2_df=pd.DataFrame(data2)
# ID NewID
#0 ABC-123 IQU
#1 LFK-402 EUW
#2 BPM-299 NMS
I want to make the final result as below:
data_final_df
# IQU_afd EUW_ote NMS_qbm
#0 420 50 50
#1 380 40 40
#2 390 45 45
I tried the code in Renaming columns of dataframe with values from another dataframe.
It ran without error, but there were no changes. I think column names in data 1 are not perfectly matched to the value in the data2 value.
How can I change some part of the column name from another pandas DataFrame?
We could create a mapping from "ID" to "NewID" and use it to modify column names:
mapping = dict(zip(data2['ID'], data2['NewID']))
data1_df.columns = [mapping[x] + '_' + y for x, y in data1_df.columns.str.split('_')]
print(data1_df)
or
s = data1_df.columns.str.split('_')
data1_df.columns = s.str[0].map(mapping) + '_' + s.str[1]
or use the DataFrame data2_df:
s = data1_df.columns.str.split('_')
data1_df.columns = s.str[0].map(data2_df.set_index('ID')['NewID']) + '_' + s.str[1]
Output:
IQU_afd EUW_ote NMS_qbm
0 420 50 50
1 380 40 40
2 390 45 45
One option is to use replace:
mapping = dict(zip(data2['ID'], data2['NewID']))
s = pd.Series(data1_df.columns)
data1_df.columns = s.replace(regex = mapping)
data1_df
IQU_afd EUW_ote NMS_qbm
0 420 50 50
1 380 40 40
2 390 45 45

Collecting data from rotten tomatoes using python API

Im new to python and API and i try to collect data from this link:
https://www.rottentomatoes.com/top/bestofrt/top_100_action__adventure_movies/
the data that i want is the first 25 movies and their info
and i have to use API
the code im tring is this:
result = requests.get('https://www.rottentomatoes.com/top/bestofrt/top_100_action__adventure_movies/').text
print(result)
and thats as far as i could get...(im very new)
the result is very long but this is an example from it
<td class="bold">8.</td>
<td>
<span class="tMeterIcon tiny">
<span class="icon tiny certified_fresh"></span>
<span class="tMeterScore"> 92%</span>
</span>
</td>
<td>
<a href="/m/dunkirk_2017" class="unstyled articleLink">
Dunkirk (2017)</a>
</td>
<td class="right hidden-xs">461</td>
the info that i need from this is the rank(class="bold"), rating(class="tMeterScore") ,title(class="unstyled articleLink") and number of reviews(class="right hidden-xs").
so the problem is that i dont know how to get the data that i need from the result and i dont know if i even do it the right way (if there is a better way to get data)
For tables on simple web pages, pandas.read_html is great.
import pandas as pd
# Read all the page tables with a simple call
tables= pd.read_html('https://www.rottentomatoes.com/top/bestofrt/top_100_action__adventure_movies/')
# display the table shapes to manually select the corret one
print("Tables")
print('\n'.join([f"{i} shape:{t.shape}" for i, t in enumerate(tables)]))
# Selection of the table ('2' results from the manual observation, see previous comment)
table = tables[2]
# some data
print('\n'.join(["", "Table:", "",
"Columns types:", str(table.dtypes), "", "",
"5 first and last rows:", str(table), "", "",
"First row:", str(table.iloc[0])
]))
Output:
Tables
0 shape:(11, 2)
1 shape:(12, 2)
2 shape:(100, 4)
3 shape:(10, 3)
4 shape:(10, 3)
Table:
Columns types:
Rank float64
RatingTomatometer object
Title object
No. of Reviews int64
dtype: object
5 first and last rows:
Rank ... No. of Reviews
0 1.0 ... 525
1 2.0 ... 547
2 3.0 ... 437
3 4.0 ... 434
4 5.0 ... 392
.. ... ... ...
95 96.0 ... 93
96 97.0 ... 130
97 98.0 ... 324
98 99.0 ... 203
99 100.0 ... 66
[100 rows x 4 columns]
First row:
Rank 1.0
RatingTomatometer 96%
Title Black Panther (2018)
No. of Reviews 525
Name: 0, dtype: object

Obtaining \r\n\r\n while scraping from web in Python

I am workin on scraping text using Python from the link; tournament link
Here is my code to get the tabular data;
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "http://www.hubertiming.com/results/2017GPTR10K"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
rows = soup.find_all('tr') ## find the table rows
Now, the goal is to obtain the data as a dataframe.
listnew=[]
for row in rows:
row_td = row.find_all('td')
str_cells = str(row_td)
cleantext = BeautifulSoup(str_cells, "lxml").get_text() ##obtain text part
listnew.append(cleantext) ## append to list
df = pd.DataFrame(listnew)
df.head(10)
Then we get following output;
0 []
1 [Finishers:, 577]
2 [Male:, 414]
3 [Female:, 163]
4 []
5 [1, 814, \r\n\r\n JARED WIL...
6 [2, 573, \r\n\r\n NATHAN A ...
7 [3, 687, \r\n\r\n FRANCISCO...
8 [4, 623, \r\n\r\n PAUL MORR...
9 [5, 569, \r\n\r\n DEREK G O..
I don't know why there is a new line character and carriage return character; \r\n\r\n? how can I remove them and get a dataframe in the proper format? Thanks in advance.
Pandas can parse HTML tables, give this a try:
from urllib.request import urlopen
import pandas as pd
from bs4 import BeautifulSoup
url = "http://www.hubertiming.com/results/2017GPTR10K"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
table_1_html = soup.find('table', attrs={'id': 'individualResults'})
t_1 = pd.read_html(table_1_html.prettify())[0]
print(t_1)
Output:
Place Bib Name ... Chip Pace Gun Time Team
0 1 814 JARED WILSON ... 5:51 36:24 NaN
1 2 573 NATHAN A SUSTERSIC ... 5:55 36:45 INTEL TEAM F
2 3 687 FRANCISCO MAYA ... 6:05 37:48 NaN
3 4 623 PAUL MORROW ... 6:13 38:37 NaN
4 5 569 DEREK G OSBORNE ... 6:20 39:24 INTEL TEAM F
.. ... ... ... ... ... ... ...
572 573 273 RACHEL L VANEY ... 15:51 1:38:34 NaN
573 574 467 ROHIT B DSOUZA ... 15:53 1:40:32 INTEL TEAM I
574 575 471 CENITA D'SOUZA ... 15:53 1:40:34 NaN
575 576 338 PRANAVI APPANA ... 16:15 1:42:01 NaN
576 577 443 LIBBY B MITCHELL ... 16:20 1:42:10 NaN
[577 rows x 10 columns]
Seems like some cells in the HTML code has a lot of leading and trailing spaces and new lines:
<td>
JARED WILSON
</td>
Use str.strip to remove all leading and trailing whitespace, like this:
BeautifulSoup(str_cells, "lxml").get_text().strip().
Well looking at the url you provided, you can see the new lines in the :
...
<td>814</td>
<td>
JARED WILSON
</td>
...
so that's what you get when you scrape. These can easily be removed by the very convenient .strip() string method.
Your DataFrame is not formatted correctly because you are giving it a list of lists, which are not all of the same size (see the first 4 lines), which come from another table located on the top right. One easy fix is to remove the first 4 lines, though it would be way more robust to select the table you want based on its id ("individualResults").
df = pd.DataFrame(listnew[4:])
df.head(10)
Have a look here: BeautifulSoup table to dataframe

Pandas hiding a column, to stay in df, but not displayed in html table

I have +-300 lines of code to give me a specific df table.. this table needs to be displayed in html. Everything is set up and working perfectly. The only problem is, that I have a column within the df (that is crucial for other big calculations) that I cannot delete or adjust. I would like to keep this column within in the df, but want to hide it when viewing html table
For example (simplified):
'''The table I have as df (cannot be changed fundamentally) '''
col1 col2 col3 col4
r1 2 34 45 23
r2 2 65 34 56
r3 2 34 34 54
r4 2 76 54 34
'''The Table I need to be displayed in html (without actually removing col1, just hiding it)'''
col2 col3 col4
r1 34 45 23
r2 65 34 56
r3 34 34 54
r4 76 54 34
You can create custom css styles. Here are added borders and hidden column col0 by Styler.set_table_styles:
css = [
{
'props': [
('border-collapse', 'collapse')]
},
{
'selector': 'th',
'props': [
('border-color', 'black'),
('border-style ', 'solid'),
('border-width','1px')]
},
{
'selector': 'td',
'props': [
('border-color', 'black'),
('border-style ', 'solid'),
('border-width','1px')]
},
{'selector': '.col0',
'props': [('display', 'none')]}]
html = df.style.set_table_styles(css).render()
EDIT:
If want printing DataFrame without col1 then is possible use DataFrame.drop this column or select only columns for printing:
df_print = df.drop('col1', axis=1)
df_print = df[['col2', 'col3', 'col4']]

Categories