How to insert a variable in xpath within a for loop? - python
for i in range(length):
# print(i)
driver.execute_script("window.history.go(-1)")
range = driver.find_element_by_xpath("(//a[#class = 'button'])[i]").click()
content2 = driver.page_source.encode('utf-8').strip()
soup2 = BeautifulSoup(content2,"html.parser")
name2 = soup2.find('h1', {'data-qa-target': 'ProviderDisplayName'}).text
phone2 = soup2.find('a', {'class': 'click-to-call-button-secondary hg-track mobile-click-to-call'}).text
print(name2, phone2)
Hey guy I am trying to scrape the First and last Name, Telephone for each person this website: https://www.healthgrades.com/family-marriage-counseling-directory. I want the (l.4) button to adapt to the variable (i). if i manually change i to a number everything works perfectly fine. But as soon as I placed in the variable i it doesn't work, any help much appreciated!
Instead of this :
range = driver.find_element_by_xpath("(//a[#class = 'button'])[i]").click()
do this :
range = driver.find_element_by_xpath(f"(//a[#class = 'button'])[{i}]").click()
Update 1 :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.healthgrades.com/family-marriage-counseling-directory")
for name in driver.find_elements(By.CSS_SELECTOR, "a[data-qa-target='provider-details-provider-name']"):
print(name.text)
Output :
Noe Gutierrez, MSW
Melissa Huston, LCSW
Gina Kane, LMHC
Dr. Mary Marino, PHD
Emili-Erin Puente, MED
Richard Vogel, LMFT
Lynn Bednarz, LCPC
Nicole Palow, LMHC
Dennis Hart, LPCC
Dr. Robert Meeks, PHD
Jody Davis
Dr. Kim Logan, PHD
Artemis Paschalis, LMHC
Mark Webb, LMFT
Deirdre Holland, LCSW-R
John Paul Dilorenzo, LMHC
Joseph Hayes, LPC
Dr. Maylin Batista, PHD
Ella Gray, LCPC
Cynthia Mack-Ernsdorff, MA
Dr. Edward Muldrow, PHD
Rachel Sievers, LMFT
Dr. Lisa Burton, PHD
Ami Owen, LMFT
Sharon Lorber, LCSW
Heather Rowley, LCMHC
Dr. Bonnie Bryant, PHD
Marilyn Pearlman, LCSW
Charles Washam, BCD
Dr. Liliana Wolf, PHD
Christy Kobe, LCSW
Dana Paine, LPCC
Scott Kohner, LCSW
Elizabeth Krzewski, LMHC
Luisa Contreras, LMFT
Dr. Joel Nunez, PHD
Susanne Sacco, LISW
Lauren Reminger, MA
Thomas Recher, AUD
Kristi Smith, LCSW
Kecia West, LPC
Gregory Douglas, MED
Gina Smith, LCPC
Anne Causey, LPC
Dr. David Greenfield, PHD
Olga Rothschild, LMHC
Dr. Susan Levin, PHD
Ferguson Jennifer, LMHC
Marci Ober, LMFT
Christopher Checke, LMHC
Process finished with exit code 0
Update 2 :
leng = len(driver.find_elements(By.CSS_SELECTOR, "a[data-qa-target='provider-details-provider-name']"))
for i in range(leng):
driver.find_element_by_xpath(f"(//a[text()='View Profile'])[{i}]").click()
Related
How to deal with long names in data cleaning?
I have a users database. I want to separate them into two columns to have user1 and user2. The way I was solving this was to split the names into multiple columns then merge the names to have the two columns of users. The issue I run into is some names are long and after the split. Those names take some spot on the data frame which makes it harder to merge properly. Users Maria Melinda Del Valle Justin Howard Devin Craig Jr. Michael Carter III Jeanne De Bordeaux Alhamdi After I split the user columns 0 1 2 3 4 5 6 7 8 Maria Melinda Del Valle Justin Howard Devin Craig Jr. Michael Carter III Jeanne De Bordeaux Alhamdi The expected result is the following User1 User2 Maria Melinda Del valle Justin Howard Devin Craig Jr. Michael Carter III Jeanne De Bordeaux Alhamdi
You can use: def f(sr): m = sr.isna().cumsum().loc[lambda x: x < 2] return sr.dropna().groupby(m).apply(' '.join) out = df.apply(f, axis=1).rename(columns=lambda x: f'User{x+1}') Output: >>> out User1 User2 0 Maria Melinda Del Valle Justin Howard 1 Devin Craig Jr. Michael Carter III 2 Jeanne De Bordeaux Alhamdi As suggested by #Barmar, If you know where to put the blank columns in the first split, you should know how to create both columns.
webscraping stars from imdb page using beautifulsoup
I am trying to get the name of stars from an IMDb page. below is my code from requests import get url = 'https://www.imdb.com/search/title/?title_type=tv_movie,tv_series&user_rating=6.0,10.0&adult=include&ref_=adv_prv' response = get(url) from bs4 import BeautifulSoup html_soup = BeautifulSoup(response.text, 'html.parser') movie_containers = html_soup.find_all('div', 'lister-item mode-advanced') first_movie = movie_containers[0] first_stars = first_movie.select('a[href*="name"]') first_stars I got the following output [Bob Odenkirk, Rhea Seehorn, Jonathan Banks, Michael Mando] i am trying to get only the names of the stars and first_stars.text gives the following error AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_3104\1297903165.py in <module> 1 first_stars = first_movie.select('a[href*="name"]') ----> 2 first_stars.text ~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key) 2288 """Raise a helpful exception to explain a common code fix.""" 2289 raise AttributeError( -> 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2291 ) AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()? when i tried first_stars = first_movie.find('a[href*="name"]') first_stars.text i also got the following error AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_3104\2359725208.py in <module> 1 first_stars = first_movie.find('a[href*="name"]') ----> 2 first_stars.text AttributeError: 'NoneType' object has no attribute 'text' Any idea how i can extract only the name of the stars?
If you need the star name without distinction this might help. block = soup.find_all("div", attrs={"class":"lister-item mode-advanced"}) starList= list() for star in block: starList.append(star.find("p", attrs={"class":""}).text.replace("Stars:", "").replace("\n", "").strip()) print(starList) It prints ['Bob Odenkirk, Rhea Seehorn, Jonathan Banks, Michael Mando', 'Jason Bateman, Laura Linney, Sofia Hublitz, Skylar Gaertner', 'Gia Sandhu, Anson Mount, Ethan Peck, Jess Bush', 'Josh Brolin, Imogen Poots, Lili Taylor, Tom Pelphrey', 'Pablo Schreiber, Shabana Azmi, Natasha Culzac, Olive Gray', 'Titus Welliver, Mimi Rogers, Madison Lintz, Stephen A. Chang', 'Rachel Griffiths, Sophia Ali, Shannon Berry, Jenna Clause', 'Adam Scott, Zach Cherry, Britt Lower, Tramell Tillman', 'Milo Ventimiglia, Mandy Moore, Sterling K. Brown, Chrissy Metz', 'Emilia Clarke, Peter Dinklage, Kit Harington, Lena Headey', 'Bryan Cranston, Aaron Paul, Anna Gunn, Betsy Brandt', 'Millie Bobby Brown, Finn Wolfhard, Winona Ryder, David Harbour', 'Joe Locke, Kit Connor, Yasmin Finney, William Gao', 'John C. Reilly, Quincy Isaiah, Jason Clarke, Gaby Hoffmann', 'Bill Hader, Stephen Root, Sarah Goldberg, Anthony Carrigan', 'Kaley Cuoco, Zosia Mamet, Griffin Matthews, Rosie Perez', 'Caitríona Balfe, Sam Heughan, Sophie Skelton, Richard Rankin', 'Evan Rachel Wood, Jeffrey Wright, Ed Harris, Thandiwe Newton', 'Patrick Stewart, Alison Pill, Michelle Hurd, Santiago Cabrera', 'Nicola Coughlan, Jonathan Bailey, Ruth Gemmell, Florence Hunt', 'Andrew Lincoln, Norman Reedus, Melissa McBride, Lauren Cohan', 'Cillian Murphy, Paul Anderson, Sophie Rundle, Helen McCrory', 'Manuel Garcia-Rulfo, Becki Newton, Neve Campbell, Christopher Gorham', 'Jane Fonda, Lily Tomlin, Sam Waterston, Martin Sheen', 'Ellen Pompeo, Chandra Wilson, James Pickens Jr., Justin Chambers', 'Alexander Dreymon, Eliza Butterworth, Arnas Fedaravicius, Mark Rowley', 'Luke Grimes, Kelly Reilly, Wes Bentley, Cole Hauser', 'Elisabeth Moss, Wagner Moura, Phillipa Soo, Chris Chalk', "Scott Whyte, Nolan North, Steven Pacey, Emily O'Brien", 'Steve Carell, Jenna Fischer, John Krasinski, Rainn Wilson', 'Jodie Whittaker, Peter Capaldi, Pearl Mackie, Matt Smith', 'Ansel Elgort, Ken Watanabe, Rachel Keller, Shô Kasamatsu', 'James Spader, Megan Boone, Diego Klattenhoff, Ryan Eggold', 'Mark Harmon, David McCallum, Sean Murray, Pauley Perrette', 'Zendaya, Hunter Schafer, Angus Cloud, Jacob Elordi', 'Niv Sultan, Shaun Toub, Shervin Alenabi, Arash Marandi', 'Asa Butterfield, Gillian Anderson, Emma Mackey, Ncuti Gatwa', 'Jack Lowden, Kristin Scott Thomas, Gary Oldman, Chris Reilly', 'Karl Urban, Jack Quaid, Antony Starr, Erin Moriarty', 'Mariska Hargitay, Christopher Meloni, Ice-T, Dann Florek', "Nathan Fillion, Alyssa Diaz, Richard T. Jones, Melissa O'Neil", "Saoirse-Monica Jackson, Louisa Harland, Tara Lynne O'Neill, Kathy Kiera Clarke", 'Donald Glover, Brian Tyree Henry, LaKeith Stanfield, Zazie Beetz', 'Jennifer Aniston, Courteney Cox, Lisa Kudrow, Matt LeBlanc', 'Jared Padalecki, Jensen Ackles, Jim Beaver, Misha Collins', 'Julia Roberts, Sean Penn, Dan Stevens, Betty Gilpin', 'James Gandolfini, Lorraine Bracco, Edie Falco, Michael Imperioli', 'Natasha Lyonne, Charlie Barnett, Greta Lee, Elizabeth Ashley', 'Jean Smart, Hannah Einbinder, Carl Clemons-Hopkins, Rose Abdoo', 'Katheryn Winnick, Gustaf Skarsgård, Alexander Ludwig, Georgia Hirst'] or if you need both title and its stars block = soup.find_all("div", attrs={"class":"lister-item mode-advanced"}) starList= list() movieDict = dict() for star in block: movieDict = { "moviename":star.find("h3", attrs={"class":"lister-item-header"}).text.split("\n")[2], "stars": star.find("p", attrs={"class":""}).text.replace("Stars:", "").replace("\n", "").strip() } starList.append(movieDict) print(starList) this will print [{'moviename': 'Better Call Saul', 'stars': 'Bob Odenkirk, Rhea Seehorn, Jonathan Banks, Michael Mando'}, {'moviename': 'Ozark', 'stars': 'Jason Bateman, Laura Linney, Sofia Hublitz, Skylar Gaertner'}, {'moviename': 'Star Trek: Strange New Worlds', 'stars': 'Gia Sandhu, Anson Mount, Ethan Peck, Jess Bush'}, {'moviename': 'Outer Range', 'stars': 'Josh Brolin, Imogen Poots, Lili Taylor, Tom Pelphrey'}, {'moviename': 'Halo', 'stars': 'Pablo Schreiber, Shabana Azmi, Natasha Culzac, Olive Gray'}, {'moviename': 'Bosch: Legacy', 'stars': 'Titus Welliver, Mimi Rogers, Madison Lintz, Stephen A. Chang'}, {'moviename': 'The Wilds', 'stars': 'Rachel Griffiths, Sophia Ali, Shannon Berry, Jenna Clause'}, {'moviename': 'Severance', 'stars': 'Adam Scott, Zach Cherry, Britt Lower, Tramell Tillman'}, {'moviename': 'This Is Us', 'stars': 'Milo Ventimiglia, Mandy Moore, Sterling K. Brown, Chrissy Metz'}, {'moviename': 'Game of Thrones', 'stars': 'Emilia Clarke, Peter Dinklage, Kit Harington, Lena Headey'}, {'moviename': 'Breaking Bad', 'stars': 'Bryan Cranston, Aaron Paul, Anna Gunn, Betsy Brandt'}, {'moviename': 'Stranger Things', 'stars': 'Millie Bobby Brown, Finn Wolfhard, Winona Ryder, David Harbour'}, {'moviename': 'Heartstopper', 'stars': 'Joe Locke, Kit Connor, Yasmin Finney, William Gao'}, {'moviename': 'Winning Time: The Rise of the Lakers Dynasty', 'stars': 'John C. Reilly, Quincy Isaiah, Jason Clarke, Gaby Hoffmann'}, {'moviename': 'Barry', 'stars': 'Bill Hader, Stephen Root, Sarah Goldberg, Anthony Carrigan'}, {'moviename': 'The Flight Attendant', 'stars': 'Kaley Cuoco, Zosia Mamet, Griffin Matthews, Rosie Perez'}, {'moviename': 'Outlander', 'stars': 'Caitríona Balfe, Sam Heughan, Sophie Skelton, Richard Rankin'}, {'moviename': 'Westworld', 'stars': 'Evan Rachel Wood, Jeffrey Wright, Ed Harris, Thandiwe Newton'}, {'moviename': 'Star Trek: Picard', 'stars': 'Patrick Stewart, Alison Pill, Michelle Hurd, Santiago Cabrera'}, {'moviename': 'Bridgerton', 'stars': 'Nicola Coughlan, Jonathan Bailey, Ruth Gemmell, Florence Hunt'}, {'moviename': 'The Walking Dead', 'stars': 'Andrew Lincoln, Norman Reedus, Melissa McBride, Lauren Cohan'}, {'moviename': 'Peaky Blinders', 'stars': 'Cillian Murphy, Paul Anderson, Sophie Rundle, Helen McCrory'}, {'moviename': 'The Lincoln Lawyer', 'stars': 'Manuel Garcia-Rulfo, Becki Newton, Neve Campbell, Christopher Gorham'}, {'moviename': 'Grace and Frankie', 'stars': 'Jane Fonda, Lily Tomlin, Sam Waterston, Martin Sheen'}, {'moviename': "Grey's Anatomy", 'stars': 'Ellen Pompeo, Chandra Wilson, James Pickens Jr., Justin Chambers'}, {'moviename': 'The Last Kingdom', 'stars': 'Alexander Dreymon, Eliza Butterworth, Arnas Fedaravicius, Mark Rowley'}, {'moviename': 'Yellowstone', 'stars': 'Luke Grimes, Kelly Reilly, Wes Bentley, Cole Hauser'}, {'moviename': 'Shining Girls', 'stars': 'Elisabeth Moss, Wagner Moura, Phillipa Soo, Chris Chalk'}, {'moviename': 'Love, Death & Robots', 'stars': "Scott Whyte, Nolan North, Steven Pacey, Emily O'Brien"}, {'moviename': 'The Office', 'stars': 'Steve Carell, Jenna Fischer, John Krasinski, Rainn Wilson'}, {'moviename': 'Doctor Who', 'stars': 'Jodie Whittaker, Peter Capaldi, Pearl Mackie, Matt Smith'}, {'moviename': 'Tokyo Vice', 'stars': 'Ansel Elgort, Ken Watanabe, Rachel Keller, Shô Kasamatsu'}, {'moviename': 'The Blacklist', 'stars': 'James Spader, Megan Boone, Diego Klattenhoff, Ryan Eggold'}, {'moviename': 'NCIS: Naval Criminal Investigative Service', 'stars': 'Mark Harmon, David McCallum, Sean Murray, Pauley Perrette'}, {'moviename': 'Euphoria', 'stars': 'Zendaya, Hunter Schafer, Angus Cloud, Jacob Elordi'}, {'moviename': 'Tehran', 'stars': 'Niv Sultan, Shaun Toub, Shervin Alenabi, Arash Marandi'}, {'moviename': 'Sex Education', 'stars': 'Asa Butterfield, Gillian Anderson, Emma Mackey, Ncuti Gatwa'}, {'moviename': 'Slow Horses', 'stars': 'Jack Lowden, Kristin Scott Thomas, Gary Oldman, Chris Reilly'}, {'moviename': 'The Boys', 'stars': 'Karl Urban, Jack Quaid, Antony Starr, Erin Moriarty'}, {'moviename': 'Law & Order: Special Victims Unit', 'stars': 'Mariska Hargitay, Christopher Meloni, Ice-T, Dann Florek'}, {'moviename': 'The Rookie', 'stars': "Nathan Fillion, Alyssa Diaz, Richard T. Jones, Melissa O'Neil"}, {'moviename': 'Derry Girls', 'stars': "Saoirse-Monica Jackson, Louisa Harland, Tara Lynne O'Neill, Kathy Kiera Clarke"}, {'moviename': 'Atlanta', 'stars': 'Donald Glover, Brian Tyree Henry, LaKeith Stanfield, Zazie Beetz'}, {'moviename': 'Friends', 'stars': 'Jennifer Aniston, Courteney Cox, Lisa Kudrow, Matt LeBlanc'}, {'moviename': 'Supernatural', 'stars': 'Jared Padalecki, Jensen Ackles, Jim Beaver, Misha Collins'}, {'moviename': 'Gaslit', 'stars': 'Julia Roberts, Sean Penn, Dan Stevens, Betty Gilpin'}, {'moviename': 'The Sopranos', 'stars': 'James Gandolfini, Lorraine Bracco, Edie Falco, Michael Imperioli'}, {'moviename': 'Russian Doll', 'stars': 'Natasha Lyonne, Charlie Barnett, Greta Lee, Elizabeth Ashley'}, {'moviename': 'Hacks', 'stars': 'Jean Smart, Hannah Einbinder, Carl Clemons-Hopkins, Rose Abdoo'}, {'moviename': 'Vikings', 'stars': 'Katheryn Winnick, Gustaf Skarsgård, Alexander Ludwig, Georgia Hirst'}]
You have to iterate the ResultSet: first_stars = [s.text for s in first_movie.select('a[href*="name"]')] first_stars Output: ['Bob Odenkirk', 'Rhea Seehorn', 'Jonathan Banks', 'Michael Mando']
Pandas -Split data and create columns when string occurs
I am looking to read in a text file (see below) and then create columns for all the English leagues only. So I'll be looking to do something like where "Alias name" is "England_" then create a new column with the alias name as the header and then the player names in the rows. note that the first occurrence for Alias is down as "Aliases" in the text file. "-----------------------------------------------------------------------------------------------------------" "- NEW TEAM -" "-----------------------------------------------------------------------------------------------------------" Europe Players 17/04/2019 07:59 p.m. Aliases for England_Premier League ------------------------------------------------------------------------------- Harry Kane Mohamed Salah Kevin De Bruyne The command completed successfully. Alias name England_Division 1 Comment Teams Members ------------------------------------------------------------------------------- Will Grigg Jonson Clarke-Harris Jerry Yates Ivan Toney Troy Parrott The command completed successfully. Alias name Spanish La Liga Comment Members ------------------------------------------------------------------------------- Lionel Messi Luis Suarez Cristiano Ronaldo Sergio Ramos The command completed successfully. Alias name England_Division 2 Comment Members ------------------------------------------------------------------------------- Eoin Doyle Matt Watters James Vughan The command completed successfully. This is my current code on how I'm reading in the data df = pd.read_csv(r'Desktop\SampleData.txt', sep='\n', header=None) This gives me a pandas DF with one column. I'm fairly new to python so I'm wondering how I would go about getting the below result? should I use a delimiter when reading in the file? England_Premier League England_Division 1 England_Division 2 Harry Kane Will Griggs Eoin Doyle Mohamed Salah Jonson Clarke-Harris Matt Watters Kevin De Bruyne Ivan Toney James Vughan Troy Parrott
You can use re module for the task. For example: import re import pandas as pd txt = """ "-----------------------------------------------------------------------------------------------------------" "- NEW TEAM -" "-----------------------------------------------------------------------------------------------------------" Europe Players 17/04/2019 07:59 p.m. Aliases for England_Premier League ------------------------------------------------------------------------------- Harry Kane Mohamed Salah Kevin De Bruyne The command completed successfully. Alias name England_Division 1 Comment Teams Members ------------------------------------------------------------------------------- Will Grigg Jonson Clarke-Harris Jerry Yates Ivan Toney Troy Parrott The command completed successfully. Alias name Spanish La Liga Comment Members ------------------------------------------------------------------------------- Lionel Messi Luis Suarez Cristiano Ronaldo Sergio Ramos The command completed successfully. Alias name England_Division 2 Comment Members ------------------------------------------------------------------------------- Eoin Doyle Matt Watters James Vughan The command completed successfully. """ r_competitions = re.compile(r"^Alias(?:(?:es for)| name)\s*(.*?)$", flags=re.M) r_names = re.compile(r"^-+$\s*(.*?)\s*The command", flags=re.M | re.S) dfs = [] for comp, names in zip(r_competitions.findall(txt), r_names.findall(txt)): if not "England" in comp: continue data = [] for n in names.split("\n"): data.append({comp: n}) dfs.append(pd.DataFrame(data)) print(pd.concat(dfs, axis=1).fillna("")) Prints: England_Premier League England_Division 1 England_Division 2 0 Harry Kane Will Grigg Eoin Doyle 1 Mohamed Salah Jonson Clarke-Harris Matt Watters 2 Kevin De Bruyne Jerry Yates James Vughan 3 Ivan Toney 4 Troy Parrott
startswith() function help needed in Pandas Dataframe
I have a Name Column in Dataframe in which there are Multiple names. DataFrame import pandas as pd df = pd.DataFrame({'name': ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', "Mr. Roderick Robert Crispin", "Cunningham"," Mr. Alfred Fleming"]})` OUTPUT Name 0 Brailey, Mr. William Theodore Ronald 1 Roger Marie Bricoux 2 Mr. Roderick Robert Crispin 3 Cunningham 4 Mr. Alfred Fleming I wrote a row classification function, like if I pass a row/name it should return output class mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke'] def classify_role(row): if row.loc['name'] in mus: return 'musician' Calling a function is_brailey = df['name'].str.startswith('Brailey') print(classify_role(df[is_brailey].iloc[0])) Should show 'musician' But output is showing different class I think I am writing something wrong here in classify_role() Must be this row if row.loc['name'] in mus: Summary: I am in need of a solution if I put first name of a person in startswith() who is in musi it should return musician
EDIT: If want test if values exist in lists you can create dictionary and test membership by Series.isin: mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke'] cat1 = ['Mr. Alfred Fleming','Cunningham'] d = {'musician':mus, 'category':cat1} for k, v in d.items(): df.loc[df['Name'].isin(v), 'type'] = k print (df) Name type 0 Brailey, Mr. William Theodore Ronald musician 1 Roger Marie Bricoux musician 2 Mr. Roderick Robert Crispin NaN 3 Cunningham category 4 Mr. Alfred Fleming category Your solution should be changed: mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke'] def classify_role(row): if row in mus: return 'musician' df['type'] = df['Name'].apply(classify_role) print (df) Name type 0 Brailey, Mr. William Theodore Ronald musician 1 Roger Marie Bricoux musician 2 Mr. Roderick Robert Crispin None 3 Cunningham None 4 Mr. Alfred Fleming None You can pass values in tuple to Series.str.startswith, solution should be expand to match more categories by dictionary: d = {'musician': ['Brailey, Mr. William Theodore Ronald'], 'cat1':['Roger Marie Bricoux', 'Cunningham']} for k, v in d.items(): df.loc[df['Name'].str.startswith(tuple(v)), 'type'] = k print (df) Name type 0 Brailey, Mr. William Theodore Ronald musician 1 Roger Marie Bricoux cat1 2 Mr. Roderick Robert Crispin NaN 3 Cunningham cat1 4 Mr. Alfred Fleming NaN
Scraping table by beautiful soup 4
Hello I am trying to scrape this table in this url: https://www.espn.com/nfl/stats/player/_/stat/rushing/season/2018/seasontype/2/table/rushing/sort/rushingYards/dir/desc There are 50 rows in this table.. however if you click Show more (just below the table), more of the rows appear. My beautiful soup code works fine, But the problem is it retrieves only the first 50 rows. It doesnot retrieve rows that appear after clicking the Show more. How can i get all the rows including first 50 and also those appears after clicking Show more? Here is the code: #Request to get the target wiki page rqst = requests.get("https://www.espn.com/nfl/stats/player/_/stat/rushing/season/2018/seasontype/2/table/rushing/sort/rushingYards/dir/desc") soup = BeautifulSoup(rqst.content,'lxml') table = soup.find_all('table') NFL_player_stats = pd.read_html(str(table)) players = NFL_player_stats[0] players.shape out[0]: (50,1)
Using DevTools in Firefox I see it gets data (in JSON format) for next page from https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page=2 If you change value in page= then you can get other pages. import requests url = 'https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page=' for page in range(1, 4): print('\n---', page, '---\n') r = requests.get(url + str(page)) data = r.json() #print(data.keys()) for item in data['athletes']: print(item['athlete']['displayName']) Result: --- 1 --- Ezekiel Elliott Saquon Barkley Todd Gurley II Joe Mixon Chris Carson Christian McCaffrey Derrick Henry Adrian Peterson Phillip Lindsay Nick Chubb Lamar Miller James Conner David Johnson Jordan Howard Sony Michel Marlon Mack Melvin Gordon Alvin Kamara Peyton Barber Kareem Hunt Matt Breida Tevin Coleman Aaron Jones Doug Martin Frank Gore Gus Edwards Lamar Jackson Isaiah Crowell Mark Ingram II Kerryon Johnson Josh Allen Dalvin Cook Latavius Murray Carlos Hyde Austin Ekeler Deshaun Watson Kenyan Drake Royce Freeman Dion Lewis LeSean McCoy Mike Davis Josh Adams Alfred Blue Cam Newton Jamaal Williams Tarik Cohen Leonard Fournette Alfred Morris James White Mitchell Trubisky --- 2 --- Rashaad Penny LeGarrette Blount T.J. Yeldon Alex Collins C.J. Anderson Chris Ivory Marshawn Lynch Russell Wilson Blake Bortles Wendell Smallwood Marcus Mariota Bilal Powell Jordan Wilkins Kenneth Dixon Ito Smith Nyheim Hines Dak Prescott Jameis Winston Elijah McGuire Patrick Mahomes Aaron Rodgers Jeff Wilson Jr. Zach Zenner Raheem Mostert Corey Clement Jalen Richard Damien Williams Jaylen Samuels Marcus Murphy Spencer Ware Cordarrelle Patterson Malcolm Brown Giovani Bernard Chase Edmonds Justin Jackson Duke Johnson Taysom Hill Kalen Ballage Ty Montgomery Rex Burkhead Jay Ajayi Devontae Booker Chris Thompson Wayne Gallman DJ Moore Theo Riddick Alex Smith Robert Woods Brian Hill Dwayne Washington --- 3 --- Ryan Fitzpatrick Tyreek Hill Andrew Luck Ryan Tannehill Josh Rosen Sam Darnold Baker Mayfield Jeff Driskel Rod Smith Matt Ryan Tyrod Taylor Kirk Cousins Cody Kessler Darren Sproles Josh Johnson DeAndre Washington Trenton Cannon Javorius Allen Jared Goff Julian Edelman Jacquizz Rodgers Kapri Bibbs Andy Dalton Ben Roethlisberger Dede Westbrook Case Keenum Carson Wentz Brandon Bolden Curtis Samuel Stevan Ridley Keith Ford Keenan Allen John Kelly Kenjon Barner Matthew Stafford Tyler Lockett C.J. Beathard Cameron Artis-Payne Devonta Freeman Brandin Cooks Isaiah McKenzie Colt McCoy Stefon Diggs Taylor Gabriel Jarvis Landry Tavon Austin Corey Davis Emmanuel Sanders Sammy Watkins Nathan Peterman EDIT: get all data as DataFrame import requests import pandas as pd url = 'https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page=' df = pd.DataFrame() # emtpy DF at start for page in range(1, 4): print('page:', page) r = requests.get(url + str(page)) data = r.json() #print(data.keys()) for item in data['athletes']: player_name = item['athlete']['displayName'] position = item['athlete']['position']['abbreviation'] gp = item['categories'][0]['totals'][0] other_values = item['categories'][2]['totals'] row = [player_name, position, gp] + other_values df = df.append( [row] ) # append one row df.columns = ['NAME', 'POS', 'GP', 'ATT', 'YDS', 'AVG', 'LNG', 'BIG', 'TD', 'YDS/G', 'FUM', 'LST', 'FD'] print(len(df)) # 150 print(df.head(20))