Unable to scrape similar links from different depth out of a webpage - python

I've created a script in python to parse different links from a webpage. There are two section in the landing page. One is Top Experiences and the other is More Experiences. My current attempt can fetch the links from both the categories.
The type of links I wanna collect are (few of them) under the Top Experiences section at this moment. However, when I traverse the links under More Experiences section, I can see that they all lead to the page in which there is a section named Experiences under which there are links that are similar to the links under Top Experiences in the landing page. I wanna grab them all.
One such desirable link I'm after looks like: https://www.airbnb.com/experiences/20712?source=seo.
website link
My current attempt fetches the links from both the categories:
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
def get_links(link):
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
items = [urljoin(link,item.get("href")) for item in soup.select("div[style='margin-top:16px'] a._1f0v6pq")]
return items
if __name__ == '__main__':
for item in get_links(URL):
print(item)
How can I parse all the links under Top Experiences section along with the links under Experiences section that can be found upon traversing the links under More Experiences?
Please check out the image if anything unclear. I used a pen available in paint so the writing may be a little hard to understand.

The solution is slightly tricky. It can be achieved in several ways. The one I find most useful is use the links under More Experiences within get_links() function recursively. All the links under More Experiences have a common keyword _pdp-.
So, when you define condional statement within the function to make the links sieve through the function get_links() recursively then the else block will produces the desired links. Most important thing to notice is that all the desired links are within the class _1f0v6pq So the logic of getting the links is fairly easy .
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
def get_links(link):
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("div[style='margin-top:16px'] a._1f0v6pq"):
if "_pdp-" in item.get("href"):
get_links(urljoin(URL,item.get("href")))
else:
print(urljoin(URL,item.get("href")))
if __name__ == '__main__':
get_links(URL)

Process:
Get all Top Experiences links
Get all More Experiences links
Send a request to all More Experiences links one by one and get the links under Experiences in each page.
The div under which the links are present are same for all the pages have the same class _12kw8n71
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
from time import sleep
from random import randint
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")
top_experiences= [urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[0].find_all('a')]
more_experiences= [urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[1].find_all('a')]
generated_experiences=[]
#visit each link in more_experiences
for url in more_experiences:
sleep(randint(1,10))#avoid blocking by putting some delay
generated_experiences.extend([urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[0].find_all('a')])
Notes:
Your required links will be present in three lists top_experiences , more_experiences and generated_experiences
I have added random delay to avoid getting blocked.
Not printing the lists as it will be too long.
top_experiences - 50 links
more_experiences - 299 links
generated_experiences -14950 links

Seem like both the "Top Experience" and "More experiences" links share the same class so you can just use .find_all to obtain the links.
import requests
#from urllib.parse import urljoin
from bs4 import BeautifulSoup
# URL to scrape
url = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
# Make request and Initialize BS4 with request content
req = requests.get(url)
soup = BeautifulSoup(req.content, "lxml")
# Tag that contains "Top Experiences" and "More Experiences"
soup.find_all(class_="_l8g1fr")
# Test Code
#Prints title of links and the href
links = soup.find_all(class_="_l8g1fr")
for link in links:
print(link.find("a").get_text())
print(link.find("a").get('href'))
Refactor code to meet your coding paradigm.

You can scrape from the divs with class "_12kw8n71":
from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0').text, 'html.parser')
a, b, *_ = d.find_all('div', {'class':'_12kw8n71'})
result = {'top_experiences':[[i.text, i['href']] for i in a.find_all('a')], 'more_experiences':[[i.text, i['href']] for i in b.find_all('a')]}
Output (Only top experiences and part of the links from more experiences, as the full output exceeds Stackoverflow's character limit):
{'top_experiences': [["#1 HOLLYWOOD SIGN TOUR - WORLD'S BEST", '/experiences/26790?source=seo'], ['**#1-COMEDY/INSTAGRAM HOLLYWOOD WALK**', '/experiences/94033?source=seo'], ["A Potter's Wheel in Brooklyn", '/experiences/139532?source=seo'], ['Absolute: Seoul Pub Crawl & Party', '/experiences/161886?source=seo'], ['Amsterdam Experience Cruise', '/experiences/145329?source=seo'], ['Argentine Tango classes in London', '/experiences/241771?source=seo'], ['Axe throwing Canadian Experience', '/experiences/372251?source=seo'], ['Be a chocolate maker for a day', '/experiences/161912?source=seo'], ['Chianti in a Glass', '/experiences/179469?source=seo'], ['Cooking class in the Chianti Hills', '/experiences/25122?source=seo'], ['Cycle and Snack through hidden Bangkok', '/experiences/315116?source=seo'], ['Discover Valpolicella', '/experiences/77468?source=seo'], ["Don't watch Soweto, be part of Soweto", '/experiences/243188?source=seo'], ['Explore Pompeii w/ an Archaeologist', '/experiences/176605?source=seo'], ['Explore backstreets with a historian', '/experiences/90528?source=seo'], ['Explore the hidden food gems of Athens', '/experiences/256850?source=seo'], ['Fall and Winter Forest Trail Rides', '/experiences/236040?source=seo'], ['Foodies of Palermo - street food tour', '/experiences/212673?source=seo'], ['Forge a silver ring with jewellers', '/experiences/209198?source=seo'], ['Handmade Family Pasta & Tiramisù', '/experiences/115906?source=seo'], ['Hidden Gems of Old Delhi', '/experiences/154177?source=seo'], ['Hike Runyon Canyon with a rescue dog', '/experiences/82265?source=seo'], ['Hike Table Mountain with a Guide!', '/experiences/137248?source=seo'], ['Kayak in La Jolla', '/experiences/132375?source=seo'], ['Lakehouse Jazz', '/experiences/53239?source=seo'], ["Lisbon's best flavors", '/experiences/64564?source=seo'], ["London's Top Sights Tour : Fun Guide!", '/experiences/203605?source=seo'], ['Make ramen noodles from scratch', '/experiences/61279?source=seo'], ['Mount Etna excursion and lava tubes', '/experiences/221142?source=seo'], ['Must Have L.A. Pictures!', '/experiences/59665?source=seo'], ['MySurf School Bali', '/experiences/153910?source=seo'], ['PASTAMANIA', '/experiences/103280?source=seo'], ['Paddle with the Penguins', '/experiences/112515?source=seo'], ['Paella Maestro', '/experiences/51311?source=seo'], ["Paris' Best Kept Secrets Tour", '/experiences/113388?source=seo'], ['Play, Walk & Feed Anteaters +Kinkajou', '/experiences/270405?source=seo'], ['Red Dunes Safari & BBQ #The Real Camp', '/experiences/105242?source=seo'], ['San José Chocolate Tasting Tour', '/experiences/280822?source=seo'], ['Snorkel in Manly with a local expert', '/experiences/66873?source=seo'], ['Sunrise SUP into Caves & Grottos', '/experiences/242911?source=seo'], ['Sushi-making Experience', '/experiences/53271?source=seo'], ['Swim with sea lions in their natural habitat', '/experiences/327235?source=seo'], ['The Drinking & Prohibition walk', '/experiences/102778?source=seo'], ['The Ultimate Harry Potter Walking Tour', '/experiences/188432?source=seo'], ['Tsukiji (Old) vs Toyosu (New) S.S Tour', '/experiences/71924?source=seo'], ['Wolf Encounter', '/experiences/47240?source=seo'], ['XXL Wild&Organic Roman food: Eat&Learn', '/experiences/161461?source=seo'], ['Your First Chinese Seal-engraving Work', '/experiences/166172?source=seo'], ['Под руку с духами по старой Москве', '/experiences/94558?source=seo'], ["⭐BIKE to taste the world's BEST TACOS!", '/experiences/75047?source=seo']], 'more_experiences': [[" Flying Trapeze - 'Ten Things I Hate About You' Tour", '/sitemaps/v2/experiences_pdp-L1-0'], ["'The Devil in the White City' Tour - 23:59 in Paris", '/sitemaps/v2/experiences_pdp-L1-1'], ["24h en Republica Dominicana - 80s Cover Band's Secret Gig[FUKUOKA]", '/sitemaps/v2/experiences_pdp-L1-2'], ['80s Nights - A Journey Through History', '/sitemaps/v2/experiences_pdp-L1-3'], ["A Journey for the Tastebuds, in Malaga - A Tea Sommelier's Zen Tea Party", '/sitemaps/v2/experiences_pdp-L1-4'], ['A Time for Home - A music walk with a french composer', '/sitemaps/v2/experiences_pdp-L1-5'], ['A musical walk in Salzburg - ANALOG PHOTOGRAPHY in DC! Shoot FILM!', '/sitemaps/v2/experiences_pdp-L1-6'], ['ANGEL, ENERGIA GASTRONOMICA ANCESTRAL! - Acuarela al aire libre en Poblenou', '/sitemaps/v2/experiences_pdp-L1-7'], ['Acupuncture in Historic Clinic - Afternoon tea at the Casavant Villa', '/sitemaps/v2/experiences_pdp-L1-8'], ["Afterwork dans un Atelier d'Artistes - Alpine Raft Expérience", '/sitemaps/v2/experiences_pdp-L1-9'], ['Alpine Tour Skiing Uphill/Downhill - Amélie en live!', '/sitemaps/v2/experiences_pdp-L1-10'], ['An African Dance & Cultural Experience - Another BKK', '/sitemaps/v2/experiences_pdp-L1-11'], ['Another Etna: Trekking, Food and Wine - Aprende o perfecciona bailes caribeños', '/sitemaps/v2/experiences_pdp-L1-12'], ['Aprende una actividad unica de Oaxaca - Around historical area with lovely dog', '/sitemaps/v2/experiences_pdp-L1-13'], ['Around the Neva & its embankments - Art tour Soumaya & Jumex Museums visit', '/sitemaps/v2/experiences_pdp-L1-14'], ['Art tour at Bellas Artes - Ascent to the crater of the Puracé volcano', '/sitemaps/v2/experiences_pdp-L1-15'], ['Asciende a la cima del Batea Mahuida - Atlas Mountains Day Trip: imlil Valley', '/sitemaps/v2/experiences_pdp-L1-16'], ['Atlas Mountains Full day Cultural Trip - Authentic Tango Night', '/sitemaps/v2/experiences_pdp-L1-17'], ['Authentic Tango with Live Music - BIARRITZ WALKING HISTORICAL TOUR', '/sitemaps/v2/experiences_pdp-L1-18'], ['BIG SUP Fun - Paddle Sevilla! - Bake ‘Dorayaki’ Japanese pancake', '/sitemaps/v2/experiences_pdp-L1-19'], ['Bakery Hop with Perth Brunch Bloggers - Banyuatis-Munduk Amazing Bike Tour', '/sitemaps/v2/experiences_pdp-L1-20'], ["Baptism of Rome for First-Timers - Batik: What's That?", '/sitemaps/v2/experiences_pdp-L1-21'], ['Batom vermelho - um estilo de vida - Beach Exercise Under The Sun', '/sitemaps/v2/experiences_pdp-L1-22'], ['Beach FlowJAM - Become Captain of the Day with Vicky', '/sitemaps/v2/experiences_pdp-L1-23'], ['Become Reiki 1 Certified in Hawaii! - Beginner Workshop', '/sitemaps/v2/experiences_pdp-L1-24'], ['Beginner and Kid Friendly Kayak Tour - Best Croissants + Coffee of Paris Walk', '/sitemaps/v2/experiences_pdp-L1-25'], ['Best Dirt Biking Experience in KL! - Big Easy Food Tours', '/sitemaps/v2/experiences_pdp-L1-26'], ['Big Fish In The Big Apple - Bike the Wineries', '/sitemaps/v2/experiences_pdp-L1-27'], ['Bike the local backroads - Birdwatching al Parco del Circeo', '/sitemaps/v2/experiences_pdp-L1-28'], ['Birdwatching and camping - Boat Trip at Arrabida with Lunch', '/sitemaps/v2/experiences_pdp-L1-29'], ['Boat Trip to historic Clonmacnoise - Boozin in Brooklyn: Spirits & Beer', '/sitemaps/v2/experiences_pdp-L1-30'], ["Boozy Icy Secrets - Breakfast Coffee Yoga # Kellogg's Cafe", '/sitemaps/v2/experiences_pdp-L1-31'], ['Breakfast With The (Chicago) Bear - Brugge voor jou.', '/sitemaps/v2/experiences_pdp-L1-32'], ['Brumby (wild) Horse Experience - Bushwalk with Bob', '/sitemaps/v2/experiences_pdp-L1-33'], ["Bushwalking; Airlie's best kept secret - Cabalgata, pasión por la aventura.", '/sitemaps/v2/experiences_pdp-L1-34'], ['Cabalgatas "Buscando el sol de ayer" - Camina por el parque observando aves', '/sitemaps/v2/experiences_pdp-L1-35'], ['Camina por la Historia de Jujuy - Canoeing with Champagne and Pastries', '/sitemaps/v2/experiences_pdp-L1-36'], ['Canoeing, Kayaking, Cycling in Catba - Capture your relationship in photos', '/sitemaps/v2/experiences_pdp-L1-37'], ['Capture your smile in beautiful hanbok - Catalonia Sailing Xperience', '/sitemaps/v2/experiences_pdp-L1-38'], ['Catalonia from Above - Cervantes Tapas Crawl Alcalá d Henares', '/sitemaps/v2/experiences_pdp-L1-39'], ['Cervezas artesanales de Querétaro - Cherry blossom viewing party in Ueno', '/sitemaps/v2/experiences_pdp-L1-40'], ['Cherry blossoms and your photo session - Chocolate tasting and tour', '/sitemaps/v2/experiences_pdp-L1-41'], ['Chocolate therapy - City Tour With Visit to Teatro Colon', '/sitemaps/v2/experiences_pdp-L1-42'], ['City Tour around Cali - Climb a sea cliff in El Empordà coast', '/sitemaps/v2/experiences_pdp-L1-43'], ['Climb an iconic hill with an instructor - Cocktails and Canapés', '/sitemaps/v2/experiences_pdp-L1-44'], ['Cocktails and Canvas - Columbia River King Salmon Fishing', '/sitemaps/v2/experiences_pdp-L1-45'], ['Columbus Cemetery. History and Art - Conhecendo o centro do recife', '/sitemaps/v2/experiences_pdp-L1-46'], ['Conhecendo o litoral e as belas praias - Cook great traditional dish with Heba', '/sitemaps/v2/experiences_pdp-L1-47'], ['Cook with Cristiana and Mamma Nora - Cook shelf mania.. authentic n savory', '/sitemaps/v2/experiences_pdp-L1-48'], ['Cook tapas at a secret private eatery - Cooking Spanish "tapas"', '/sitemaps/v2/experiences_pdp-L1-49'], ['Cooking Spanish classics with Soul! - Copenhagen Skyline Art Workshop', '/sitemaps/v2/experiences_pdp-L1-50'], ['Copenhagen Small Group Tour - Couture dans un atelier de Montmartre', '/sitemaps/v2/experiences_pdp-L1-51'], ["Covered Walkways Through a Writer’s Eyes - Crawlin' in the City - Bar Crawl", '/sitemaps/v2/experiences_pdp-L1-52'], ["Crazy French Sauces - Create a Story in a Writer's Retreat", '/sitemaps/v2/experiences_pdp-L1-53'], ['Create a Travel Journal - Create with a pro tie-dye artist', '/sitemaps/v2/experiences_pdp-L1-54'], ['Create wooden guitar picks and jewelry - Creative Raw Vegan Cooking Workshop', '/sitemaps/v2/experiences_pdp-L1-55'], ['Creative Rendezvous in Copenhagen - Créer ses cosmétiques naturels (DIY)', '/sitemaps/v2/experiences_pdp-L1-56'], ['Créer un Mandala Energétique - Cupcake Bake & Decorate - Spitalfields', '/sitemaps/v2/experiences_pdp-L1-57'], ['Cupcake Bouquet Class - Cycling in quiet NE. Connecticut.', '/sitemaps/v2/experiences_pdp-L1-58'], ['Cycling in the Don Valley - DJ WORKSHOP IN AACHEN GEBE DEN TAKT AN', '/sitemaps/v2/experiences_pdp-L1-59'], ['DJ Workshop and Indie Scene Night - Dark tour of Split', '/sitemaps/v2/experiences_pdp-L1-60'], ['Darkroom Photography in Kuala Lumpur - Decorative Wreath Workshop', '/sitemaps/v2/experiences_pdp-L1-61'], ['Decoupage for DIY memorabilia & gifts - Descubre México a través de la foto', '/sitemaps/v2/experiences_pdp-L1-62'], ['Descubre Playa Blanca, Saltos - Design a Garment in the World Market', '/sitemaps/v2/experiences_pdp-L1-63'], ['Design a game level in a game studio - Dinghy Sailing at El Portet', '/sitemaps/v2/experiences_pdp-L1-64'], ['Dingle Traditional Rowing - Discover CDMX from the best Rooftops', '/sitemaps/v2/experiences_pdp-L1-65'], ['Discover Caen "Art de vivre" - Discover Manzanita Scavenger Hunt', '/sitemaps/v2/experiences_pdp-L1-66'], ["Discover Marrakech Secrets by Bike - Discover Tarraco's living history", '/sitemaps/v2/experiences_pdp-L1-67'], ['Discover Tarragona coast from the sea - Discover nd paint your Aztec protector', '/sitemaps/v2/experiences_pdp-L1-68'], ['Discover new flavors with an oenophile - Discover the cosmos', '/sitemaps/v2/experiences_pdp-L1-69'], ['Discover the emotion of Kitesurf - Discovering Montjuic Hill', '/sitemaps/v2/experiences_pdp-L1-70'], ['Discovering Old Medina of Casablanca - Diving, Dipping, Swiming in Varadero', '/sitemaps/v2/experiences_pdp-L1-71'], ['Dj Workshop - Downtown Toronto Photoshoot Experience', '/sitemaps/v2/experiences_pdp-L1-72'], ['Downtown Traditional Cantina Pub Crawl - Drinking&Eating with Tokyohandsomeboy', '/sitemaps/v2/experiences_pdp-L1-73'], ['Drinks and a Show Walking Tour - Découverte des dauphins', '/sitemaps/v2/experiences_pdp-L1-74'], ['Découverte des spiritueux québécois - EL FASCINANTE MUNDO DEL MOSAICO', '/sitemaps/v2/experiences_pdp-L1-75'], ['ELDORADO: sus plantas y lugares. - Eat Like a Local - Dinner Experience', '/sitemaps/v2/experiences_pdp-L1-76'], ['Eat Like a Local at Butterworth - Eco Friendly Ranch & Petting Zoo', '/sitemaps/v2/experiences_pdp-L1-77'], ['Eco Hike in Pinar del Rio - Electric Skateboard down Paris', '/sitemaps/v2/experiences_pdp-L1-78'], ['Electric Skateboard the Highline Canal - Enigmatic / Sintra', '/sitemaps/v2/experiences_pdp-L1-79'], ['Enjoin Local Meal By Taking Vegetable - Enjoy local food experience in Tokyo!', '/sitemaps/v2/experiences_pdp-L1-80'], ['Enjoy local western part of Jeju ! - Escalada en Roca en Acantilados Valpo', '/sitemaps/v2/experiences_pdp-L1-81'], ['Escalada en Tarifa - Everyone Can Dance! Brussels', '/sitemaps/v2/experiences_pdp-L1-82'], ['Everyone can do Calligraphy - Experience Fabulous698B! Spring menu!', '/sitemaps/v2/experiences_pdp-L1-83'], ['Experience Falconry Firsthand - Experience a tasting menu in Saigon', '/sitemaps/v2/experiences_pdp-L1-84'], ['Experience a traditional Sunday lunch - Experiencia de golf en Buenos Aires', '/sitemaps/v2/experiences_pdp-L1-85'], ['Experiencia de un día en Salamanca - Explore Bowen Island Walking Adventure', '/sitemaps/v2/experiences_pdp-L1-86'], ['Explore Brazilian cuisine with a chef - Explore Korea broadcast Town & MBC', '/sitemaps/v2/experiences_pdp-L1-87'], ['Explore Korčula with Art historian - Explore Saratoga Passage by Kayak', '/sitemaps/v2/experiences_pdp-L1-88'], ['Explore Sceneries of Sokcho - Explore city nightlife with a musician', '/sitemaps/v2/experiences_pdp-L1-89'], ['Explore city nightlife with locals! - Explore the Old District Sham Shui Po', '/sitemaps/v2/experiences_pdp-L1-90'], ['Explore the Oslo Wilderness in Kayak - Exploring Maspalomas by E-Scooter', '/sitemaps/v2/experiences_pdp-L1-91'], ['Exploring RawaPening Lake of Salatiga - FOOTBALL MATCH IN MEDELLIN', '/sitemaps/v2/experiences_pdp-L1-92'], ['FOREST BATHING AND CREATING MEMORIES - Fantasy Photo~Shoot with A Unicorn...', '/sitemaps/v2/experiences_pdp-L1-93'], ['Fantasy adventure in the mountains - Fazendo Arte na Serra da Cantareira!', '/sitemaps/v2/experiences_pdp-L1-94'], ['Faça acrobacia em tecido como circense - Film your experience in Osaka', '/sitemaps/v2/experiences_pdp-L1-95'], ['FilmNeverDies-Filmphotography Workshop - Fishing with the Presidents', '/sitemaps/v2/experiences_pdp-L1-96'], ['Fishing, Malecon and Sunset - Florist in Paris', '/sitemaps/v2/experiences_pdp-L1-97'], ['Florist terrace & aperol spritz - Follow a historian through Havana', '/sitemaps/v2/experiences_pdp-L1-98'], ['Follow a local in vibrant East London - Food, Fotos & The French', '/sitemaps/v2/experiences_pdp-L1-99'], ['Food, Nature and Art on Tuscan Hills - Fossil Recovery Exploration', '/sitemaps/v2/experiences_pdp-L1-100'], ['Fossil hunting, Lets GO! - French food market tour & aperitif', '/sitemaps/v2/experiences_pdp-L1-101'], ['French macarons baking class - Full day surf experience', '/sitemaps/v2/experiences_pdp-L1-102'], ['Full day trail ride - Gaelic Craic and Cuisine', '/sitemaps/v2/experiences_pdp-L1-103'], ['Gain insight from a spiritual healer - Gems of Rome...', '/sitemaps/v2/experiences_pdp-L1-104'], ['Generations of Beekeeping - Get uplifted with gospel in Harlem', '/sitemaps/v2/experiences_pdp-L1-105'], ['Get ur Shine On Moonshine/Whiskey Tour - Glenorchy and Paradise Explorer', '/sitemaps/v2/experiences_pdp-L1-106'], ['Gliding on the Ice in the Red Dot - Goat Yoga!', '/sitemaps/v2/experiences_pdp-L1-107'], ['Goatherd for one day! - Graffiti, Street & Young Art in Vienna', '/sitemaps/v2/experiences_pdp-L1-108'], ['Grampians hike & waterfalls - Guide tours with E-bike in Menorca', '/sitemaps/v2/experiences_pdp-L1-109'], ['Guided Photo Tour ART Walk - Guitar workshop for the Byron Vibe', '/sitemaps/v2/experiences_pdp-L1-110'], ['Gulf Coast Birding For Beginners - Hand Building with clay in Brooklyn', '/sitemaps/v2/experiences_pdp-L1-111'], ['Hand Coffee Roasting & Cupping Class - Hang loose w/ Island Style Surf School', '/sitemaps/v2/experiences_pdp-L1-112'], ['Hang out in Shibuya and craft bonsai - Have a personal photographer for a day', '/sitemaps/v2/experiences_pdp-L1-113'], ['Have a picnic in a national park - Helicopter tour from Paris to Versailles', '/sitemaps/v2/experiences_pdp-L1-114'], ['Helicopter tour in São Paulo - Hidden Sculpture Stories Tour', '/sitemaps/v2/experiences_pdp-L1-115'], ['Hidden Sights & Cosmopolitan Magic - Hike Forest+ City Summit+ Rose Garden', '/sitemaps/v2/experiences_pdp-L1-116'], ['Hike Gracia hills for view & workout - Hike the Braids and Blackford Hill', '/sitemaps/v2/experiences_pdp-L1-117'], ['Hike the Bridge to Nowhere - Hiking Ajusco (1day)', '/sitemaps/v2/experiences_pdp-L1-118'], ['Hiking Ancient Mountain Olive Trees - Hill Country Estate Winery Tour', '/sitemaps/v2/experiences_pdp-L1-119'], ['Hill Country Yoga Hike - History and Tapas', '/sitemaps/v2/experiences_pdp-L1-120'], ['History and art inside Tijuca forest - Home-based cooking class in Panaji', '/sitemaps/v2/experiences_pdp-L1-121'], ['Home-cooked traditional Indian Food - Horseback ride California cowboy style', '/sitemaps/v2/experiences_pdp-L1-122'], ['Horseback ride and lunch with a farmer - Hungarian Cooking Class & Market Walk', '/sitemaps/v2/experiences_pdp-L1-123'], ['Hungarian pasta workshop and dinner - Ikebana experience', '/sitemaps/v2/experiences_pdp-L1-124'], ['Ikebana exprence - In the middle of the wet forest.', '/sitemaps/v2/experiences_pdp-L1-125'], ['In the trails of Rio - Inside Scoop LA', '/sitemaps/v2/experiences_pdp-L1-126'], ['Inside Showjumping with a Journalist - Interactive experience of lightpainting room', '/sitemaps/v2/experiences_pdp-L1-127'], ['Interactive outdoor navigation course - Invigorating Yoga Flow at Balboa Park', '/sitemaps/v2/experiences_pdp-L1-128'], ['Invitation To Ginza Personal Shopping - JUNGLE ART WALK TULUM', '/sitemaps/v2/experiences_pdp-L1-129'], ['JUNGLE IN THE CITY - Jazz in Cape Town with Lee Thomson', '/sitemaps/v2/experiences_pdp-L1-130'], ['Jazz in a Secret Venue - Joshua Tree Macramé Journey', '/sitemaps/v2/experiences_pdp-L1-131'], ['Joshua Tree Recording Studio & Lodging - Kamikochi hike and visit the Shrine', '/sitemaps/v2/experiences_pdp-L1-132'], ['Kanazawa Haul at Depachika - Kayak with the Dolphins', '/sitemaps/v2/experiences_pdp-L1-133'], ['Kayak&Snorkeling experience in Genoa - Kitesurf In Lake Garda', '/sitemaps/v2/experiences_pdp-L1-134'], ['Kitesurf Lecce-Scuola Kite Salento - Kundalini Yoga to awaken Inner Lover', '/sitemaps/v2/experiences_pdp-L1-135'], ['Kundalini Yoga, Gong and Yogi Tea - LGBTQ+ History Tour of Brighton', '/sitemaps/v2/experiences_pdp-L1-136'], ['LGBTQ+ queer and trans tour of London - Lamawandern im Mostviertel', '/sitemaps/v2/experiences_pdp-L1-137'], ['Lamb Farm Life - Learn 3D printing at Maker Bean Cafe!', '/sitemaps/v2/experiences_pdp-L1-138'], ['Learn 3D printing: design a souvenir - Learn Pencak Silat (Bali Martial Art)', '/sitemaps/v2/experiences_pdp-L1-139'], ['Learn Phone Photography in Mongkok - Learn and indulge in Quercy cuisine', '/sitemaps/v2/experiences_pdp-L1-140'], ['Learn and play poker in Barão Geraldo - Learn how to wakeboard', '/sitemaps/v2/experiences_pdp-L1-141'], ['Learn how to wakeboard or wakesurf - Learn to DJ and dance at a queer club', '/sitemaps/v2/experiences_pdp-L1-142'], ['Learn to DJ at Private Studio - Learn to bake sourdough in my kitchen', '/sitemaps/v2/experiences_pdp-L1-143'], ['Learn to bake traditional Irish Breads - Learn to play badminton!', '/sitemaps/v2/experiences_pdp-L1-144'], ['Learn to play poker from a Vegas pro - Legends and secrets with traditional music', '/sitemaps/v2/experiences_pdp-L1-145'], ["Lei Po'o (Flower Crown) Workshop - Let's learn how to wearing kimono", '/sitemaps/v2/experiences_pdp-L1-146'], ["Let's learn to SKETCH & DRAW! - Life changing Hike on Crowders Mtn.", '/sitemaps/v2/experiences_pdp-L1-147'], ['Life coaching with horses - Lisbon with the historic tram', '/sitemaps/v2/experiences_pdp-L1-148'], ["Lisbon's CatWalk - Living the Santa Cruz Canyon!", '/sitemaps/v2/experiences_pdp-L1-149'], ['Living the dying city! - London Chicken Wing Tour!', '/sitemaps/v2/experiences_pdp-L1-150'], ['London Christmas Lights Running Tour - Luis Barragán en 5 tiempos.', '/sitemaps/v2/experiences_pdp-L1-151'], ['Lujan de Cuyo Private Wine Tour - Macaron Class in Paris', '/sitemaps/v2/experiences_pdp-L1-152'], ["Macaron Masterclass in Borough - Makati's best food drinks and music", '/sitemaps/v2/experiences_pdp-L1-153'], ['Make Fujiyama Katsu-Curry with a chef - Make Wine Candles+Bottle Cutting (DIY)', '/sitemaps/v2/experiences_pdp-L1-154'], ['Make Your Leather Tortellino Keychain - Make a seashell frame after a boatride', '/sitemaps/v2/experiences_pdp-L1-155'], ['Make a signature scent with a perfumer - Make sushi in a traditional folk house', '/sitemaps/v2/experiences_pdp-L1-156'], ['Make sustainable leather espadrilles - Make your own silk scarf in Paris !', '/sitemaps/v2/experiences_pdp-L1-157'], ['Make your own silver jewel in Florence - Malen und Zeichnen', '/sitemaps/v2/experiences_pdp-L1-158'], ['Maleny Yoga Rainforest Wellness - Market Tour and a Home Cooked Meal', '/sitemaps/v2/experiences_pdp-L1-159'], ['Market Tour&Cook Kenyan cultural food - Matera Photo Journey', '/sitemaps/v2/experiences_pdp-L1-160'], ['Mathematics Walking Tour of Amsterdam - Meditazione rilassamento profondo', '/sitemaps/v2/experiences_pdp-L1-161'], ['Meditazione e benEssere in spiaggia - Men & Womens Wellness Retreat', '/sitemaps/v2/experiences_pdp-L1-162'], ['Mental Wellness Picnic Chat - Midday Vermouth Madrileño & Tortilla', '/sitemaps/v2/experiences_pdp-L1-163'], ['Middle Eastern Food Cooking Class - Mini Horse Experience!', '/sitemaps/v2/experiences_pdp-L1-164'], ['Mini Nature Retreat - Moniz Family Surf: Private Surf Lesson', '/sitemaps/v2/experiences_pdp-L1-165'], ['Moniz Family Surf: Waikiki Surf Lesson - Morocco in frames:Culture &photography', '/sitemaps/v2/experiences_pdp-L1-166'], ['Mortadella & Bologna: a love story - Mountain plateau hike (5-7 hr)', '/sitemaps/v2/experiences_pdp-L1-167'], ['Mountain summit and cozy restaurant - Music, Mummies, and Museums!', '/sitemaps/v2/experiences_pdp-L1-168'], ['Music, art, history, Rio nightlife - NEW! Amsterdam light cruise', '/sitemaps/v2/experiences_pdp-L1-169'], ['NEW! Aperitivo tour - Fun Fun Fun! ❤️ - National Arboretum and #madeinDC Tour', '/sitemaps/v2/experiences_pdp-L1-170'], ['National Gallery with an Art Historian - Nature trails & Birding near Cali', '/sitemaps/v2/experiences_pdp-L1-171'], ['Nature walk near Hotspring in Jozankei - Night Cheese: California Artisans', '/sitemaps/v2/experiences_pdp-L1-172'], ["Night Cherry Viewing - Nonna Cecilia's Pasta Workshop!", '/sitemaps/v2/experiences_pdp-L1-173'], ['Noodle & Roll & Water Puppet Theater - Obstacle Course Adventure', '/sitemaps/v2/experiences_pdp-L1-174'], ['Ocean Beach Coffee Cart Bike Tour - Old Montreal: Intimate & Personalized', '/sitemaps/v2/experiences_pdp-L1-175'], ['Old Quebec : Mesdames & Mesdemoiselles - Ontdek de ambacht van het jam maken', '/sitemaps/v2/experiences_pdp-L1-176'], ['OoLong Tea Tasting - Osaka Old City Walk Tour with Sweets!', '/sitemaps/v2/experiences_pdp-L1-177'], ['Osaka Shinsekai/Dotonbori Walking Tour - PASTA MATRICIANA MAKING CLASS', '/sitemaps/v2/experiences_pdp-L1-178'], ['PASTA grani antichi PANE lievito nat - Paddle with the Penguins', '/sitemaps/v2/experiences_pdp-L1-179'], ['Paddle your way in history - Paint a skateboard deck with an artist', '/sitemaps/v2/experiences_pdp-L1-180'], ['Paint a surfboard to send home! - Painting on the Amalfi Coast.', '/sitemaps/v2/experiences_pdp-L1-181'], ['Painting on the Pier - Paris fabrics & crafts', '/sitemaps/v2/experiences_pdp-L1-182'], ['Paris for Families - Passeio a cavalo no por do Sol', '/sitemaps/v2/experiences_pdp-L1-183'], ['Passeio agradável no bairro do Bexiga! - Pedal untamed back roads of Outaouais', '/sitemaps/v2/experiences_pdp-L1-184'], ['Pedala nelle pinete di Ravenna - Personal Shopping Consultant', '/sitemaps/v2/experiences_pdp-L1-185'], ['Personal Shopping For Bespoke Suits - Photo Session in beautiful Basel', '/sitemaps/v2/experiences_pdp-L1-186'], ['Photo Session in sunny Ibiza - Photo tour of Dublin city', '/sitemaps/v2/experiences_pdp-L1-187'], ["Photo tour on bike with an expert - Photographing in 'off-piste' Tokyo.", '/sitemaps/v2/experiences_pdp-L1-188'], ['Photographing in Palermo markets - Photoshoot for yoga lovers in Paris', '/sitemaps/v2/experiences_pdp-L1-189'], ['Photoshoot from Varadero to Havana - Pick plant n Cook,KANOM JAAK,dessert', '/sitemaps/v2/experiences_pdp-L1-190'], ['Pick wild apples and can applesauce - Pizza Class on a Renaissance Terrace', '/sitemaps/v2/experiences_pdp-L1-191'], ['Pizza Cooking Class in Milan - Play the Chinese ink art with Runkuan', '/sitemaps/v2/experiences_pdp-L1-192'], ['Play the Fiddle in Under an Hour - Por los Cerros de Santiago.', '/sitemaps/v2/experiences_pdp-L1-193'], ['Por los cementerios de Chepe. - Pose for pin-ups with a photographer', '/sitemaps/v2/experiences_pdp-L1-194'], ['Pose for portraits in Cambridge - Prague experience: archaeology&history', '/sitemaps/v2/experiences_pdp-L1-195'], ['Prague for Locals and You - Private Alpine Backpacking Trip', '/sitemaps/v2/experiences_pdp-L1-196'], ['Private Amalfi & Positano HalfDay tour - Private Yoga Class & Eco Snack', '/sitemaps/v2/experiences_pdp-L1-197'], ['Private Yoga Class for All Levels - Professional photos of your LA Vacay!', '/sitemaps/v2/experiences_pdp-L1-198'], ['Professional photoshoot by the sea - Pêche dans le Parc du Mercantour', '/sitemaps/v2/experiences_pdp-L1-199'], ['Pêche des Calamars en mer - Rafting the Isar', '/sitemaps/v2/experiences_pdp-L1-200'], ['Rafting trip-Snorkeling&cliff jumping - Recording Studio Magic (Immersive)', '/sitemaps/v2/experiences_pdp-L1-201'], ['Recording your own k-pop song - Relaxing Costarican Typical Trails', '/sitemaps/v2/experiences_pdp-L1-202'], ['Relaxing Countryside : Somewhere Snowy - Ride OFF Road in Puerto Rico', '/sitemaps/v2/experiences_pdp-L1-203'], ['Ride Retired Racehorses at Lake Fork - River hike captured on drone video', '/sitemaps/v2/experiences_pdp-L1-204'], ['River view cooking with super mom chef - Romancing the Bean', '/sitemaps/v2/experiences_pdp-L1-205'], ['Romantic "Aperitivo" with lake view - Rugged North Cascades Private Day Tour', '/sitemaps/v2/experiences_pdp-L1-206'], ['Ruin Pub Crawl with a Local Hostess - Running Tour of Brooklyn!', '/sitemaps/v2/experiences_pdp-L1-207'], ['Running Tour of San Diego - SAORI freestyle weaving for everyone', '/sitemaps/v2/experiences_pdp-L1-208'], ['SASHIKO; Learn the Japanese embroidery - SUP around Balboa Isle & the Back Bay', '/sitemaps/v2/experiences_pdp-L1-209'], ['SUP at sunset - Sail around the city with a boat lover', '/sitemaps/v2/experiences_pdp-L1-210'], ['Sail at Sunset in Porto - Sails, Whales, and Trails', '/sitemaps/v2/experiences_pdp-L1-211'], ['Sailtour Lake Lucerne - Samurai Kendo Experience', '/sitemaps/v2/experiences_pdp-L1-212'], ['Samurai Ninja Experience & Guided Tour - Sardinia: sip by sip', '/sitemaps/v2/experiences_pdp-L1-213'], ['Sardinian identity in the kitchen - Scotch Whisky & Cheese Tasting', '/sitemaps/v2/experiences_pdp-L1-214'], ['Scotland Adventure Adrenaline Day - Seaside Yoga', '/sitemaps/v2/experiences_pdp-L1-215'], ['Seaside bike ride and paddle surfing - Secretos de la Perla Tapatía', '/sitemaps/v2/experiences_pdp-L1-216'], ['Secretos del centro de Coyoacán - Segeln in Berlin', '/sitemaps/v2/experiences_pdp-L1-217'], ['Segeltörn auf dem Steinhuder Meer - Seville From The Rooftops', '/sitemaps/v2/experiences_pdp-L1-218'], ['Seville Great Experience in E-Bike - Shmurgle a Goat, Cosy Winter Edition', '/sitemaps/v2/experiences_pdp-L1-219'], ['Shochu tasting workshop in Asakusa - Shop-til-you-drop with Fashion Stylist', '/sitemaps/v2/experiences_pdp-L1-220'], ['Shopper bag making class - Silicon Valley High Tech Company Tour', '/sitemaps/v2/experiences_pdp-L1-221']]}

Related

Why is Selnium not able to find element by class?

I was trying to extract some data from the website but the code below just returns a empty list when there is clearly an element by the class name given in the code.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://terraria.fandom.com/wiki/Weapons")
search = driver.find_elements(By.CLASS_NAME, "infocard clearfix terraria compact")
print(search)
I am really puzzled, I even tried using sleep to give the website time to load but it didn't affect the result. Any help will be much appreciated.
You are getting an empty data item because infocard clearfix terraria compact did't select any data point. The following element selector/locator strategy is working.
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#chrome to stay open to see what's happening in the real word or make it comment to close
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
URL ='https://terraria.fandom.com/wiki/Weapons'
driver.get(URL)
time.sleep(5)
for item in driver.find_elements(By.XPATH,'//*[#class="itemlist"]/ul/li/span/span/span/a'):
print(item.text)
Output:
Copper Shortsword
Tin Shortsword
Wooden Sword
Boreal Wood Sword
Copper Broadsword
Iron Shortsword
Palm Wood Sword
Rich Mahogany Sword
Cactus Sword
Lead Shortsword
Silver Shortsword
Tin Broadsword
Ebonwood Sword
Iron Broadsword
Shadewood Sword
Tungsten Shortsword
Gold Shortsword
Lead Broadsword
Silver Broadsword
Bladed Glove
Tungsten Broadsword
Zombie Arm
Gold Broadsword
Platinum Shortsword
Mandible Blade
Stylish Scissors
Ruler
Platinum Broadsword
Umbrella
Breathing Reed
Gladius
Bone Sword
Bat Bat
Tentacle Spike
Candy Cane Sword
Katana
Ice Blade
Light's Bane
Tragic Umbrella
Muramasa
Exotic Scimitar
Phaseblade
Blood Butcherer
Starfury
Enchanted Sword
Purple Clubberfish
Bee Keeper
Falcon Blade
Blade of Grass
Fiery Greatsword
Night's Edge
Pearlwood Sword
Classy Cane
Slap Hand
Cobalt Sword
Palladium Sword
Phasesaber
Ice Sickle
Brand of the Inferno
Mythril Sword
Orichalcum Sword
Breaker Blade
Cutlass
Frostbrand
Adamantite Sword
Beam Sword
Titanium Sword
Fetid Baghnakhs
Bladetongue
Tizona
Ham Bat
Excalibur
True Excalibur
Chlorophyte Saber
Death Sickle
Psycho Knife
Keybrand
Chlorophyte Claymore
The Horseman's Blade
Christmas Tree Sword
True Night's Edge
Seedler
Flying Dragon
Starlight
Terra Blade
Influx Waver
Star Wrath
Meowmere
Wooden Yoyo
Rally
Malaise
Artery
Amazon
Code 1
Valor
Cascade
Format:C
Gradient
Chik
Hel-Fire
Amarok
Code 2
Yelets
Red's Throw
Valkyrie Yoyo
Kraken
The Eye of Cthulhu
Terrarian
Spear
Trident
Storm Spear
The Rotted Fork
Swordfish
Dark Lance
Cobalt Naginata
Palladium Pike
Mythril Halberd
Orichalcum Halberd
Adamantite Glaive
Titanium Trident
Gungnir
Ghastly Glaive
Chlorophyte Partisan
Tonbogiri
Mushroom Spear
Obsidian Swordfish
North Pole
Wooden Boomerang
Enchanted Boomerang
Fruitcake Chakram
Bloody Machete
Shroomerang
Ice Boomerang
Thorn Chakram
Combat Wrench
Flamarang
Flying Knife
Sergeant United Shield
Light Disc
Bananarang
Possessed Hatchet
Paladin's Hammer
Chain Knife
Mace
Flaming Mace
Ball O' Hurt
The Meatball
Blue Moon
Sunfury
Anchor
KO Cannon
Drippler Crippler
Chain Guillotines
Dao of Pow
Flower Pow
Golem Fist
Flairon
Terragrim
Arkhalis
Jousting Lance
Shadowflame Knife
Hallowed Jousting Lance
Sleepy Octopod
Scourge of the Corruptor
Shadow Jousting Lance
Vampire Knives
Sky Dragon's Fury
Daybreak
Solar Eruption
Zenith
Wooden Bow
Boreal Wood Bow
Copper Bow
Palm Wood Bow
Rich Mahogany Bow
Tin Bow
Ebonwood Bow
Iron Bow
Shadewood Bow
Lead Bow
Silver Bow
Tungsten Bow
Gold Bow
Platinum Bow
Demon Bow
Tendon Bow
Blood Rain Bow
The Bee's Knees
Hellwing Bow
Molten Fury
Sharanga
Pearlwood Bow
Marrow
Ice Bow
Daedalus Stormbow
Shadowflame Bow
Phantom Phoenix
Pulse Bow
Aerial Bane
Tsunami
Eventide
Phantasm
Cobalt Repeater
Palladium Repeater
Mythril Repeater
Orichalcum Repeater
Adamantite Repeater
Titanium Repeater
Hallowed Repeater
Vulcan Repeater
Chlorophyte Shotbow
Stake Launcher
Red Ryder
Flintlock Pistol
Musket
The Undertaker
Sandgun
Revolver
Minishark
Boomstick
Quad-Barrel Shotgun
Handgun
Phoenix Blaster
Pew-matic Horn
Clockwork Assault Rifle
Gatligator
Shotgun
Onyx Blaster
Coin Gun
Uzi
Megashark
Venus Magnum
Tactical Shotgun
Sniper Rifle
Candy Corn Rifle
Chain Gun
Xenopopper
Vortex Beater
S.D.M.G.
Grenade Launcher
Proximity Mine Launcher
Rocket Launcher
Nail Gun
Stynger
Jack 'O Lantern Launcher
Snowman Cannon
Celebration
Electrosphere Launcher
Celebration Mk2
Paper Airplane
White Paper Airplane
Shuriken
Throwing Knife
Poisoned Knife
Snowball
Spiky Ball
Bone
Rotten Egg
Star Anise
Molotov Cocktail
Frost Daggerfish
Javelin
Bone Javelin
Bone Throwing Knife
Grenade
Sticky Grenade
Bouncy Grenade
Beenade
Happy Grenade
Flare Gun
Ale Tosser
Blowpipe
Blowgun
Snowball Cannon
Paintball Gun
Harpoon
Egg Cannon
Star Cannon
Toxikarp
Dart Pistol
Dart Rifle
Flamethrower
Piranha Gun
Elf Melter
Super Star Shooter
Wand of Sparking
Thunder Zapper
Amethyst Staff
Topaz Staff
Sapphire Staff
Emerald Staff
Ruby Staff
Diamond Staff
Amber Staff
Vilethorn
Weather Pain
Magic Missile
Aqua Scepter
Flower of Fire
Flamelash
Sky Fracture
Crystal Serpent
Flower of Frost
Frost Staff
Crystal Vile Shard
Life Drain
Meteor Staff
Poison Staff
Rainbow Rod
Unholy Trident
Tome of Infinite Wisdom
Venom Staff
Nettle Burst
Bat Scepter
Blizzard Staff
Inferno Fork
Shadowbeam Staff
Spectre Staff
Resonance Scepter
Razorpine
Staff of Earth
Betsy's Wrath
Gray Zapinator
Space Gun
Bee Gun
Roman candle
Laser Rifle
Orange Zapinator
Zapinator
Wasp Gun
Leaf Blower
Rainbow Gun
Heat Ray
Laser Machinegun
Charged Blaster Cannon
Bubble Gun
Water Bolt
Book of Skulls
Demon Scythe
Cursed Flames
Golden Shower
Crystal Storm
Magnet Sphere
Razorblade Typhoon
Lunar Flare
Crimson Rod
Ice Rod
Clinger Staff
Nimbus Rod
Magic Dagger
Medusa Head
Spirit Flame
Shadowflame Hex Doll
Blood Thorn
Magical Harp
Toxic Flask
Nightglow
Stellar Tune
Nebula Arcanum
Nebula Blaze
Last Prism
Finch Staff
Slime Staff
Flinx Staff
Abigail's Flower
Hornet Staff
Vampire Frog Staff
Imp Staff
Blade Staff
Spider Staff
Pirate Staff
Sanguine Staff
Optic Staff
Deadly Sphere Staff
Pygmy Staff
Raven Staff
Desert Tiger Staff
Tempest Staff
Terraprisma
Xeno Staff
Stardust Cell Staff
Stardust Dragon Staff
Lightning Aura Rod
Flameburst Rod
Explosive Trap Rod
Ballista Rod
Houndius Shootius
Queen Spider Staff
Lightning Aura Cane
Flameburst Cane
Explosive Trap Cane
Ballista Cane
Staff of the Frost Hydra
Lightning Aura Staff
Flameburst Staff
Explosive Trap Staff
Ballista Staff
Lunar Portal Staff
Rainbow Crystal Staff
Leather Whip
Snapthorn
Spinal Tap
Firecracker
Cool Whip
Durendal
Dark Harvest
Morning Star
Kaleidoscope
Snowball Launcher
Cannon
Bunny Cannon
Land Mine
Bomb
Sticky Bomb
Bouncy Bomb
Scarab Bomb
Dynamite
Sticky Dynamite
Bouncy Dynamite
Bomb Fish
Holy Hand Grenade
Holy Water
Unholy Water
Blood Water
Boring Bow
Etherian Javelin
First Fractal
Skull Bow
Scythe
Soul Scythe
Icemourne
WebdriverManager

How do I remove the <a href... tags from my web scrapper

So, right now, what I'm trying to do is that I'm trying to scrape a table from rottentomatoes.com and but every time I run the code, I'm facing an issue that it just prints <a href tags. For now, all I want are the Movie titles numbered.
This is my code so far:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
url = "https://www.rottentomatoes.com/top/bestofrt/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
titles = []
year_released = []
def get_requests():
try:
result = requests.get(url=url)
soup = BeautifulSoup(result.text, 'html.parser')
table = soup.find('table', class_='table')
for name in table:
td = soup.find_all('a', class_='unstyled articleLink')
titles.append(td)
print(titles)
break
except:
print("The result could not get fetched")
And this is my output:
[[Opening This Week, Top Box Office, Coming Soon to Theaters, Weekend Earnings, Certified Fresh Movies, On Dvd & Streaming, VUDU, Netflix Streaming, iTunes, Amazon and Amazon Prime, Top DVD & Streaming, New Releases, Coming Soon to DVD, Certified Fresh Movies, Browse All, Top Movies, Trailers, Forums,
View All
,
View All
, Top TV Shows, Certified Fresh TV, 24 Frames, All-Time Lists, Binge Guide, Comics on TV, Countdown, Critics Consensus, Five Favorite Films, Now Streaming, Parental Guidance, Red Carpet Roundup, Scorecards, Sub-Cult, Total Recall, Video Interviews, Weekend Box Office, Weekly Ketchup, What to Watch, The Zeros, View All, View All, View All,
It Happened One Night (1934),
Citizen Kane (1941),
The Wizard of Oz (1939),
Modern Times (1936),
Black Panther (2018),
Parasite (Gisaengchung) (2019),
Avengers: Endgame (2019),
Casablanca (1942),
Knives Out (2019),
Us (2019),
Toy Story 4 (2019),
Lady Bird (2017),
Mission: Impossible - Fallout (2018),
BlacKkKlansman (2018),
Get Out (2017),
The Irishman (2019),
The Godfather (1972),
Mad Max: Fury Road (2015),
Spider-Man: Into the Spider-Verse (2018),
Moonlight (2016),
Sunset Boulevard (1950),
All About Eve (1950),
The Cabinet of Dr. Caligari (Das Cabinet des Dr. Caligari) (1920),
The Philadelphia Story (1940),
Roma (2018),
Wonder Woman (2017),
A Star Is Born (2018),
Inside Out (2015),
A Quiet Place (2018),
One Night in Miami (2020),
Eighth Grade (2018),
Rebecca (1940),
Booksmart (2019),
Logan (2017),
His Girl Friday (1940),
Portrait of a Lady on Fire (Portrait de la jeune fille en feu) (2020),
Coco (2017),
Dunkirk (2017),
Star Wars: The Last Jedi (2017),
A Night at the Opera (1935),
The Shape of Water (2017),
Thor: Ragnarok (2017),
Spotlight (2015),
The Farewell (2019),
Selma (2014),
The Third Man (1949),
Rear Window (1954),
E.T. The Extra-Terrestrial (1982),
Seven Samurai (Shichinin no Samurai) (1956),
La Grande illusion (Grand Illusion) (1938),
Arrival (2016),
Singin' in the Rain (1952),
The Favourite (2018),
Double Indemnity (1944),
All Quiet on the Western Front (1930),
Snow White and the Seven Dwarfs (1937),
Marriage Story (2019),
The Big Sick (2017),
On the Waterfront (1954),
Star Wars: Episode VII - The Force Awakens (2015),
An American in Paris (1951),
The Best Years of Our Lives (1946),
Metropolis (1927),
Boyhood (2014),
Gravity (2013),
Leave No Trace (2018),
The Maltese Falcon (1941),
The Invisible Man (2020),
12 Years a Slave (2013),
Once Upon a Time In Hollywood (2019),
Argo (2012),
Soul (2020),
Ma Rainey's Black Bottom (2020),
The Kid (1921),
Manchester by the Sea (2016),
Nosferatu, a Symphony of Horror (Nosferatu, eine Symphonie des Grauens) (Nosferatu the Vampire) (1922),
The Adventures of Robin Hood (1938),
La La Land (2016),
North by Northwest (1959),
Laura (1944),
Spider-Man: Far From Home (2019),
Incredibles 2 (2018),
Zootopia (2016),
Alien (1979),
King Kong (1933),
Shadow of a Doubt (1943),
Call Me by Your Name (2018),
Psycho (1960),
1917 (2020),
L.A. Confidential (1997),
The Florida Project (2017),
War for the Planet of the Apes (2017),
Paddington 2 (2018),
A Hard Day's Night (1964),
Widows (2018),
Never Rarely Sometimes Always (2020),
Baby Driver (2017),
Spider-Man: Homecoming (2017),
The Godfather, Part II (1974),
The Battle of Algiers (La Battaglia di Algeri) (1967), View All, View All]]
Reading tables via pandas.read_html() as provided by #F.Hoque would probably the leaner approache but you can also get your results with BeautifulSoup only.
Iterate over all <tr> of the <table>, pick information from tags via .text / .get_text() and store it structured in list of dicts:
data = []
for row in soup.select('table.table tr')[1:]:
data.append({
'rank': row.td.text,
'title': row.a.text.split(' (')[0].strip(),
'releaseYear': row.a.text.split(' (')[1][:-1]
})
Example
import requests
from bs4 import BeautifulSoup
url = "https://www.rottentomatoes.com/top/bestofrt/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
result = requests.get(url=url)
soup = BeautifulSoup(result.text, 'html.parser')
data = []
for row in soup.select('table.table tr')[1:]:
data.append({
'rank': row.td.text,
'title': row.a.text.split(' (')[0].strip(),
'releaseYear': row.a.text.split(' (')[1][:-1]
})
data
Output
[{'rank': '1.', 'title': 'It Happened One Night', 'releaseYear': '1934'},
{'rank': '2.', 'title': 'Citizen Kane', 'releaseYear': '1941'},
{'rank': '3.', 'title': 'The Wizard of Oz', 'releaseYear': '1939'},
{'rank': '4.', 'title': 'Modern Times', 'releaseYear': '1936'},
{'rank': '5.', 'title': 'Black Panther', 'releaseYear': '2018'},...]

Unable to let my script populate result using post requests

I've created a script using python in combination with selenium to parse the id,vikey and cbhtmlfragid meant to be used as payload while being used within a post http requests. As I found it difficult to scrape id,vikey and cbhtmlfragid using requests, I thought to grab them using selenium so that I can use them while making a post requests.
I'm trying to populate result using a in the inputbox right next to Entity Name Or Identifier. I could notice that the result are populated through a post requests which I'm trying to achieve programmatically.
website link
To populate the result it is necessary to follow the steps sequentially in this image which ultimately leads to this image
I've tried with:
import re
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.businessregistration.moc.gov.kh/'
post_url = 'https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/update.html?id={}'
payload = {
'QueryString': 'a',
'SourceAppCode': 'cambodia-br-soleproprietorships',
'OriginalVersionIdentifier': '',
'nodeW772-Advanced': 'N',
'_CBASYNCUPDATE_': 'true',
'_CBHTMLFRAGNODEID_': 'W762',
'_CBHTMLFRAGID_': '',
'_CBHTMLFRAG_': 'true',
'_CBNODE_': 'W778',
'_VIKEY_': '',
'_CBNAME_': 'buttonPush'
}
def get_content(wait,link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[data-rel='#appMainNavigation']"))).click()
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-soleproprietorships']"))).click()
elem = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-brSoleProprietorSearch']")))
driver.execute_script("arguments[0].click();",elem)
item_id = driver.current_url.split("id=")[1].split("&_timestamp")[0]
x_catalyst = re.findall(r"sessionId:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
item = re.findall(r"viewInstanceKey:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
elem = re.findall(r"guid:(.*?),", str(driver.page_source), flags=re.DOTALL)[0]
return item_id,x_catalyst,item,elem
def make_post_requests(item_id,x_catalyst,item,elem):
payload['_VIKEY_'] = item
payload['_CBHTMLFRAGID_'] = elem
res = requests.post(post_url.format(item_id),data=payload,headers={
'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
'x-catalyst-session-global':x_catalyst
})
soup = BeautifulSoup(res.text,"lxml")
result_count = soup.select_one("[class='appPagerBanner']")
print(result_count)
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
item_id,x_catalyst,item,elem = get_content(wait,link)
make_post_requests(item_id,x_catalyst,item,elem)
driver.quit()
When I execute the above script, I could find out that there is no result in there. So, I suppose I went somewhere wrong.
How can I let my script populate result using post requests?
Well, The POST request needing multiple parameters from the website. So here's the plan.
We will make GET request to the target url and collect all our required POST data.
https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-soleproprietorships%26service%3DregisterItemSearch&target=cambodia-master
we will then check/verify how the structure of the POST request is applied within the POST request which is requiring you to track the POST parameters and how they acting behind.
nodeWxxx-Advanced where xxx is 3 digits auto generated.
_CBHTMLFRAGNODEID_ that key is also holding an auto generated value.
_CBHTMLFRAGID_ that's holding a unix timestamp auto generated to validate the post request.
_CBNODE_ is a mechanism key which is auto generated for the first time but it's then rooted with 4 points up. and then became static
_VIKEY_ is holding a value which presented within the HTML source as well.
_CBNAME_ is holding 3 different values which is buttonPush, selectPage and pageSizeChange and we don't need to use the first value. so we will work on 2nd and 3rd. i don't need to explain you what those values do as the name already present the meaning.
_CBVALUE_ is acting as double face key, which hold a digit value which can act as the pagination if it's called with selectPage, And it can act as the size of the items per page if it's called with pageSizeChange. where 4 is meaning 200 items per page which is the maximum.
You might found this complicated if you parsed the HTML and found multiple values of Wxxx where xxx is 3 digits, but I've been able to analyze the HOST structure itself.
Now we've prepared our POST data. but here's a note that our GET request got redirection to the following url.
https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/view.html?id=
Where id is actually the an automatically generated token where the host identify your request as a valid one or not.
Now before made the POST request, we need to change the parameter which is handling the operation of viewing the data. In our case we will need to change view to update to be as the following
https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/update.html?id
Now we are ready to call the target but firstly we will make POST request to change the items per page to be 200 items. as i described above.
Now after our previous POST, we will make a loop of POSTS to iterate over pages with our form data.
the current result of the website is 1,421 which means (200 item multiplied by 8 will give us 1600) but the last page is included only 21 items so that's why 1421
Below is the final code:
import requests
from bs4 import BeautifulSoup
import re
one = "https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-soleproprietorships%26service%3DregisterItemSearch&target=cambodia-master"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"
}
def main(url):
with requests.session() as req:
r = req.get(url, headers=headers, allow_redirects=True)
node = re.search(r"nodeW\d{3}-Advanced", r.text).group()
nodeid = re.search(r"AsyncWrapperW\d{3}", r.text).group()
agid = re.search(r"guid:(\d+)", r.text).group(1)
cbnode = re.findall(r"Callback\('([^']*)'", r.text)[5]
vikey = re.search(r"viewInstanceKey:'([^']*)'", r.text).group(1)
data = {
'QueryString': 'a',
'SourceAppCode': 'cambodia-br-soleproprietorships',
'OriginalVersionIdentifier': '',
f"{node}": 'N',
'_CBASYNCUPDATE_': 'true',
'_CBHTMLFRAGNODEID_': nodeid[-4:],
'_CBHTMLFRAGID_': agid,
'_CBHTMLFRAG_': 'true',
'_CBNODE_': f"W{int(cbnode[1:])+4}",
'_VIKEY_': vikey,
'_CBNAME_': 'selectPage'
}
target = r.url.split("&")[0].replace("view.", "update.")
change = data.copy()
change['_CBNAME_'] = "pageSizeChange"
change['_CBVALUE_'] = "4"
r = req.post(target, headers=headers, data=change)
for num in range(1, 9):
print(f"Extracting Page# {num}\n {'*' * 40}")
data['_CBVALUE_'] = num
r = req.post(target, headers=headers, data=data)
soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.findAll("span", class_="appReceiveFocus")[3:]:
print(item.text)
main(one)
Please also stop to delete your question after you receive the answer, as you always do on your previous posts where you usually delete the post after receiving the answer.
Sample of output:
Extracting Page# 1
****************************************
អេមី ជេមស៍ / AMY GEMS (50009790)
អាហារដ្ឋាន លីន លៀនជីន ហ៊្វូដ ហ្វ្លេវើ / AHARATHAN LIN LIANJIN FOOD FLAVOR (50009800)
អាមេទីស ដាយមិន ខេធីវី / AMETHYST DIAMOND KTV (50009801)
អេស៊ា ដាយអឹម៉ិន ស៊ីធី អេនធើប្រាយ / ASIA DIAMOND CITY ENTERPRISE (50009845)
អរុណ យ៉ាណា / ARUN YANA (50009848)
អេ ដឺ ស៊ាវ ជី ឌាន / A DE XIAO CHI DIAN (50009856)
អង្គរ ផែល ប៊ូទិក / ANGKOR PAL BOUTIQUE (50009862)
អូហ្ក្រេ ឌុយ សូឡី / AUGRE DU SOLEIL (50001007)
អង្គរធំ មន្ទីរពហុព្យាបាល និង សម្ភព / ANGKOR THOM MONTY PEAHUK PYEABAL & SAMPHUB (50009905)
អាមីកា ត្រាវែល (កាំបូដ្យ) / AMICA TRAVEL (CAMBODGE) (50009916)
អង្គរ ជិនសារ​ ហ៊្វូដ / ANGKOR JINSHA FOOD (50009973)
អចលនទ្រព្យ វ៉ាន់សៀង / ACHALNAK TROP VEAN SIENG (50009986)
អាឡាដាំង សឹបផ្លាយអឺរ / ALADANG SUPPLIER (50010017)
អាន ស៊ី មីន ណាន ស៊ាវ ឈី / AN XI MIN NAN XIAO CHI (50010068)
អង្គរ មែនហ្គោ ទ្រី រីហ្សត / ANGKOR MANGO TREE RESORT (50010072)
អាហារដ្ឋាន ជៀ មីង យៀក កាយ / AHARATHAN JIA MING YUE CAI (50010073)
អូប៊ែ អេលេហ្វិន ប្លង្គ / AUBERGE ELEPHANT BLANC (50001029)
អានីស ស្តារ អ៊ិនវ៉ាយរ៉ិនម៉ិនថល ខូតធីង / ANISE STAR ENVIRONMENTAL COATINGS (50010091)
អរសាំ ហៅស៍ វីឡា / AWESOME HOUSE VILLA (50010131)
អាន វេងស្រ៊ុន ត្បូងឃ្មុំ / AN VENGSRUN TBONG KHMUM (50010150)
ភោជនីយដ្ឋាន អា យុង តា ផាយ តាំង / AH YONG TA PAI TANG RESTAURANT (50010177)
អា ឆ័ន ចុង ឆាង ធីង / A CHUAN ZHONG CHAN TING (50010232)
អាតឡូវ ខនសាល់ធីង / ARTLOW CONSULTING (50010235)
សណ្ឋាគារ អង្គរ យ៉ាយ័ន / ANGKOR YAYUAN HOTEL (50010236)
អេ.អេម ហាងបញ្ចាំ / A.M HANG BANH CHAM (50010326)
អង់ទ្រីក្ស៍ ត្រេឌើរ / ANTRIKSH TRADERS (50010335)
អង្គរ សុខសាន្តត្រាណ / ANGKOR SOKSAN TRAN (50001055)
អង្គរ ស៊ែរ ធួរ / ANGKOR SHARE TOURS (50010341)
អង្គរ សិលា ពេជ្រ / ANGKOR SEILA PICH (50001056)
អ័ង ចូង អ័ង / ANG CHOUNG ANG (50010382)
អៅ មិន ជិន ឡុង 18 / AO MEN JIN LONG 18 (50010387)
អង្គរ ភើល / ANGKOR PEARL (50001067)
ភោជនីយដ្ឋាន អូ ម៉ាស្សេ / AU MARCHE RESTAURANT (50010465)
ភោជនីយដ្ឋាន អេ អាយ / A I RESTAURANT (50010472)
អាន ស៊ី មីង ស៊័ន ជឹន ស័រ / AN XI MING XUAN ZHEN SUO (50010513)
អាម៉ាណូ ឃែរ វ៉ធើរ / AMANO CARE WATER (50010521)
អាសស្តា វ៉ធើរ / ASSDA WATER (50010555)
អង្គរ ឡូធើស អាត / ANGKOR LOTUS ART (50010576)
សណ្ឋាគារ អេផល 2 / APPLE 2 HOTEL (50010621)
អាហារដ្ឋាន ផ្ទះបាយខ្មែរ ភឿក / AHARATHAN PHTEAH BAI KHMER PERK (50001084)
អង្គរ ស៊ីវុត្ថា ហូថេល / ANGKOR SIVUTHA HOTEL (50010625)
អាទីហ្វិក ឃីតឈិនវែរ អិកសុីប៊ីសិន សេនធ័រ / ARTIFEX KITCHENWARE EXHIBITION CENTER (50010654)
សណ្ឋាគារ អង្គរ យៀ ធីម / ANGKOR YEAR THEME HOTEL (50010659)
អេប៊ីឡូស៍ ឌីជីធលអេដជ៍ / ABILOS DIGITALEDGE (50010666)
សណ្ឋាគារ អេភី 7 ដេ / AP 7 DAY HOTEL (50010700)
អង្គរ អឹមេហ្ស៊ីង ហូលីដេ & ធួរ / ANGKOR AMAZING HOLIDAY & TOURS (50001092)
អេអេហ្វអេសស៊ី ព្រីមៀម ស្ទ័រ / AFSC PREMIUM STORE (50010703)
ភោជនីយដ្ឋាន អូ រ៉ង់ដេ-វូ / AU RENDEZ-VOUS RESTAURANT (50010720)
អន្តរជាតិលេខ១ វី យា ណា / ANYARAKCHEAT LEK 1 VY YA NA (50010733)
អៅ ម៉ាយ យ៉ា លី យី មេន ជ័ង / AO MAY YA LY YIE MAN CHUANG (50010740)
អេស្រ៊ី អ៊ិនធើណេសិនណល អ៊ិនវេសម៉ិន / ATHREE INTERNATIONAL INVESTMENT (50010764)
អង្គរ សាន់ អែន គុយ គីមយ៉ា ហៅស៍ / ANGKOR SUN AND KUY KIM YA HOUSE (50010801)
អាតឡូវ ខនសាល់ធីង I / ARTLOW CONSULTING I (50010809)
អា ហ្វុង ស៊ុបភើម៉ាឃីត / AR FONG SUPERMARKET (50010826)
អាម៉ាហ្សូន អេឡិចទ្រីក សប / AMAZON ELECTRIC SHOP (50010830)
អេស៊ីអិល ប៊ី ផនសប / ACL B PAWNSHOP (50010857)
អេពិក អ៊េចឌី អេទ្បីវាទ័រ / APEX HD ELEVATOR (50010891)
អេ & អ៊ីវ៉ា ហ្វេសិន / A & EVA FASHION (50010895)
អេននី វីល ម៉ាត / ANNIE WILL MART (50010912)
ភោជនីយដ្ឋាន អានហួយ ហ័យណាន / AN HUI HUAI NAN RESTAURANT (50010941)
អេនថូនី អឹផាតម៉ិន / ANTHONY APARTMENT (50010982)
អាណែត ប៊ីយូធី បាយ តូណូ ប្រ៊េន / AH NETH BEAUTY BY TONO BRAND (50011007)
ភោជនីយដ្ឋាន អេរេប៉ាស៍ ខារ៉ាខាស៍ / AREPAS CARACAS RESTAURANT (50011033)
អង្គរ ប៊ែដមីនថុន ក្លឹប សៀមរាប / ANGKOR BADMINTON CLUB SIEMREAP (50011075)
ភោជនីយដ្ឋាន អា វ៉ាង ជឺ ប៉ាវ អ៊ី / A WANG CHEU BAO YI RESTAURANT (50011102)
ភោជនីយដ្ឋាន អាយ សាង សួយ ជូរ / AI SHANG SHUI ZHOU RESTAURANT (50011103)
អេ ធី អ៊ីនជិននារីង វើកសប / A T ENGINEERING WORKSHOP (50011140)
អចលនទ្រព្យ ទឹកមាស ធីអិម / AKCHALNAKTROP TEUK MEAS TM (50011184)
អារុណរះ រ៉ូមែន / ARUNRASS ROMAIN (50011207)
អាអយ ហូសថេល / AOI HOSTEL (50011221)
អំពូលភ្លើង ពន្លឺពត៌មាន / AMPUOULPHLEUNG PONLEU POR MEAN (50011335)
អន្លង់ធំ វ៉ធើរ / ANLONGTHOUM WATER (50011341)
អង្គរ សរ សេរី រីហ្សត / ANGKOR SOR SEREI RESORT (50011347)
ផលិតកម្ម អាថាន់ ភាពយន្ត / ATHAN FILM PRODUCTION (50011350)
អេ អាយ ជី អូ កាហ្វេ / A I G O CAFE (50011416)
អាឡេវ៉ូ / ALEVU (50011433)
អាហារសមុទ្រចំហុយ ក្តាមមាស / AHAR SAMUTH CHOMHUY KDAMMEAS (50011470)
អសនៈ ប៊ូទិក វីឡា / ASANAK BOUTIQUE VILLA (50011487)
អាហារដ្ឋាន ផ្ទះបាយខ្មែរ ភឿក I / AHARATHAN PHTEAH BAI KHMER PERK I (50011504)
អចលនទ្រព្យ យាន ឈីងប៊ូ / ACHALNAKTRORP YAN QINGBU (50011521)
ភោជនីយដ្ឋាន អា ឡាំង សាវ ខាវ / AH LAING SHAO KHAO (50011555)
អារ៉ាយ៉ា អង្គរ រេសុីដេន I / ARAYA ANGKOR RESIDENCE I (50011563)
អាស៊ីង ភ្នាក់ងារទេសចរណ៏ / AH XING PNAKNGEATESACHOR (50011586)
សណ្ឋាគារ អង្គរ បូស្គី / ANGKOR BOSKY HOTEL (50011621)
ហាងនំប័ុង អូសសីុ ហ្គួមេ / AUSSIE GOURMET BAKERY (50001188)
សណ្ឋាគារ អង្គរ ហាវជាំង / ANGKOR HAOJIANG HOTEL (50011644)
អាប៉ូឡូ រៀល អុីស្ទេត / APOLLO REAL ESTATE (50011655)
ផ្ទះសំណាក់ អរុណរះ ហេងហេង / ARUNRAS HENG HENG GUESTHOUSE (50011717)
អូតូ ហាប់ (ខេមបូឌា) / AUTO HUB (CAMBODIA) (50011772)
ផ្សារទំនើប អាយ សាំង / AI SANG SUPERMARKET (50011790)
អាណាមេន ព្រីនធីង ហៅស៍ / ANAMAN PRINTING HOUSE (50011839)
អាតម៉ាលែន រីសត៍ / ATMALAND RESORT (50011842)
អាសុី ទ្រីលីហ្គល អុិនធើណេសិនណល ស្កូល (អេ.ធី.អាយ.អេស) / ASIA TRILINGUAL INTERNATIONAL SCHOOL (A.T.I.S) (50011
908)
អេ & ហ្សេត ម៉ាស្ទ័រ / A & Z MASTER (50011909)
អាយ យៀក ស៊ួយ ឡេវ ហ៊ួយ / AI YUE SHUI LIAO HUI (50011946)
អេស៊ា ម៉ាំងឃី ត្រាវែល (ខេមបូឌា) / ASIA MONKEY TRAVEL (CAMBODIA) (50011972)
អេលីស វីឡា / ALICE VILLA (50011993)
អង្គរ ប៊ែមប៊ូ ហ្វាយប៊ឺរ / ANGKOR BAMBOO FIBERS (50012035)
ភោជនីយដ្ឋាន អាដាហ្ស៊ី យេផេនីស / ADACHI JAPANESE RESTAURANT (50012044)
ភោជនីយដ្ឋាន អៃ សាង ជា ស៊ី / AI SHANG JIA SHI RESTAURANT (50012069)
អ៊ែតវេនឈ័រ សៀមរាប ​រេស៊ីដេន / ADVENTURE SIEMREAP RESIDENCE (50012135)
អាំង ប៊ីប៊ីឃ្យូ គ្រីល / ANG BBQ GRILL (50012143)
អចលនទ្រព្យ ចាសួន / AKCHALNAKTROP JASOUN (50012160)
អឹផាតម៉ិន 103 / APARTMENT 103 (50012161)
អេ.ឌី អាន ដុង អេនធើធេនមេន / A.D AN DONG ENTERTAINMENT (50012165)
អេស៊ា រីហ្វ្លេក ត្រាវែល អេន ធួរ / ASIA REFLECT TRAVEL AND TOUR (50012181)
អាណាខន់ដា តិចណូឡូជី / ANACONDA TECHNOLOGY (50012187)
អេឌីជេ គ្រីអេធីវ / ADJ CREATIVE (50012256)
អង្គរ ប៊ុន ធួរ / ANGKOR BUN TOUR (50012265)
អ៊ែមប៊ើ អង្គរ វីឡា ហូថេល អេន ស្ប៉ា / AMBER ANGKOR VILLA HOTEL AND SPA (50012311)
អៃ ឈីង ហាយ ប៊ូទីក ហូថេល / AI QING HAI BOUTIQUE HOTEL (50012317)
អនុស្សាវរីយ៍ល្អកោះរ៉ុង / ANUSAORY LAOR KOH RONG (50012340)
អង្គរ មែនហ្គោ ទ្រី រេស៊ីដេន ហូថេល / ANGKOR MANGO TREE RESIDENCE HOTEL (50012378)
សណ្ឋាគារ អេឡិចស៊ីស / ALEXIS HOTEL (50012418)
អចលនទ្រព្យ តាកទឺ / AKCHALNAK TROP TAK TEUR (50012422)
អាហារដ្ឋាន ហោ វ៉ាង សេង យ៉ាន / AHARATHAN HAO WANG SHENG YAN (50012429)
អាំង ថាណូ / AING TANO (50012439)
អាយសាង កុំព្យូទ័រ ណេត សប / AISANG COMPUTER NET SHOP (50012501)
អង្គរកាតាលីណា ប៊ូទីកវីឡា / ANGKOR CATALINA BOUTIQUE VILLA (50012509)
អេឌវ៉ាន់ឈ័រ ដូម រីសត / ADVENTURE DOME RESORT (50012521)
អេ.ភី.ធី.ធី ត្រាវែល & ធួរ / A.P.T.T TRAVEL & TOURS (50012528)
អឹផាតម៉ិន វីភី លី ហេង / APARTMENT VP LY HENG (50012540)
អាហារដ្ឋាន វីភី លី ហេង / AHARATHAN VP LY HENG (50012541)
អាដូរាស៍ ស្ប៉ា / ADORASS SPA (50012593)
ភោជនីយដ្ឋាន អាន់ណាពួរណា ឥណ្ឌាន / ANNAPURNA INDIAN RESTAURANT (50012620)
អវតាន ឆាតធើរ បាស / AVATAN CHARTER BUS (50012640)
អាយសាង កុំព្យូទ័រ ណេត សប សង្កាត់ 3 / AISANG COMPUTER NET SHOP SANGKAT 3 (50012660)
អាមីលីយ៉ា ប៊ីយូធី ខសមេធីក / AMILIYA BEAUTY COSMETICS (50012679)
អង្គរ ស្ក្រាប់ មរកត & ស្ប៉ា / ANGKOR SCRUB MOROKOT & SPA (50012680)
អេស៊ាន II ហាងបញ្ចាំ / ASIAN II HANG BANHCHAM (50012681)
អាហារដ្ឋាន ម្លប់សុបិន្ត / AHARATHAN MLOB SOBEN (50012682)
អា លីលី ប៊្រែន ប៊ីយូធី / AH LYLY BRAND BEAUTY (50012729)
អង្គរ ម៉ូនូមែន ហូថេល / ANGKOR MONUMENT HOTEL (50012733)
អេភីខេអ សៀមរាបអង្គរ ខនទ្រីសាយ / APKR SIEM REAP ANGKOR COUNTRY SIDE (50012752)
អង្គរ អុីន ហ្វ័រ ហូមស្តេ / ANGKOR UNE FOIS HOMESTAY (50000130)
អង្ករ ស្ថាបនិក / ANGKOR SATHABNEK (50001312)
អេ1 រីសាយខល / A1 RECYCLE (50012850)
អាដ័រ មី / ADORE ME (50012890)
អាវ គឹមហយ / AV KOEMHORY (50012896)
អឹផាតម៉ិន 19 / APARTMENT 19 (50012900)
អានខិន ខា រីភែ / ANKEN CAR REPAIR (50012909)
សណ្ឋាគារ អង្គរ សេនត្រល់ ផាក & ស្ប៉ា / ANGKOR CENTRAL PARK HOTEL & SPA (50012921)
អង្គរ ម៉ាវេលើស ត្រាវែល / ANGKOR MARVELLOUS TRAVEL (50012945)
អេនជូលី វីឡា / ANJULIE VILLA (50013023)
អាហារដ្ឋាន សំ សំអឿន ភូមិគៀនត្រជាក់ចិត្ត / AHARATHAN SAM SAM OEUN PHOM KEAN TRACHEAK CHET (50013043)
អេនី ផ្ទះ / ANNE PHTEASH (50013049)
អាបូរេស្ត ហូសថេល / ABOREST HOSTEL (50013073)
អេ.វី.អេស.អាយ.អេស.ប៊ី (ខេមបូឌា) / A.V.S.I.S.B (CAMBODIA) (50013098)
អា ហ៊ុយ ផាយ តាង / AR HUI PAI DANG (50013141)
អង្គរ ក្រូសង់ បេកឃើរី / ANGKOR CROISSANT BAKERY (50013147)
អាំង បានភាគ្យ / AING BANPHEAK (50001347)
អេនដ្រូ សម / ANDREW SAM (50013301)
ភោជនីយដ្ឋាន អារ៉ាប៊ីក សាវ៉ារម៉ា / ARABIC SHAWARMA RESTAURANT (50013339)
អាហារដ្ឋាន ថៅ យាន ទៅ ខៅ អ៊ី / AHARATHAN THAO YEAN TOUV KHAO YI (50013351)
អាហារដ្ឋាន យ៉ាន ជា ជឺ ប៉ៅ អ៊ី / AHARATHAN YAN CHEA CHEU PAO YI (50013352)
អាហារដ្ឋាន ចិន ឈីង អាយ ខៅ ប៉ា / AHARATHAN CHEN CHHING AI KHAO PA (50013353)
អាហារដ្ឋាន អ័រ អាយ អ័រ ជា ស៊ន់ ឆៃ អ៊ី / AHARATHAN OR AI OR CHEA SHUN CHHAI YI (50013354)
អាហារដ្ឋាន ចិន វួយ សា ស៊ាន សាវ ឈឺ / AHARATHAN CHEN WUY SHA SEAN SAV CHHER (50013355)
អាង-យី ប៊ីយូធី សប / ANG-YI BEAUTY SHOP (50013361)
ភោជនីយដ្ឋាន អាន់ណាពួរណា I ឥណ្ឌាន / ANNAPURNA I INDIAN RESTAURANT (50013381)
អឹឡាយអិន ត្រេនស្លេសិន រីសសស៍ / ALLIANCE TRANSLATION RESOURCE (50013407)
អង្គរ អ៉ិមប្រេស ឡេថិច / ANGKOR EMBRACE LATEX (50001377)
អេ.អ.វ៉ាយ ខសមេធីក / A.R.Y COSMETIC (50013492)
អេស៊ា ម៉ាឃីត ផាប់ស្ទ្រីត / ASIA MARKET PUB STREET (50001395)
អឹឈីវើ អិុនធើណេសិនណល ស្គូល / ACHIEVERS INTERNATIONAL SCHOOL (50001513)
អាតវ៉កស៍ ស្ទូឌីយោ (ខេមបូឌា) / ARTWORX STUDIOS (CAMBODIA) (50001559)
អេឃ្វើរៀស ហូថេល អេន អឺបេន រីហ្សត / AQUARIUS HOTEL AND URBAN RESORT (50001601)
អេមីធី អង្គរ សឺវីស / AMITY ANGKOR SERVICE (50001604)
អាទីឡា អង្គរ / ATTILA ANGKOR (50001613)
ផ្ទះសំណាក់ អង្គរ ភ្នំមាស / ANGKOR PHNOM MEAS GUESTHOUSE (50001646)
អង្គរ សេនលីន ត្រាវែល / ANGKOR SENLIN TRAVEL (50001659)
អាប៊ែនឌែន ឡៃ អុិនធើណេសិនណល​ ស្គូល / ABUNDANT LIFE INTERNATIONAL SCHOOL (50001664)
សណ្ឋាគារ អង្គរ ប៊ូធីក ត្រភីក / ANGKOR BOUTIQUE TROPIC HOTEL (50001668)
អង្គរ រង់ដេវូ វីឡា / ANGKOR RENDEZVOUS VILLA (50001703)
អង្គរ ហាត បឹងហ្គាឡូ / ANGKOR HEART BUNGALOW (50001751)
អេស៊ា អិចផ្លរើ ត្រាវែល / ASIA EXPLORER TRAVEL (50001758)
អាស្ត្រូ អេស៊ា ធួរ & ត្រាវែល សឺវីស / ASTRO ASIA TOUR & TRAVEL SERVICES (50001767)
អូន សុគន្ធា / AUN SOKUNTHEA (50001796)
អង្គរដេសធីណេសិនត្រាវែល / ANGKOR DESTINATION TRAVEL (50001799)
អង្គរ ធួរ បាស ត្រេនស្ព័តថេសិន / ANGKOR TOUR BUS TRANSPORTATION (50001809)
អង្គរ អ៉ិនធើ ហ្វីតណេស / ANGKOR INTER FITNESS (50001836)
អ៊ែតមឺស៊្វែរ រេសថឹរេន / ATMO SPHERE RESTAURANT (50001849)
អេស៊ា ជេមអាប់ ត្រាវែល / ASIA JAMUP TRAVEL (50001901)
អ៊ែបសឹលូត ធួរ & ត្រាវែល (ខេមបូឌា) / ABSOLUTE TOURS & TRAVEL (CAMBODIA) (50001903)
អង្គរ អេ ណាទួរ វ៉ូយ៉ាហ្ស៍ / ANGKOR ET NATURE VOYAGE (50001942)
អង្គរ ម៉ាល់ធីម៉ីឌៀ / ANGKOR MULTIMEDIA (50001949)
អេ.អេស.អេច ត្រាវែល I / A.S.H TRAVEL I (50001955)
អេអេ ប្រាយវេត ជើនី អេន វីឡា / AA PRIVATE JOURNEYS AND VILLAS (50001966)
អង្គរ ថោន ថ្មី / ANGKOR TOWN THMEY (50001990)
អឹឃូស្ទីក អេនធើធេនមិុន / ACOUSTIC ENTERTAINMENT (50002025)
អង្គរ បុរីប្រិមប្រិយ៍ ម៉ាស្សា ខ្មែរ បុរាណ / ANGKOR BOREIBREMBREI MASSAGE KHMER BORAN (50000204)
អប្សរា ស៊ីល សេនធ័រ / APSARA SILK CENTER (50002122)
អ៊ែតវេនឈ័រ អេស៊ា ត្រាវែល & ធួរ / ADVENTURE ASIA TRAVEL & TOURS (50002161)
អង្គរ ហ្វីលីង ប៊ូទីក / ANGKOR FEELING BOUTIQUE (50002206)
សណ្ឋាគារ អេ-វើខេសិន ប៊ូធីក / A-VACATION BOUTIQUE HOTEL (50002210)
អង្គរ ដឹ ថេន ប៊ែលស៍ / ANGKOR THE TEN BELLS (50002218)
អេមប្រាយ ថេលើរ / AMPRISE TAILOR (50002223)
អេស៊ា​ ហេបភី​ វីឡា / ASIA HAPPY VILLA (50002232)
អេ សិដ្ឋ អូតូ / A SETH AUTO (50002321)
អង្គរទិព្វ លីគង កាត់ដេរ / ANGKOR TIP LYKORNG KATDE (50002374)
I am not sure why you want to mimick the POST method when you can automate it easily with selenium. As you are already using selenium you can automate without being bothered about underlying complexities of preparing raw requests and parameter passing.
With selenium its quite easy to automate the whole flow. Unless,
You want to use python requests finally to reduce the time taken by browser load time.
You are expecting changes in the site HTML frequently (though change can also happen to the POST request, small parameter change can break the POST request)
If you have above two reason or a third one please don't accept this as an answer, update the question and I will remove this as an answer.
Below code can automate the flow for you, which I have generated with the help of Selenium IDE. Done some modifications to use labels instead of other hardcoded xpath combinations, added visibility check and clickable check to it.
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import ElementNotVisibleException
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.businessregistration.moc.gov.kh/")
driver.maximize_window()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(),\'Online Services\')]"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Sole Proprietorships']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='appSubMenuLink menu-brSoleProprietorSearch']/span[text()='Search Entity']"))).click()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "QueryString")))
driver.find_element(By.ID, "QueryString").send_keys("a")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class=\'appReceiveFocus\' and text()=\'Search\']"))).click()
time.sleep(10)
driver.quit()
Of course it doesn't cover the part to extract data from the grid after search. Tested with chromedriver 80.0 in Ubuntu.

Python webscraper only returns 'None'

I am trying to write a program to scrape eBay to show me an inventory of products, but all I get printed to the terminal is 'None'. I am expecting a title of the product.
I've changed my code to 'find' different keywords, but the output is always the same 'None', usually that means that it can't find the specified keyword, but the keywords I enter are shown on the web page and in 'inspect element'.
Here's a snippet:
def get_detail_data(soup):
try:
title = soup.find('div', class_="it-ttl").find('itemprop')
except:
title = ''
I am trying to fetch only the results that correspond with the keyword I enter into the program.
Probably the selector is wrong. I was able to extract titles from eBay HTML with this code.
from bs4 import BeautifulSoup
import requests
import os
params = {"_nkw": "coffee"}
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.103"
}
proxies = {"http": os.getenv("PROXY_URL")}
# r = requests.get("https://www.ebay.com/sch/i.html", params=params, headers=headers, proxies=proxies)
# html = r.content
html = open("ebay.html", "r").read()
soup = BeautifulSoup(html, "lxml")
titles = [title.get_text() for title in soup.select("h3.s-item__title")]
print(titles)
Output
['Custom Photo Mug Coffee Cup Picture Photo Text Phrase Logo Personalized Gift', 'Automatic Self Stirring Coffee Mixing Cup Mug 400ml Electric Stainless Steel NEW', 'Grinch Cup Of Fuckoffee One Splash No One Cares Gift Black Coffee Mug', '6 Pack Stainless Steel Coffee Soup Mug Tumbler Camping Mug Cup 12oz ', 'BULLET THERMOS 16oz Coffee Hunting Hot Insulated Stainless Steel Cup Mug Flask', 'Lot of 1-48 Pack Stainless Steel Coffee Soup Mug Tumbler Camping Mug Cup 16 oz ', 'Custom Magic Coffee Cup Personalized Image Photo Text Color Changing Mug Gift', '400ml Automatic Self Stirring Mug Coffee Cup Mixer Tea Home Insulated Green', 'Set of 2 Strong Glass Double Wall Coffee Mug Tea Espresso Cup 16oz with Wood Lid', 'Funny Mug Donald Trump 45 Coffee Mug President Gun Right Cup Political Gifts Mug', 'Custom Photo Mug Coffee Cup Picture Photo Text Phrase Logo Personalized Gift 15', 'Trump 2020 Keep America Great Office Work Cup Gift Coffee Tea Ceramic Mug', 'New ListingNew Custom Red coffee cup set', 'Green Self Stirring Mug Coffee Cup Tea Auto Mixer Drink Insulate Stainless 400ml', 'Have A Nice Day Coffee Mug Middle Finger Funny Cup for Milk, Juice, Tea, Coffee ', 'Custom Magic Coffee Cup Personalized Image Photo Text Heart Handle Inner Colour', 'New ListingJoe Biden & Harris 46th President Inauguration Day 2021 Coffee Mug Gifts Cup', 'Travel Coffee Mug Stainless Steel Lid Tea Drink Tea Cup Handle Double Wall 10 Oz', 'MAGA Make America Great Again 45th President Donald Trump Coffee Mug Cup', 'Black Pistol Gun Mug Coffee Tea Beverage Cup Novelty Handgun Grip Joke Gift', 'Auto Mixing coffee cup Stainless Electric Lazy Self Stirring Mug Tea Mug', 'Personalized Coffee Mug Custom Photo Text Logo Name Printed Gift Ceramic Cup ', 'Baby Yoda The Child Mandalorian Mug No Coffee No Forcee Meme Ceramic ( Black)', 'HOT & Cold Double Insulated Self Stirring Funny Joke Mug Electric Coffee Cup WS', '400ML Camera Lens Cup Coffee Travel Mug Thermos Stainless Steel Leak-Proof Lid', 'Trump 2020 Keep America Great - Ceramic Coffee Mug Tea Cup', 'Mug Set of 5, Coffee Cup with Accent Spoon Tea Cups 12 Oz. Porcelain Mugs Black', 'MAGA Make America Great Again 45th President Donald Trump Coffee Mug Cup', 'Starbucks Style 18oz Stainless Steel Thermal Insulated Chubby Coffee Mug Cup', 'To My Wife Gift Mug - Never Forget That I Love You - Coffee Mug Tea Cup Gift', 'Donald Trump Twitter Ban Funny Gift White Coffee Mug 11Oz 15Oz', "New ListingFunny Valentine's Coffee Mug for Men - Thanks for All the Coffee Mug - Valentine", 'Bruntmor Porcelain Coffee Cups Mugs Set of 6 Large sized 12 Ounce Matte Black', 'New ListingRare! MAC MILLER Coffee Mug Cup Tea 11oz ', 'New ListingGifts For Husband - You Are My Everything Coffee Mug - Meaningful Gift Idea ', 'New ListingCeramic Coffee Tea Cup Gift, Funny Coffe Mug Gift Mug Coffee Tea Cup ', 'I love catturd mug Perfect gift for valentine day, Best gift ever', 'New ListingWorlds Best Boss Mug 12oz Manager Gift Supervisor Coworker Coffee Cup', 'Maxam 15oz Double Wall Stainless Steel Coffee Cup', 'To My Gorgeous Wife Meeting You Was Fate Black Ceramic 11oz 15oz Coffee Tea Cup', 'New ListingFunny Naruto Coffee Tea Cup Mug', 'New ListingHallmark Signature Sunflower Coffee Cup Mug Always', 'New ListingInspired ! Kiss Band Ricks White Tea Coffee Mug Cup 11oz', '12V Electric Heated Travel Mug Stainless Steel Coffee Tea Cup Warmer Heater NEW', 'Stainless Steel Insulated Double Wall Travel Coffee Mug Cup 14 Oz Thermos Tea !!', 'New ListingJoe Biden -Trump Coffee Mug Cup 11oz Tea Coffee', 'New ListingA Ceramic Coffee Mug Beverage Cup ..."Shopping Solves Everything" ', 'The Big Guy Coffee Mug Joe Biden Sunglasses Red White Blue Election Coffee Cup', 'Bruntmor Ceramic Stacking Coffee Mug Set Tea Cup Set of 6 Matte Black 18 Ounce', 'New ListingThankful Porcelain Ceramic Thanksgiving and Fall Coffee Mug', 'To My Husband Never Forget That I Love You If Could Give One Gift Coffee Mug', 'Stainless Steel Insulated Travel Coffee Mug CUP 16 OZ NEW! PURPLE color', 'Funny Parody Middle Finger Coffee Mug Home Office Ceramic Porcelain Tea Cup Gift', '2pk Donut Warming White Ceramic Coffee Mug Set 20oz Cup Drip Trap Mustache Guard', 'Firefly Serenity Shiny Coffee Mug Tea Cup Novelty Gift Mugs', 'Polka Dot Coffee Cups Mugs Set of 6 Large sized 16 Ounce Ceramic Coffee Mug Set', 'Customized Prescription Bottle Coffee Mug Tea Cup', 'Camera Lens 24-105mm Travel Coffee Mug / Cup with Drinking Lid Best Gift', "New ListingCeramic Coffee Mug with Joe Biden's Big Cup Campaign Design for Presidents 11oz", 'Stainless Steel Double Wall Mug Cup Portable Travel Coffee Tea Cups 500/350ml', 'New ListingCalm The Hell Down Coffee Mug', 'Ceramic Coffee Cups Mugs Set of 6 Large-sized 16 Ounce Grooved Mugs Matte Black', "Big Sister Mug - Big Sister Gift - God Created My World's Best Funny Coffee Cup"]

Scrape Google News with lxml and python

I'm trying to scrape Google News using python and lxml. Everything is going well but when I try to print each div data using a for loop everything mess up.
Here my code:
# -*- coding: utf-8 -*-
from stem import Signal
from stem.control import Controller
from lxml import html
from lxml import cssselect
from lxml import etree
import requests
proxies = {
'http' : 'http://127.0.0.1:8123'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
url = "https://www.google.it/search?hl=en&tbm=nws&as_occt=any&tbs=cdr:1,cd_min:9/1/2014,cd_max:9/1/2014,sbd:1&as_nsrc=Daily%20Mail&start=0"
page = requests.get(url,proxies=proxies,headers=headers)
tree = html.fromstring(page.content)
results = tree.xpath('//div[#class="_cnc"]')
for div in results:
print(div)
I get this output:
<Element div at 0x7f4154df9470>
<Element div at 0x7f4154df94c8>
<Element div at 0x7f4154df9520>
<Element div at 0x7f4154df9578>
<Element div at 0x7f4154df95d0>
<Element div at 0x7f4154df9628>
<Element div at 0x7f4154df9680>
<Element div at 0x7f4154df96d8>
<Element div at 0x7f4154df9730>
<Element div at 0x7f4154df9788>
I would like to extract from each div -> title, href and snippet, with something like this:
....
for div in results:
title = div.xpath('//a[#class="l _HId"]/text()')
href = div.xpath('//a[#class="l _HId"]/#href')
snippet = div.xpath('//div[#class="st"]/text()')
#for example
print(title)
....
When I try to print I get same multiple output:
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
Does anyone know what's wrong with my code?
You are almost there - just prepend the dots to the inner XPath expressions to make them specific to the context of the current node:
for div in results:
title = div.xpath('.//a[#class="l _HId"]/text()')
href = div.xpath('.//a[#class="l _HId"]/#href')
snippet = div.xpath('.//div[#class="st"]/text()')
#for example
print(title)

Categories