I have this string and want to turn it into two arrays, one has the film title and the other one has the year. Their positions in the array need to correspond with each other. Is there a way to do this?
films = ("""Endless Love (1981), Top Gun (1986), The Color of Money (1986), Rain Man (1988),
Born on the Fourth of July (1989), Interview with the Vampire: The Vampire Chronicles (1994),
Mission: Impossible (1996), Jerry Maguire (1996), The Matrix (1999), Mission: Impossible II (2000),
Vanilla Sky (2001), Cocktail (1988), A Few Good Men (1992), The Firm (1993), Eyes Wide Shut (1999),
Magnolia (1999), Minority Report (2002), Austin Powers in Goldmember (2002), Days of Thunder (1990),
The Powers of Matthew Star (1982), Cold Mountain (2003), The Talented Mr. Ripley (1999),
War of the Worlds (2005), The Oprah Winfrey Show (1986), Far and Away (1992), Taps (1981),
The Last Samurai (2003), Valkyrie (2008), Jack Reacher (2012), Edge of Tomorrow (2014),
Enemy of the State (1998), Mission: Impossible III (2006), Crimson Tide (1995), Reign Over Me (2007),
Batman Forever (1995), Batman Begins (2005), The Simpsons (1989), The Simpsons: Brother from the Same Planet (1993),
The Simpsons: When You Dish Upon a Star (1998), End of Days (1999), House of D (2004), The Indian Runner (1991),
Harry & Son (1984), Mission: Impossible - Ghost Protocol (2011), Aladdin (1992), Pacific Rim (2013),
Oblivion (2013), Knight and Day (2010),
""")
First split the input string on comma to generate a list, then use comprehensions to get the title and year as separate lists.
films_list = re.split(r',\s*', films)
titles = [re.split(r'\s*(?=\(\d+\))', x)[0] for x in films_list]
years = [re.split(r'\s*(?=\(\d+\))', x)[1] for x in films_list]
Answer of Tim is well enough. I will try to write an alternative someone who would like to solve the problem without using regex.
a = films.split(",")
years = []
for i in a:
years.append(i[i.find("(")+1:i.find(")")])
Same approach can be applied for titles.
You can do something like this (without any kind of import or extra module needed, or regex complexity):
delimeter = ", "
movies_with_year = pfilms.split(delimeter)
movies = []
years = []
for movie_with_year in movies_with_year:
movie = movie_with_year[:-6]
year = movie_with_year[-6:].replace("(","").replace(")","")
movies.append(movie)
years.append(year)
This script will result in something like this:
movies : ['Endless Love ', ...]
years : ['1981', ...]
You shuold clear all "new line" (|n) and use try/except to pass over the last elemet issue.
films = ("""Endless Love (1981), Top Gun (1986), The Color of Money (1986), Rain Man (1988),
Born on the Fourth of July (1989), Interview with the Vampire: The Vampire Chronicles (1994),
Mission: Impossible (1996), Jerry Maguire (1996), The Matrix (1999), Mission: Impossible II (2000),
Vanilla Sky (2001), Cocktail (1988), A Few Good Men (1992), The Firm (1993), Eyes Wide Shut (1999),
Magnolia (1999), Minority Report (2002), Austin Powers in Goldmember (2002), Days of Thunder (1990),
The Powers of Matthew Star (1982), Cold Mountain (2003), The Talented Mr. Ripley (1999),
War of the Worlds (2005), The Oprah Winfrey Show (1986), Far and Away (1992), Taps (1981),
The Last Samurai (2003), Valkyrie (2008), Jack Reacher (2012), Edge of Tomorrow (2014),
Enemy of the State (1998), Mission: Impossible III (2006), Crimson Tide (1995), Reign Over Me (2007),
Batman Forever (1995), Batman Begins (2005), The Simpsons (1989), The Simpsons: Brother from the Same Planet (1993),
The Simpsons: When You Dish Upon a Star (1998), End of Days (1999), House of D (2004), The Indian Runner (1991),
Harry & Son (1984), Mission: Impossible - Ghost Protocol (2011), Aladdin (1992), Pacific Rim (2013),
Oblivion (2013), Knight and Day (2010),
""")
movies = []
years = []
for item in films.replace("\n", "").split("),"):
try:
movies.append(item.split(" (")[0])
years.append(item.split(" (")[-1])
except:
...
Related
I am attempting to extract all the events from a wiki article on a date, such as May 9 (for example), and have all those events in a one-column dataframe while also ignoring the <h3> tag sub-headings Pre-1600, 1601–1900, 1901–present. All those events in those subsections should just be concatenated together into one column seamlessly.
I also want to ignore the other sections such as births, deaths, etc which are denoted in <h2> tag as well. So, only the events section is being extracted. The <h2> tag/section of interest is the second in the list as seen here.
import requests, itertools, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://en.wikipedia.org/wiki/May_9').text, 'html.parser')
h2 = d.find_all("h2")
h2
[<h2 id="mw-toc-heading">Contents</h2>,
<h2><span class="mw-headline" id="Events">Events</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2><span class="mw-headline" id="Births">Births</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2><span class="mw-headline" id="Deaths">Deaths</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2><span class="mw-headline" id="Holidays_and_observances">Holidays and observances</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2><span class="mw-headline" id="References">References</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2><span class="mw-headline" id="External_links">External links</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span>edit<span class="mw-editsection-bracket">]</span></span></h2>,
<h2>Navigation menu</h2>]
I'm struggling with constructing a function that selects the Events section and then the subsequent <li> tags but ignores the subheadings and the other sections.
I've attempted to separate out the <h2> sections with
data = [[i.name, i] for i in d.find_all(re.compile('h2|ul'))]
new_data = [[a, list(b)] for a, b in itertools.groupby(data, key=lambda x:x[0] == 'h2')]
But I'm stuck at this point. If there is a better approach, I'm happy to use it.
You can use .find_previous to check if previous <h2> is the Events heading:
import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/May_9"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for li in soup.select("h3 + ul > li"):
if (h2 := li.find_previous("h2")) and (h2.find(id="Events")):
date, event = li.text.replace("–", "-").split(" - ", maxsplit=1)
print("{:<10} {}".format(date, event))
Prints:
0328 Athanasius is elected Patriarch of Alexandria.[1]
1009 Lombard Revolt: Lombard forces led by Melus revolt in Bari against the Byzantine Catepanate of Italy.
1386 England and Portugal formally ratify their alliance with the signing of the Treaty of Windsor, making it the oldest diplomatic alliance in the world which is still in force.
1450 'Abd al-Latif (Timurid monarch) is assassinated.
1540 Hernando de Alarcón sets sail on an expedition to the Gulf of California.
1662 The figure who later became Mr. Punch makes his first recorded appearance in England.[2]
1671 Thomas Blood, disguised as a clergyman, attempts to steal England's Crown Jewels from the Tower of London.
1726 Five men arrested during a raid on Mother Clap's molly house in London are executed at Tyburn.
1864 Second Schleswig War: The Danish navy defeats the Austrian and Prussian fleets in the Battle of Heligoland.
1865 American Civil War: Nathan Bedford Forrest surrenders his forces at Gainesville, Alabama.
1865 American Civil War: President Andrew Johnson issues a proclamation ending belligerent rights of the rebels and enjoining foreign nations to intern or expel Confederate ships.
1873 Der Krach: Vienna stock market crash heralds the Long Depression.
1877 Mihail Kogălniceanu reads, in the Chamber of Deputies, the Declaration of Independence of Romania. This day became the Independence Day of Romania.
1901 Australia opens its first national parliament in Melbourne.
1911 The works of Gabriele D'Annunzio are placed in the Index of Forbidden Books by the Vatican.
1915 World War I: Second Battle of Artois between German and French forces.
1918 World War I: Germany repels Britain's second attempt to blockade the port of Ostend, Belgium.
1920 Polish-Soviet War: The Polish army under General Edward Rydz-Śmigły celebrates its capture of Kiev with a victory parade on Khreshchatyk.
1926 Admiral Richard E. Byrd and Floyd Bennett claim to have flown over the North Pole (later discovery of Byrd's diary appears to cast some doubt on the claim.)
1927 Old Parliament House, Canberra officially opens.[3]
1936 Italy formally annexes Ethiopia after taking the capital Addis Ababa on May 5.
1941 World War II: The German submarine U-110 is captured by the Royal Navy. On board is the latest Enigma machine which Allied cryptographers later use to break coded German messages.
1942 The Holocaust in Ukraine: The SS executes 588 Jewish residents of the Podolian town of Zinkiv (Khmelnytska oblast. The Zoludek Ghetto (in Belarus) is destroyed and all its inhabitants executed or deported.
1945 World War II: The final German Instrument of Surrender is signed at the Soviet headquarters in Berlin-Karlshorst.
1946 King Victor Emmanuel III of Italy abdicates and is succeeded by Umberto II.
1948 Czechoslovakia's Ninth-of-May Constitution comes into effect.
1950 Robert Schuman presents the "Schuman Declaration", is considered by some people to be the beginning of the creation of what is now the European Union.
1955 Cold War: West Germany joins NATO.
1960 The Food and Drug Administration announces it will approve birth control as an additional indication for Searle's Enovid, making Enovid the world's first approved oral contraceptive pill.
1969 Carlos Lamarca leads the first urban guerrilla action against the military dictatorship of Brazil in São Paulo, by robbing two banks.
1974 Watergate scandal: The United States House Committee on the Judiciary opens formal and public impeachment hearings against President Richard Nixon.
1979 Iranian Jewish businessman Habib Elghanian is executed by firing squad in Tehran, prompting the mass exodus of the once 100,000-strong Jewish community of Iran.
1980 In Florida, United States, Liberian freighter MV Summit Venture collides with the Sunshine Skyway Bridge over Tampa Bay, making a 1,400-ft. section of the southbound span collapse. Thirty-five people in six cars and a Greyhound bus fall 150 ft. into the water and die.
1980 In Norco, California, United States, five masked gunmen hold up a Security Pacific bank, leading to a violent shoot-out and one of the largest pursuits in California history. Two of the gunmen and one police officer are killed and thirty-three police and civilian vehicles are destroyed in the chase.
1987 LOT Flight 5055 Tadeusz Kościuszko crashes after takeoff in Warsaw, Poland, killing all 183 people on board.
1988 New Parliament House, Canberra officially opens.[3]
1992 Armenian forces capture Shusha, marking a major turning point in the First Nagorno-Karabakh War.
1992 Westray Mine disaster kills 26 workers in Nova Scotia, Canada.
2001 In Ghana, 129 football fans die in what became known as the Accra Sports Stadium disaster. The deaths are caused by a stampede (caused by the firing of tear gas by police personnel at the stadium) that followed a controversial decision by the referee.
2002 The 38-day stand-off in the Church of the Nativity in Bethlehem comes to an end when the Palestinians inside agree to have 13 suspected terrorists among them deported to several different countries.[4]
2017 US President Donald Trump fires FBI Director James Comey.[5]
2018 The historic defeat for Barisan Nasional, the governing coalition of Malaysia since the country's independence in 1957 in 2018 Malaysian general election.
2020 The COVID-19 recession causes the U.S. unemployment rate to hit 14.9 percent, its worst rate since the Great Depression.[6]
I'm trying to append a string of words into a list however when I try to index that list, it gives back individual letters.
For example:
url = 'https://almostginger.com/famous-movie-locations/'
titles = []
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
titles = soup.select('h3')
for t in titles:
tt = t.text.strip()
for s in range(len(tt)):
print(s)
Shows that only individual letters are indexed, whereas if I'm trying to create a list, I get the error:
titles.append(tt)
AttributeError: 'str' object has no attribute 'text'
Expected outcome:
'Café des Deux Moulins as seen in Amélie (2001)',
'Royal Palace of Caserta as seen in Angels and Demons (2009)'
You get an error simply because of a duplicate variable name. Change one of the titles to something else.
import requests
from bs4 import BeautifulSoup
url = 'https://almostginger.com/famous-movie-locations/'
titles_ = []
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
titles = soup.select('h3')
for t in titles:
tt = t.text.strip()
titles_.append(tt)
titles_
Output:
['Café des Deux Moulins as seen in Amélie (2001)',
'Royal Palace of Caserta as seen in Angels and Demons (2009)',
'Odesa Steps as seen in Battleship Potemkin (1926)',
'Promenade Plantée as seen in Before Sunset (2004)',
'Curracloe\xa0Beach as seen in Brooklyn (2015)',
'Belfry of Bruges as seen in In Bruges (2008)',
'Sirmione as seen in Call Me By Your Name (2017)',
'Villa del Balbianello as seen in Casino Royale (2006)',
'Neuschwanstein Castle as seen in Chitty Chitty Bang Bang (1968)',
'Nyhavn Harbour as seen in The Danish Girl (2015)',
'Rosslyn Chapel as seen in The Da Vinci Code (2006)',
'Highclere Castle as seen in Downton Abbey (2010-2019)',
'Juvet Landscape Hotel as seen in Ex Machina (2014)',
'Mini Hollywood as seen in For a Few Dollars More (1965)',
'The Dark Hedges as seen in Game of Thrones (2011-2019)',
'Kaufhaus Görlitz as seen in The Grand Budapest Hotel (2014)',
'Bar Vitelli as seen in The Godfather (1972)',
'Glenfinnan Viaduct as seen in Harry Potter and the Chamber of Secrets (2002)',
'Old Royal Naval College as seen in The King’s Speech (2010)',
'Trevi Fountain as seen in La Dolce Vita (1960)',
'Juliet’s House as seen in Letters to Juliet (2010)',
'Church of Agios Ioannis Kastri as seen in Mamma Mia! (2008)',
'Palace of Versailles as seen in Marie Antoinette (2006)',
'Shakespeare & Company Bookshop as seen in Midnight in Paris (2011)',
'Doune Castle as seen in Monty Python and The Holy Grail (1975)',
'The Notting Hill Bookshop as seen in Notting Hill (1999)',
'Belchite as seen in Pan’s Labyrinth (2006)',
'Umschlagplatz as seen in The Pianist (2002)',
'Popeye Village as seen in Popeye (1980)',
'Cliffs of Moher as seen in The Princess Bride (1987)',
'Wicklow Mountains in P.S. I Love You (2007)',
'Mouth of Truth as seen in Roman Holiday (1953)',
'Piłsudskiego Bridge as seen in Schindler’s List (1993)',
'Kirkjufell Mountain as seen in The Secret Life of Walter Mitty (2013)',
'Residenzplatz as seen in The Sound of Music (1965)',
'The Fairy Glen as seen in Stardust (2007)',
'Skellig Michael as seen in Star Wars Episode VIII: The Last Jedi (2017)',
'Spanish Steps as seen in The Talented Mr Ripley (1999)',
'Riesenrad Ferris Wheel as seen in The Third Man (1949)',
'Hotel Carlton as seen in To Catch a Thief (1955)',
'Tibidabo Amusement Park as seen in Vicky Cristina Barcelona (2008)',
'Haweswater Reservoir as seen in Withnail & I (1989)',
'Aït Benhaddou as seen in Gladiator (2000)',
'Masai Mara as seen in Out of Africa (1985)',
'Sidi Idriss Hotel as seen in Star Wars Episode III: A New Hope (1977)',
'Maya Bay as seen in The Beach (2000)',
'Hongcun Ancient Village as seen in Crouching Tiger Hidden Dragon (2000)',
'Lebua State Tower as seen in The Hangover Part II (2011)',
'Petra as seen in Indiana Jones and the Last Crusade (1989)',
'Angkor Thom as seen in Lara Croft: Tomb Raider (2001)',
'Park Hyatt Hotel as seen in Lost in Translation (2003)',
'Phang Nga Bay as seen in The Man with the Golden Gun (1974)',
'Burj Khalifa as seen in Mission: Impossible – Ghost Protocol (2011)',
'Chhatrapati Shivaji Maharaj Terminus as seen in Slumdog Millionaire (2008)',
'King’s Canyon as seen in The Adventures of Priscilla, Queen of the Desert (1994)',
'Hobbiton as seen in The Lord of the Rings: The Fellowship of the Ring (2001)',
'Pine Oak Court as seen in Neighbours (1985-Present)',
'Devil’s Tower as seen in Close Encounters of the Third Kind (1977)',
'Route 66 as seen in Easy Rider (1969)',
'Art Institute of Chicago as seen in Ferris Bueller’s Day Off (1986)',
'Monument Valley as seen in Forrest Gump (1994)',
'New York Public Library as seen in Ghostbusters (1984)',
'Salvation Mountain as seen in Into the Wild (2007)',
'Martha’s Vineyard as seen in Jaws (1975)',
'Griffith Observatory as seen in Rebel Without a Cause (1955)',
'Philadelphia Museum of Art as seen in Rocky (1976)',
'Edmund Pettus Bridge as seen in Selma (2014)',
'Timberline Lodge as seen in The Shining (1980)',
'Dead Horse Point State Park as seen in Thelma & Louise (1991)',
'Golden Gate Bridge as seen in Vertigo (1958)',
'Katz’s Delicatessen as seen in When Harry Met Sally (1989)',
'Prairie Mountain as seen in Brokeback Mountain (2005)',
'Iguazu Falls as seen in Indiana Jones and the Crystal Skull (2008)',
'Machu Picchu as seen in The Motorcycle Diaries (2004)',
'Bahia De Cacaluta Beach as seen in Y Tu Mamá También (2001)',
'3 thoughts on “75+ Famous Movie Locations You Can Actually Visit”',
'Leave a Reply Cancel reply']
This is a good use-case for a list comprehension:
import requests
from bs4 import BeautifulSoup as BS
with requests.Session() as session:
r = session.get('https://almostginger.com/famous-movie-locations/')
r.raise_for_status()
soup = BS(r.text, 'lxml')
titles = [title.text+'\n' for title in soup.select('h3')]
print(*titles)
from os import listdir
from os.path import isfile, join
from datasets import load_dataset
from transformers import BertTokenizer
test_files = [join('./test/', f) for f in listdir('./test') if isfile(join('./test', f))]
dataset = load_dataset('json', data_files={"test": test_files}, cache_dir="./.cache_dir")
After running the code, here output of dataset["test"]["abstract"]:
[['eleven politicians from 7 parties made comments in letter to a newspaper .',
"said dpp alison saunders had ` damaged public confidence ' in justice .",
'ms saunders ruled lord janner unfit to stand trial over child abuse claims .',
'the cps has pursued at least 19 suspected paedophiles with dementia .'],
['an increasing number of surveys claim to reveal what makes us happiest .',
'but are these generic lists really of any use to us ?',
'janet street-porter makes her own list - of things making her unhappy !'],
["author of ` into the wild ' spoke to five rape victims in missoula , montana .",
"` missoula : rape and the justice system in a college town ' was released april 21 .",
"three of five victims profiled in the book sat down with abc 's nightline wednesday night .",
'kelsey belnap , allison huguet and hillary mclaughlin said they had been raped by university of montana football '
'players .',
"huguet and mclaughlin 's attacker , beau donaldson , pleaded guilty to rape in 2012 and was sentenced to 10 years .",
'belnap claimed four players gang-raped her in 2010 , but prosecutors never charged them citing lack of probable '
'cause .',
'mr krakauer wrote book after realizing close friend was a rape victim .'],
['tesco announced a record annual loss of £ 6.38 billion yesterday .',
'drop in sales , one-off costs and pensions blamed for financial loss .',
'supermarket giant now under pressure to close 200 stores nationwide .',
'here , retail industry veterans , plus mail writers , identify what went wrong .'],
...,
['snp leader said alex salmond did not field questions over his family .',
"said she was not ` moaning ' but also attacked criticism of women 's looks .",
'she made the remarks in latest programme profiling the main party leaders .',
'ms sturgeon also revealed her tv habits and recent image makeover .',
'she said she relaxed by eating steak and chips on a saturday night .']]
I would like that each sentence to have this structure of tokenizing. How can I do such thing using huggingface? In fact, I think I have to flatten each list of the above list to get a list of strings and then tokenize each string.
I am new at python and i'm working on interface.I should take top 250 movies from imdb website.
def clicked(self):
movie=self.movie_name.text()
url="https://www.imdb.com/chart/top/"
response=requests.get(url)
html_content=response.content
soup=BeautifulSoup(html_content,"html.parser")
movie_name = soup.find_all("td",{"class":"titleColumn"})
for i in movie_name:
i=i.text
i=i.strip()
i=i.replace("\n","")
if (movie == i):
self.yazialani.setText(i)
and with this code output is like this:
6. Schindler's List(1993)
7. The Lord of the Rings: The Return of the King(2003)
8. Pulp Fiction(1994)
but for my project i just wanna take movies names not years and rankings.How should i change my code?
One primitive solution could be (considering your string is of the tipe digits+. +name_of_movie+(YEAR) is taking just
a=["6. Schindler's List(1993)", "7. The Lord of the Rings: The Return of the King(2003)", "8. Pulp Fiction(1994)"]
just_names=[]
for name in a:
i=0
while True:
if name[i]=='.':
just_names.append(name[i+2:-6]) # To delete the space after the point
break
i+=1
Only the name of the movie is contained in the anchor tag. So select anchor tag text for each td
import requests
from bs4 import BeautifulSoup
url="https://www.imdb.com/chart/top/"
response=requests.get(url)
html_content=response.content
soup=BeautifulSoup(html_content,"html.parser")
movie_name = soup.find_all("td",{"class":"titleColumn"})
for i in movie_name:
print(i.find("a").get_text(strip=True))
Output:
The Shawshank Redemption
The Godfather
The Godfather: Part II
The Dark Knight
12 Angry Men
Schindler's List
The Lord of the Rings: The Return of the King
Pulp Fiction
Il buono, il brutto, il cattivo
The Lord of the Rings: The Fellowship of the Ring
Fight Club
Forrest Gump
Inception
Star Wars: Episode V - The Empire Strikes Back
The Lord of the Rings: The Two Towers
The Matrix
Goodfellas
One Flew Over the Cuckoo's Nest
Shichinin no samurai
Se7en
La vita è bella
Cidade de Deus
The Silence of the Lambs
Hamilton
It's a Wonderful Life
Star Wars
Saving Private Ryan
Sen to Chihiro no kamikakushi
Gisaengchung
The Green Mile
Interstellar
Léon
The Usual Suspects
Seppuku
The Lion King
Back to the Future
The Pianist
Terminator 2: Judgment Day
American History X
Modern Times
Psycho
Gladiator
City Lights
The Departed
The Intouchables
Whiplash
The Prestige
...
...
..
I wrote the code below, and I made a dictionary for it, but I want Create tuples of (lemma, NER type) and Collect counts over the tuples I dont know how to do it? can you pls help me? NER type means name entity recognition
text = """
Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.
"""
doc = nlp(text).ents
en = [(entity.text, entity.label_) for entity in doc]
en
#entities
#The list stored in variable entities is has type list[list[tuple[str, str]]],
#from pprint import pprint
pprint(en)
sum(filter(None, entities), [])
from collections import defaultdict
type2entities = defaultdict(list)
for entity, entity_type in sum(filter(None, entities), []):
type2entities[entity_type].append(entity)
from pprint import pprint
pprint(type2entities)
I hope the following code snippets solve your problem.
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
text = ("Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.")
doc = nlp(text)
lemma_ner_list = []
for entity in doc.ents:
lemma_ner_list.append((entity.lemma_, entity.label_))
# print list of lemma ner tuples
print(lemma_ner_list)
# print count of tuples
print(len(lemma_ner_list))