<div class="content">
<div class="container">
<div class="row pt-2">
<div class="col pe-1">
<div class="grid-cell p-2">
<a href="united-states_florida/company/met-west-commercial-lender/tom-mchugh-975">
Tom Mchugh
<div class="col ps-1">
<div class="grid-cell p-2">
<a href="united-states_florida/company/met-west-commercial-lender">
Met West Commercial Lender
My result showing like this
I want to look like following table:
Column A
Column B
Tom Mchugh
Met West Commercial Lender
There might be different approaches. Here is an elegant one.
y = df.Name.values
df = pd.DataFrame({'A' : y[::2], 'B' : y[1::2]})
I am trying to pull the name and position of random people from Sales Navigator. Each person shows up as a card that contains all the information. I obtain a list of the cards but then I want to get for each one the Name and Title. I have tried using the code below to get the information from a card, the HTML of one result is below.
So far, my attempts always return an error indicating that the element could not be found. How could I solve this?
def testeo(driver):
lista = driver.find_elements_by_xpath("//*[contains(#class,'pv5 ph2 search-results__result-item')]")
nombres = []
for i in range(0, len(lista)):
lista[i].find_element_by_xpath(".//*[contains(#class,'t-14 t-bold')]").text))
<li class="pv5 ph2 search-results__result-item" data-scroll-into-view="urn:li:fs_salesProfile:(ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T)">
<div class="pt5 absolute search-results__select-container">
<input id="search-result-ember6830" class="small-input ember-checkbox ember-view" type="checkbox">
<label class="m0" for="search-result-ember6830">
<span class="a11y-text">
Select Jean Jongejan
<div style="" id="ember6866" class="flex full-width deferred-area ember-view"> <div class="search-results__result-container full-width pl2">
<div id="ember6981" class="ember-view"> <div id="ember6982" class="ember-view">
<h3 class="a11y-text">
Profile result – Jean Jongejan
<section class="result-lockup">
<h4 class="a11y-text">
Profile result lockup – Jean Jongejan
<div class="result-lockup__profile-info flex flex-column">
<div class="horizontal-person-entity-lockup-4 result-lockup__entity ml6">
<a href="/sales/people/ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T?_ntb=ErSmZYlWS8KlI9CD0cB6Yg%3D%3D" id="ember6985" class="result-lockup__icon-link ember-view">
<div class="presence-entity--size-4 relative mr2">
<img src="" loading="lazy" alt="Go to Jean Jongejan’s profile" id="ember6986" class="max-width max-height circle-entity-4 lazy-image ghost-person loaded ember-view">
<div class="presence-indicator presence-indicator--size-4 hidden presence-entity__indicator presence-entity__indicator--size-4" title="Reachable">
<span class="a11y-text">
Jean Jongejan is reachable
</a> </figure>
<dt class="result-lockup__name">
<a href="/sales/people/ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T?_ntb=ErSmZYlWS8KlI9CD0cB6Yg%3D%3D" id="ember6989" class="ember-view"> Jean Jongejan
</a> </dt>
<dd class="inline-flex vertical-align-middle">
<ul class="ml1 flex align-items-center list-style-none">
<li class="mr1">
<span class="a11y-text">
3rd degree contact
<span class="label-16dp block" aria-hidden="true">
<!----><!----><!----> </ul>
<dd class="result-lockup__highlight-keyword">
<span class="t-14 t-bold">EXT Key Account Management & Consultancy</span>
<span data-entity-hovercard-id="urn:li:fs_salesCompany:36314" class="result-lockup__position-company">
<a href="/sales/company/36314?_ntb=Z6Rvdg6sRMiPD6xsYlUuFQ%3D%3D" id="ember6991" class="Sans-14px-black-75%-bold ember-view"> <span aria-hidden="true">
<span class="a11y-text">
Go to Marimekko account page
</a> <button aria-expanded="false" aria-label="See more about Marimekko" class="entity-hovercard__a11y-trigger p0 b0" data-entity-hovercard-id="urn:li:fs_salesCompany:36314" data-entity-hovercard-trigger="click"></button>
<span class="t-12 t-black--light">
3 years 11 months in role and company
<ul class="mv1 t-12 t-black--light result-lockup__misc-list">
<li class="result-lockup__misc-item">Breda, North Brabant, Netherlands</li>
<!----> </div>
<div class="result-lockup__actions flex">
<ul class="result-lockup__common-actions">
<li class="result-lockup__action-item mb3">
<div class="display-flex">
<div id="ember6993" class="ember-view"> <div id="ember6995" class="save-to-list-dropdown artdeco-dropdown artdeco-dropdown--placement-bottom artdeco-dropdown--justification-right ember-view"><button aria-expanded="false" id="ember6996" class="save-to-list-dropdown__trigger ph4 artdeco-button artdeco-button--secondary artdeco-button--pro artdeco-button--1 m-type--message artdeco-dropdown__trigger artdeco-dropdown__trigger--placement-bottom ember-view" type="button" tabindex="0"> Save
<!----></button><div tabindex="-1" aria-hidden="true" id="ember6997" class="save-to-list-dropdown__content-container artdeco-dropdown__content artdeco-dropdown--is-dropdown-element artdeco-dropdown__content--has-arrow artdeco-dropdown__content--arrow-right artdeco-dropdown__content--justification-right artdeco-dropdown__content--placement-bottom ember-view"><div class="artdeco-dropdown__content-inner">
<div id="ember6998" class="ember-view">
</div> <div class="relative">
<div id="ember6999" class="ember-view">
<div id="ember7000" class="artdeco-dropdown artdeco-dropdown--placement-bottom artdeco-dropdown--justification-right ember-view"><button aria-expanded="false" id="ember7001" class="artdeco-dropdown__trigger result-lockup__action-button m-type--more artdeco-dropdown__trigger--non-button artdeco-dropdown__trigger--placement-bottom ember-view" type="button" tabindex="0"> <span class="a11y-text">See more actions for this result</span>
<li-icon aria-hidden="true" type="ellipsis-horizontal-icon" class="artdeco-button artdeco-button--tertiary artdeco-button--1 artdeco-button--muted p0" size="medium"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M2 10h4v4H2v-4zm8 4h4v-4h-4v4zm8-4v4h4v-4h-4z"></path>
<!----></button><div tabindex="-1" aria-hidden="true" id="ember7002" class="artdeco-dropdown__content result-lockup__dropdown-more artdeco-dropdown--is-dropdown-element artdeco-dropdown__content--has-arrow artdeco-dropdown__content--arrow-right artdeco-dropdown__content--justification-right artdeco-dropdown__content--placement-bottom ember-view"><!----></div></div>
<div id="ember7003" class="ember-view"><!----></div>
<!----></div> </div>
<!----> </ul>
<section class="result-context relative pt1">
<h4 class="a11y-text">Profile result context – Jean Jongejan</h4>
<!----> </section>
</div> </div>
Can you try this?
I am trying to scrape a title from an h1 class, but I keep getting "None"
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find('h1', {'class': 'prod-name'})
I've also tried using this way:
name_div = soup.find_all('div', {'class': 'col-md-12 col-sm-12 col-xs-12'})[0]
name = name_div.find('h1').text
in which case I get: "IndexError: list index out of range"
Can anybody help me out?
This is the source code:
<div class="row attachDetails __web-inspector-hidebefore-shortcut__">
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12">
<div class="brand-desc">POLO RALPH LAUREN</div>
<h1 class="prod-name">ARAN CREWNECK SWEATER</h1>
<div class="panel-group" id="accordion">
<div class="borders-overview">
<div class="panel-heading">
<h4 class="panel-title">
<label class="overview-label collapsed" data-angle="overview-label" data-toggle="collapse" data-parent="#accordion" href="#collapse1">
<a class="fa fa-angle-up pull-right"></a>
<a class="over-view">OVERVIEW</a>
<span class="color-disp over-view">COLOR: FAWN GREY HEATHER</span>
<span class="style-num over-view">MATERIAL# : 710766783002
<div id="collapse1" class="panel-collapse collapse">
<div class="short-desc-section"></div>
<div class="border-details">
<div class="panel-heading">
<h4 class="panel-title">
<label class="prod-details collapsed" data-angle="prod-details" data-toggle="collapse" data-parent="#accordion" href="#collapse2">
<a class="detail-link">Details</a>
<a class="fa fa-angle-up pull-right"></a>
<div id="collapse2" class="long-desc panel-collapse collapse">
<div><ol><li>STANDARD FIT</li><li>COTTON</li></ol></div>
<div><li><b>Board:</b> S196SC23</li></div>
The following html appears as a string in my code. That is okay, what I need though is how to get:
"class="company-image company-34""
for each company-## there is also a price found in this tag further below in the HTML:
class="small-12 medium-4 cell text-right" data-after="kr./år">1.813
I tried following code:
for x in html:
if "company-image company" in x:
print("Oh yes")
but it doesn't really work. My thinking is I look for everytime "company-image company" is mentioned and get the whole string and the following numbers after, it is always two numbers ##. And whenever it is found I look for "data-after="kr./år"" and get the numbers following. Eventually this would end in a for loop, as there are multiple companies and prices.
<app-offer-match _ngcontent-vdv-c20="" _nghost-vdv-c22="" class="ng-star-inserted">
<div _ngcontent-vdv-c22="" class="box">
<div _ngcontent-vdv-c22="" class="line1">
<div _ngcontent-vdv-c22="" class="company-image company-34"><img _ngcontent-vdv-c22="" src="/assets/images/companies/34.svg"></div>
<div _ngcontent-vdv-c22="" class="button compare">Sammenlign </div>
<div _ngcontent-vdv-c22="" class="line2">
<div _ngcontent-vdv-c22="" class="container-button">
<div _ngcontent-vdv-c22="" class="button mini-accordion"></div>
<div _ngcontent-vdv-c22="" class="container-insurance-list">
<div _ngcontent-vdv-c22="" class="indbo ng-star-inserted">
<div _ngcontent-vdv-c22="" class="grid-x container-product-overview">
<div _ngcontent-vdv-c22="" class="small-5 cell detail"><span _ngcontent-vdv-c22="">Indbo</span>
<!----><span _ngcontent-vdv-c22="" class="ng-star-inserted">Kongshaven 3</span>
<div _ngcontent-vdv-c22="" class="small-6 cell">
<div _ngcontent-vdv-c22="" class="grid-x price">
<div _ngcontent-vdv-c22="" class="small-12 medium-8 cell text-right" data-after="kr.">Selvrisiko 2.199</div>
<div _ngcontent-vdv-c22="" class="small-12 medium-4 cell text-right" data-after="kr./år">1.813 </div>
EDIT: Added desired output.
Desired output would be a pandas dataframe of:
Company Price
company-image company-34 1.813
It looks like an xml, that's because I formatted it like that for you guys. WHen I output it, it is of type STR, thank you.
Try this:
company = """[your string above]"""
import lxml.html as lh
import pandas as pd
doc = lh.fromstring(company)
columns = ["Company", "Price"]
rows = []
targets = doc.xpath('//div[contains(#class,"company-image company")]')
for target in targets:
row = []
price = target.xpath('../following-sibling::div//div[#data-after="kr./år"]')[0]
Company Price
0 company-image company-34 1.813
I just found out about how to process webpages in python using BeautifulSoup.
There's a list of div from which I want to get those in a specific range. The range is defined by two div that have a h2 child.
How would I do that? Thank you for your support!
EDIT: I added an actual representation of my html code below instead of a previous "simplified" version that was missing tags.
The new code shows a root div with class foo-bar-details.
Nested are 9 div tags. Two of which have a nested h2 tag. All of those 9 div tags contain img elements deeply nested within. What I need is each img element of those divs that are between the ones containing the h2 element.
An expected outcome if applied to the html code below would be:
<img src="../../images/123456_thumb.jpg" alt="Image 123456" title="Image 123456">
<img src="../../images/67890_thumb.JPG" alt="Image 67890 " title="Image 67890">
This is the html code:
<div class="foo-bar-details">
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/39826_thumb.JPG" alt="Image 39826" title="Image 39826 ">
<div class="img-description">
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>JHFDFD </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/223234_thumb.JPG" alt="Image 223234" title="Image 223234 ">
<div class="img-description">
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>sdfsdf </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/223823_thumb.JPG" alt="Image 223823" title="Image 223823 ">
<div class="img-description">
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-4">
<h2 class="h3 margin-bottom-5">
<ul class="list-inline margin-0">
<li> Foo feature </li>
<div id="info-panel-header" class="padding-y-10 padding-x-40">
<div class="row">
<div class="col-se-6 element-info">
<div class="col-se-12">
<div class="row">
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/123456_thumb.jpg" alt="Image 123456" title="Image 123456">
<div class="img-description">
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="sec-feat-4-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Foo strin: </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Barbar</strong><span class="icon-help"></span>
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Mine: </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
TEST<span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/67890_thumb.JPG" alt="Image 67890 " title="Image 67890">
<div class="img-description">
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-5">
<h2 class="h3 margin-bottom-5">
<ul class="list-inline margin-0">
<li> Bar feature </li>
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/39826_thumb.JPG" alt="Image 39826" title="Image 39826 ">
<div class="img-description">
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/209876_thumb.JPG" alt="Image 209876" title="Image 209876 ">
<div class="img-description">
Here is a solution involving lxml.html:
We extract all divs between the first and last divs which contain an h2 tag:
import lxml.html
# HTML file saved as "file.html"
file_name = "file.html"
with open(file_name, 'r') as f:
tree = lxml.html.fromstring(f.read())
# all_div = tree.findall('div')
all_div = tree.find_class('foo-bar-details')[0].findall('div')
start, stop = None, None
for k, div in enumerate(all_div):
if div.findall('h2') and start is None:
print("Range starts at %d" % k)
start = k
if div.findall('h2') and start is not None:
print("Range stops at %d" % k)
stop = k + 1 # add one as range stops at k - 1
# div_list = all_div[start:stop]
img_list = [_.xpath('.//img') for _ in all_div[start:stop]]
# [[], [<Element img at 0x20b58d73f40>], [<Element img at 0x20b58d73f90>], []]
# Or
img_list = [_.xpath('.//img/#src') for _ in all_div[start:stop]]
# [[], ['../../images/123456_thumb.jpg'], ['../../images/67890_thumb.JPG'], []]
Another solution involving SimplifiedDoc:
from simplified_scrapy.simplified_doc import SimplifiedDoc
html ='''
<div class="foo-bar-details">
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-4">
<h2 class="h3 margin-bottom-5">
<ul class="list-inline margin-0">
<li> Foo feature </li>
<div id="info-panel-header" class="padding-y-10 padding-x-40">Test 1</div>
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="foo-feat-4-1">Test 2</div>
<div class="padding-y-10 padding-x-40 " id="foo-feat-4-2">Test 3</div>
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="foo-feat-4-3">Test 4</div>
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-5">
<h2 class="h3 margin-bottom-5">
<ul class="list-inline margin-0">
<li> Bar feature </li>
doc = SimplifiedDoc(html)
divs = doc.select('div.foo-bar-details').divs.contains('<h2')
print ([div.id for div in divs])
divs = doc.select('div.foo-bar-details').divs.notContains('<h2')
print ([div.id for div in divs])
['elem-4', 'elem-5']
['info-panel-header', 'foo-feat-4-1', 'foo-feat-4-2', 'foo-feat-4-3']
Simplifieddoc library does not rely on the third-party library, which is lighter and faster, perfect for beginners.
Here are more examples here
If I understand you correctly, you want to find <img> tags and corresponding <h2> to which the images belong to.
This example (txt variable contains the HTML snippet from your question):
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for img in soup.select('div:has(h2) ~ div img'):
out.setdefault(img.find_previous('h2').get_text(strip=True), []).append(img['src'])
from pprint import pprint
{'Bar': ['../../images/39826_thumb.JPG', '../../images/209876_thumb.JPG'],
'Foo': ['../../images/123456_thumb.jpg', '../../images/67890_thumb.JPG']}
I just came to know lxmlx in python and I'm in the need for some help as I have no experience with XPath.
I want to get text data from a webpage into a dictionary.
I'm referring to the html snippet I posted below. Within the original html page there's a div element of the class general-info that I retrieve using the following line:
general_info = document_tree.xpath("//div[contains(concat(' ', normalize-space(#class), ' '), 'general-info')]")
From here on I want to iterate over the nested divs and get the 2 <p> tags as key and value. The text inside the <strong> being the key.
There can also be empty div tags and there can be a special case where the key and the value for the dictionary can be within the same div (see the last element).
The number of elements can change, so it would be best to use the <strong> tags as starting point and then search for the next <p> tag.
This is code that I was able to write using BeautifulSoup:
generalinfo = documentSoup.findAll("div", {"class": "general-info"})
if generalinfo:
strongs = generalinfo[0].find_all('strong')
for descr in strongs:
p = descr.find_next_sibling("p")
if p:
key = descr.text.strip().rstrip(':')
details_dict[key] = p.text.strip()
nextdiv = descr.parent.parent.find_next_sibling("div")
if nextdiv:
child = nextdiv.findChild()
if child:
key = descr.text.strip()[:-1]
details_dict[key] = child.text.strip()
I am going for the following output:
['Title:' : 'This is a title',
'Owner:' : 'This is an owner',
'Category:' : 'This is a categroy',
'Type:' : 'This is a type',
'Special case:' : 'This is a special case']
If anyone can help me out here I will appreciate this!
html code:
<div class="general-info margin-bottom-20 margin-top-20">
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a title</p>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is an owner</p>
<h2 class="h3 margin-top-10 margin-bottom-10 padding-x-20">Validity</h2>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a category</p>
<div class="row padding-x-40"></div>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a type</p>
<div class="row padding-x-40">
<strong>Special case:</strong>
<p>This is a special case</p>
I believe this is about as generalized as I can get given the html provided:
general_info = doc.xpath("//div[contains(concat(' ', normalize-space(#class), ' '), 'general-info')]//p[#class='margin-0']")
for i in general_info :
if len(i.xpath('./strong/text()'))>0:
topic = i.xpath('./strong/text()')[0]
if len(i.text.strip())>0:
entry += i.text.replace('\n','').strip()
print(topic+' '+i.text.replace('\n','').strip())
special = general_info[0].xpath('./ancestor::div[#class="general-info margin-bottom-20 margin-top-20"]//div/div/strong')[0]
print(special.text+" ",special.xpath('./following-sibling::p/text()')[0])
('Title: This is a title',
'Owner: This is an owner',
'Category: This is a category',
'Type: This is a type',
'Special case: This is a special case')
I recommend another solution, which is very suitable for extracting data from XML.
from simplified_scrapy.spider import SimplifiedDoc
<div class="general-info margin-bottom-20 margin-top-20">
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a title</p>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is an owner</p>
<h2 class="h3 margin-top-10 margin-bottom-10 padding-x-20">Validity</h2>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a category</p>
<div class="row padding-x-40"></div>
<div class="row padding-x-20">
<div class="col-sm-4">
<p class="margin-0">
<div class="col-sm-8">
<p class="margin-0">This is a type</p>
<div class="row padding-x-40">
<strong>Special case:</strong>
<p>This is a special case</p>
doc = SimplifiedDoc(html) # create doc
divs = doc.selects('div.general-info')
# First way
for div in divs:
strongs = div.strongs
for strong in strongs:
p = strong.next
if not p:
# Second way
for div in divs:
ds = div.selects('strong|p>text()')
for i in range(0,len(ds),2):
{'Title:': 'This is a title', 'Owner:': 'This is an owner', 'Category:': 'This is a category', 'Type:': 'This is a type', 'Special case:': 'This is a special case'}
{'Title:': 'This is a title', 'Owner:': 'This is an owner', 'Category:': 'This is a category', 'Type:': 'This is a type', 'Special case:': 'This is a special case'}
Here are more examples:https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples/