Web Scraping: getting KeyError when parsing JSON in Python

Web Scraping: getting KeyError when parsing JSON in Python - python

I want to extract the full address from the webpage and I'm using BeautifulSoup and JSON.
Here's my code:
import bs4
import json
from bs4 import BeautifulSoup
import requests
url = 'xxxxxxxxxxxxxxxxx'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
info = json.loads(i.get('data-payload'))
I printed 'info' out:
{'storeName': None, 'props': {'locations': [{'dirty': False, 'updated_at': '2016-05-05T07:57:19.282Z', 'country_code': 'US', 'company_id': 106906, 'longitude': -74.0001954, 'address': '5 Crosby St 3rd Floor', 'state': 'New York', 'full_address': '5 Crosby St 3rd Floor, New York, 10013, New York, USA', 'country': 'United States', 'id': 17305, 'to_params': 'new-york-us', 'latitude': 40.719753, 'region': '', 'city': 'New York', 'description': '', 'created_at': '2015-01-19T01:32:16.317Z', 'zip_code': '10013', 'hq': True}]}, 'name': 'LocationsMapList'}
What I want is the "full_address" under "location" so my code was:
info = json.loads(i.get('data-payload'))
for i in info['props']['locations']:
print (i['full_address'])
But I got this error:
----> 5 for i in info['props']['locations']:
KeyError: 'locations'
I want to print the full address out, which is '5 Crosby St 3rd Floor, New York, 10013, New York, USA'.
Thanks a lot!

The data you are parsing seem to be inconsistent, the keys are not in all objects.
If you still want to perform a loop, you need to use a try/except statement to catch an exception, or the method get to set a fallback when you're looking for a key in a dictionary that could be not here.
info = json.loads(i.get('data-payload'))
for item in info['props'].get('locations', []):
print (item.get('full_address', 'no address'))
get('locations', []) : returns an empty list if the key location doesn't exist, so the loop doesn't run any iteration.
get('full_address', 'no address') : returns "no adress" in case there is no such key
EDIT :
The data are inconsistent (never trust data). Some JSON objects have a key props with a null /None value. The next fix should correct that :
info = json.loads(i.get('data-payload'))
if info.get('props'):
for item in info['props'].get('locations', []):
print (item.get('full_address', 'no address'))

Your first object is fine, but it's clear that your second object has no locations key anywhere, nor full_address.

Related

Python Returning An Empty List Using Beautiful Soup HTML Parsing

I'm currently working on a project that involves web scraping a real estate website (for educational purposes). I'm taking data from home listings like address, price, bedrooms, etc.
After building and testing along the way with the print function (it worked successfully!), I'm now building a dictionary for each data point in the listing. I'm storing that dictionary in a list in order to eventually use Pandas to create a table and send to a CSV.
Here is my problem. My list is displaying an empty dictionary with no error. Please note, I've successfully scraped the data already and have seen the data when using the print function. Now its displaying nothing after adding each data point to a dictionary and putting it in a list. Here is my code:
import requests
from bs4 import BeautifulSoup
r=requests.get("https://www.century21.com/real-estate/colorado-springs-co/LCCOCOLORADOSPRINGS/")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div", {"class":"infinite-item"})
all[0].find("a",{"class":"listing-price"}).text.replace("\n","").replace(" ","")
l=[]
for item in all:
d={}
try:
d["Price"]=item.find("a",{"class":"listing-price"}.text.replace("\n","").replace(" ",""))
d["Address"]=item.find("div",{"class":"property-address"}).text.replace("\n","").replace(" ","")
d["City"]=item.find_all("div",{"class":"property-city"})[0].text.replace("\n","").replace(" ","")
try:
d["Beds"]=item.find("div",{"class":"property-beds"}).find("strong").text
except:
d["Beds"]=None
try:
d["Baths"]=item.find("div",{"class":"property-baths"}).find("strong").text
except:
d["Baths"]=None
try:
d["Area"]=item.find("div",{"class":"property-sqft"}).find("strong").text
except:
d["Area"]=None
except:
pass
l.append(d)
When I call l (the list that contains my dictionary) - this is what I get:
[{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{},
{}]
I'm using Python 3.8.2 with Beautiful Soup 4. Any ideas or help with this would be greatly appreciated. Thanks!

This does what you want much more concisely and is more pythonic (using nested list comprehension):
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.century21.com/real-estate/colorado-springs-co/LCCOCOLORADOSPRINGS/")
c = r.content
soup = BeautifulSoup(c, "html.parser")
css_classes = [
"listing-price",
"property-address",
"property-city",
"property-beds",
"property-baths",
"property-sqft",
]
pl = [{css_class.split('-')[1]: item.find(class_=css_class).text.strip() # this shouldn't error if not found
for css_class in css_classes} # find each class in the class list
for item in soup.find_all('div', class_='property-card-primary-info')] # find each property card div
print(pl)
Output:
[{'address': '512 Silver Oak Grove',
'baths': '6 baths',
'beds': '4 beds',
'city': 'Colorado Springs CO 80906',
'price': '$1,595,000',
'sqft': '6,958 sq. ft'},
{'address': '8910 Edgefield Drive',
'baths': '5 baths',
'beds': '5 beds',
'city': 'Colorado Springs CO 80920',
'price': '$499,900',
'sqft': '4,557 sq. ft'},
{'address': '135 Mayhurst Avenue',
'baths': '3 baths',
'beds': '3 beds',
'city': 'Colorado Springs CO 80906',
'price': '$420,000',
'sqft': '1,889 sq. ft'},
{'address': '7925 Bard Court',
'baths': '4 baths',
'beds': '5 beds',
'city': 'Colorado Springs CO 80920',
'price': '$405,000',
'sqft': '3,077 sq. ft'},
{'address': '7641 N Sioux Circle',
'baths': '3 baths',
'beds': '4 beds',
'city': 'Colorado Springs CO 80915',
'price': '$389,900',
'sqft': '3,384 sq. ft'},
...
]

You should use function to do the repetitive job. This would make your code clearer.
I've managed this code, that is working:
import requests
from bs4 import BeautifulSoup
def find_div_and_get_value(soup, html_balise, attributes):
return soup.find(html_balise, attrs=attributes).text.replace("\n","").strip()
def find_div_and_get_value2(soup, html_balise, attributes):
return soup.find(html_balise, attrs=attributes).find('strong').text.replace("\n","").strip()
r=requests.get("https://www.century21.com/real-estate/colorado-springs-co/LCCOCOLORADOSPRINGS/")
c=r.content
soup = BeautifulSoup(c,"html.parser")
houses = soup.findAll("div", {"class":"infinite-item"})
l=[]
for house in houses:
try:
d = {}
d["Price"] = find_div_and_get_value(house, 'a', {"class": "listing-price"})
d["Address"] = find_div_and_get_value(house, 'div', {"class": "property-address"})
d["City"] = find_div_and_get_value(house, 'div', {"class":"property-city"})
d["Beds"] = find_div_and_get_value2(house, 'div', {"class":"property-beds"})
d["Baths"] = find_div_and_get_value2(house, 'div', {"class":"property-baths"})
d["Area"] = find_div_and_get_value2(house, 'div', {"class":"property-sqft"})
l.append(d)
except:
break
for house in l:
print(house)

Get JSON data with Django

I am playing around with GeoIP2 and requested the following in my view.
g = GeoIP2()
city = g.city('google.com')
tests = Test.objects.all()
args = { 'tests': tests }
return render(request, 'app/home.html', args)
I receive a JSON response with a bunch of data, I am interested in "city" for example.
{'city': None, 'continent_code': 'NA', 'continent_name': 'North America', 'country_code': 'US', 'country_name': 'United States', 'dma_code': None, 'latitude': 37.751, 'longitude': -97.822, 'postal_code': None, 'region': None, 'time_zone': 'America/Chicago'}
My model
# Create your models here.
class Test(models.Model):
city = models.CharField(default='', max_length=100)
def __str__(self):
return self.browser_family
Despite some googling and Youtube videos I am not quite sure how I should grab for example "city" out of the JSON response. I looked at previous threads here but not quite sure it can be applied here, seems like other threads were for more sophisticated stuff.
Any suggestions out there?
SOLVED
city = g.city('google.com')
json = city['city']

You can grab/assign it as follows:
city = JSON_response(‘city‘)
What happens here:
You assign the value of the key "city" of your JSON_reponse to the variable "city.
In your example city will be None

Where I can get payer_id for executing payment (paypal)?

Working with PayPal Python REST SDK. After payment created we need to execute this payment. At https://github.com/paypal/PayPal-Python-SDK/blob/master/samples/payment/execute.py is code sample. But when executing payment
if payment.execute({"payer_id": "DUFRQ8GWYMJXC"}): # return True or False
print("Payment[%s] execute successfully" % (payment.id))
else:
print(payment.error)
we need to write payer_id. But how I can take it? Any ideas or code examples?

The payerid is returned in the host headers on the callback - Alternatively, you can do the following..
In the process of creating a sale, do...
saleinfo_ = {"intent":"sale",
"redirect_urls":{
"return_url":("myurlonsuccess"),
"cancel_url":("myurlonfail")
},
"payer":{"payment_method":"paypal"},
"transactions":[
{"amount":{"total":out_totaltopay_, "details":{"tax":out_taxes_, "subtotal":out_subtotal_}, "currency":symbol_}, "description":description_}
payment_ = Payment(saleinfo_)
if (payment_.create()):
token_ = payment_.id
Then when the return callback arrives, use...
payment_ = Payment.find(token_)
if (payment_ != None):
payerid_ = payment_.payer.payer_info.payer_id
if (payment_.execute({"payer_id":payerid_})):
....
The json data received in the find process is similar to the following
{'payment_method': 'paypal', 'status': 'VERIFIED', 'payer_info': {'shipping_address': {'line1': '1 Main St', 'recipient_name': 'Test Buyer', 'country_code': 'US', 'state': 'CA', 'postal_code': '95131', 'city': 'San Jose'}, 'first_name': 'Test', 'payer_id': '<<SOMEID>>', 'country_code': 'US', 'email': 'testbuyer#mydomain.com', 'last_name': 'Buyer'}}
Hope that helps

Get headlines from web archive

I am trying to get headline from www.bbc.co.uk/news. The code I have works fine and it is as below:
from bs4 import BeautifulSoup, SoupStrainer
import urllib2
import re
opener = urllib2.build_opener()
url = 'http://www.bbc.co.uk/news'
soup = BeautifulSoup(opener.open(url), "lxml")
titleTag = soup.html.head.title
print(titleTag.string)
titles = soup.find_all('span', {'class' : 'title-link__title-text'})
headlines = [t.text for t in titles]
print(headlines)
But I would like to build a dataset from a given date, let's say 1st April 2016. But the headlines keep on changing during the day and BBC does not keep the history.
So I thought to get it from web archive. For example, I would like to get headlines from this url (http://web.archive.org/web/20160203074646/http://www.bbc.co.uk/news) for the timestamp 20160203074646.
When I paste the url in my code, the output contains the headlines.
EDIT
But how do I automate this process for all the timestamps?

To see all snapshots for a given URL, replace the timestamp with an asterisk:
http://web.archive.org/web/*/http://www.bbc.co.uk
then screen scrape that.
A few things to consider:
The Wayback API will give you the nearest single snapshot to a given timestamp. You seem like you want all available snapshots, which is why I suggested screen scraping.
The BBC might change headlines faster than the Wayback Machine can snapshot them.
The BBC provides RSS feeds which can be parsed more reliably. There is a listing under "Choose a Feed".
EDIT: have a look at the feedparser docs
import feedparser
d = feedparser.parse('http://feeds.bbci.co.uk/news/rss.xml?edition=uk')
d.entries[0]
Output
{'guidislink': False,
'href': u'',
'id': u'http://www.bbc.co.uk/news/world-europe-37003819',
'link': u'http://www.bbc.co.uk/news/world-europe-37003819',
'links': [{'href': u'http://www.bbc.co.uk/news/world-europe-37003819',
'rel': u'alternate',
'type': u'text/html'}],
'media_thumbnail': [{'height': u'432',
'url': u'http://c.files.bbci.co.uk/12A34/production/_90704367_mediaitem90704366.jpg',
'width': u'768'}],
'published': u'Sun, 07 Aug 2016 21:24:36 GMT',
'published_parsed': time.struct_time(tm_year=2016, tm_mon=8, tm_mday=7, tm_hour=21, tm_min=24, tm_sec=36, tm_wday=6, tm_yday=220, tm_isdst=0),
'summary': u"Turkey's President Erdogan tells a huge rally in Istanbul that he would approve the return of the death penalty if it was backed by parliament and the public.",
'summary_detail': {'base': u'http://feeds.bbci.co.uk/news/rss.xml?edition=uk',
'language': None,
'type': u'text/html',
'value': u"Turkey's President Erdogan tells a huge rally in Istanbul that he would approve the return of the death penalty if it was backed by parliament and the public."},
'title': u'Turkey death penalty: Erdogan backs return at Istanbul rally',
'title_detail': {'base': u'http://feeds.bbci.co.uk/news/rss.xml?edition=uk',
'language': None,
'type': u'text/plain',
'value': u'Turkey death penalty: Erdogan backs return at Istanbul rally'}}

python BeautifulSoup parsing table

I'm learning python requests and BeautifulSoup. For an exercise, I've chosen to write a quick NYC parking ticket parser. I am able to get an html response which is quite ugly. I need to grab the lineItemsTable and parse all the tickets.
You can reproduce the page by going here: https://paydirect.link2gov.com/NYCParking-Plate/ItemSearch and entering a NY plate T630134C
soup = BeautifulSoup(plateRequest.text)
#print(soup.prettify())
#print soup.find_all('tr')
table = soup.find("table", { "class" : "lineItemsTable" })
for row in table.findAll("tr"):
cells = row.findAll("td")
print cells
Can someone please help me out? Simple looking for all tr does not get me anywhere.

Here you go:
data = []
table = soup.find('table', attrs={'class':'lineItemsTable'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
This gives you:
[ [u'1359711259', u'SRF', u'08/05/2013', u'5310 4 AVE', u'K', u'19', u'125.00', u'$'],
[u'7086775850', u'PAS', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'125.00', u'$'],
[u'7355010165', u'OMT', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'145.00', u'$'],
[u'4002488755', u'OMT', u'02/12/2014', u'NB 1ST AVE # E 23RD ST', u'5', u'115.00', u'$'],
[u'7913806837', u'OMT', u'03/03/2014', u'5015 4th Ave', u'K', u'46', u'115.00', u'$'],
[u'5080015366', u'OMT', u'03/10/2014', u'EB 65TH ST # 16TH AV E', u'7', u'50.00', u'$'],
[u'7208770670', u'OMT', u'04/08/2014', u'333 15th St', u'K', u'70', u'65.00', u'$'],
[u'$0.00\n\n\nPayment Amount:']
]
Couple of things to note:
The last row in the output above, the Payment Amount is not a part
of the table but that is how the table is laid out. You can filter it
out by checking if the length of the list is less than 7.
The last column of every row will have to be handled separately since it is an input text box.

Updated Answer
If a programmer is interested in only parsing a table from a webpage, they can utilize the pandas method pandas.read_html.
Let's say we want to extract the GDP data table from the website: https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries
Then following codes does the job perfectly (No need of beautifulsoup and fancy html):
Using pandas only
# sometimes we can directly read from the website
url = "https://en.wikipedia.org/wiki/AFI%27s_100_Years...100_Movies#:~:text=%20%20%20%20Film%20%20%20,%20%204%20%2025%20more%20rows%20"
df = pd.read_html(url)
df.head()
Using pandas and requests (More General Case)
# if pd.read_html does not work, we can use pd.read_html using requests.
import pandas as pd
import requests
url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"
r = requests.get(url)
df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
df = df_list[0]
df.head()
Required modules
pip install lxml
pip install requests
pip install pandas
Output

Solved, this is how your parse their html results:
table = soup.find("table", { "class" : "lineItemsTable" })
for row in table.findAll("tr"):
cells = row.findAll("td")
if len(cells) == 9:
summons = cells[1].find(text=True)
plateType = cells[2].find(text=True)
vDate = cells[3].find(text=True)
location = cells[4].find(text=True)
borough = cells[5].find(text=True)
vCode = cells[6].find(text=True)
amount = cells[7].find(text=True)
print amount

Here is working example for a generic <table>. (question links-broken)
Extracting the table from here countries by GDP (Gross Domestic Product).
htmltable = soup.find('table', { 'class' : 'table table-striped' })
# where the dictionary specify unique attributes for the 'table' tag
The tableDataText function parses a html segment started with tag <table> followed by multiple <tr> (table rows) and inner <td> (table data) tags. It returns a list of rows with inner columns. Accepts only one <th> (table header/data) in the first row.
def tableDataText(table):
rows = []
trs = table.find_all('tr')
headerow = [td.get_text(strip=True) for td in trs[0].find_all('th')] # header row
if headerow: # if there is a header row include first
rows.append(headerow)
trs = trs[1:]
for tr in trs: # for every table row
rows.append([td.get_text(strip=True) for td in tr.find_all('td')]) # data row
return rows
Using it we get (first two rows).
list_table = tableDataText(htmltable)
list_table[:2]
[['Rank',
'Name',
"GDP (IMF '19)",
"GDP (UN '16)",
'GDP Per Capita',
'2019 Population'],
['1',
'United States',
'21.41 trillion',
'18.62 trillion',
'$65,064',
'329,064,917']]
That can be easily transformed in a pandas.DataFrame for more advanced tools.
import pandas as pd
dftable = pd.DataFrame(list_table[1:], columns=list_table[0])
dftable.head(4)

I was interested in the tables in MediaWiki Version display such
as https://en.wikipedia.org/wiki/Special:Version
unit test
from unittest import TestCase
import pprint
class TestHtmlTables(TestCase):
'''
test the HTML Tables parsere
'''
def testHtmlTables(self):
url="https://en.wikipedia.org/wiki/Special:Version"
html_table=HtmlTable(url)
tables=html_table.get_tables("h2")
pp = pprint.PrettyPrinter(indent=2)
debug=True
if debug:
pp.pprint(tables)
pass
HtmlTable.py
'''
Created on 2022-10-25
#author: wf
'''
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
class HtmlTable(object):
'''
HtmlTable
'''
def __init__(self, url):
'''
Constructor
'''
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
self.html_page = urlopen(req).read()
self.soup = BeautifulSoup(self.html_page, 'html.parser')
def get_tables(self,header_tag:str=None)->dict:
"""
get all tables from my soup as a list of list of dicts
Args:
header_tag(str): if set search the table name from the given header tag
Return:
dict: the list of list of dicts for all tables
"""
tables = {}
for i,table in enumerate(self.soup.find_all("table")):
fields = []
table_data=[]
for tr in table.find_all('tr', recursive=True):
for th in tr.find_all('th', recursive=True):
fields.append(th.text)
for tr in table.find_all('tr', recursive=True):
record= {}
for i, td in enumerate(tr.find_all('td', recursive=True)):
record[fields[i]] = td.text
if record:
table_data.append(record)
if header_tag is not None:
header=table.find_previous_sibling(header_tag)
table_name=header.text
else:
table_name=f"table{i}"
tables[table_name]=(table_data)
return tables
Result
Finding files... done.
Importing test modules ... done.
Tests to run: ['TestHtmlTables.testHtmlTables']
testHtmlTables (tests.test_html_table.TestHtmlTables) ... Starting test testHtmlTables, debug=False ...
{ 'Entry point URLs': [ {'Entry point': 'Article path', 'URL': '/wiki/$1'},
{'Entry point': 'Script path', 'URL': '/w'},
{'Entry point': 'index.php', 'URL': '/w/index.php'},
{'Entry point': 'api.php', 'URL': '/w/api.php'},
{'Entry point': 'rest.php', 'URL': '/w/rest.php'}],
'Installed extensions': [ { 'Description': 'Brad Jorsch',
'Extension': '1.0 (b9a7bff) 01:45, 9 October '
'2022',
'License': 'Get a summary of logged API feature '
'usages for a user agent',
'Special pages': 'ApiFeatureUsage',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'Brion Vibber, Kunal Mehta, Sam '
'Reed, Aaron Schulz, Brad Jorsch, '
'Umherirrender, Marius Hoch, '
'Andrew Garrett, Chris Steipp, '
'Tim Starling, Gergő Tisza, '
'Alexandre Emsenhuber, Victor '
'Vasiliev, Glaisher, DannyS712, '
'Peter Gehres, Bryan Davis, James '
'D. Forrester, Taavi Väänänen and '
'Alexander Vorwerk',
'Extension': '– (df2982e) 23:10, 13 October 2022',
'License': 'Merge account across wikis of the '
'Wikimedia Foundation',
'Special pages': 'CentralAuth',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'Tim Starling and Aaron Schulz',
'Extension': '2.5 (648cfe0) 06:20, 17 October '
'2022',
'License': 'Grants users with the appropriate '
'permission the ability to check '
"users' IP addresses and other "
'information',
'Special pages': 'CheckUser',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'Ævar Arnfjörð Bjarmason and '
'James D. Forrester',
'Extension': '– (2cf4aaa) 06:41, 14 October 2022',
'License': 'Adds a citation special page and '
'toolbox link',
'Special pages': 'CiteThisPage',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'PediaPress GmbH, Siebrand '
'Mazeland and Marcin Cieślak',
'Extension': '1.8.0 (324e738) 06:20, 17 October '
'2022',
'License': 'Create books',
'Special pages': 'Collection',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'Amir Aharoni, David Chan, Joel '
'Sahleen, Kartik Mistry, Niklas '
'Laxström, Pau Giner, Petar '
'Petković, Runa Bhattacharjee, '
'Santhosh Thottingal, Siebrand '
'Mazeland, Sucheta Ghoshal and '
'others',
'Extension': '– (56fe095) 11:56, 17 October 2022',
'License': 'Makes it easy to translate content '
'pages',
'Special pages': 'ContentTranslation',
'Version': 'GPL-2.0-or-later'},
{ 'Description': 'Andrew Garrett, Ryan Kaldari, '
'Benny Situ, Luke Welling, Kunal '
'Mehta, Moriel Schottlender, Jon '
'Robson and Roan Kattouw',
'Extension': '– (cd01f9b) 06:21, 17 October 2022',
'License': 'System for notifying users about '
'events and messages',
'Special pages': 'Echo',
'Version': 'MIT'},
..
'Installed libraries': [ { 'Authors': 'Benjamin Eberlei and Richard Quadling',
'Description': 'Thin assertion library for input '
'validation in business models.',
'Library': 'beberlei/assert',
'License': 'BSD-2-Clause',
'Version': '3.3.2'},
{ 'Authors': '',
'Description': 'Arbitrary-precision arithmetic '
'library',
'Library': 'brick/math',
'License': 'MIT',
'Version': '0.8.17'},
{ 'Authors': 'Christian Riesen',
'Description': 'Base32 encoder/decoder according '
'to RFC 4648',
'Library': 'christian-riesen/base32',
'License': 'MIT',
'Version': '1.6.0'},
...
{ 'Authors': 'Readers Web Team, Trevor Parscal, Roan '
'Kattouw, Alex Hollender, Bernard Wang, '
'Clare Ming, Jan Drewniak, Jon Robson, '
'Nick Ray, Sam Smith, Stephen Niedzielski '
'and Volker E.',
'Description': 'Provides 2 Vector skins:\n'
'\n'
'2011 - The Modern version of MonoBook '
'with fresh look and many usability '
'improvements.\n'
'2022 - The Vector built as part of '
'the WMF mw:Desktop Improvements '
'project.',
'License': 'GPL-2.0-or-later',
'Skin': 'Vector',
'Version': '1.0.0 (93f11b3) 20:24, 17 October 2022'}],
'Installed software': [ { 'Product': 'MediaWiki',
'Version': '1.40.0-wmf.6 (bb4c5db)17:39, 17 '
'October 2022'},
{'Product': 'PHP', 'Version': '7.4.30 (fpm-fcgi)'},
{ 'Product': 'MariaDB',
'Version': '10.4.25-MariaDB-log'},
{'Product': 'ICU', 'Version': '63.1'},
{'Product': 'Pygments', 'Version': '2.10.0'},
{'Product': 'LilyPond', 'Version': '2.22.0'},
{'Product': 'Elasticsearch', 'Version': '7.10.2'},
{'Product': 'LuaSandbox', 'Version': '4.0.2'},
{'Product': 'Lua', 'Version': '5.1.5'}]}
test testHtmlTables, debug=False took 1.2 s
ok
----------------------------------------------------------------------
Ran 1 test in 1.204s
OK

from behave import *
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
class readTableDataFromDB:
def LookupValueFromColumnSingleKey(context, tablexpath, rowName, columnName):
print("element present readData From Table")
element = context.driver.find_elements_by_xpath(tablexpath+"/descendant::th")
indexrow = 1
indexcolumn = 1
for values in element:
valuepresent = values.text
print("text present here::"+valuepresent+"rowName::"+rowName)
if valuepresent.find(columnName) != -1:
print("current row"+str(indexrow) +"value"+valuepresent)
break
else:
indexrow = indexrow+1
indexvalue = context.driver.find_elements_by_xpath(
tablexpath+"/descendant::tr/td[1]")
for valuescolumn in indexvalue:
valuepresentcolumn = valuescolumn.text
print("Team text present here::" +
valuepresentcolumn+"columnName::"+rowName)
print(indexcolumn)
if valuepresentcolumn.find(rowName) != -1:
print("current column"+str(indexcolumn) +
"value"+valuepresentcolumn)
break
else:
indexcolumn = indexcolumn+1
print("index column"+str(indexcolumn))
print(tablexpath +"//descendant::tr["+str(indexcolumn)+"]/td["+str(indexrow)+"]")
#lookupelement = context.driver.find_element_by_xpath(tablexpath +"//descendant::tr["+str(indexcolumn)+"]/td["+str(indexrow)+"]")
#print(lookupelement.text)
return context.driver.find_elements_by_xpath(tablexpath+"//descendant::tr["+str(indexcolumn)+"]/td["+str(indexrow)+"]")
def LookupValueFromColumnTwoKeyssss(context, tablexpath, rowName, columnName, columnName1):
print("element present readData From Table")
element = context.driver.find_elements_by_xpath(
tablexpath+"/descendant::th")
indexrow = 1
indexcolumn = 1
indexcolumn1 = 1
for values in element:
valuepresent = values.text
print("text present here::"+valuepresent)
indexrow = indexrow+1
if valuepresent == columnName:
print("current row value"+str(indexrow)+"value"+valuepresent)
break
for values in element:
valuepresent = values.text
print("text present here::"+valuepresent)
indexrow = indexrow+1
if valuepresent.find(columnName1) != -1:
print("current row value"+str(indexrow)+"value"+valuepresent)
break
indexvalue = context.driver.find_elements_by_xpath(
tablexpath+"/descendant::tr/td[1]")
for valuescolumn in indexvalue:
valuepresentcolumn = valuescolumn.text
print("Team text present here::"+valuepresentcolumn)
print(indexcolumn)
indexcolumn = indexcolumn+1
if valuepresent.find(rowName) != -1:
print("current column"+str(indexcolumn) +
"value"+valuepresentcolumn)
break
print("indexrow"+str(indexrow))
print("index column"+str(indexcolumn))
lookupelement = context.driver.find_element_by_xpath(
tablexpath+"//descendant::tr["+str(indexcolumn)+"]/td["+str(indexrow)+"]")
print(tablexpath +
"//descendant::tr["+str(indexcolumn)+"]/td["+str(indexrow)+"]")
print(lookupelement.text)
return context.driver.find_element_by_xpath(tablexpath+"//descendant::tr["+str(indexrow)+"]/td["+str(indexcolumn)+"]")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Web Scraping: getting KeyError when parsing JSON in Python - python

Your first object is fine, but it's clear that your second object has no locations key anywhere, nor full_address.

Related

Python Returning An Empty List Using Beautiful Soup HTML Parsing

Get JSON data with Django

Where I can get payer_id for executing payment (paypal)?

Get headlines from web archive

python BeautifulSoup parsing table

Categories

Resources