How to upgrade IBM Watson's Python version? - python

Homework is a python notebook project in Watson. Homework provides below codes for function get_basketball_stats(link="..."). However it return erroneous result: dictionary's value and key are dis-matched, i.e. Key "PPG" is given "GP"'s values.
I tried the same codes in google Colab. The result is correct. Google colab python version is 3.6.7. I suspect that the outdated python version in Watson (3.5.5) causes the erroneous dictionary, and hence I ask the question here: how to upgrade Watson's python version?
def get_basketball_stats(link='https://en.wikipedia.org/wiki/Michael_Jordan'):
# read the webpage
response = requests.get(link)
# create a BeautifulSoup object to parse the HTML
soup = bs4.BeautifulSoup(response.text, 'html.parser')
# the player stats are defined with the attribute CSS class set to 'wikitable sortable';
#therefore we create a tag object "table"
table=soup.find(class_='wikitable sortable')
#the headers of the table are the first table row (tr) we create a tag object that has the first row
headers=table.tr
#the table column names are displayed as an abbreviation; therefore we find all the abbr tags and returs an Iterator
titles=headers.find_all("abbr")
#we create a dictionary and pass the table headers as the keys
data = {title['title']:[] for title in titles}
#we will store each column as a list in a dictionary, the header of the column will be the dictionary key
#we iterate over each table row by fining each table tag tr and assign it to the objed
for row in table.find_all('tr')[1:]:
#we iterate over each cell in the table, as each cell corresponds to a different column we all obtain the correspondin key corresponding the column n
for key,a in zip(data.keys(),row.find_all("td")[2:]):
# we append each elment and strip any extra HTML contnet
data[key].append(''.join(c for c in a.text if (c.isdigit() or c == ".")))
# we remove extra rows by finding the smallest list
Min=min([len(x) for x in data.values()])
#we convert the elements in the key to floats
for key in data.keys():
data[key]=list(map(lambda x: float(x), data[key][:Min]))
return data
I expect the keys to match their corresponding values in Watson like Google Colad does.

Related

Making lists store all data of the loop and not only last one

I want to store the JSON I get from an API, but only get the JSON of the last loop. How to get the lists dynamic? Also I need to use the last query (Pandas) but it's not working.
Last how to make an API to :
List latest forecast for each location for every day.
List average the_temp of last 3 forecasts for each location for every day.
Get the top n locations based on each available metric where n is a parameter given in the API call.
import requests
import json
import sqlite3
import pandas as pd #library for data frame
print(sqlite3.sqlite_version)
for x in range(20,28): # i need to get LONDjson/BERLjson/SANjson lists dynamic to store bot 7 jsons from each urls
r = requests.get('https://www.metaweather.com/api/location/44418/2021/4/'+str(x)+'/') #GET request from the source url
LONDjson=r.json() #JSON object of the result
r2 = requests.get('https://www.metaweather.com//api/location/2487956/2021/4/'+str(x)+'/')
SANjson=r2.json()
r3 = requests.get('https://www.metaweather.com//api/location/638242/2021/4/'+str(x)+'/')
BERLjson=r3.json()
conn= sqlite3.connect('D:\weatherdb.db') #create db in path
cursor = conn.cursor()
#import pprint
#pprint.pprint(LONDjson)
cursor.executescript('''
DROP TABLE IF EXISTS LONDjson;
DROP TABLE IF EXISTS SANjson;
DROP TABLE IF EXISTS BERLjson;
CREATE TABLE LONDjson (id int, data json);
''');
for LOND in LONDjson:
cursor.execute("insert into LONDjson values (?, ?)",
[LOND['id'], json.dumps(LOND)])
conn.commit()
z=cursor.execute('''select json_extract(data, '$.id', '$.the_temp', '$.weather_state_name', '$.applicable_date' ) from LONDjson;
''').fetchall() #query the data
hint: in your initial for loop you are not storing the results of api call. you are storing in variable but that is just getting re-written each loop.
a common solution for this starting with empty list that you append to. where perhaps if storing mutliple variables you are storing a dictionary as elements of list
example
results = []
for x in range(10):
results.append(
{
'x': x,
'x_sqaured': x*x,
'abs_x': abs(x)
}
)
print(results)
It looks like there's at least two things that can be improved in the data manipulation part of your code.
Using an array to store the retrieved data
LONDjson = []
SANjson = []
BERLjson = []
for x in range(20,28):
r = requests.get('https://www.metaweather.com/api/location/44418/2021/4/'+str(x)+'/')
LONDjson.append(r.json())
r2 = requests.get('https://www.metaweather.com//api/location/2487956/2021/4/'+str(x)+'/')
SANjson.append(r2.json())
r3 = requests.get('https://www.metaweather.com//api/location/638242/2021/4/'+str(x)+'/')
BERLjson.append(r3.json())
Retrieving the data from the array
# The retrieved data is a dictionary inside a list with only one entry
for LOND in LONDjson:
print(LOND[0]['id'])
Hope this helps you out.

Loop in web Table

I am new using python and I am trying to get some values from a table in a webpage, I need to get the values in yellow from the web page:
I have this code, it is getting all the values in the "Instruments" column but I don't know how to get the specific values:
body = soup.find_all("tr")
for Rows in body:
RowValue = Rows.find_all('th')
if len(RowValue) > 0:
CellValue = RowValue[0]
ThisWeekValues.append(CellValue.text)
any suggestion?
ids = driver.find_elements_by_xpath('//*[#id]')
if 'Your element id` in ids:
Do something
One of the ways could be this, since only id is different.

petl convert data from duplicate entries

Am trying to use petl library to build an ETL process that copied data between two tables. The table contain a unique slug field on the destination. For that, I wrote my script so It would identify duplicate slugs and convert them with by appending ID to the slug value.
table = etl.fromdb(source_con, 'SELECT * FROM user')
# get whatever remains as duplicates
duplicates = etl.duplicates(table, 'slug')
for dup in [i for i in duplicates.values('id')]:
table = etl.convert(
table,
'slug',
lambda v, row: '{}-{}'.format(slugify_unicode(v), str(row.id).encode('hex')),
where=lambda row: row.id == dup,
pass_row=True
)
The above did not work as expected, it seems like the table object remains with duplicate values after the loop.
Anyone can advise?
Thanks

ParsePy - Filter results that contain string

I have a table stored in Parse.com, and I'm using ParsePy to get and filter the data in my Python Django program.
My table has three columns, objectId (string), name (string), and type (array). I want to query the name column and return any objects that contain the partial term xyz. For example, if I search for amp, and there's a row where name: Example Name this row should be returned.
Here's my code so far:
def searchResults(self, searchTerm):
register('parseKey', 'parseRestKey')
myParseObject = ParseObject()
allData = myParseObject.Query.filter(name = searchTerm)
return allData
The problem with this code is it only works if searchTerm is exactly the same as what's in the name column. The Parse REST API says that the queries accept regex parameters, but I'm not sure how to use them in ParsePy.
Yes, you have to use regex for that. And, this is how it would be:
allData = myParseObject.Query.filter(name__regex = "<your_regex>")

Writing full contents of Pandas dataframe to HTML table

I am embedding links in one column of a Pandas dataframe (table, below) and writing the dataframe to hmtl.
Links in the dataframe table are formatted as shown (indexing first link in table):
In: table.loc[0,'Links']
Out: u'I6'
If I view (rather than index a specific row) the dataframe (in notebook), the link text is truncated:
<a href="http://xxx.xx.xxx.xxx/browser/I6.html...
I write the dataframe to html:
table_1=table.to_html(classes='table',index=False,escape=False)
But, the truncated link (rather than the full text) is written to the html table:
<td> <a href="http://xxx.xx.xxx.xxx/browser/I6.html...</td>\n
I probably need an additional parameter for to_html().
Look at the documentation now, but advice appreciated:
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.to_html.html
Thanks!
So there is probably a pandas-specific explanation, but you could also work around the problem by (a) replacing the links with a key value, (b) writing the html table string, and then (c) replacing the keys with the appropriate links.
For example, replace each link with a key, storing the keys in a dict:
map = {}
for i in df.index:
counter = 0
if df.ix[i]['Links'] in map:
df.ix[i, 'Links'] = map[df.ix[i]['Links']]
else:
map[df.ix[i, 'Links']] = 'href' + str(counter)
counter += 1
df.ix[i, 'Links'] = map[df.ix[i]['Links']]
Write the table:
table_1 = df.to_html(classes='table',index=False,escape=False)
Re-write the links:
for key, value in map.iteritems():
table_1 = table_1.replace(value, key)

Categories