I am embedding links in one column of a Pandas dataframe (table, below) and writing the dataframe to hmtl.
Links in the dataframe table are formatted as shown (indexing first link in table):
In: table.loc[0,'Links']
Out: u'I6'
If I view (rather than index a specific row) the dataframe (in notebook), the link text is truncated:
<a href="http://xxx.xx.xxx.xxx/browser/I6.html...
I write the dataframe to html:
table_1=table.to_html(classes='table',index=False,escape=False)
But, the truncated link (rather than the full text) is written to the html table:
<td> <a href="http://xxx.xx.xxx.xxx/browser/I6.html...</td>\n
I probably need an additional parameter for to_html().
Look at the documentation now, but advice appreciated:
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.to_html.html
Thanks!
So there is probably a pandas-specific explanation, but you could also work around the problem by (a) replacing the links with a key value, (b) writing the html table string, and then (c) replacing the keys with the appropriate links.
For example, replace each link with a key, storing the keys in a dict:
map = {}
for i in df.index:
counter = 0
if df.ix[i]['Links'] in map:
df.ix[i, 'Links'] = map[df.ix[i]['Links']]
else:
map[df.ix[i, 'Links']] = 'href' + str(counter)
counter += 1
df.ix[i, 'Links'] = map[df.ix[i]['Links']]
Write the table:
table_1 = df.to_html(classes='table',index=False,escape=False)
Re-write the links:
for key, value in map.iteritems():
table_1 = table_1.replace(value, key)
Related
I am new using python and I am trying to get some values from a table in a webpage, I need to get the values in yellow from the web page:
I have this code, it is getting all the values in the "Instruments" column but I don't know how to get the specific values:
body = soup.find_all("tr")
for Rows in body:
RowValue = Rows.find_all('th')
if len(RowValue) > 0:
CellValue = RowValue[0]
ThisWeekValues.append(CellValue.text)
any suggestion?
ids = driver.find_elements_by_xpath('//*[#id]')
if 'Your element id` in ids:
Do something
One of the ways could be this, since only id is different.
Homework is a python notebook project in Watson. Homework provides below codes for function get_basketball_stats(link="..."). However it return erroneous result: dictionary's value and key are dis-matched, i.e. Key "PPG" is given "GP"'s values.
I tried the same codes in google Colab. The result is correct. Google colab python version is 3.6.7. I suspect that the outdated python version in Watson (3.5.5) causes the erroneous dictionary, and hence I ask the question here: how to upgrade Watson's python version?
def get_basketball_stats(link='https://en.wikipedia.org/wiki/Michael_Jordan'):
# read the webpage
response = requests.get(link)
# create a BeautifulSoup object to parse the HTML
soup = bs4.BeautifulSoup(response.text, 'html.parser')
# the player stats are defined with the attribute CSS class set to 'wikitable sortable';
#therefore we create a tag object "table"
table=soup.find(class_='wikitable sortable')
#the headers of the table are the first table row (tr) we create a tag object that has the first row
headers=table.tr
#the table column names are displayed as an abbreviation; therefore we find all the abbr tags and returs an Iterator
titles=headers.find_all("abbr")
#we create a dictionary and pass the table headers as the keys
data = {title['title']:[] for title in titles}
#we will store each column as a list in a dictionary, the header of the column will be the dictionary key
#we iterate over each table row by fining each table tag tr and assign it to the objed
for row in table.find_all('tr')[1:]:
#we iterate over each cell in the table, as each cell corresponds to a different column we all obtain the correspondin key corresponding the column n
for key,a in zip(data.keys(),row.find_all("td")[2:]):
# we append each elment and strip any extra HTML contnet
data[key].append(''.join(c for c in a.text if (c.isdigit() or c == ".")))
# we remove extra rows by finding the smallest list
Min=min([len(x) for x in data.values()])
#we convert the elements in the key to floats
for key in data.keys():
data[key]=list(map(lambda x: float(x), data[key][:Min]))
return data
I expect the keys to match their corresponding values in Watson like Google Colad does.
Am trying to use petl library to build an ETL process that copied data between two tables. The table contain a unique slug field on the destination. For that, I wrote my script so It would identify duplicate slugs and convert them with by appending ID to the slug value.
table = etl.fromdb(source_con, 'SELECT * FROM user')
# get whatever remains as duplicates
duplicates = etl.duplicates(table, 'slug')
for dup in [i for i in duplicates.values('id')]:
table = etl.convert(
table,
'slug',
lambda v, row: '{}-{}'.format(slugify_unicode(v), str(row.id).encode('hex')),
where=lambda row: row.id == dup,
pass_row=True
)
The above did not work as expected, it seems like the table object remains with duplicate values after the loop.
Anyone can advise?
Thanks
I have a table stored in Parse.com, and I'm using ParsePy to get and filter the data in my Python Django program.
My table has three columns, objectId (string), name (string), and type (array). I want to query the name column and return any objects that contain the partial term xyz. For example, if I search for amp, and there's a row where name: Example Name this row should be returned.
Here's my code so far:
def searchResults(self, searchTerm):
register('parseKey', 'parseRestKey')
myParseObject = ParseObject()
allData = myParseObject.Query.filter(name = searchTerm)
return allData
The problem with this code is it only works if searchTerm is exactly the same as what's in the name column. The Parse REST API says that the queries accept regex parameters, but I'm not sure how to use them in ParsePy.
Yes, you have to use regex for that. And, this is how it would be:
allData = myParseObject.Query.filter(name__regex = "<your_regex>")
I am working on lxml for fetching the html page.
I want to fetch the html table which have the class name as 'class1'.
I have done something like this :
for span in doc.xpath('//table[#class="class1"]'):
print span
But,
after this I found that there are 4 tables in HTML page which have class name as 'class1'.
for example :
table A
table B
table C
table D
these all 4 tables have the same class name.
how I can fetch only table B?
You can just get the second item of list:
result = doc.xpath('//table[#class="class1"]')
if len(result) > 1:
print result[1]
Or if your table has id, you can get it via xpath:
print doc.xpath('//table[#id="you id"]')[0]
I think what you might want here is...
doc.xpath('//table[#class="class1"]')[1]