I want to extract website names from the url. For e.g. https://plus.google.com/in/test.html
should give the output as - "plus google"
Some more testcases are -
WWW.OH.MADISON.STORES.ADVANCEAUTOPARTS.COM/AUTO_PARTS_MADISON_OH_7402.HTML
Output:- OH MADISON STORES ADVANCEAUTOPARTS
WWW.LQ.COM/LQ/PROPERTIES/PROPERTYPROFILE.DO?PROPID=6054
Output:- LQ
WWW.LOCATIONS.DENNYS.COM
Output:- LOCATIONS DENNYS
WV.WESTON.STORES.ADVANCEAUTOPARTS.COM
Output:- WV WESTON STORES ADVANCEAUTOPARTS
WOODYANDERSONFORDFAYETTEVILLE.NET/
Output:- WOODYANDERSONFORFAYETTEVILLE
WILMINGTONMAYFAIRETOWNCENTER.HGI.COM
Output:- WILMINGTONMAYFAIRETOWNCENTER HGI
WHITEHOUSEBLACKMARKET.COM/
Output:- WHITEHOUSEBLACKMARKET
WINGATEHOTELS.COM
Output:- WINGATEHOTELS
string = str(input("Enter the url "))
new_list = list(string)
count=0
flag=0
if 'w' in new_list:
index1 = new_list.index('w')
new_list.pop(index1)
count += 1
if 'w' in new_list:
index2 = new_list.index('w')
if index2 != -1 and index2 == index1:
new_list.pop(index2)
count += 1
if 'w' in new_list:
index3= new_list.index('w')
if index3!= -1 and index3== index2 and new_list[index3+1]=='.':
new_list.pop(index3)
count+=1
flag = 1
if flag == 0:
start = string.find('/')
start += 2
end = string.rfind('.')
new_string=string[start:end]
print(new_string)
elif flag == 1:
start = string.find('.')
start = start + 1
end = string.rfind('.')
new_string=string[start:end]
print(new_string)
The above works for some testcases but not all. Please help me with it.
Thanks
this is something you could build upon; using urllib.parse.urlparse:
from urllib.parse import urlparse
tests = ('https://plus.google.com/in/test.html',
('WWW.OH.MADISON.STORES.ADVANCEAUTOPARTS.COM/'
'AUTO_PARTS_MADISON_OH_7402.HTML'),
'WWW.LQ.COM/LQ/PROPERTIES/PROPERTYPROFILE.DO?PROPID=6054')
def extract(url):
# urlparse will not work without a 'scheme'
if not url.startswith('http'):
url = 'http://' + url
parsed = urlparse(url).netloc
split = parsed.split('.')[:-1] # get rid of TLD
if split[0].lower() == 'www':
split = split[1:]
ret = ' '.join(split)
return ret
for url in tests:
print(extract(url))
The function strips the url from the double slash to the single slash:
the rest is 'clean up'
def stripURL( url, TwoSlashes, OneSlash ):
try:
start = url.index(TwoSlashes) + len(TwoSlashes)
end = url.index( OneSlash, start )
return url[start:end]
except ValueError:
return ""
url= raw_input("URL : ")
if "www." in url:url=url.replace("www.","")
Strip = stripURL( url, "//", "/" )
# Strips anything after the last period found
Stripped = Strip[:Strip.rfind(".")]
# get rid of the any periods used in the name
Stripped = Stripped.replace("."," ")
print Stripped
Related
I have two lists of multiline strings and I try to get the the diff lines for these strings. First I tried to just split all lines of each string and handled all these strings as one big "file" and get the diff for it but I had a lot of bugs. I cannot just diff by index since I do not know, which multiline string was added, which was deleted and which one was modified.
Lets say I had the following example:
import difflib
oldList = ["one\ntwo\nthree","four\nfive\nsix","seven\neight\nnine"]
newList = ["four\nfifty\nsix","seven\neight\nnine","ten\neleven\ntwelve"]
oldAllTogether = []
for string in oldList:
oldAllTogether.extend(string.splitlines())
newAllTogether = []
for string in newList:
newAllTogether.extend(string.splitlines())
diff = difflib.unified_diff(oldAllTogether,newAllTogether)
So I somehow have to find out, which strings belong to each other.
I had to implmenent my own code in order to get the desired output. It is basically the same as Differ.compare() with the difference that we have a look at multiline blocks instead of lines. So the code would be:
diffString = ""
oldList = ["one\ntwo\nthree","four\nfive\nsix","seven\neight\nnine"]
newList = ["four\nfifty\nsix","seven\neight\nnine","ten\neleven\ntwelve"]
a = oldList
b = newList
cruncher = difflib.SequenceMatcher(None, a, b)
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if tag == 'replace':
best_ratio, cutoff = 0.74, 0.75
oldstrings = a[alo:ahi]
newstrings = b[blo:bhi]
for j in range(len(newstrings)):
newstring = newstrings[j]
cruncher.set_seq2(newstring)
for i in range(len(oldstrings)):
oldstring = oldstrings[i]
cruncher.set_seq1(oldstring)
if cruncher.real_quick_ratio() > best_ratio and \
cruncher.quick_ratio() > best_ratio and \
cruncher.ratio() > best_ratio:
best_ratio, best_old, best_new = cruncher.ratio(), i, j
if best_ratio < cutoff:
#added string
stringLines = newstring.splitlines()
for line in stringLines: diffString += "+" + line + "\n"
else:
#replaced string
start = False
for diff in difflib.unified_diff(oldstrings[best_old].splitlines(),newstrings[best_new].splitlines()):
if start:
diffString += diff + "\n"
if diff[0:2] == '##':
start = True
del oldstrings[best_old]
#deleted strings
stringLines = []
for string in oldstrings:
stringLines.extend(string.splitlines())
for line in stringLines: diffString += "-" + line + "\n"
elif tag == 'delete':
stringLines = []
for string in a[alo:ahi]:
stringLines.extend(string.splitlines())
for line in stringLines:
diffString += "-" + line + "\n"
elif tag == 'insert':
stringLines = []
for string in b[blo:bhi]:
stringLines.extend(string.splitlines())
for line in stringLines:
diffString += "+" + line + "\n"
elif tag == 'equal':
continue
else:
raise ValueError('unknown tag %r' % (tag,))
which result in the following:
print(diffString)
four
-five
+fifty
six
-one
-two
-three
+ten
+eleven
+twelve
Given an initial list of URLs crawled from a site:
https://somesite.com/
https://somesite.com/advertise
https://somesite.com/articles
https://somesite.com/articles/read
https://somesite.com/articles/read/1154
https://somesite.com/articles/read/1155
https://somesite.com/articles/read/1156
https://somesite.com/articles/read/1157
https://somesite.com/articles/read/1158
https://somesite.com/blogs
I am trying to turn the list into a tab-organized tree hierarchy:
https://somesite.com
/advertise
/articles
/read
/1154
/1155
/1156
/1157
/1158
/blogs
I've tried using lists, tuples, and dictionaries. So far I have figured out two flawed ways to output the content.
Method 1 will miss elements if they have the same name and position in the hierarchy:
Input:
https://somesite.com
https://somesite.com/missions
https://somesite.com/missions/playit
https://somesite.com/missions/playit/extbasic
https://somesite.com/missions/playit/extbasic/0
https://somesite.com/missions/playit/stego
https://somesite.com/missions/playit/stego/0
Output:
https://somesite.com/
/missions
/playit
/extbasic
/0
/stego
----------------^ Missing expected output "/0"
Method 2 will not miss any elements, but it will print redundant content:
Input:
https://somesite.com
https://somesite.com/missions
https://somesite.com/missions/playit
https://somesite.com/missions/playit/extbasic
https://somesite.com/missions/playit/extbasic/0
https://somesite.com/missions/playit/stego
https://somesite.com/missions/playit/stego/0
Output:
https://somesite.com/
/missions
/playit
/extbasic
/0
/missions <- Redundant content
/playit <- Redundant content
/stego
/0
I'm not sure how to properly do this, and my googling has only turned up references to urllib that don't seem to be what I need. Perhaps there is a much better approach, but I have been unable to find it.
My code for getting the content into a usable list:
#!/usr/bin/python3
import re
# Read the original list of URLs from file
with open("sitelist.raw", "r") as f:
raw_site_list = f.readlines()
# Extract the prefix and domain from the first line
first_line = raw_site_list[0]
prefix, domain = re.match("(http[s]://)(.*)[/]" , first_line).group(1, 2)
# Remove instances of prefix and domain, and trailing newlines, drop any lines that are only a slash
clean_site_list = []
for line in raw_site_list:
clean_line = line.strip(prefix).strip(domain).strip()
if not clean_line == "/":
if not clean_line[len(clean_line) - 1] == "/":
clean_site_list += [clean_line]
# Split the resulting relative paths into their component parts and filter out empty strings
split_site_list = []
for site in clean_site_list:
split_site_list += [list(filter(None, site.split("/")))]
This gives a list to manipulate, but I've run out of ideas on how to output it without losing elements or outputting redundant elements.
Thanks
Edit: This is the final working code I put together based on the answer chosen below:
# Read list of URLs from file
with open("sitelist.raw", "r") as f:
urls = f.readlines()
# Remove trailing newlines
for url in urls:
urls[urls.index(url)] = url[:-1]
# Remove any trailing slashes
for url in urls:
if url[-1:] == "/":
urls[urls.index(url)] = url[:-1]
# Remove duplicate lines
unique_urls = []
for url in urls:
if url not in unique_urls:
unique_urls += [url]
# Do the actual work (modified to use unique_urls and use tabs instead of 4x spaces, and to write to file)
base = unique_urls[0]
tabdepth = 0
tlen = len(base.split('/'))
final_urls = []
for url in unique_urls[1:]:
t = url.split('/')
lt = len(t)
if lt != tlen:
tabdepth += 1 if lt > tlen else -1
tlen = lt
pad = ''.join(['\t' for _ in range(tabdepth)])
final_urls += [f'{pad}/{t[-1]}']
with open("sitelist.new", "wt") as f:
f.write(base + "\n")
for url in final_urls:
f.write(url + "\n")
This works with your sample data:
urls = ['https://somesite.com',
'https://somesite.com/missions',
'https://somesite.com/missions/playit',
'https://somesite.com/missions/playit/extbasic',
'https://somesite.com/missions/playit/extbasic/0',
'https://somesite.com/missions/playit/stego',
'https://somesite.com/missions/playit/stego/0']
base = urls[0]
print(base)
tabdepth = 0
tlen = len(base.split('/'))
for url in urls[1:]:
t = url.split('/')
lt = len(t)
if lt != tlen:
tabdepth += 1 if lt > tlen else -1
tlen = lt
pad = ''.join([' ' for _ in range(tabdepth)])
print(f'{pad}/{t[-1]}')
This code will help you in your task. I agree this code might be a bit large and might contain some redundant codes and checks but this will create a dictionary containing hierarchy of the urls, you can use that dictionary however you like, print it or store it.
More over this code will also parse different urls and create a seprate tree of them (see code and output)
EDIT: This will also take care of the redundant urls
Code:
from json import dumps
def process_urls(urls: list):
tree = {}
for url in urls:
url_components = url.split("/")
# First three components will be the protocol
# an empty entry
# and the base domain
base_domain = url_components[:3]
base_domain = base_domain[0] + "//" + "".join(base_domain[1:])
# Add base domain to tree if not there.
try:
tree[base_domain]
except:
tree[base_domain] = {}
structure = url_components[3:]
for i in range(len(structure)):
# add the first element
if i == 0 :
try:
tree[base_domain]["/"+structure[i]]
except:
tree[base_domain]["/"+structure[i]] = {}
else:
base = tree[base_domain]["/"+structure[0]]
for j in range(1, i):
base = base["/"+structure[j]]
try:
base["/"+structure[i]]
except:
base["/"+structure[i]] = {}
return tree
def print_tree(tree: dict, depth=0):
for key in tree.keys():
print("\t"*depth+key)
# redundant checks
if type(tree[key]) == dict:
# if dictionary is empty then do nothing
# else call this function recuressively
# increase depth by 1
if tree[key]:
print_tree(tree[key], depth+1)
if __name__ == "__main__":
urls = [
'https://somesite.com',
'https://somesite.com/missions',
'https://somesite.com/missions/playit',
'https://somesite.com/missions/playit/extbasic',
'https://somesite.com/missions/playit/extbasic/0',
'https://somesite.com/missions/playit/extbasic/0',
'https://somesite.com/missions/playit/extbasic/0',
'https://somesite.com/missions/playit/extbasic/0',
'https://somesite.com/missions/playit/stego',
'https://somesite.com/missions/playit/stego/0',
'https://somesite2.com/missions/playit',
'https://somesite2.com/missions/playit/extbasic',
'https://somesite2.com/missions/playit/extbasic/0',
'https://somesite2.com/missions/playit/stego',
'https://somesite2.com/missions/playit/stego/0'
]
tree = process_urls(urls)
print_tree(tree)
Output:
https://somesite.com
/missions
/playit
/extbasic
/0
/stego
/0
https://somesite2.com
/missions
/playit
/extbasic
/0
/stego
/0
On a personal whim I have written some code to search for the shortest series of links between any two Wikipedia articles. It turned out to be very brute force and takes a long long time to find the goal if it's more than a link or two deep, but it works! I will eventually keep track of and make use of the link paths and stuff, but I wanted to get the search working optimally first. Is there a faster way to do this or a good way to cut some major corners here?
import urllib2
from bs4 import BeautifulSoup
Start = 'http://en.wikipedia.org/wiki/Alan_Reid_%28politician%29'
End = 'http://en.wikipedia.org/wiki/Ayr'
#Using BeautifulSoup, this grabs the page
def soup_request(target):
request = urllib2.Request(target)
request.add_header("User-Agent", "Mozilla/5.0")
page = urllib2.urlopen(target)
soup = BeautifulSoup(page)
return soup
#This will grab all Wiki links off a given page
def get_links(Start):
soup = soup_request(Start)
Wiki_links = []
#Finds all links
for url in soup.findAll('a'):
result = url.get('href')
try:
if str(result)[:5] == '/wiki':
Wiki_links.append(result)
except:
pass
for q in range(len(Wiki_links)):
Wiki_links[q] = 'http://en.wikipedia.org'+str(Wiki_links[q])
print "Got new links from",Start
return Wiki_links
#This will check all the given links to see if the title matches the goal webpage
def check_links(Links,End):
goalsoup = soup_request(End)
goaltitle = goalsoup.html.title
Found = False
count = 0
for q in Links:
if Found:
break
length = len(Links)
#Runs through all the given links and checks their titles for correct one
if q is not None:
count += 1
soup = soup_request(q)
print "Checked",count,"links out of",length
try:
title = soup.html.head.title
if title == goaltitle:
Found = True
print "Found it!"
break
except:
print 'doh'
pass
return Found
#Top function to do all the stuff in the right order, applying a maximum depth of how deep into the links
def wiki_crawl(Start, End, depth):
Old_Links = [Start]
count = depth
while count > 0:
New_Links = []
for q in range(len(Old_Links)):
New_Links.extend(get_links(Old_Links[q]))
Found = check_links(New_Links,End)
if Found:
print "All done."
break
Old_Links = New_Links
count -= 1
print "_______________________________________________________________ROUND DONE"
if not Found:
print "Did not find the page, you must go deeper!"
wiki_crawl(Start, End, 2)
Here are some functions to take info from wiki. The only problems with it is that sometimes it takes out a space from the info on the webpage.
def take_out_parenthesis(st):
string = list(st)
for a in string:
if a == '(':
del string[st.find(a)]
if a == ')':
del string[st.find(a) - 1]
return ''.join(string)
def take_out_tags(string):
st = list(string)
odd = ['<', '>']
times = 0
for a in string:
if a in odd:
times += 1
times /= 2
for b in range(times):
start = string.find('<') - 1
end = string.find('>')
bet = end - start + 1
for a in range(bet):
del st[start]
string = ''.join(st)
return string
def take_out_brackets(string):
st = list(string)
odd = ['[', ']']
times = 0
for a in string:
if a in odd:
times += 1
times /= 2
for b in range(times):
start = string.find('[') - 1
end = string.find(']')
bet = end - start + 1
for a in range(bet):
del st[start]
string = ''.join(st)
return string
def take_from_web_page(text):
n = 0
url = text.replace(" ", "_")
search = "http://en.wikipedia.org/wiki/%s" % url
page = urllib2.urlopen(search).read()
start = page.find('<p><b>') + 6
end = page.find('</a>.', start) + 5
new_page = page[start:end]
for a in new_page:
if a == '<':
if new_page[n - 1] != ' ':
lst = list(new_page)
lst.insert(n, ' ')
new_page = ''.join(lst)
n += 1
n += 1
return take_out_parenthesis(take_out_brackets(take_out_tags(new_page)))
def isexact(pat):
for c in pat.upper():
if c not in 'ATGC':
return 0
return 1
def print_matches(ofh, enz, matches):
if matches:
print >>ofh, "Enzyme %s matches at:" % enz,
for m in matches:
print >>ofh, m,
print >>ofh
else:
print >>ofh, "No match found for enzyme %s." % enz
def get_site_only(pat):
newpat = ""
for c in pat:
if c.isalpha():
newpat += c
return newpat
def findpos(seq, pat):
matches = []
current_match = seq.find(pat)
while current_match != -1:
matches.append(current_match)
current_match =seq.find(pat, current_match+1)
return matches
seq = ""
ifh = open("C:\Python27\\link_cutzymes.txt",'r')
ofh = open("C:\Python27\\re-en-output.txt", "w")
line = ifh.readline()
while line:
fields = line.split()
name = fields[0]
pat = get_site_only(fields[2])
if isexact(pat):
print_matches(ofh, name, findpos(seq, pat))
line = ifh.readline()
else:
line = ifh.readline()
ofh.close()
ifh.close()
it is showing list index error can help me
Traceback (most recent call last): File
"C:/Users/ram/Desktop/rest_enz7.py", line 55, in
name = fields[0] IndexError: list index out of range
name = fields[0] - you probably are reading an empty line, splitting it, and accessing it at index 0, which is out of range for an empty list..
you can make sure your file contains only lines of your format, check for empty lines in the code, or use try and except to name a few options.
while reading the data from file,if data is not exist to split,it will not convert into list. I can see in your code name = fields[0] is causing error.
At that time please use try and except in your code.
you can rewrite the code as :
try:
fields = line.split()
name = fields[0]
except:
pass
What a string[x] does is get the xth letter of the list. This means that if there is no object in the xth position then you get an error.
So if name = fields[0] returns an error then fieldsmust be an empty list (It would look like this: []) because there is no first object (Python counts from zero so letter 0 is letter 1, letter 1 is letter 2 and so on). You can fix this with a try: and except: like so:
try:
name = fields[0]
except:
name = '' #Or whatever code you want to run if it fails
In the place of name = fields[0]
I am using python 2.7 and i have a problem that i haven't encountered before, when i print a certain string and then a variable on the same line the variable is printed over the string. e.g. the script is coded like so print 'IP Rating = ', ipRating and the output in command prompt will be 'IP20ating = '. I have no idea why this is happening but i have the same code for various variables and string in the same script and they all come out as expected, i have tried renaming the variable and changing the string but there is still no difference, has anybody encoutered this error before or have any ideas why this might be happening? i can post the code if requested.
Many thanks :)
EDIT
Here is the code - I know i may have repeated myself a few times and there are unneccessary library's in there but the way i work is by importing all libraries i might need and then removing unnecessary code at the end.
from bs4 import BeautifulSoup as Soup
from bs4 import BeautifulSoup
from urllib import urlopen
import webbrowser
import httplib
import urllib2
import urllib
import string
import mylib
import xlrd
import glob
import xlwt
import bs4
import sys
import os
import re
print '\nStarting Web Search'
found = False
while found == False:
excelFile = "F:\\len\\web sheets completed\\csv formatted\\imported\\re-imported\\Import Corrections\\saxby web spreadsheet.xls"
try:
inFi = xlrd.open_workbook(excelFile)
found = True
except IOError:
print 'File not found.'
inFi = xlrd.open_workbook(excelFile)
inWS = inFi.sheet_by_index(0)
headers = mylib.getHeader(inWS)
supplyHead = mylib.findHeader('Supplier Part Ref', headers)
saxbeginurl = "http://www.saxbylighting.com/index.php?pg=search&ser="
badLink = "index.php?pg=search&ser=10180&next=0"
resLink = "http://www.saxbylighting.com/images/ProductImages/Zoomed/"
overCount = 0
for t in range(524,534):
projection = 0
ipRating = 0
diameter = 0
width = 0
weight = 0
length = 0
height = 0
i = 0
w = 0
l = 0
h = 0
d = 0
p = 0
x = 0
iP = 0
wei = 0
imgStock = str(inWS.cell(t, supplyHead).value.encode('latin-1'))
overCount = overCount + 1
print '\n',imgStock
if imgStock == '3TRAWI':
url = 'http://www.saxbylighting.com/index.php?pg=details&prod=53'
elif imgStock == '10313':
url = 'http://www.saxbylighting.com/index.php?pg=details&prod=204'
else:
url = saxbeginurl + imgStock
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
img_tags = soup.find_all("img")
the_image_tag = soup.find("img", src='/images/dhl_logo.png')
try:
for dataSheet in soup.find('div',{'class':'panes'}):
#print dataSheet, ' -- ', str(i)
i = i + 1
if i == 4:
reqData = str(dataSheet).split('<img', 1)[0]
first_Data = reqData.replace('<br/>','\n')
second_Data = first_Data.replace('<b>','')
third_Data = second_Data.replace('</b>','')
fourth_Data = third_Data.replace(':',': ')
dataList = fourth_Data.split('\n')
#print dataList
for information in dataList:
if 'Weight' in dataList[wei]:
pre_Weight = dataList[wei]
sec_weight = str(pre_Weight).replace('Weight :','')
weight = sec_weight.replace(' ','')
wei += 1
if 'IP' in dataList[iP]:
ipRating = str(dataList[iP])
iP += 1
for product_Dimensions in dataList:
if 'Product dimensions :' in dataList[x]:
#print dataList[x]
dimensionList = str(dataList[x]).replace('mm','mm:')
#print dimensionList
prelim_Dimensions = dimensionList.replace('Product dimensions :','')
first_Dimensions = prelim_Dimensions.replace('cm','0mm')
sec_Dimensions = first_Dimensions.replace(' ',' ')
third_Dimensions = sec_Dimensions.strip()
dimenList = third_Dimensions.split('mm:')
#print dimenList
for project in dimenList:
if 'Proj' in dimenList[p]:
pre_pro = str(dimenList[p]).replace('Proj','')
sec_pro = pre_pro.replace(':','')
thro_pro = sec_pro.replace(' ','')
projection = thro_pro
elif p == len(dimenList):
print 'Projection not found'
p += 1
for diamet in dimenList:
if 'dia' in dimenList[d]:
pre_dia = str(dimenList[d]).replace('dia','')
sec_dia = pre_dia.replace(':','')
third_dia = sec_dia.replace(' ','')
diameter = third_dia
elif d == len(dimenList):
print 'Diameter not found'
d += 1
for heig in dimenList:
if 'H:' in dimenList[h]:
pre_hei = str(dimenList[h]).replace('H','')
sec_hei = pre_hei.replace(':','')
third_hei = sec_hei.replace(' ','')
height = third_hei
elif h == len(dimenList):
print 'Height not found'
h += 1
for lent in dimenList:
if 'L:' in dimenList[l]:
pre_leng = str(dimenList[l]).replace('L','')
sec_leng = pre_leng.replace(':','')
third_leng = sec_leng.replace(' ','')
length = third_leng
elif l == len(dimenList):
print 'Length not found'
l += 1
for wid in dimenList:
if 'W:' in dimenList[w]:
pre_wid = str(dimenList[w]).replace('W','')
sec_wid = pre_wid.replace(':','')
third_wid = sec_wid.replace(' ','')
width = third_wid
elif w == len(dimenList):
print 'Width not found'
w += 1
x += 1
print 'IP Rating = ', ipRating
print 'Weight = ', weight
print 'Projection = ', projection, 'mm'
print 'Diameter = ',diameter, 'mm'
print 'Length = ',length, 'mm'
print 'Height = ',height, 'mm'
print 'Width = ',width, 'mm'
except TypeError:
print 'Type Error... skipping this product and carrying on.'
Here is an example output
IP44ating =
Weight = .51KGS
Projection = 35 mm
Diameter = 0 mm
Length = 0 mm
Height = 90 mm
Width = 120 mm
I strongly suspect that your data ipRating that you think is IP20 is actually \rIP20. That is: that you have a stray 0x13 carriage return character in there at the start of the variable. The carriage return character is moving the print position to the start of the line and then the variable is overwriting what you printed before.
You can test whether this is the problem by adding the line:
ipRating = ipRating.replace("\r", "")
before your print statement.
This is the proper way to do what you're doing.
print('IP Rating = %s' % ipRating)
or
print('IP Rating = %d' % ipRating)
That is just one example from all the print statements you have at the end of your code.
If you're putting a string variable in print, use a %s or otherwise use a %d. If you have any more questions just ask.