I have a piece of code as follows. I want to take a header and remove the special symbols " !##$%^&* " from it, but I've tried everything but still can't. Hope everyone can help, thank you very much
try:
title = driver.find_element(By.XPATH,'/html/body/main/section[2]/div/div/article/div[3]/p[1]/span').text
print(title)
if title.count("#") > 0:
titles.append(title)
titles[number] = title[0:title.index('#')]
number += 1
else:
titles.append(title)
number += 1
if titles[number-1] == '':
titles[number-1] = f"Invalid Title"
banned_char = '<>:"/\|?*'
for character in banned_char:
if title.count(character) > 0:
titles[number-1] = title[title.replace('<>:"/\|?*',' ')]
except:
titles.append(f'Failed Title number {number}')
number+=1
print(f'Download {number} have no title.')
I see two mistakes in your code
replace searchs exactly string '<>:"/\|?*' and you should replace every char separatelly .replace('<',' ').replace('>',' ').replace(':',' ') (or run it in for-loop)
you have to assign title = title.replace(), not title[title.replace(...)
banned_char = '<>:"/\|?*'
for character in banned_char:
title = title.replace(character,' ')
# --- after loop ---
titles[number-1] = title
Related
I want to extract website names from the url. For e.g. https://plus.google.com/in/test.html
should give the output as - "plus google"
Some more testcases are -
WWW.OH.MADISON.STORES.ADVANCEAUTOPARTS.COM/AUTO_PARTS_MADISON_OH_7402.HTML
Output:- OH MADISON STORES ADVANCEAUTOPARTS
WWW.LQ.COM/LQ/PROPERTIES/PROPERTYPROFILE.DO?PROPID=6054
Output:- LQ
WWW.LOCATIONS.DENNYS.COM
Output:- LOCATIONS DENNYS
WV.WESTON.STORES.ADVANCEAUTOPARTS.COM
Output:- WV WESTON STORES ADVANCEAUTOPARTS
WOODYANDERSONFORDFAYETTEVILLE.NET/
Output:- WOODYANDERSONFORFAYETTEVILLE
WILMINGTONMAYFAIRETOWNCENTER.HGI.COM
Output:- WILMINGTONMAYFAIRETOWNCENTER HGI
WHITEHOUSEBLACKMARKET.COM/
Output:- WHITEHOUSEBLACKMARKET
WINGATEHOTELS.COM
Output:- WINGATEHOTELS
string = str(input("Enter the url "))
new_list = list(string)
count=0
flag=0
if 'w' in new_list:
index1 = new_list.index('w')
new_list.pop(index1)
count += 1
if 'w' in new_list:
index2 = new_list.index('w')
if index2 != -1 and index2 == index1:
new_list.pop(index2)
count += 1
if 'w' in new_list:
index3= new_list.index('w')
if index3!= -1 and index3== index2 and new_list[index3+1]=='.':
new_list.pop(index3)
count+=1
flag = 1
if flag == 0:
start = string.find('/')
start += 2
end = string.rfind('.')
new_string=string[start:end]
print(new_string)
elif flag == 1:
start = string.find('.')
start = start + 1
end = string.rfind('.')
new_string=string[start:end]
print(new_string)
The above works for some testcases but not all. Please help me with it.
Thanks
this is something you could build upon; using urllib.parse.urlparse:
from urllib.parse import urlparse
tests = ('https://plus.google.com/in/test.html',
('WWW.OH.MADISON.STORES.ADVANCEAUTOPARTS.COM/'
'AUTO_PARTS_MADISON_OH_7402.HTML'),
'WWW.LQ.COM/LQ/PROPERTIES/PROPERTYPROFILE.DO?PROPID=6054')
def extract(url):
# urlparse will not work without a 'scheme'
if not url.startswith('http'):
url = 'http://' + url
parsed = urlparse(url).netloc
split = parsed.split('.')[:-1] # get rid of TLD
if split[0].lower() == 'www':
split = split[1:]
ret = ' '.join(split)
return ret
for url in tests:
print(extract(url))
The function strips the url from the double slash to the single slash:
the rest is 'clean up'
def stripURL( url, TwoSlashes, OneSlash ):
try:
start = url.index(TwoSlashes) + len(TwoSlashes)
end = url.index( OneSlash, start )
return url[start:end]
except ValueError:
return ""
url= raw_input("URL : ")
if "www." in url:url=url.replace("www.","")
Strip = stripURL( url, "//", "/" )
# Strips anything after the last period found
Stripped = Strip[:Strip.rfind(".")]
# get rid of the any periods used in the name
Stripped = Stripped.replace("."," ")
print Stripped
This is an example of what is on the text file that I am searching:
15 - Project `enter code here`Name
APP_IDENTIFIER=ie.example.example
DISPLAY_NAME=Mobile Banking
BUNDLE_VERSION=1.1.1
HEADER_COLOR=#72453h
ANDROID_VERSION_CODE=3
20 - Project Name
APP_IDENTIFIER=ie.exampleTwo.exampleTwp
DISPLAY_NAME=More Mobile Banking
BUNDLE_VERSION=1.2.3
HEADER_COLOR=#23456g
ANDROID_VERSION_CODE=6
If, for example, the user types in 15, I want python to copy the following info:
ie.example.example
Mobile Banking
1.1.1
#72453h
3
because I need to copy it into a different text file.
I get the user to input a project number (in this example the project numbers are 15 & 20) and then I need the program to copy the app_identifier, display_name, bundle_version and android_version of the project relating to the number that the user input.
How do I get python to search the text file for the number input by the user and only take the needed information from the lines directly below that specific project?
I have a whole program written but this is just one section of it.
I don't really have any code yet to find and copy the specific information I need.
Here is code i have to search for the project ID
while True:
CUID = int(input("\nPlease choose an option:\n"))
if (CUID) == 0:
print ("Project one")
break
elif (CUID) == 15:
print ("Project two")
break
elif (CUID) == 89:
print ("Project three")
break
else:
print ("Incorrect input")
The solution thanks to Conor:
projectFile = open("C:/mobileBuildSettings.txt" , "r")
for line in projectFile:
CUID = str(CUID)
if CUID + " - " in line:
appIdentifier = next(projectFile).split("=")[1]
displayName = next(projectFile).split("=")[1]
bundleVersion = next(projectFile).split("=")[1]
next(projectFile)
androidVersionCode = next(projectFile).split("=")[1]
print (appIdentifier, displayName, bundleVersion, androidVersionCode)
break
projectfile = open("projects", "r")
for line in projectfile:
if CUID in line:
appIdentifier = next(projectfile).split("=")[1]
displayName = next(projectfile).split("=")[1]
bundleVersion = next(projectfile).split("=")[1]
next(projectfile)
androidVersionCode = next(projectfile).split("=")[1]
# Do whatever with the 4 values here, call function etc.
break
Then do with appIdentifier, displayName, bundleVersion & androidVersionCode what you will, they will return just the values after the '='.
Although I would recommend against generically searching for an integer, what if the integer is also in the bundle or android version?
There is no reason to list all individual numbers in a long if..else list. You can use a regular expression to check if a line starts with any digit. If it does, check if it matches the number you are looking for, and if it does not, skip the following lines until you reach your blank line separator.
As soon as you have the data you are looking for, you can use a regular expression again to locate the =, or simply use .find:
import re
numberToLookFor = '18'
with open("project.txt") as file:
while True:
line = file.readline()
if not line:
break
line = line.rstrip('\r\n')
if re.match('^'+numberToLookFor+r'\b', line):
while line and line != '':
if line.find('='):
print line[line.find('=')+1:]
line = file.readline().rstrip('\r\n')
else:
while line and line != '':
line = file.readline().rstrip('\r\n')
Here you go:
while True:
CUID = int(input("\nPlease choose an option:\n"))
if (CUID) == 0:
appid = value.split("APP_IDENTIFIER=")[1] # get the value after "APP_IDENTIFIER="
print appid
output >>> ie.example.example
You can apply the same code for all values there, just change the title before "=".
Get the whole line from text then get only the value after "=" with this code for result output.
On a personal whim I have written some code to search for the shortest series of links between any two Wikipedia articles. It turned out to be very brute force and takes a long long time to find the goal if it's more than a link or two deep, but it works! I will eventually keep track of and make use of the link paths and stuff, but I wanted to get the search working optimally first. Is there a faster way to do this or a good way to cut some major corners here?
import urllib2
from bs4 import BeautifulSoup
Start = 'http://en.wikipedia.org/wiki/Alan_Reid_%28politician%29'
End = 'http://en.wikipedia.org/wiki/Ayr'
#Using BeautifulSoup, this grabs the page
def soup_request(target):
request = urllib2.Request(target)
request.add_header("User-Agent", "Mozilla/5.0")
page = urllib2.urlopen(target)
soup = BeautifulSoup(page)
return soup
#This will grab all Wiki links off a given page
def get_links(Start):
soup = soup_request(Start)
Wiki_links = []
#Finds all links
for url in soup.findAll('a'):
result = url.get('href')
try:
if str(result)[:5] == '/wiki':
Wiki_links.append(result)
except:
pass
for q in range(len(Wiki_links)):
Wiki_links[q] = 'http://en.wikipedia.org'+str(Wiki_links[q])
print "Got new links from",Start
return Wiki_links
#This will check all the given links to see if the title matches the goal webpage
def check_links(Links,End):
goalsoup = soup_request(End)
goaltitle = goalsoup.html.title
Found = False
count = 0
for q in Links:
if Found:
break
length = len(Links)
#Runs through all the given links and checks their titles for correct one
if q is not None:
count += 1
soup = soup_request(q)
print "Checked",count,"links out of",length
try:
title = soup.html.head.title
if title == goaltitle:
Found = True
print "Found it!"
break
except:
print 'doh'
pass
return Found
#Top function to do all the stuff in the right order, applying a maximum depth of how deep into the links
def wiki_crawl(Start, End, depth):
Old_Links = [Start]
count = depth
while count > 0:
New_Links = []
for q in range(len(Old_Links)):
New_Links.extend(get_links(Old_Links[q]))
Found = check_links(New_Links,End)
if Found:
print "All done."
break
Old_Links = New_Links
count -= 1
print "_______________________________________________________________ROUND DONE"
if not Found:
print "Did not find the page, you must go deeper!"
wiki_crawl(Start, End, 2)
Here are some functions to take info from wiki. The only problems with it is that sometimes it takes out a space from the info on the webpage.
def take_out_parenthesis(st):
string = list(st)
for a in string:
if a == '(':
del string[st.find(a)]
if a == ')':
del string[st.find(a) - 1]
return ''.join(string)
def take_out_tags(string):
st = list(string)
odd = ['<', '>']
times = 0
for a in string:
if a in odd:
times += 1
times /= 2
for b in range(times):
start = string.find('<') - 1
end = string.find('>')
bet = end - start + 1
for a in range(bet):
del st[start]
string = ''.join(st)
return string
def take_out_brackets(string):
st = list(string)
odd = ['[', ']']
times = 0
for a in string:
if a in odd:
times += 1
times /= 2
for b in range(times):
start = string.find('[') - 1
end = string.find(']')
bet = end - start + 1
for a in range(bet):
del st[start]
string = ''.join(st)
return string
def take_from_web_page(text):
n = 0
url = text.replace(" ", "_")
search = "http://en.wikipedia.org/wiki/%s" % url
page = urllib2.urlopen(search).read()
start = page.find('<p><b>') + 6
end = page.find('</a>.', start) + 5
new_page = page[start:end]
for a in new_page:
if a == '<':
if new_page[n - 1] != ' ':
lst = list(new_page)
lst.insert(n, ' ')
new_page = ''.join(lst)
n += 1
n += 1
return take_out_parenthesis(take_out_brackets(take_out_tags(new_page)))
I want to search for string in file and if there is string make action and if there isn´t string make other action, but from this code:
itcontains = self.textCtrl2.GetValue()
self.textCtrl.AppendText("\nTY: " + itcontains)
self.textCtrl2.Clear()
pztxtflpath = "TCM/Zoznam.txt"
linenr = 0
with open(pztxtflpath) as f:
found = False
for line in f:
if re.search("\b{0}\b".format(itcontains),line):
hisanswpath = "TCM/" + itcontains + ".txt"
hisansfl = codecs.open(hisanswpath, "r")
textline = hisansfl.readline()
linenr = 0
ans = ""
while textline <> "":
linenr += 1
textline = hisansfl.readline()
hisansfl.close()
rnd = random.randint(1, linenr) - 1
hisansfl = codecs.open(pztxtflpath, "r")
textline = hisansfl.readline()
linenr = 0
pzd = ""
while linenr <> rnd:
textline = hisansfl.readline()
linenr += 1
ans = textline
hisansfl.close()
self.textCtrl.AppendText("\nTexter: " + ans)
if not found:
self.textCtrl.AppendText("\nTexter: " + itcontains)
wrtnw = codecs.open(pztxtflpath, "a")
wrtnw.write("\n" + itcontains)
wrtnw.close
If there is not that string it is working corectly, but if there is that string, what i am searching for it makes if not found action. I really don´t know how to fix it, i have already try some codes from other sites, but in my code it doesn´t works. Can somebody help please?
Are you saying that the code underneath the following if statement executes if the string contains what you're looking for?
if re.search("\b{0}\b".format(itcontains),line):
If so, then you just need to add the following to the code block underneath this statement:
found = True
This will keep your if not found clause from running. If the string you are looking for should only be found once, I would also add a break statement to your first statement to break out of the loop.
When replying to an SMS, I have a limit of 160 characters. I currently have code set up to take a reply (which can be >160) and split it into a list of multiple texts each <160. It's also set up so that it keeps words whole. I included it:
repl='message to be sent. may be >160'
texts=[]
words=repl.split()
curtext=''
for word in words:
#for the first word, drop the space
if len(curtext)==0:
curtext+=word
#check if there's enough space left in the current message
elif len(curtext)<=155-(len(word)+1):
curtext+=' '+word
#not enough space. make a new message
else:
texts.append(curtext)
curtext=word
if curtext!='':
texts.append(curtext)
return texts
However, I now want to modify it so that it appends "reply m for more" to end of every second message. Any ideas on how to do this?
(I'm writing code in Python)
reply = "text to be sent ...."
texts = []
count = 0
current_text = []
for word in reply.split():
if count + len(word) < (160 if len(texts) % 2 == 0 else (160-17)):
current_text.append(word)
count += (len(word) + 1)
else:
count = 0
if len(texts) % 2 != 0):
#odd-numbered text gets additional message...
texts.append(" ".join(current_text) + "\nreply m for more")
else:
texts.append(" ".join(current_text))
current_text = []
def sms_calculator(msg_text):
sms_lst=[]
if len(msg_text) == 0:
return sms_lst
l_m_text = (msg_text.split())
if len(max(l_m_text, key=len))> 160:
return sms_lst
sms_string=l_m_text[0]
for i in range(1,len(l_m_text)):
if len(sms_string +' '+ l_m_text[i]) < 160 :
sms_string=sms_string +' '+ l_m_text[i]
else:
sms_lst.append(sms_string)
sms_string = l_m_text[i]
sms_lst.append(sms_string)
return sms_lst