I have two different kinds of URLs in a list:
The first kind looks like this and starts with the word 'meldung':
meldung/xxxxx.html
The other kind starts with 'artikel':
artikel/xxxxx.html
I want to detect if a URL starts with 'meldung' or 'artikel' and then do different operations based on that. To achieve this I tired to use a loop with if and else conditions:
for line in r:
if re.match(r'^meldung/', line):
print('je')
else:
print('ne')
I also tried this with line.startswith():
for line in r:
if line.startswith('meldung/'):
print('je')
else:
print('ne')
But both methods dont work since the strings I am checking dont have any whitespaces.
How can I do this correctly?
You can just use the following, if the links are stored as strings within the list:
for line in r:
if ‘meldung’ in line:
print(‘je’)
else:
print(‘ne’)
What about this:
r = ['http://example.com/meldung/page1.html', 'http://example.com/artikel/page2.html']
for line in r:
url_tokens = line.split('/')
if url_tokens[-2] == 'meldung':
print(url_tokens[-1]) # the xxxxx.html part
elif url_tokens[-2] == 'artikel':
print('ne')
else:
print('something else')
you can do it using regex:
import re
def check(string):
if (re.search('^meldung|artikel*', string)):
print("je")
else:
print("ne")
for line in r:
check(line)
Related
My goal is to remove any non-countries from the list, "Checklist". Obviously "Munster" is not a country. Portugal is though. Why does it get removed then?
checklist = ["Portugal", "Germany", "Munster", "Spain"]
def CheckCountry(i):
with open("C:/Users/Soham/Desktop/Python Exercises/original.txt","r") as f:
for countries in f.readlines():
if countries==i:
continue
else:
return True
return False
for i in checklist:
if CheckCountry(i)==True:
index=checklist.index(i)
checklist.pop(index)
else:
CheckCountry(i)
print(checklist)
Please tell me what is wrong with my code. Keep in mind I have not learned regex or lambda yet.
I think your using the continue function improperly in this scenario. Your not allowing your function to get to the return False because of continue. So try this:
checklist = ["Portugal", "Germany", "Munster", "Spain"]
def CheckCountry(i):
with open("C:/Users/Soham/Desktop/Python Exercises/original.txt","r") as f:
for country in f.readlines():
if country != i:
return False
return True
for i in checklist:
if CheckCountry(i) == False:
checklist.pop(checklist.index(i))
print(checklist)
Also remember python doesn't require that you include an else statement after an if. So in your situation they are just taking up space. I don't know how your .txt file is set up but I'm assuming each line has only a single country. Since if they are multiple countries per line I suggest using a separate of some sort so that you can turn each line into a list like so:
def CheckCountry(i):
with open("C:/Users/Soham/Desktop/Python Exercises/original.txt","r") as f:
for countries in f.readlines():
country_lst = countries.split(',') # using , as a separator
for x in country_lst:
if x != i:
return False
from docx import Document
alphaDic = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','!','?','.','~',',','(',')','$','-',':',';',"'",'/']
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
removeSpaces = " ".join(translation.split())
removeLineBreaks = removeSpaces.replace('\n','')
doc.paragraphs[docIndex].text = removeLineBreaks
docIndex +=1
I am attempting to remove line breaks from the document, but it doesn't work.
I am still getting
Hello
There
Rather than
Hello
There
I think what you want to do is get rid of an empty paragraph. The following function could help, it deletes a certain paragraph that you don't want:
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None
Code by: Scanny*
In your code, you could check if translation is equal to '' or not, and if it is then call the delete_paragraph function, so your code would be like:
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
if translation != '':
doc.paragraphs[docIndex].text = translation
else:
delete_paragraph(doc.paragraphs[docIndex])
docIndex -=1 # go one step back in the loop because of the deleted index
docIndex +=1
*Reference- feature: Paragraph.delete()
The package comes with an example program that extracts the text.
That said, I think your problem springs from the fact that you are trying to operate on paragraphs. But the separation between paragraphs is where the newlines are happening. So even if you replace a program with the empty string (''), there will still be a newline added to the end of it.
You should either take the approach of the example program, and do your own formatting, or you should make sure that you delete any spurious "empty" paragraphs that might be between the "full" ones you have ("Hello", "", "There") -> ("Hello", "There").
Since readlines could read any type of text files, you can open the file rewrite the lines you want and ignore the lines you dont want to use.
"""example"""
file = open("file name", "w")
for line in file.readlines():
if (line != ''):
file.write(line)
I need to write a function that will search for words in a matrix. For the moment i'm trying to search line by line to see if the word is there. This is my code:
def search(p):
w=[]
for i in p:
w.append(i)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
l=[]
for letter in line:
l.append(letter)
if w==l:
return True
else:
pass
This code works only if my word begins in the first position of a line.
For example I have this matrix:
[[a,f,l,y],[h,e,r,e],[b,n,o,i]]
I want to find the word "fly" but can't because my code only works to find words like "here" or "her" because they begin in the first position of a line...
Any form of help, hint, advice would be appreciated. (and sorry if my english is bad...)
You can convert each line in the matrix to a string and try to find the search work in it.
def search(p):
s=read_wordsearch()
for line in s:
if p in ''.join(line):
return True
I'll give you a tip to search within a text for a word. I think you will be able to extrapolate to your data matrix.
s = "xxxxxxxxxhiddenxxxxxxxxxxx"
target = "hidden"
for i in xrange(len(s)-len(target)):
if s[i:i+len(target)] == target:
print "Found it at index",i
break
If you want to search for words of all length, if perhaps you had a list of possible solutions:
s = "xxxxxxxxxhiddenxxxtreasurexxxxxxxx"
targets = ["hidden","treasure"]
for i in xrange(len(s)-1):
for j in xrange(i+1,len(s)):
if s[i:j] in targets:
print "Found",s[i:j],"at index",
def search(p):
w = ''.join(p)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
word = ''.join(line)
if word.find(w) >= 0:
return True
return False
Edit: there is already lot of string functions available in Python. You just need to use strings to be able to use them.
join the characters in the inner lists to create a word and search with in.
def search(word, data):
return any(word in ''.join(characters) for characters in data)
data = [['a','f','l','y'], ['h','e','r','e'], ['b','n','o','i']]
if search('fly', data):
print('found')
data contains the matrix, characters is the name of each individual inner list. any will stop after it has found the first match (short circuit).
I've been working on this, and googling for hours. I can't seem to figure out what is going wrong.
The purpose of this program, is to check a text file for stock market ticker symbols, and add a ticker only if it is not already in the file.
There are two things going wrong. When the text file is empty, it won't add any tickers at all. When it has even a single character in the text file, it is adding every ticker you give it, regardless of if that ticker is already on the list.
import re
def tickerWrite(tick):
readTicker = open('Tickers.txt', 'r')
holder = readTicker.readlines()
readTicker.close()
if check(tick) == False:
writeTicker = open('Tickers.txt', 'w')
holder.append(tick.upper() + '\n')
writeTicker.writelines(holder)
writeTicker.close()
def check(ticker):
with open('Tickers.txt') as tList:
for line in tList:
if re.search(ticker, line):
return True
else:
return False
Another module calls AddReadTickers.tickerWrite(ticker) in order to add tickers entered by a user.
First of all.
I suggest to use
if not check(tick):
instead of
if check(tick) == False:
Then. I think it is better to use
writeTicker = open('Tickers.txt', 'a')
and you will not need holder at all.
Just tried to rewrite the code
from __future__ import print_function
import re
import sys
def tickerWrite(tick):
if not check(tick):
with open('Tickers.txt', 'a') as writeTicker:
print(tick.upper(), file=writeTicker)
def check(ticker):
with open('Tickers.txt') as tList:
for line in tList:
return bool(re.search(ticker, line))
if __name__ == '__main__':
tickerWrite(sys.argv[1])
It works as it seems for me.
function check should return False defaultly. It returns None for an empty Tickers.txt, that causes the line "if check(tick) == False:" always False. This is the reason it won't add any ticker for empty file
my guess is because of content of ticker. Since you use the ticker as pattern, it probably cause unexpected result when ticker contains some special characters of regular expression. In my understanding, you can just use code
if ticker==line:
return True
else:
return False
My code that is meant to replace certain letters (a with e, e with a and s with 3 specifically) is not working, but I am not quite sure what the error is as it is not changing the text file i am feeding it.
pattern = "ae|ea|s3"
def encode(pattern, filename):
message = open(filename, 'r+')
output = []
pattern2 = pattern.split('|')
for letter in message:
isfound = false
for keypair in pattern2:
if letter == keypair[0]:
output.append(keypair[1])
isfound = true
if isfound == true:
break;
if isfound == false:
output.append(letter)
message.close()
Been racking my brain out trying to figure this out for a while now..
It is not changing the textfile because you do not replace the textfile with the output you create. Instead this function is creating the output string and dropping it at the end of the function. Either return the output string from the function and store it outside, or replace the file in the function by writing to the file without appending.
As this seems like an exercise I prefer to not add the code to do it, as you will probably learn more from writing the function yourself.
Here is a quick implementation with the desired result, you will need to modify it yourself to read files, etc:
def encode(pattern, string):
rep = {}
for pair in pattern.split("|"):
rep[pair[0]] = pair[1]
out = []
for c in string:
out.append(rep.get(c, c))
return "".join(out)
print encode("ae|ea|s3", "Hello, this is my default string to replace")
#output => "Hallo, thi3 i3 my dafeult 3tring to rapleca"
If you want to modify a file, you need to specifically tell your program to write to the file. Simply appending to your output variable will not change it.