I have a text file test.txt which has in it 'a 2hello 3fox 2hen 1dog'.
I want to read the file and then add all the items into a list, then strip the integers so it will result in the list looking like this 'a hello fox hen dog'
I tried this but my code is not working. The result is ['a 2hello 3foz 2hen 1dog']. thanks
newList = []
filename = input("Enter a file to read: ")
openfile = open(filename,'r')
for word in openfile:
newList.append(word)
for item in newList:
item.strip("1")
item.strip("2")
item.strip("3")
print(newList)
openfile.close()
from python Doc
str.strip([chars])Return a copy of the string with the leading and
trailing characters removed. The chars argument is a string specifying
the set of characters to be removed. If omitted or None, the chars
argument defaults to removing whitespace. The chars argument is not a
prefix or suffix; rather, all combinations of its values are stripped:
Strip wont modify the string, returns a copy of the string after removing the characters mentioned.
>>> text = '132abcd13232111'
>>> text.strip('123')
'abcd'
>>> text
'132abcd13232111'
You can try:
out_put = []
for item in newList:
out_put.append(item.strip("123"))
If you want to remove all 123 then use regular expression re.sub
import re
newList = [re.sub('[123]', '', word) for word in openfile]
Note: This will remove all 123 from the each line
Pointers:
strip returns a new string, so you need to assign that to something. (better yet, just use a list comprehension)
Iterating over a file object gives you lines, not words;
so instead you can read the whole thing then split on spaces.
The with statement saves you from having to call close manually.
strip accepts multiple characters, so you don't need to call it three times.
Code:
filename = input("Enter a file to read: ")
with open(filename, 'r') as openfile:
new_list = [word.strip('123') for word in openfile.read().split()]
print(new_list)
This will give you a list that looks like ['a', 'hello', 'fox', 'hen', 'dog']
If you want to turn it back into a string, you can use ' '.join(new_list)
there are several types of strips in python, basically they strip some specified char in every line. In your case you could use lstrip or just strip:
s = 'a 2hello 3fox 2hen 1dog'
' '.join([word.strip('0123456789') for word in s.split()])
Output:
'a hello fox hen dog'
A function in Python is called in this way:
result = function(arguments...)
This calls function with the arguments and stores the result in result.
If you discard the function call result as you do in your case, it will be lost.
Another way to use it is:
l=[]
for x in range(5):
l.append("something")
l.strip()
This will remove all spaces
Related
I have a word within two opening and closing parenthesis, like this ((word)).
I want to remove the first and the last parenthesis, so they are not duplicate, in order to obtain something like this: (word).
I have tried using strip('()') on the variable that contains ((word)). However, it removes ALL parentheses at the beginning and at the end. Result: word.
Is there a way to specify that I only want the first and last one removed?
For this you could slice the string and only keep from the second character until the second to last character:
word = '((word))'
new_word = word[1:-1]
print(new_word)
Produces:
(word)
For varying quantities of parenthesis, you could count how many exist first and pass this to the slicing as such (this leaves only 1 bracket on each side, if you want to remove only the first and last bracket you can use the first suggestion);
word ='((((word))))'
quan = word.count('(')
new_word = word[quan-1:1-quan]
print(new_word)
Produces;
(word)
You can use regex.
import re
word = '((word))'
re.findall('(\(?\w+\)?)', word)[0]
This only keeps one pair of brackets.
instead use str.replace, so you would do str.replace('(','',1)
basically you would replace all '(' with '', but the third argument will only replace n instances of the specified substring (as argument 1), hence you will only replace the first '('
read the documentation :
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
you can replace double opening and double closing parentheses, and set the max parameter to 1 for both operations
print('((word))'.replace('((','(',1).replace('))',')',1) )
But this will not work if there are more occurrences of double closing parentheses
Maybe reversing the string before replacing the closing ones will help
t= '((word))'
t = t.replace('((','(',1)
t = t[::-1] # see string reversion topic [https://stackoverflow.com/questions/931092/reverse-a-string-in-python]
t = t.replace('))',')',1) )
t = t[::-1] # and reverse again
Well , I used regular expression for this purpose and substitute a bunch of brackets with a single one using re.sub function
import re
s="((((((word)))))))))"
t=re.sub(r"\(+","(",s)
g=re.sub(r"\)+",")",t)
print(g)
Output
(word)
Try below:
>>> import re
>>> w = '((word))'
>>> re.sub(r'([()])\1+', r'\1', w)
'(word)'
>>> w = 'Hello My ((word)) into this world'
>>> re.sub(r'([()])\1+', r'\1', w)
'Hello My (word) into this world'
>>>
try this one:
str="((word))"
str[1:len(str)-1]
print (str)
And output is = (word)
I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like:
['hello\n', 'hallo\n', 'hell\n']
And when I put a word in, it gives a output of this:
hellllok could be spelled as hello
hellos
hillock
Question:
I'm wondering if there is a reverse or inverse \n so I can counteract these \n's.
Any help is appreciated.
There's no "reverse newline" in the standard character set but, even if there was, you would have to apply it to each string in turn.
And, if you can do that, you can equally modify the strings to remove the newline. In other words, create a new list using the current one, with newlines removed. That would be something like:
>>> oldlist = ['hello\n', 'hallo\n', 'hell\n']
>>> oldlist
['hello\n', 'hallo\n', 'hell\n']
>>> newlist = [s.replace('\n','') for s in oldlist]
>>> newlist
['hello', 'hallo', 'hell']
That will remove all newlines from each of the strings. If you want to ensure you only replace a single newline at the end of the strings, you can instead use:
newlist = [re.sub('\n$','',s) for s in oldlist]
I am reading a file in my Python script which looks like this:
#im a useless comment
this is important
I wrote a script to read and split the "this is important" part and ignore the comment lines that start with #.
I only need the first and the last word (In my case "this" and "important").
Is there a way to tell Python that I don't need certain parts of a split?
In my example I have what I want and it works.
However if the string is longer and I have like 10 unused variables, I gues it is not like programmers would do it.
Here is my code:
#!/usr/bin/python3
import re
filehandle = open("file")
for line in file:
if re.search("#",line):
break;
else:
a,b,c = line.split(" ")
print(a)
print(b)
filehandle.close()
Another possibility would be:
a, *_, b = line.split()
print(a, b)
# <a> <b>
If I recall correctly, *_ is not backwards compatible, meaning you require Python 3.5/6 or above (would really have to look into the changelogs here).
On line 8, use the following instead of
a,b,c = line.split(" ")
use:
splitLines = line.split(" ")
a, b, c = splitLines[0], splitLines[1:-1], splitLines[-1]
Negative indexing in python, parses from the last. More info
I think python negative indexing can solve your problem
import re
filehandle = open("file")
for line in file:
if re.search("#",line):
break;
else:
split_word = line.split()
print(split_word[0]) #First Word
print(split_word[-1]) #Last Word
filehandle.close()
Read more about Python Negative Index
You can save the result to a list, and get the first and last elements:
res = line.split(" ")
# res[0] and res[-1]
If you want to print each 3rd element, you can use:
res[::3]
Otherwise, if you don't have a specific pattern, you'll need to manually extract elements by their index.
See the split documentation for more details.
If I've understood your question, you can try this:
s = "this is a very very very veeeery foo bar bazzed looong string"
splitted = s.split() # splitted is a list
splitted[0] # first element
splitted[-1] # last element
str.split() returns a list of the words in the string, using sep as the delimiter string. ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
In that way you can get the first and the last words of your string.
For multiline text (with re.search() function):
import re
with open('yourfile.txt', 'r') as f:
result = re.search(r'^(\w+).+?(\w+)$', f.read(), re.M)
a,b = result.group(1), result.group(2)
print(a,b)
The output:
this important
Im trying to make a new line every time i find a word starting with a capital letter, here is my code:
import re
def new_line(name):
fr = open(name, 'r')
string = fr.read()
new_list = []
fw = open('output', 'w')
c = 0
m = re.findall('\s+[A-Z]\w+', string,re.MULTILINE)
for i in m:
j = str(i)
l = re.sub('[A-Z]\w+','\n'+str(m[c]), string,re.MULTILINE)
c = c+1
print("These are the list items:"+j+"\n")
print("STRINGY STRING BELOW!!!")
print(string)
print('/////////////////////////////////////////////')
print("Output :\n"+l)
print(m)
new_line('task.txt')
Desired output should be something like this :
These are the list items: Miss
These are the list items: Catherine
.
.
.
These are the list items: Heathcliff
And then the text with new lines added , instead of replacing every match with a \n and the match itself, the text is replaced with only the last item from list m
Like this:
Output :
I got
Heathcliff
Heathcliff and myself to
Heathcliff
Heathcliff; and, to my agreeable disappointment, she behaved infinitely better than I dared to expect.
Heathcliff seemed almost over-fond of
Heathcliff.
Heathcliff; and even to his sister she showed plenty of affection.
I didnt post the original input text as it's too long.
You could try this. It just prefixes each word (with capital letter) with \n.
>>> re.sub(r'\s+([A-Z])','\n\g<1>', "Heathcliff and myself to Heathcliff; to my")
'Heathcliff and myself to\nHeathcliff; to my'
Here is my approach: use re.sub to search for white spaces followed by a capital letter. Replace that with the capital letter itself.
with open(name) as infile, open('output', 'w') as outfile:
contents = infile.read()
new_contents = re.sub(r'\s+([A-Z])', r'\n\1', contents)
outfile.write(new_contents)
Notes
The paretheses in the pattern tells re to remember the text within
the \1 in the replacement text is what re remembered before
Since the list contains only matches that will end up in the list m, you are constantly replacing any word starting with upper case in the document with what is in m[c], so after you've looped through, it will be the last name in the list.
Try stopping the loop after c = 1, c = 2 etc, and you will find all the names to be that number in the list.
re.sub() replaces all non overlapping ocurrences of your pattern.
What does that mean? See the following example:
import re
test_str = 'spam spam spam'
print re.sub('spam', 'beans', test_str, re.MULTILINE)
will print
beans beans beans
What this means is that your code is replacing all ocurrences of capitalized words in the string with your last word. That is why you're seeing 'Heathcliff' everywhere: it was the last capitalized word in your text
def censor2(filename):
infile = open(filename,'r')
contents = infile.read()
contentlist = contents.split()
print (contents)
print (contentlist)
for letter in contentlist:
if len(letter) == 4:
print (letter)
contents = contents.replace(letter,'xxxx')
outfile = open('censor.txt','w')
outfile.write(contents)
infile.close()
outfile.close()
This code works in Python. It accepts a file 'example.txt', reads it and loops through replacing all 4 letter words with the string 'xxxx' and outputting this into a new file (keeping original format!) called censored.txt.
I used the replace function and find the words to be replaced. However, the word 'abcd' is replaced and the next word 'abcde' is turned into 'xxxxe'
How do i prevent 'abcde' from being changed?
I could not get the below examples to work, but after working with the re.sub module i found that the following code works to replace only 4 letter words and not 5 letter words.
contents = re.sub(r"(\b)\w{4}(\b)", r"\1xxxxx\2", contents)
how about:
re.sub(r'\babcd\b','',my_text)
this will require it to have word boundaries on either side
This is where regular expressions can be helpful. You would want something like this:
import re
...
contents = re.sub(r'\babcd\b', 'xxxx', contents)
....
The \b is the "word boundary" marker. It matches the change from a word to whitespace characters, punctuation, etc.
You'll need the r'' style string for the regex pattern so that the backslashes are not treated as escape characters.