python - remove string from words in an array - python

#!/usr/bin/python
#this looks for words in dictionary that begin with 'in' and the suffix is a real word
wordlist = [line.strip() for line in open('/usr/share/dict/words')]
newlist = []
for word in wordlist:
if word.startswith("in"):
newlist.append(word)
for word in newlist:
word = word.split('in')
print newlist
how would I get the program to remove the string "in" from all the words that it starts with? right now it does not work

#!/usr/bin/env python
# Look for all words beginning with 'in'
# such that the rest of the word is also
# a valid word.
# load the dictionary:
with open('/usr/share/dict/word') as inf:
allWords = set(word.strip() for word in inf) # one word per line
using 'with' ensures the file is always properly closed;
I make allWords a set; this makes searching it an O(1) operation
then we can do
# get the remainder of all words beginning with 'in'
inWords = [word[2:] for word in allWords if word.startswith("in")]
# filter to get just those which are valid words
inWords = [word for word in inWords if word in allWords]
or run it into a single statement, like
inWords = [word for word in (word[2:] for word in allWords if word.startswith("in")) if word in allWords]
Doing it the second way also lets us use a generator for the inside loop, reducing memory requirements.

split() returns a list of the segments obtained by splitting. Furthermore,
word = word.split('in')
doesn't modify your list, it just modifies the variable being iterated.
Try replacing your second loop with this:
for i in range(len(newlist)):
word = newlist[i].split('in', 1)
newlist[i] = word[1]

It's difficult to tell from your question what you want in newlist if you just want words that start with "in" but with "in" removed then you can use a slice:
newlist = [word[2:] for word in wordlist if word.startswith('in')]
If you want words that start with "in" are still in wordlist once they've had "in" removed (is that what you meant by "real" in your comment?) then you need something a little different:
newlist = [word for word in wordlist if word.startswith('in') and word[2:] in wordlist
Note that in Python we use a list, not an "array".

Suppose that wordlist is the list of words. Following code should do the trick:
for i in range(len(wordlist)):
if wordlist[i].startswith("in"):
wordlist[i] = wordlist[i][2:]
It is better to use while loop if the number of words in the list is quite big.

Related

What's the difference between "word = line.split()" and "for word in line.split()"?

I'm new to programming and this is my first question here. I feel it might be a very silly beginner doubt, but here goes.
On multiple occasions, I've typed out the whole code right except for this one line, on which I make the same mistake every time.
Could someone please explain to me what the computer understands when I type each of the following lines, and what the difference is?
word = line.split()
for word in line.split()
The difference between the expected and my actual output is just because I typed the former instead of the latter:
word = line.split()
This will split the line variable (using the default "any amount of white space" separator) and give you back a list of words built from it. You then bind the variable word to that list.
On the other hand:
for word in line.split()
initially does the same thing the previous command did (splitting the line to get a list) but, instead of binding the word variable to that entire list, it iterates over the list, binding word to each string in the list in turn.
The following transcript hopefully makes this clearer:
>>> line = 'pax is good-looking'
>>> word = line.split() ; print(word)
['pax', 'is', 'good-looking']
>>> for word in line.split(): print(word)
...
pax
is
good-looking
The split() is a separator method.
word = line.split() will return a list by splitting line into words where ' ' is present (as it is the default separator.)
for word in line.split() will iterate over that list (line.split()).
Here is an example for clarification.
line = "Stackoverflow is amazing"
word = line.split()
print(word)
>>>['Stackoverflow','is','amazing']
for word in line.split():
print(word)
>>>
'Stackoverflow'
'is'
'amazing'

searching for reverse string in an file

I am trying to make a script that looks inside a file that contains all words of my language (1 word per line), read it and checks if every single word in that file is in reverse in the file, basicly palindromes and semi-palindromes
words = open('AllWords.txt', 'r')
for line in words:
reverse = line[::-1]
if reverse in words:
print(reverse)
if reverse not in words:
continue
However it seems that after the first word in the file (which is not reverse in words) it stops iterating.
Does anyone know how I could fix this?
The problem is that word is an iterator and the check reverse in words exhausts it. So for the next iteration of the for loop there is no further element available (the iterator is exhausted) and so it stops iterating.
You could use a list or set instead:
words = set(map(str.rstrip, open(...).readlines()))
Then perform the rest of the code as you've already indicated.
If order matters then you can use a list for the iteration and a set for the check (membership tests for sets are O(1)):
with open(...) as fh:
words = [x.rstrip() for x in fh]
word_set = set(words)
for word in words:
if word[::-1] in word_set:
print(word)
You can also use two sets since the palindromes are the intersection between two sets, one for words and one for reversed words:
with open(...) as fh:
words = set(map(str.rstrip, fh))
words_reversed = set(x[::-1] for x in words)
palindromes = words & words_reversed
words = open('AllWords.txt', 'r').readlines()
for line in words:
reverse = line[::-1]
if reverse in words:
print(reverse)
The process is going to be quite slow if you have a large number of words in your file. You could get results much faster using set operations:
words = open("Allwords.txt").read().split("\n")
palindromes = set(words).intersection(w[::-1] for w in words)
for palindrome in palindromes: print(palindrome)

How do I loop over a string and add words that start with a certain letter to an empty list?

So for an assignment I have to create an empty list variable empty_list = [], then have python loop over a string, and have it add each word that starts with a 't' to that empty list. My attempt:
text = "this is a text sentence with words in it that start with letters"
empty_list = []
for twords in text:
if text.startswith('t') == True:
empty_list.append(twords)
break
print(empty_list)
This just prints a single [t]. I'm pretty sure I'm not using startswith() correctly. How would I go about making this work correctly?
text = "this is a text sentence with words in it that start with letters"
print([word for word in text.split() if word.startswith('t')])
Working solution for you. You also need to replace text.startswith('t') by twords.startswith('t') because you are now using twords to iterate through each word of your original statement stored in text. You used break which would only make your code print this since after finding the first word, it will break outside the for loop. To get all the words beginning with t, you need to get rid of the break.
text = "this is a text sentence with words in it that start with letters"
empty_list = []
for twords in text.split():
if twords.startswith('t') == True:
empty_list.append(twords)
print(empty_list)
> ['this', 'text', 'that']
Try something like this:
text = "this is a text sentence with words in it that start with letters"
t = text.split(' ')
ls = [s for s in t if s.startswith('t')]
ls will be the resulting list
Python is great for using list comprehension.
The below code works,
empty_list = []
for i in text.split(" "):
if i.startswith("t"):
empty_list.append(i)
print(empty_list)
The problem in your code is,
You are iterating each letter, that's wrong

Call multiple functions inside list comprehension

I'm trying to import a text file and return the text into a list of strings for each word while also returning lower case and no punctuation.
I've created the following code but this doesn't split each word into a string. Also is it possible to add .lower() into the comprehension?
def read_words(words_file):
"""Turns file into a list of strings, lower case, and no punctuation"""
return [word for line in open(words_file, 'r') for word in line.split(string.punctuation)]
Yes, you can add .lower to the comprehension. It should probably happen in word. Also the following code probably does not split each word because of string.punctuation. If you are just trying to split on whitespace calling .split() without arguments will suffice.
Here's a list comprehension that should do everything you want:
[word.translate(None, string.punctuation).lower() for line in open(words_file) for word in line.split()]
You need to split on whitespace (the default) to separate the words. Then you can transform each resulting string to remove the punctuation and make it lowercase.
import string
def read_words(words_file):
"""Turns file into a list of strings, lower case, and no punctuation"""
with open(words_file, 'r') as f:
lowered_text = f.read().lower()
return ["".join(char for char in word if char not in string.punctuation) for word in lowered_text.split()]
Use a mapping to translate the words and use it in a generator function.
import string
def words(filepath):
'''Yield words from filepath with punctuation and whitespace removed.'''
# map uppercase to lowercase and punctuation/whitespace to an empty string
t = str.maketrans(string.ascii_uppercase,
string.ascii_lowercase,
string.punctuation + string.whitespace)
with open(filepath) as f:
for line in f:
for word in line.strip().split():
word = word.translate(t)
# don't yield empty strings
if word:
yield word
Usage
for word in words('foo.txt'):
print(word)

Python lists not working properly

import random
words = ["Football" , "Happy" ,"Sad", "Love", "Human"]
for word in words:
word = random.choice(words)
print(word)
words.remove(word)
Why does the above code only print out 3 words instead of all 5? Am I trying to achieve printing the words from wordsin a random order in an incorrect way?
You can't modify a list (by adding or removing elements) while iterating over it, the behaviour is undefined. Here's a possible alternative for what you're doing that doesn't have that problem:
random.shuffle(words)
for word in words:
print(word)
This is because you are not looping correctly. Try this:
import random
words = ["Football" , "Happy" ,"Sad", "Love", "Human"]
while words:
word = random.choice(words)
print(word)
words.remove(word)
You need to make sure that the list words is not empty because you cannot modify an array whilst iterating over it.
People have mostly explained why you're not getting the behavior you want, but just to throw an alternate solution into the mix using a different idiom:
import random
words = ["Football" , "Happy" ,"Sad", "Love", "Human"]
random.shuffle(words)
while words:
print(words.pop())
you should not modify a list while iterating over it try
for _ in range(len(words)):
word = random.choice(words)
words.remove(word)
print word
To explicitly state blogbeards suggestion,
>>>import random
>>>random.shuffle(words)
>>>print(*words)

Categories