Python remove words in every string from list - python

So I have a list that has a strings in the form of a sentence as each element, like this
a = ["This is a sentence with some words.", "And this is a sentence as well.", "Also this right here is a sentence."]
What I want to do with this list is to only keep the third and fourth word of each string, so in the end I want a list like
b = ["a sentence", "is a", "right here"]
The first thing to do I presume is to split the list after spaces, so something like
for x in a:
x.split()
However I'm a bit confused on how to continue. The above loop should produce basically one list per sentence where every word is an own element. I thought about doing this
e = []
for x in a:
x.split()
a = x[0:2]
a = x[2:]
e.append(a)
but instead of removing words it removes characters and I get the following output
['is is a sentence with some words.', 'd this is a sentence as well.', 'so this right here is a sentence.']
I'm not sure why it produces this behavior. I have been sitting at this for a while now and probably missed something really stupid, so I would really appreciate some help.

Nothing can modify a string, they are immutable. You can only derive data from it. As others have said, you need to store the value of .split().
Lists are mutable but slicing them also does not modify them in place, it creates a new sublist which you need to store somewhere. Overall this can be done like so:
e = [' '.join(x.split()[2:4]) for x in a]
The whole thing is a list comprehension in case you're not familiar. .join() converts the sublist back into a string.

When you do x.split(), the output does not take effect on x itself, it results in a list of strings, since strings are not mutable:
lst = s.split(),
Then just join your desired items:
e.append(' '.join(lst[2:4]))

Strings are immutable. x.split() returns a list of strings, but does not modify x. However you do not capture that return value, so it is lost.

Related

Extract data from a list Python

I have a list of string and I want to take the last "word" of it, explanation :
Here's my code :
myList = ["code 53 value 281", "code 53 value 25", ....]
And I want to take only the number at the end :
myList = ["281", "25", ....]
Thank you.
Let's break down your problem.
So first off, you've got a list of strings. You know that each string will end with some kind of numeric value, you want to pull that out and store it in the list. Basically, you want to get rid of everything except for that last numeric value.
To write it in code terms, we need to iterate on that list, split each string by a space character ' ', then grab the last word from that collection, and store it in the list.
There are quite a few ways you could do this, but the simplest would be list comprehension.
myList = ["Hey 123", "Hello 456", "Bye 789"] # we want 123, 456, 789
myNumericList = [x.split(' ')[-1] for x in myList]
# for x in myList is pretty obvious, looks like a normal for loop
# x.split(' ') will split the string by the space, as an example, "Hey 123" would become ["Hey", "123"]
# [-1] gets the last element from the collection
print(myNumericList) # "123", "456", "789"
I don't know why you would want to check if there are integers in your text, extract them and then convert them back to a string and add to a list. Anyhow, you can use .split() to split the text on spaces and then try to interpret the splitted strings as integers, like so:
myList = ["code 53 value 281", "code 53 value 25"]
list = []
for var in myList:
list.append(var.split()[-1])
print(list)
Loop through the list and for a particular value at i-th index in the list simply pick the last value.
See code section below:
ans=[]
for i in myList:
ans.append(i.split(" ")[-1])
print(ans)

Is there a reverse \n?

I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like:
['hello\n', 'hallo\n', 'hell\n']
And when I put a word in, it gives a output of this:
hellllok could be spelled as hello
hellos
hillock
Question:
I'm wondering if there is a reverse or inverse \n so I can counteract these \n's.
Any help is appreciated.
There's no "reverse newline" in the standard character set but, even if there was, you would have to apply it to each string in turn.
And, if you can do that, you can equally modify the strings to remove the newline. In other words, create a new list using the current one, with newlines removed. That would be something like:
>>> oldlist = ['hello\n', 'hallo\n', 'hell\n']
>>> oldlist
['hello\n', 'hallo\n', 'hell\n']
>>> newlist = [s.replace('\n','') for s in oldlist]
>>> newlist
['hello', 'hallo', 'hell']
That will remove all newlines from each of the strings. If you want to ensure you only replace a single newline at the end of the strings, you can instead use:
newlist = [re.sub('\n$','',s) for s in oldlist]

Insert word into specific position in a list

Sorry if the title isn't descriptive enough. Basically, I have a list like
["The house is red.", "Yes it is red.", "Very very red."]
and I'd like to insert the word "super" before the first character, between the middle characters and after the last character of each string. So I would have something like this for the first element:
["superThe houssupere is red.super",...]
How would I do this? I know with strings I could use add the "super" string to the beginning of my string then use len() to go to the middle of the string and add "super". Is there a way to get this to work with a list or should I try a different approach?
The method used here is to iterate through the original list, splitting each item into two halves and building the final item string using .format before appending it into a new list.
orig_list = ["The house is red.", "Yes it is red.", "Very very red."]
new_list = []
word = 'super'
for item in orig_list:
first_half = item[:len(item) // 2]
second_half = item[len(item) // 2:]
item = '{}{}{}{}{}'.format(word, first_half, word, second_half, word)
new_list.append(item)

Python: find out if an element in a list has a specific string

I am looking for a specific string in a list; this string is part of a longer string.
Basically i loop trough a text file and add each string in a different element of a list. Now my objective is to scan the whole list to find out if any of the elements string contain a specific string.
example of the source file:
asfasdasdasd
asdasdasdasdasd mystring asdasdasdasd
asdasdasdasdasdasdadasdasdasdas
Now imagine that each of the 3 string is in an element of the list; and you want to know if the list has the string "my string" in any of it's elements (i don't need to know where is it, or how many occurrence of the string are in the list). I tried to get it with this, but it seems to not find any occurrence
work_list=["asfasdasdasd", "asdasdasdasd my string asdasdasdasd", "asdadadasdasdasdas"]
has_string=False
for item in work_list:
if "mystring" in work_list:
has_string=True
print "***Has string TRUE*****"
print " \n".join(work_list)
The output will be just the list, and the bool has_string stays False
Am I missing something or am using the in statement in the wrong way?
You want it to be:
if "mystring" in item:
A concise (and usually faster) way to do this:
if any("my string" in item for item in work_list):
has_string = True
print "found mystring"
But really what you've done is implement grep.
Method 1
[s for s in stringList if ("my string" in s)]
# --> ["blah my string blah", "my string", ...]
This will yield a list of all the strings which contain "my string".
Method 2
If you just want to check if it exists somewhere, you can be faster by doing:
any(("my string" in s) for s in stringList)
# --> True|False
This has the benefit of terminating the search on the first occurrence of "my string".
Method 3
You will want to put this in a function, preferably a lazy generator:
def search(stringList, query):
for s in stringList:
if query in s:
yield s
list( search(["an apple", "a banana", "a cat"], "a ") )
# --> ["a banana", "a cat"]

Using Python to check words

I'm stuck on a simple problem. I've got a dictionary of words in the English language, and a sample text that is to be checked. I've got to check every word in the sample against the dictionary, and the code I'm using is wrong.
for word in checkList: # iterates through every word in the sample
if word not in refDict: # checks if word is not in the dictionary
print word # just to see if it's recognizing misspelled words
The only problem is, as it goes through the loop it prints out every word, not just the misspelled ones. Can someone explain this and offer a solution possibly? Thank you so much!
The snippet you have is functional. See for example
>>> refDict = {'alpha':1, 'bravo':2, 'charlie':3, 'delta':4}
>>> s = 'he said bravo to charlie O\'Brian and jack Alpha'
>>> for word in s.split():
... if word not in refDict:
... print(repr(word)) # by temporarily using repr() we can see exactly
... # what the words are like
...
'he'
'said'
'to'
"O'Brian"
'and'
'jack'
'Alpha' # note how Alpha was not found in refDict (u/l case difference)
Therefore, the dictionary contents must differ from what you think, or the words out of checklist are not exactly as they appear (eg. with whitespace or capitalization; see the use of repr() (*) in print statement to help identify cases of the former).
Debugging suggestion: FOCUS on the first word from checklist (or the first that you suspect is to be found in dictionary). Then for this word and this word only, print it in details, with its length, with bracket on either side etc., for both the word out of checklist and the corresponding key in the dictionary...
(*) repr() was a suggestion from John Machin. Instead I often use brackets or other characters as in print('[' + word + ']'), but repr() is more exacting in its output.
Consider stripping your words of any whitespace that might be there, and changing all the words of both sets to the same case. Like this:
word.strip().lower()
That way you can make sure you're comparing apples to apples.
Clearly "word not in refDict" always evaluates to True. This is probably because the contents of refDict or checkList are not what you think they are. Are they both tuples or lists of strings?
The code you have would work if the keys in refDict are the correctly spelt words. If the correctly spelt words are the values in your dict then you need something like this:
for word in checkList:
if word not in refDict.values():
print word
Is there a reason you dictionary is stored as a mapping as opposed to a list or a set? A python dict contains name-value pairs for example I could use this mapping: {"dog":23, "cat":45, "pony":67} to store an index of a word and page number it is found in some book. In your case your dict is a mapping of what to what?
Are the words in the refDict the keys or the values?
Your code will only see keys: e.g.:
refDict = { 'w':'x', 'y':'z' }
for word in [ 'w','x','y','z' ]:
if word not in refDict:
print word
prints:
x
z
Othewise you want;
if word not in refDict.values()
Of course this rather assumes that your dictionary is an actual python dictionary which seems an odd way to store a list of words.
Your refDict is probably wrong. The in keyword checks if the value is in the keys of the dictionary. I believe you've put your words in as values.
I'd propose using a set instead of a dictionary.
knownwords = set("dog", "cat")
knownwords.add("apple")
text = "The dog eats an apple."
for word in text.split(" "):
# to ignore case word is converted to lowercase
if word.lower() not in knownwords:
print word
# The
# eats
# an
# apple. <- doesn't work because of the dot

Categories