how to split String with a "pattern" from list - python

I've got a small problem with finding part of string in a list with python.
I load the string from a file and the value is one of the following: (none, 1 from list, 2 from list, 3 from list or more...)
I need to perform different actions depending on whether the String equals "", the String equals 1 element from list, or if the String is for 2 or more elements. For Example:
List = [ 'Aaron', 'Albert', 'Arcady', 'Leo', 'John' ... ]
String = "" #this is just example
String = "Aaron" #this is just example
String = "AaronAlbert" #this is just example
String = "LeoJohnAaron" #this is just example
I created something like this:
if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif ... #dont know what now
The best way would be to split this String with a pattern from a list. I was trying:
String.Find(x) #failed.
I tried to find similar posts but couldn't.

if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif len([1 for x in List if x in String]) == 2
...
This is called a list comprehension, it will go through the list and find all of the list elements that have a substring in common with the string at hand, then return the length of that.
Note that there may be some issues if you have a name like "Ann" and "Anna", the name "Anna" in the string will get counted twice. If you need a solution that accounts for that, I would suggest splitting on capital letters to explicitly separate the list into separate names by splitting on capital letters (If you want I can update this solution to show how to do that with regex)

I think the most straightforward approach would be to loop over the list of names and for each of them check if its in your string.
for name in List:
if name in String:
print("do something here")

So, you want to find whether some string contains any members of the given list.
Iterate over the list and check whether the string contains the current item:
for data in List:
if data in String:
print("Found it!")

Related

Extract data from a list Python

I have a list of string and I want to take the last "word" of it, explanation :
Here's my code :
myList = ["code 53 value 281", "code 53 value 25", ....]
And I want to take only the number at the end :
myList = ["281", "25", ....]
Thank you.
Let's break down your problem.
So first off, you've got a list of strings. You know that each string will end with some kind of numeric value, you want to pull that out and store it in the list. Basically, you want to get rid of everything except for that last numeric value.
To write it in code terms, we need to iterate on that list, split each string by a space character ' ', then grab the last word from that collection, and store it in the list.
There are quite a few ways you could do this, but the simplest would be list comprehension.
myList = ["Hey 123", "Hello 456", "Bye 789"] # we want 123, 456, 789
myNumericList = [x.split(' ')[-1] for x in myList]
# for x in myList is pretty obvious, looks like a normal for loop
# x.split(' ') will split the string by the space, as an example, "Hey 123" would become ["Hey", "123"]
# [-1] gets the last element from the collection
print(myNumericList) # "123", "456", "789"
I don't know why you would want to check if there are integers in your text, extract them and then convert them back to a string and add to a list. Anyhow, you can use .split() to split the text on spaces and then try to interpret the splitted strings as integers, like so:
myList = ["code 53 value 281", "code 53 value 25"]
list = []
for var in myList:
list.append(var.split()[-1])
print(list)
Loop through the list and for a particular value at i-th index in the list simply pick the last value.
See code section below:
ans=[]
for i in myList:
ans.append(i.split(" ")[-1])
print(ans)

Removing item in list during loop

I have the code below. I'm trying to remove two strings from lists predict strings and test strings if one of them has been found in the other. The issue is that I have to split up each of them and check if there is a "portion" of one string inside the other. If there is then I just say there is a match and then delete both strings from the list so they are no longer iterated over.
ValueError: list.remove(x): x not in list
I get the above error though and I am assuming this is because I can't delete the string from test_strings since it is being iterated over? Is there a way around this?
Thanks
for test_string in test_strings[:]:
for predict_string in predict_strings[:]:
split_string = predict_string.split('/')
for string in split_string:
if (split_string in test_string):
no_matches = no_matches + 1
# Found match so remove both
test_strings.remove(test_string)
predict_strings.remove(predict_string)
Example input:
test_strings = ['hello/there', 'what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings =['hello/there/mister', 'interesting/what/that/is']
so I want there to be a match between hello/there and hello/there/mister and for them to be removed from the list when doing the next comparison.
After one iteration I expect it to be:
test_strings == ['what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings == ['interesting/what/that/is']
After the second iteration I expect it to be:
test_strings == ['yo/do/di/doodle', 'ding/dong/darn']
predict_strings == []
You should never try to modify an iterable while you're iterating over it, which is still effectively what you're trying to do. Make a set to keep track of your matches, then remove those elements at the end.
Also, your line for string in split_string: isn't really doing anything. You're not using the variable string. Either remove that loop, or change your code so that you're using string.
You can use augmented assignment to increase the value of no_matches.
no_matches = 0
found_in_test = set()
found_in_predict = set()
for test_string in test_strings:
test_set = set(test_string.split("/"))
for predict_string in predict_strings:
split_strings = set(predict_string.split("/"))
if not split_strings.isdisjoint(test_set):
no_matches += 1
found_in_test.add(test_string)
found_in_predict.add(predict_string)
for element in found_in_test:
test_strings.remove(element)
for element in found_in_predict:
predict_strings.remove(element)
From your code it seems likely that two split_strings match the same test_string. The first time through the loop removes test_string, the second time tries to do so but can't, since it's already removed!
You can try breaking out of the inner for loop if it finds a match, or use any instead.
for test_string, predict_string in itertools.product(test_strings[:], predict_strings[:]):
if any(s in test_string for s in predict_string.split('/')):
no_matches += 1 # isn't this counter-intuitive?
test_strings.remove(test_string)
predict_strings.remove(predict_string)

Python - If Value in List (or similair)

I need to use or create a comparison function in Python, perhaps it already exists a way of doing this?
I need to compare a string with a value in a list, and I need to make a match even if it's a couple of characters off. I'll make an example so you can see what I mean.
Example 1:
Value in list : Name: This is the title
Value in search : Name This is the title
Example 2:
Value in list : Name and shortening m.m.
Value in search : Name and shortening m.m
As you can see the values I want to compare and need to match are very similar. The values in the search are folder names so they are a bit different because they contain illegal characters.
Maybe the easiest way to accomplish this is to remove none allowed characters from the string before making the comparison \/:*?"<>| and any trailing points.
Any tips on what's the most efficient way of comparing the strings and get a match is?
Edit: Is this an ugly way of doing it?
def Cleanup(Str):
Illegal = ['\\','/',':','*','?','"','<','>','|']
return ''.join([char for char in Str if char not in Illegal]).rstrip('.')
I'm sure there's a better way to do this, but here's my crack at it
import string
a = "Name: This is the title"
b = "Name This is the title"
# remove punctuation and make all lower-case
def strippedAndLowered(myString):
return "".join(i for i in myString if i not in string.punctuation).lower()
strippedAndLowered(a) == strippedAndLowered(b) # returns True
Use the following code to strip the punctuation, and then compare them:
def search(lst, item):
for i in lst:
i = ''.join([char for char in i if char.isalpha() or char == ' '])
if item == i:
return True
return False
The translate function should be faster:
item = "Name: This is the title"
search = "Name This is the title"
illegal = r'\/:*?"<>|'
def compare(s1, s2):
return s1.translate(None, illegal) == s2.translate(None, illegal)
print compare(search, item)
Gives:
True
And if you are really worried about the performance, and have many comparisons, you can cache the translated versions in a dictionary.

Python: find out if an element in a list has a specific string

I am looking for a specific string in a list; this string is part of a longer string.
Basically i loop trough a text file and add each string in a different element of a list. Now my objective is to scan the whole list to find out if any of the elements string contain a specific string.
example of the source file:
asfasdasdasd
asdasdasdasdasd mystring asdasdasdasd
asdasdasdasdasdasdadasdasdasdas
Now imagine that each of the 3 string is in an element of the list; and you want to know if the list has the string "my string" in any of it's elements (i don't need to know where is it, or how many occurrence of the string are in the list). I tried to get it with this, but it seems to not find any occurrence
work_list=["asfasdasdasd", "asdasdasdasd my string asdasdasdasd", "asdadadasdasdasdas"]
has_string=False
for item in work_list:
if "mystring" in work_list:
has_string=True
print "***Has string TRUE*****"
print " \n".join(work_list)
The output will be just the list, and the bool has_string stays False
Am I missing something or am using the in statement in the wrong way?
You want it to be:
if "mystring" in item:
A concise (and usually faster) way to do this:
if any("my string" in item for item in work_list):
has_string = True
print "found mystring"
But really what you've done is implement grep.
Method 1
[s for s in stringList if ("my string" in s)]
# --> ["blah my string blah", "my string", ...]
This will yield a list of all the strings which contain "my string".
Method 2
If you just want to check if it exists somewhere, you can be faster by doing:
any(("my string" in s) for s in stringList)
# --> True|False
This has the benefit of terminating the search on the first occurrence of "my string".
Method 3
You will want to put this in a function, preferably a lazy generator:
def search(stringList, query):
for s in stringList:
if query in s:
yield s
list( search(["an apple", "a banana", "a cat"], "a ") )
# --> ["a banana", "a cat"]

How to check if an element of a list contains some substring

The below code does not work as intended and looks like optimising to search in the complete list instead of each element separately and always returning true.
Intended code is to search the substring in each element of the list only in each iteration and return true or false. But it's actually looking into complete list.
In the below code the print statement is printing complete list inside <<>> if I use find() or in operator but prints only one word if I use == operator.
The issue code:
def myfunc(mylist):
for i in range(len(mylist)):
count = 0
for word in mylist:
print('<<{}>>'.format(word))
if str(word).casefold().find('abc') or 'def' in str(word).casefold():
count += 1
abcdefwordlist.append(str(word))
break
This code search for 'abc' or 'def' in mylist insted of the word.
If I use str(word).casefold() == 'abc' or str(word).casefold() == 'def' then it compares with word only.
How can I check word contains either of 'abc' or 'def' in such a loop.
You have several problems here.
abcdefwordlist is not defined (at least not in the code you showed us).
You're looping over the length of the list and then over the list of word itself, which means that too many elements will be added to your resulting array.
This function doesn't return anything, unless you meant for it to just update abcdefwordlist from outside of it.
You had the right idea with 'def' in str(word) but you have to use it in for both substrings. To sum up, a function that does what you want would look like this:
def myfunc(mylist):
abcdefwordlist = [] # unless it already exists elsewhere
for word in mylist:
if 'abc' in str(word).lower() or 'def' in str(word).lower():
abcdefwordlist.append(word)
return abcdefwordlist
This can also be sortened to a one-liner using list comprehension:
def myfunc(mylist):
return [word for word in mylist if 'abc' in str(word).lower() or 'def' in str(word).lower()]
BTW I used lower() instead of casefold() because the substrings I'm searching for are definetly lowercase

Categories