Split in python with character special

Split in python with character special - python

I split within a string traversing an array with values, this split must contain the following rule:
Split the string into two parts when there is a special character, and select the first part as a result;
SCRIPT
array = [
'srv1 #s',
'srv2;192.168.9.1'
]
result = []
for x in array:
outfinally = [line.split(';')[0] and line.split()[0] for line in x.splitlines() if line and line[0].isalpha()]
for srv in outfinally:
if srv != None:
result.append(srv)
for i in result:
print(i)
OUTPUT
srv1
srv2;192.168.9.1
DESIRED OUTPUT
srv1
srv2

This should split on any special charters and append the first part of the split to a new list:
array = [
'srv1 #s',
'srv2;192.168.9.1'
]
sep = (r'[`\-=~!##$%^&*()_+\[\]{};\'\\:"|<,./<>?]')
rest = text.split(sep, 1)[0]
new_array =[]
for i in array:
new_array.append(re.split(sep,i)[0])
Output:
['srv1 ', 'srv2']

You can split twice with the two different separators instead:
result = [s.split()[0].split(';')[0] for s in array]
result becomes:
['srv1', 'srv2']

The problem is here: line.split(';')[0] and line.split()[0]
Your second condition splits on whitespace. As a result, it'll always return the whitespace-split version unless there's a semicolon at the start of the input (in which case you get empty string).
You probably want to chain the two splits instead:
line.split(';')[0].split()[0]
To see what the code in your question is doing, take a look at what your conditional expression does in a few different cases:
array = ['srv1 s', 'srv2;192.168.9.1', ';192.168.1.1', 'srv1;srv2 192.168.1.1']
>>> for item in array:
... print("Original: {}\n\tSplit: {}".format(item, item.split(';')[0] and item.split()[0]))
...
Original: srv1 s
Split: srv1 # split on whitespace
Original: srv2;192.168.9.1
Split: srv2;192.168.9.1 # split on whitespace!
Original: ;192.168.1.1
Split: # split on special char, returned empty which is falsey, returns empty str
Original: srv1;srv2 192.168.1.1
Split: srv1;srv2 # split only on whitespace

Change
outfinally = [line.split(';')[0] and line.split()[0] for line in x.splitlines() if line and line[0].isalpha()]
To
outfinally = [line.replace(';', ' ').split()[0] for line in x.splitlines() if line and line[0].isalpha()]
When you use and like that, it will always return the first result as long as the first result is truthy. The split function returns the full string in a list when a match is not found. Since it's returning something truthy, you'll never move on to the second condition (and if you use or like I first tried to do, you'll always move on to the second condition). Instead of having 2 conditions, what you'll have to do is combine them into one. Something like line.replace(';', ' ').split()[0] or blhsing's solution is even better.

Related

Python string split and do not use middle part

I am reading a file in my Python script which looks like this:
#im a useless comment
this is important
I wrote a script to read and split the "this is important" part and ignore the comment lines that start with #.
I only need the first and the last word (In my case "this" and "important").
Is there a way to tell Python that I don't need certain parts of a split?
In my example I have what I want and it works.
However if the string is longer and I have like 10 unused variables, I gues it is not like programmers would do it.
Here is my code:
#!/usr/bin/python3
import re
filehandle = open("file")
for line in file:
if re.search("#",line):
break;
else:
a,b,c = line.split(" ")
print(a)
print(b)
filehandle.close()

Another possibility would be:
a, *_, b = line.split()
print(a, b)
# <a> <b>
If I recall correctly, *_ is not backwards compatible, meaning you require Python 3.5/6 or above (would really have to look into the changelogs here).

On line 8, use the following instead of
a,b,c = line.split(" ")
use:
splitLines = line.split(" ")
a, b, c = splitLines[0], splitLines[1:-1], splitLines[-1]
Negative indexing in python, parses from the last. More info

I think python negative indexing can solve your problem
import re
filehandle = open("file")
for line in file:
if re.search("#",line):
break;
else:
split_word = line.split()
print(split_word[0]) #First Word
print(split_word[-1]) #Last Word
filehandle.close()
Read more about Python Negative Index

You can save the result to a list, and get the first and last elements:
res = line.split(" ")
# res[0] and res[-1]
If you want to print each 3rd element, you can use:
res[::3]
Otherwise, if you don't have a specific pattern, you'll need to manually extract elements by their index.
See the split documentation for more details.

If I've understood your question, you can try this:
s = "this is a very very very veeeery foo bar bazzed looong string"
splitted = s.split() # splitted is a list
splitted[0] # first element
splitted[-1] # last element
str.split() returns a list of the words in the string, using sep as the delimiter string. ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
In that way you can get the first and the last words of your string.

For multiline text (with re.search() function):
import re
with open('yourfile.txt', 'r') as f:
result = re.search(r'^(\w+).+?(\w+)$', f.read(), re.M)
a,b = result.group(1), result.group(2)
print(a,b)
The output:
this important

How to pass a multiple elements of the list to a re.split() function ?

f = open('sentences.txt')
lines = [line.lower() for line in f]
print lines[0:5]
words = re.split("\s+", lines[0:5])
with "print" it works perfectly well, but when I try to do the same inside of re.split(), I get an error "TypeError: expected string or buffer"

I think you're searching for join, i.e.:
words = "".join(lines[0:5]).split()
Note:
No need for re module, split() is enough.

Why not just:
words = re.split("\s+", ''.join(lines))
The split function expects a string, which is then split into substrings based on the regex and returned as a list. Passing a list would not make a whole lot of sense. If you're expecting it to take your list of strings and split each string element individually and then return a list of lists of strings, you'll have to do that yourself:
lines_split = []
for line in lines:
lines_split.append(re.split("\s+", line))

As you see, you are getting a TypeError in your function call, which means that you are passing the wrong parameter from what the function is expecting. So you need to think about what you are passing.
If you have a debugger or IDE you can step through and see what type your parameter has, or even use type to print it, via
print(type(lines[0:5]))
which returns
<class 'list'>
so you need to transform that into a String. Each element in your list is a String, so think of a way to get each row out of the list. An example would be
words = [re.split('\s+', line) for line in lines]
where I am using a list comprehension to step through lines and process each row individually.

Your re.split('\s+', line) is the equivalent of line.split() so you can write
words = [line.split() for line in lines]
See the documentation for str.split.

Removing item in list during loop

I have the code below. I'm trying to remove two strings from lists predict strings and test strings if one of them has been found in the other. The issue is that I have to split up each of them and check if there is a "portion" of one string inside the other. If there is then I just say there is a match and then delete both strings from the list so they are no longer iterated over.
ValueError: list.remove(x): x not in list
I get the above error though and I am assuming this is because I can't delete the string from test_strings since it is being iterated over? Is there a way around this?
Thanks
for test_string in test_strings[:]:
for predict_string in predict_strings[:]:
split_string = predict_string.split('/')
for string in split_string:
if (split_string in test_string):
no_matches = no_matches + 1
# Found match so remove both
test_strings.remove(test_string)
predict_strings.remove(predict_string)
Example input:
test_strings = ['hello/there', 'what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings =['hello/there/mister', 'interesting/what/that/is']
so I want there to be a match between hello/there and hello/there/mister and for them to be removed from the list when doing the next comparison.
After one iteration I expect it to be:
test_strings == ['what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings == ['interesting/what/that/is']
After the second iteration I expect it to be:
test_strings == ['yo/do/di/doodle', 'ding/dong/darn']
predict_strings == []

You should never try to modify an iterable while you're iterating over it, which is still effectively what you're trying to do. Make a set to keep track of your matches, then remove those elements at the end.
Also, your line for string in split_string: isn't really doing anything. You're not using the variable string. Either remove that loop, or change your code so that you're using string.
You can use augmented assignment to increase the value of no_matches.
no_matches = 0
found_in_test = set()
found_in_predict = set()
for test_string in test_strings:
test_set = set(test_string.split("/"))
for predict_string in predict_strings:
split_strings = set(predict_string.split("/"))
if not split_strings.isdisjoint(test_set):
no_matches += 1
found_in_test.add(test_string)
found_in_predict.add(predict_string)
for element in found_in_test:
test_strings.remove(element)
for element in found_in_predict:
predict_strings.remove(element)

From your code it seems likely that two split_strings match the same test_string. The first time through the loop removes test_string, the second time tries to do so but can't, since it's already removed!
You can try breaking out of the inner for loop if it finds a match, or use any instead.
for test_string, predict_string in itertools.product(test_strings[:], predict_strings[:]):
if any(s in test_string for s in predict_string.split('/')):
no_matches += 1 # isn't this counter-intuitive?
test_strings.remove(test_string)
predict_strings.remove(predict_string)

How to Check if the substring is matching in a list of strings in Python

I have a list and I want to find if the string is present in the list of strings.
li = ['Convenience','Telecom Pharmacy']
txt = '1 convenience store'
I want to match the txt with the Convenience from the list.
I have tried
if any(txt.lower() in s.lower() for s in li):
print s
print [s for s in li if txt in s]
Both the methods didn't give the output.
How to match the substring with the list?

You could use set() and intersection:
In [19]: set.intersection(set(txt.lower().split()), set(s.lower() for s in list1))
Out[19]: {'convenience'}

I think split is your answer. Here is the description from the python documentation:
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the optional second argument sep is absent or None, the words are separated by arbitrary
strings of whitespace characters (space, tab, newline, return,
formfeed). If the second argument sep is present and not None, it
specifies a string to be used as the word separator. The returned list
will then have one more item than the number of non-overlapping
occurrences of the separator in the string. If maxsplit is given, at
most maxsplit number of splits occur, and the remainder of the string
is returned as the final element of the list (thus, the list will have
at most maxsplit+1 elements). If maxsplit is not specified or -1, then
there is no limit on the number of splits (all possible splits are
made).
The behavior of split on an empty string depends on the value of sep. If sep is not specified, or specified as None, the result will be
an empty list. If sep is specified as any string, the result will be a
list containing one element which is an empty string.
Use the split command on your txt variable. It will give you a list back. You can then do a compare on the two lists to find any matches. I personally would write the nested for loops to check the lists manually, but python provides lots of tools for the job. The following link discusses different approaches to matching two lists.
How can I compare two lists in python and return matches
Enjoy. :-)

I see two things.
Do you want to find if the pattern string matches EXACTLY an item in the list? In this case, nothing simpler:
if txt in list1:
#do something
You can also do txt.upper() or .lower() if you want list case insensitive
But If you want as I understand, to find if there is a string (in the list) which is part of txt, you have to use "for" loop:
def find(list1, txt):
#return item if found, false otherwise
for i in list1:
if i.upper() in txt.upper(): return i
return False
It should work.
Console output:
>>>print(find(['Convenience','Telecom Pharmacy'], '1 convenience store'))
Convenience
>>>

You can try this,
>> list1 = ['Convenience','Telecom Pharmacy']
>> txt = '1 convenience store'
>> filter(lambda x: txt.lower().find(x.lower()) >= 0, list1)
['Convenience']
# Or you can use this as well
>> filter(lambda x: x.lower() in txt.lower(), list1)
['Convenience']

Removing empty strings from a list in python

I need to split a string. I am using this:
def ParseStringFile(string):
p = re.compile('\W+')
result = p.split(string)
But I have an error: my result has two empty strings (''), one before 'Лев'. How do I get rid of them?

As nhahtdh pointed out, the empty string is expected since there's a \n at the start and end of the string, but if they bother you, you can filter them very quickly and efficiently.
>>> filter(None, ['', 'text', 'more text', ''])
['text', 'more text']

You could remove all newlines from the string before matching it:
p.split(string.strip('\n'))
Alternatively, split the string and then remove the first and last element:
result = p.split(string)[1:-1]
The [1:-1] takes a copy of the result and includes all indexes starting at 1 (i.e. removing the first element), and ending at -2 (i.e. the second to last element. The second index is exclusive)
A longer and less elegant alternative would be to modify the list in-place:
result = p.split(string)
del result[-1] # remove last element
del result[0] # remove first element
Note that in these two solutions the first and last element must be the empty string. If sometimes the input doesn't contain these empty strings at the beginning or end, then they will misbehave. However they are also the fastest solutions.
If you want to remove all empty strings in the result, even if they happen inside the list of results you can use a list-comprehension:
[word for word in p.split(string) if word]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.