Unexpected behavior with string.split()

Unexpected behavior with string.split() - python

Say I have a string, string = 'a'
I do string.split() and I get ['a']
I don't want this, I only want a list when I have whitespace in my string, ala string = 'a b c d'
So far, I've tried all the following with no luck:
>>> a = 'a'
>>> a.split()
['a']
>>> a = 'a b'
>>> a.split(' ')
['a', 'b']
>>> a = 'a'
>>> a.split(' ')
['a']
>>> import re
>>> re.findall(r'\S+', a)
['a']
>>> re.findall(r'\S', a)
['a']
>>> re.findall(r'\S+', a)
['a', 'b']
>>> re.split(r'\s+', a)
['a', 'b']
>>> a = 'a'
>>> re.split(r'\s+', a)
['a']
>>> a.split(" ")
['a']
>>> a = "a"
>>> a.split(" ")
['a']
>>> a.strip().split(" ")
['a']
>>> a = "a".strip()
>>> a.split(" ")
['a']
Am I crazy? I see no whitespace in the string "a".
>>> r"[^\S\n\t]+"
'[^\\S\\n\\t]+'
>>> print(re.findall(r'[^\S\n\t]+',a))
[]
What up?
EDIT
FWIW, this is how I got what I needed:
# test for linked array
if typename == 'org.apache.ctakes.typesystem.type.textsem.ProcedureMention':
for f in AnnotationType.all_features:
if 'Array' in f.rangeTypeName:
if attributes.get(f.name) and typesystem.get_type(f.elementType):
print([ int(i) for i in attributes[f.name].split() ])
and that is the end...

Split will always return a list, try this.
def split_it(s):
if len(s.split()) > 1:
return s.split()
else:
return s

The behavior of split makes sense, it always returns a list. Why not just check if the list length is 1?
def weird_split(a):
words = a.split()
if len(words) == 1:
return words[0]
return words

You could use the conditional expression to check for the presence of space, and use split only if a space is detected:
str1 = 'abc'
split_str1 = str1 if (' ' not in str1) else str1.split(' ')
print (split_str1)
str1 = 'ab c'
split_str1 = str1 if (' ' not in str1) else str1.split(' ')
print (split_str1)
This would give the output:
abc
['ab', 'c']

Related

Remove words containing vowels

I am looking output string having vowels removed.
Input: My name is 123
Output: my 123
I tried below code:
def without_vowels(sentence):
vowels = 'aeiou'
word = sentence.split()
for l in word:
for k in l:
if k in vowels:
l = ''
without_vowels('my name 123')
Can anyone give me result using list compression ?

You can use regex with search chars with 'a|e|i|o|u' with .lower() for words if have upper char like below:
>>> import re
>>> st = 'My nAmE Is 123 MUe'
>>> [s for s in st.split() if not re.search(r'a|e|i|o|u',s.lower())]
['My', '123']
>>> ' '.join(s for s in st.split() if not re.search(r'a|e|i|o|u',s.lower()))
'My 123'

This is one way to do it
def without_vowels(sentence):
words = sentence.split()
vowels = ['a', 'e', 'i', 'o', 'u']
cleaned_words = [w for w in words if not any(v in w for v in vowels)]
cleaned_string = ' '.join(cleaned_words)
print(cleaned_string)
Outputs my 123

def rem_vowel(string):
vowels = ['a','e','i','o','u']
result = [letter for letter in string if letter.lower() not in vowels]
result = ''.join(result)
print(result)
string = "My name is 123"
rem_vowel(string)

import re
def rem_vowel(string):
return (re.sub("[aeiouAEIOU]","",string))
Driver program
string = " I am uma Bhargav "
print rem_vowel(string)

Split string into pair

What would be the best/easiest way to split a string into pair of word ?
Ex:
string = "This is a string"
Output:
["This is", "is a", "a string"]

>>> import itertools
>>> a, b = itertools.tee('this is a string'.split());
>>> next(b, None)
>>> [' '.join(words) for words in zip(a, b)]
['this is', 'is a', 'a string']

string_list = string.split()
result = [f'{string_list[i] string_list[i+1]}' for i in range(len(string_list) - 1)]

words = str.split()
output = []
i = 1
while i < len(words):
cur_word = words[i]
prev_word = words[i - 1]
output.append(f"{prev_word} {cur_word}")
i += 1

You can use the zip function, with its first argument the whole list of words and its second the list of words without the first word. Since zip aggregates elements from each of the iterables, this will connect each word with its next in the list:
string = "This is a string"
zipped_lst = zip(string.split(), string.split()[1:])
print(list(zipped_lst))
This outputs
[('This', 'is'), ('is', 'a'), ('a', 'string')]

result = [ch for ch in string if (ch != ' ')]; result = str(result); print(result); prints list, not string

I need to convert 'a b c' to 'abc' so made this code:
string = 'a b c'
result = [ch for ch in string if (ch != ' ')]
print(type(result))
result = str(result)
print(type(result))
print(result)
result of this code is expected as:
<class 'list'>
<class 'str'>
'abc'
but result is as:
<class 'list'>
<class 'str'>
['a', 'b', 'c']
why result is printed list? this makes faults in other part of my code.

str() on a list does not magically turn said list into a no delimiter string. When you do
str(['a', 'b', 'c']) - what you actually get is '['a', 'b', 'c']', which is indeed a string.
If you'd like the result to be 'abc' please, use .join.
''.join(['a', 'b', 'c'])
Output-
'abc'

Here result = [ch for ch in string if (ch != ' ')] you eliminate spaces and save the chars into list. But to have string again, you need to join them:
string = 'a b c'
result = [ch for ch in string if (ch != ' ')]
s = "".join(result)
print(s) # abc
Here is the related doc page.

you can try this :
string = 'a b c'
print(string.replace(' ' ,''))
Output :
abc

you are getting a list cause you are using list comprehension.
here I have split the string into spaces using the string method split
which returns a list
I have printed the type of the list and type of element in the list
and joined them back using the string method join
string = 'a b c'
string = string.split(' ')
print(type(string))
print(type(string[0]))
print("".join(string))

The line of code
result = [ch for ch in string if (ch != ' ')]
is an example of a list comprehension in Python.
The for loop within this list takes each value from the string, one at a time, and places it into the list - this is why the list ['a', 'b', 'c'] appears in your code.
To obtain a string without spaces from a string with spaces:
string = 'a b c'
# obtain characters from the string that are not spaces
result = [ch for ch in string if (ch != ' ')]
# put the string back together without spaces
string_no_spaces = ''.join(result)
>>>'abc'

What does str(['a', 'b', 'c']) actually do is, str calls up the __str__ method of the given object (which is list now) to its constructor which makes: '['a', 'b', 'c']' string.
why result is printed list? this makes faults in other part of my code.
What print does before dumping is str('['a', 'b', 'c']'), which effectively does nothing here, printing it just strips out ' making you think that it's a list.

How do I compare two strings in python if order does not matter?

I have two strings like
string1="abc def ghi"
and
string2="def ghi abc"
How to get that this two string are same without breaking the words?

Seems question is not about strings equality, but of sets equality. You can compare them this way only by splitting strings and converting them to sets:
s1 = 'abc def ghi'
s2 = 'def ghi abc'
set1 = set(s1.split(' '))
set2 = set(s2.split(' '))
print set1 == set2
Result will be
True

If you want to know if both the strings are equal, you can simply do
print string1 == string2
But if you want to know if they both have the same set of characters and they occur same number of times, you can use collections.Counter, like this
>>> string1, string2 = "abc def ghi", "def ghi abc"
>>> from collections import Counter
>>> Counter(string1) == Counter(string2)
True

>>> s1="abc def ghi"
>>> s2="def ghi abc"
>>> s1 == s2 # For string comparison
False
>>> sorted(list(s1)) == sorted(list(s2)) # For comparing if they have same characters.
True
>>> sorted(list(s1))
[' ', ' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> sorted(list(s2))
[' ', ' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

For that, you can use default difflib in python
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
then call similar() as
similar(string1, string2)
it will return compare as ,ratio >= threshold to get match result

Equality in direct comparing:
string1 = "sample"
string2 = "sample"
if string1 == string2 :
print("Strings are equal with text : ", string1," & " ,string2)
else :
print ("Strings are not equal")
Equality in character sets:
string1 = 'abc def ghi'
string2 = 'def ghi abc'
set1 = set(string1.split(' '))
set2 = set(string2.split(' '))
print set1 == set2
if string1 == string2 :
print("Strings are equal with text : ", string1," & " ,string2)
else :
print ("Strings are not equal")

Something like this:
if string1 == string2:
print 'they are the same'
update: if you want to see if each sub-string may exist in the other:
elem1 = [x for x in string1.split()]
elem2 = [x for x in string2.split()]
for item in elem1:
if item in elem2:
print item

If you just need to check if the two strings are exactly same,
text1 = 'apple'
text2 = 'apple'
text1 == text2
The result will be
True
If you need the matching percentage,
import difflib
text1 = 'Since 1958.'
text2 = 'Since 1958'
output = str(int(difflib.SequenceMatcher(None, text1, text2).ratio()*100))
Matching percentage output will be,
'95'

I am going to provide several solutions and you can choose the one that meets your needs:
1) If you are concerned with just the characters, i.e, same characters and having equal frequencies of each in both the strings, then use:
''.join(sorted(string1)).strip() == ''.join(sorted(string2)).strip()
2) If you are also concerned with the number of spaces (white space characters) in both strings, then simply use the following snippet:
sorted(string1) == sorted(string2)
3) If you are considering words but not their ordering and checking if both the strings have equal frequencies of words, regardless of their order/occurrence, then can use:
sorted(string1.split()) == sorted(string2.split())
4) Extending the above, if you are not concerned with the frequency count, but just need to make sure that both the strings contain the same set of words, then you can use the following:
set(string1.split()) == set(string2.split())

I think difflib is a good library to do this job
>>>import difflib
>>> diff = difflib.Differ()
>>> a='he is going home'
>>> b='he is goes home'
>>> list(diff.compare(a,b))
[' h', ' e', ' ', ' i', ' s', ' ', ' g', ' o', '+ e', '+ s', '- i', '- n', '- g', ' ', ' h', ' o', ' m', ' e']
>>> list(diff.compare(a.split(),b.split()))
[' he', ' is', '- going', '+ goes', ' home']

open both of the files
then compare them by splitting its word contents;
log_file_A='file_A.txt'
log_file_B='file_B.txt'
read_A=open(log_file_A,'r')
read_A=read_A.read()
print read_A
read_B=open(log_file_B,'r')
read_B=read_B.read()
print read_B
File_A_set = set(read_A.split(' '))
File_A_set = set(read_B.split(' '))
print File_A_set == File_B_set

If you want a really simple answer:
s_1 = "abc def ghi"
s_2 = "def ghi abc"
flag = 0
for i in s_1:
if i not in s_2:
flag = 1
if flag == 0:
print("a == b")
else:
print("a != b")

This is a pretty basic example, but after the logical comparisons (==) or string1.lower() == string2.lower(), maybe can be useful to try some of the basic metrics of distances between two strings.
You can find examples everywhere related to these or some other metrics, try also the fuzzywuzzy package (https://github.com/seatgeek/fuzzywuzzy).
import Levenshtein
import difflib
print(Levenshtein.ratio('String1', 'String2'))
print(difflib.SequenceMatcher(None, 'String1', 'String2').ratio())

Try to covert both strings to upper or lower case. Then you can use == comparison operator.

You can use simple loops to check two strings are equal. .But ideally you can use something like return s1==s2
s1 = 'hello'
s2 = 'hello'
a = []
for ele in s1:
a.append(ele)
for i in range(len(s2)):
if a[i]==s2[i]:
a.pop()
if len(a)>0:
return False
else:
return True

Split line in Python 2.7

I want to split line with Python W03*17*65.68*KG*0.2891*CR*1*1N and then capture
Value qty as 17
Value kg as 65,68
Tried with split
myarray = Split(strSearchString, "*")
a = myarray(0)
b = myarray(1)
Thanks for your help

split is a method of the string itself, and you can access elements of a list with [42], not the method call (42)doc. Try:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
lst = s.split('*')
qty = lst[1]
weight = lst[2]
weight_unit = lst[3]
You may also be interested in tuple unpacking:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
_,qty,weight,weight_unit,_,_,_,_ = s.split('*')
You can even use a slice:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
qty,weight,weight_unit = s.split('*')[1:4]

>>> s = "W03*17*65.68*KG*0.2891*CR*1*1N"
>>> lst = s.split("*")
>>> lst[1]
'17'
>>> lst[2]
'65.68'

You need to invoke split method on a certain string to split it. Just using Split(my_str, "x") won't work: -
>>> my_str = "Python W03*17*65.68*KG*0.2891*CR*1*1N"
>>> tokens = my_str.split('*')
>>> tokens
['Python W03', '17', '65.68', 'KG', '0.2891', 'CR', '1', '1N']
>>> tokens[1]
'17'
>>> tokens[2]
'65.68'

import string
myarray = string.split(strSearchString, "*")
qty = myarray[1]
kb = myarray[2]

If you'd like to capture Value qty as 17 Value kg as 65.68,
one way to solve it is using dictionary after splitting strings.
>>> s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
>>> s.split('*')
['W03', '17', '65.68', 'KG', '0.2891', 'CR', '1', '1N']
>>> t = s.split('*')
>>> dict(qty=t[1],kg=t[2])
{'kg': '65.68', 'qty': '17'}
Hope it helps.

>>>s ="W03*17*65.68*KG*0.2891*CR*1*1N"
>>>my_string=s.split("*")[1]
>>> my_string
'17'
>>> my_string=s.split("*")[2]
>>> my_string
'65'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unexpected behavior with string.split() - python

Split will always return a list, try this. def split_it(s): if len(s.split()) > 1: return s.split() else: return s

The behavior of split makes sense, it always returns a list. Why not just check if the list length is 1? def weird_split(a): words = a.split() if len(words) == 1: return words[0] return words

Related

Remove words containing vowels

Split string into pair

result = [ch for ch in string if (ch != ' ')]; result = str(result); print(result); prints list, not string

How do I compare two strings in python if order does not matter?

Split line in Python 2.7

Categories

Resources