how does str.split() divide a string?

how does str.split() divide a string? - python

line = (' 1.')
print(line.split(), len(line.split()))
This gives
['1.'] 1
But if I do
for value in line.split():
val = value
print(val, len(val))
I get
1. 2
Inspecting val gives me
val[0]
'1'
val[1]
'.'
I'm confused as to why ".split()" is dividing from 1 index to 2 in the second example?

split() divides a string using spaces as separator (more than one spaces together are just un separator for this method).
see: http://docs.python.org/2.7/library/stdtypes.html#str.split
You can also give the separator you want as a parameter, for example mystr.split(",") will use the comma as separator to split mystr
There is also a second parameter that tell the method how many splits do you want to perform.
So:
mystr = "1 - 2 - 3 - 4"
print(mystr.split()) # split using spaces
print(mystr.split("-")) # split using "-"
print(mystr.split("-",2)) # split using "-" with 2 splits maximum
will produce the following output:
['1', '-', '2', '-', '3', '-', '4']
['1 ', ' 2 ', ' 3 ', ' 4']
['1 ', ' 2 ', ' 3 - 4']

When you do line.split() it returns a list with one element ['1.'].
When you iterate over the line.split (for value in line.split()) the variable value is each element of the list resultant by the split ('1.' not ['1.']).
Then you run len in the string, wich has 2 elements (the "1" and the ".").

Related

How to keep the white space when splitting sentence to a list?

Below is the code that splits the sentence "s".
s = "1 a 3 bb b8"
b = s.split()
print(b)
The output from the above code is ['1', 'a', '3', 'bb', 'b8'].
The desired output is ['1', 'a', '3', 'bb', ' b8']. Be aware that there is only one white space in the last field.

The code is not the best and not very efficient but it works. It dived spaces as field separators and spaces as data that way that the latter is replaced with a special string (e.g. $KEEP_THAT_SPACE$). In the next step the string is split by the spaces working as field separators. Then all sepcial strings in all elements are re-replaced with blank.
#!/usr/bin/env python3
s = "1 a 3 bb b8"
# assume that there are only two-character-spaces
keep_placeholder = '$KEEP_THAT_SPACE$'
s = s.replace(' ', f' {keep_placeholder}')
b = s.split()
for index, element in enumerate(b): # <- iterat
while keep_placeholder in element:
element = element.replace(keep_placeholder, ' ')
b[index] = element
print(b)
The output is ['1', 'a', '3', 'bb', ' b8'] and please see that there is only one blank space in the beginning of the last field.
The code can easily adopted if you have fields with more then two blank spaces.

That is a tricky one which make it hard to do with generic function and thus require some custom code.
I took s = s = "1 a 3 bb b8" with 3 white spaces before b8 to make it more fun :)
So first thing you can do is specify clearly the limiter in your split :
s.split(' ')
Would give the following result: ['1', 'a', '3', 'bb', '', '', 'b8']
Now you have to interpret the '' as a ' ' needed to be added to the next not empty string. In the following for loop you will implement your "business rules" that put the white spaces in the expected place.
split_list = []
buffer = ''
for elt in temp_split:
if elt != "":
split_list.append(buffer + elt)
buffer = ''
else:
buffer += ' '
print(split_list)
And the result is: ['1', 'a', '3', 'bb', ' b8']

How to remove blank inverted comma from my list and tuple in python?

I am a new python developer. I will be glad if you help me with this. The problem is I made a list and tuple. A user has to type some comma-separated numbers.
So I was expecting this result :
List : ['3', '4', '2', '3', '5']
Tuple : ('3', '4', '2', '3', '5')
and also the number should show in ascending order. It worked by the below code but shows some extra inverted comma so How to remove those blank commas.
Type some comma-separated numbers: 4 4 5 6 2 3 1
List : [' ', ' ', ' ', ' ', ' ', ' ', '1', '2', '3', '4', '4', '5', '6']
Tuple : [' ', ' ', ' ', ' ', ' ', ' ', '1', '2', '3', '4', '4', '5', '6']
This is the code I am using to make the list and tuple in ascending order. Another thing is when I give the int() method in the input so that the user only can type the number, not any string. It shows an error so how to do that?
values = input("Type some comma separated numbers: ")
list = values.split()
tuple = tuple(list)
tuple= sorted(values, reverse = False)
list= sorted(values, reverse = False)
print('List : ', list)
print('Tuple : ', tuple)

Don't user list, tuple, set or any reserved keywords as variable names.
Use values.split(',') instead of values.split(), you must pass the separator that will be used for splitting your string.
For sorting a list sorted(<your_list>) or <your_list>.sort() can be used - If you are curious to know the difference between these two
Finally your code can be minimized to,
values = input("Type some comma separated numbers: ")
list_variable = sorted(values.split(','))
tuple_variable = tuple(list_variable)
print('List : ', list_variable)
print('Tuple : ', tuple_variable)

You can't use int() when calling input() because by separating the numbers with a space, you're creating a string with numbers that do not resemble to a base10 number, hence the ValueError you get.
For example:
values = '4 4 5 6 2 3 1'
int(values)
ValueError: invalid literal for int() with base 10: '4 4 5 6 2 3 1'
However:
values = '4456231'
int(values)
4456231
As mentioned by #Corralien, you can't use built-in reserved keywords like list and tuple (unless you delete their namespace first, but that would make things unnecessarily complex). It's also considered good syntax to don't separate the = sign for keywords with spaces. To get what you need, you could do:
values = input("Type some comma separated numbers: ")
l = sorted(values.split(), reverse=False)
t = sorted(tuple(l), reverse=False)
print('List : ', l)
print('Tuple : ', t)
But that wouldn't get you the values as integers, in which case the best would be doing list comprehension. We use it to iterate between the values of the list l, and convert them to int in the list n.
values = input("Type some comma separated numbers: ")
l = sorted(values.split(), reverse=False)
n = [int(x) for x in l]
t = sorted(tuple(n), reverse=False)
print('List : ', n)
print('Tuple : ', t)
However, you're asking the user to enter values that are comma separated, not spaces. Therefore, most users would enter 4,4,5,6,2,3,1 rather 4 4 5 6 2 3 1. In that case, you'd need to adapt your code to look more like this:
values = input("Type some comma separated numbers: ")
l = sorted(values.split(','), reverse=False)
n = [int(x) for x in l]
t = sorted(tuple(n), reverse=False)
print('List : ', n)
print('Tuple : ', t)

You can simply do this:
values = input("Type some comma separated numbers: ")
List=sorted(values.split(','))
Tuple=tuple(List)
print('List : ', List)
print('Tuple : ', Tuple)

You messed your variable names up: your code stores the input() into values.
Then you split values and store the splitted data it into list (do not use built in names as variables!).
Then you do tuple= sorted(values, reverse = False) which uses the unsplit (!) original string - not the splitted data you just created.
THAT is your mistake you need to fix.
Essentially you sort the unsplit string wich gives you a sorted list of characters of your inputs!
After that commit to one type of splitting - by spaces, by commas or whatever and stick to it.

Delete spaces in dictionary values python

I'm reading the data from an outsource. The data has "Name" and "Value with warnings" so I put those in a dictionary in a manner as
d[data[i:i+6]] = data[i+8:i+17], data[i+25:i+36]
Thus at the end I have my dict as;
{'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' '), .....
As seen above both keys and values might have unnecessary spaces.
I was able to overcome spaces in keys with;
d = {x.replace(' ',''): v
for x, v in d.items()}
but can't seem to manage similar for values. I tried using d.values() but it trims the key name and also works only for 1 of the values.
Can you help me understand how I can remove space for several values (2 values in this particular case) and end up with something like;
{'GPT-P': ('169', 'H'), 'GOT-P ': ('47', ''), .....
Thanks. Stay safe and healthy

You will need to do the space replacement in your v values also but
it seems that in your case the values in your dictionary are tuples.
I guess you will want to remove spaces in all elements of each tuple so you will need a second iteration here. You can do something like this:
d = {'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
{x.replace(' ', ''): tuple(w.replace(' ', '') for w in v) for x, v in d.items()}
Which returns:
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
Notice that there is list (or tuple) comprehension tuple(w.replace(' ', '') for w in v) within the dictionary comprehension.

Given:
DoT={'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
Since you have tuples of strings as your values, you need to apply .strip() to each string in the tuple:
>>> tuple(e.strip() for e in ('47 ', ' '))
('47', '')
Apply that to each key, value in a dict comprehension and there you are:
>>> {k.strip():tuple(e.strip() for e in t) for k,t in DoT.items()}
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
You use .replace(' ','') in your attempt. That will replace ALL spaces:
>>> ' 1 2 3 '.replace(' ','')
'123'
It is more typical to use one of the .strips():
>>> ' 1 2 3 '.strip()
'1 2 3'
>>> ' 1 2 3 '.lstrip()
'1 2 3 '
>>> ' 1 2 3 '.rstrip()
' 1 2 3'
You can use .replace or any of the .strips() in the comprehensions that I used above.

Sort list of strings by very special key [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I must implement sorting the list of strings in a way which is much similar to sorted function, but with one important distinction. As you know, the sorted function accounts space character prior digits character, so sorted(['1 ', ' 9']) will give us [' 9', '1 ']. I need sorted that accounts digit character prior space chars, so in our example the result will be ['1 ', ' 9'].
Update
As I understand, by default the sorted behaviour relies on the order of chars in ascii 'alphabet' (i.e. ''.join([chr(i) for i in range(59, 127)])), so I decided to implement my own ascii 'alphabet' in the my_ord function.
I planned to use this function in junction with simple my_sort function as a key for sorted,
def my_ord(c):
punctuation1 = ''.join([chr(i) for i in range(32, 48)])
other_stuff = ''.join([chr(i) for i in range(59, 127)])
my_alphabet = string.digits + punctuation1 + other_stuff
return my_alphabet.find(c)
def my_sort(w):
return sorted(w, key=my_ord)
like this: sorted([' 1 ', 'abc', ' zz zz', '9 '], key=my_sort).
What I'm expecting in this case, is ['9 ', ' 1 ', ' zz zz', 'abc']. Unfortunately, the result not only doesn't match the expected - moreover, it differs from time to time.

You can use lstrip as the key function to ignore the whitespace on the left, front of the string.
r = sorted(['1 ', ' 9' , ' 4', '2 '], key=str.lstrip)
# r == ['1 ', '2 ', ' 4', ' 9']
key specifies a function of one argument that is used to extract a comparison key from each list element, doc.

Try this
import string
MY_ALPHABET = (
string.digits
+ ''.join([chr(i) for i in range(32, 127) if chr(i) not in string.digits])
)
inp = [' 1 ', 'abc', ' zz zz', '9 ', 'a 1', 'a ']
print(inp, '-->', sorted(inp, key=lambda w: [MY_ALPHABET.index(c) for c in w]))

You want a combination of lexical and numerical sorting. You can do that by chopping up the string into a tuple and converting the digits to int. Now the tuple compare will consider each element by its own comparison rules.
I've used regex to split the string into (beginning text, white space, the digits, everything else) created an int and used that for the key. if the string didn't match the pattern, it just returns the original string in a tuple so that it can be used for comparison also.
I moved the whitespace before the digit (group(2)) after the digit but it may make more sense to leave it out of the comparison completely.
import re
test = ['1 ', ' 9']
wanted = ['1 ', ' 9']
def sort_key(val):
"""Return tuple of (text, int, spaces, remainder) or just
(text) suitable for sorting text lexagraphically but embedded
number numerically"""
m = re.match(r"(.*?)(\s*)(\d+)(.*)", val)
if m:
return (m.group(1), int(m.group(3)), m.group(2), m.group(4))
else:
return (val,)
result = sorted(test, key=sort_key)
print(test, '-->', result)
assert result == wanted, "results compare"

For completeness and maybe efficiency in extreme cases, here is a solution using numpy argsort:
import numpy as np
lst = ['1 ', ' 9' , ' 4', '2 ']
order = np.argsort(np.array([s.lstrip() for s in lst]))
result = list(np.array(lst)[order])
Overall, I think that using sorted(..., key=...) is generally superior and this solution makes more sense if the input is already a numpy array. On the other hand, it uses strip() only once per item and makes use of numpy, so it is possible that for large enough lists, it could be faster. Additionally, it produces order, whitch shows where each sorted element was in the original list.
As a last comment, from the code you provide, but not the example you give, I am not sure if you just want to strip the leading white spaces, or do more, e.g. best-way-to-strip-punctuation-from-a-string-in-python, or first order on the string without punctuatation and then if they are equal, order on the rest (solution by tdelaney) In any case it might not be a bad idea to compile a pattern, e.g.
import numpy as np
import re
pattern = re.compile(r'[^\w]')
lst = ['1 ', ' 9' , ' 4', '2 ']
order = np.argsort(np.array([pattern.sub('',s) for s in lst]))
result = list(np.array(lst)[order])
or:
import re
pattern = re.compile(r'[^\w]')
r = sorted(['1 ', ' 9' , ' 4', '2 '], key= lambda s: pattern.sub('',s))

Remove empty entries during string split

Currently I use this helper function to remove the empty entries.
Is there a built-in way for this?
def getNonEmptyList(str, splitSym):
lst=str.split(splitSym)
lst1=[]
for entry in lst:
if entry.strip() !='':
lst1.append(entry)
return lst1

str.split(sep=None, maxsplit=-1)
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
For example:
>>> '1 2 3'.split()
['1', '2', '3']
>>> '1 2 3'.split(maxsplit=1)
['1', '2 3']
>>> ' 1 2 3 '.split()
['1', '2', '3']

This split could be done more compactly with a comprehension like:
def getNonEmptyList(str, splitSym):
return [s for s in str.split(splitSym) if s.strip() != '']

You could use filter
def get_non_empty_list(s, delimiter):
return list(filter(str.strip, s.split(delimiter)))

If you want to split text by newline and remove any empty lines here's one liner :)
lines = [l for l in text.split('\n') if l.strip()]

You can use a regex to capture the extra whitespace.
import re
split_re = r'\s*{}\s*'.format(splitSym)
return re.split(split_re, string)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how does str.split() divide a string? - python

line = (' 1.') print(line.split(), len(line.split())) This gives ['1.'] 1 But if I do for value in line.split(): val = value print(val, len(val)) I get 1. 2 Inspecting val gives me val[0] '1' val[1] '.' I'm confused as to why ".split()" is dividing from 1 index to 2 in the second example?

When you do line.split() it returns a list with one element ['1.']. When you iterate over the line.split (for value in line.split()) the variable value is each element of the list resultant by the split ('1.' not ['1.']). Then you run len in the string, wich has 2 elements (the "1" and the ".").

Related

How to keep the white space when splitting sentence to a list?

How to remove blank inverted comma from my list and tuple in python?

Delete spaces in dictionary values python

Sort list of strings by very special key [closed]

Remove empty entries during string split

Categories

Resources