What's the difference between these two input statements? - python

I've tried using these two input statements in python. Both the statements returns same output. What's the difference between using split() and split(" ") ?
a=[int(i) for i in input().split(" ")]
print(a)
and
a=[int(i) for i in input().split()]
print(a)

The default action of method split on a string is to split on any grouping of white space:
>>> 'foo bar'.split()
['foo', 'bar']
>>> 'foo \n \t bar'.split()
['foo', 'bar']
If you pass a literal space as the argument, however, the split is done differently, with only a literal space as the splitter, and with empty strings resulting from adjacent literal spaces:
>>> 'foo \n \t bar'.split(' ')
['foo', '\n', '\t', '', '', 'bar']
If the input has only single, ordinary spaces, there will be no observable difference.

Related

Split string in Python while keeping the line break inside the generated list

As simple as it sounds, can't think of a straightforward way of doing the below in Python.
my_string = "This is a test.\nAlso\tthis"
list_i_want = ["This", "is", "a", "test.", "\n", "Also", "this"]
I need the same behaviour as with string.split(), i.e. remove any type and number of whitespaces, but excluding the line breaks \n in which case I need it as a standalone list item.
How could I do this?
Split String using Regex findall()
import re
my_string = "This is a test.\nAlso\tthis"
my_list = re.findall(r"\S+|\n", my_string)
print(my_list)
How it Works:
"\S+": "\S" = non whitespace characters. "+" is a greed quantifier so it find any groups of non-whitespace characters aka words
"|": OR logic
"\n": Find "\n" so it's returned as well in your list
Output:
['This', 'is', 'a', 'test.', '\n', 'Also', 'this']
Here's a code that works but is definitely not efficient/pythonic:
my_string = "This is a test.\nAlso\tthis"
l = my_string.splitlines() #Splitting lines
list_i_want = []
for i in l:
list_i_want.extend((i.split())) # Extending elements in list by splitting lines
list_i_want.extend('\n') # adding newline character
list_i_want.pop() # Removing last newline character
print(list_i_want)
Output:
['This', 'is', 'a', 'test.', '\n', 'Also', 'this']

list comprehension using regex conditional

i have a list of strings.
If any of these strings has a 4-digit year, i want to truncate the string at the end of the year.
Otherwise I leave the strings alone.
I tried using:
for x in my_strings:
m=re.search("\D\d\d\d\d\D",x)
if m: x=x[:m.end()]
I also tried:
my_strings=[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) for x in my_strings]
Neither of these is working.
Can you tell me what I am doing wrong?
Something like this seems to work on trivial data:
>>> regex = re.compile(r'^(.*(?<=\D)\d{4}(?=\D))(.*)')
>>> strings = ['foo', 'bar', 'baz', 'foo 1999', 'foo 1999 never see this', 'bar 2010 n 2015', 'bar 20156 see this']
>>> [regex.sub(r'\1', s) for s in strings]
['foo', 'bar', 'baz', 'foo 1999', 'foo 1999', 'bar 2010', 'bar 20156 see this']
Looks like your only bound on the result string is at the end(), so you should be using re.match() instead, and modify your regex to:
my_expr = r".*?\D\d{4}\D"
Then, in your code, do:
regex = re.compile(my_expr)
my_new_strings = []
for string in my_strings:
match = regex.match(string)
if match:
my_new_strings.append(match.group())
else:
my_new_strings.append(string)
Or as a list comprehension:
regex = re.compile(my_expr)
matches = ((regex.match(string), string) for string in my_strings)
my_new_strings = [match.group() if match else string for match, string in matches]
Alternatively, you could use re.sub:
regex = re.compile(r'(\D\d{4})\D')
new_strings = [regex.sub(r'\1', string) for string in my_strings]
I am not entirely sure of your usecase, but the following code can give you some hints:
import re
my_strings = ['abcd', 'ab12cd34', 'ab1234', 'ab1234cd', '1234cd', '123cd1234cd']
for index, string in enumerate(my_strings):
match = re.search('\d{4}', string)
if match:
my_strings[index] = string[0:match.end()]
print my_strings
# ['abcd', 'ab12cd34', 'ab1234', 'ab1234', '1234', '123cd1234']
You were actually pretty close with the list comprehension, but your syntax is off - you need to make the first expression a "conditional expression" aka x if <boolean> else y:
[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) else x for x in my_strings]
Obviously this is pretty ugly/hard to read. There are several better ways to split your string around a 4-digit year. Such as:
[re.split(r'(?<=\D\d{4})\D', x)[0] for x in my_strings]

How to strip comma in Python string

How can I strip the comma from a Python string such as Foo, bar? I tried 'Foo, bar'.strip(','), but it didn't work.
You want to replace it, not strip it:
s = s.replace(',', '')
Use replace method of strings not strip:
s = s.replace(',','')
An example:
>>> s = 'Foo, bar'
>>> s.replace(',',' ')
'Foo bar'
>>> s.replace(',','')
'Foo bar'
>>> s.strip(',') # clears the ','s at the start and end of the string which there are none
'Foo, bar'
>>> s.strip(',') == s
True
unicode('foo,bar').translate(dict([[ord(char), u''] for char in u',']))
This will strip all commas from the text and left justify it.
for row in inputfile:
place = row['your_row_number_here'].strip(', ')
‎
‎‎‎‎‎
‎‎‎‎‎‎

How to split string with 2 arguments?

If I have a string thats 'asdf foo\nHi\nBar thing', I want it to split the string, so the output is ['asdf', 'foo', 'hi', 'bar', thing']. Thats essentially x.split(' ') and x.split('\n'). How can I do this efficiently? I want it to be about one line long, instead of having a for loop to split again...
Omit the parameter to split(): x.split() will split on both, spaces and newline characters (and also tabs).
Example:
>>> x = 'asdf foo\nHi\nBar thing'
>>> x.split()
['asdf', 'foo', 'Hi', 'Bar', 'thing']

How can i parse a comma delimited string into a list (caveat)?

I need to be able to take a string like:
'''foo, bar, "one, two", three four'''
into:
['foo', 'bar', 'one, two', 'three four']
I have an feeling (with hints from #python) that the solution is going to involve the shlex module.
It depends how complicated you want to get... do you want to allow more than one type of quoting. How about escaped quotes?
Your syntax looks very much like the common CSV file format, which is supported by the Python standard library:
import csv
reader = csv.reader(['''foo, bar, "one, two", three four'''], skipinitialspace=True)
for r in reader:
print r
Outputs:
['foo', 'bar', 'one, two', 'three four']
HTH!
The shlex module solution allows escaped quotes, one quote escape another, and all fancy stuff shell supports.
>>> import shlex
>>> my_splitter = shlex.shlex('''foo, bar, "one, two", three four''', posix=True)
>>> my_splitter.whitespace += ','
>>> my_splitter.whitespace_split = True
>>> print list(my_splitter)
['foo', 'bar', 'one, two', 'three', 'four']
escaped quotes example:
>>> my_splitter = shlex.shlex('''"test, a",'foo,bar",baz',bar \xc3\xa4 baz''',
posix=True)
>>> my_splitter.whitespace = ',' ; my_splitter.whitespace_split = True
>>> print list(my_splitter)
['test, a', 'foo,bar",baz', 'bar \xc3\xa4 baz']
You may also want to consider the csv module. I haven't tried it, but it looks like your input data is closer to CSV than to shell syntax (which is what shlex parses).
You could do something like this:
>>> import re
>>> pattern = re.compile(r'\s*("[^"]*"|.*?)\s*,')
>>> def split(line):
... return [x[1:-1] if x[:1] == x[-1:] == '"' else x
... for x in pattern.findall(line.rstrip(',') + ',')]
...
>>> split("foo, bar, baz")
['foo', 'bar', 'baz']
>>> split('foo, bar, baz, "blub blah"')
['foo', 'bar', 'baz', 'blub blah']
I'd say a regular expression would be what you're looking for here, though I'm not terribly familiar with Python's Regex engine.
Assuming you use lazy matches, you can get a set of matches on a string which you can put into your array.
If it doesn't need to be pretty, this might get you on your way:
def f(s, splitifeven):
if splitifeven & 1:
return [s]
return [x.strip() for x in s.split(",") if x.strip() != '']
ss = 'foo, bar, "one, two", three four'
print sum([f(s, sie) for sie, s in enumerate(ss.split('"'))], [])

Categories