How can I strip the comma from a Python string such as Foo, bar? I tried 'Foo, bar'.strip(','), but it didn't work.
You want to replace it, not strip it:
s = s.replace(',', '')
Use replace method of strings not strip:
s = s.replace(',','')
An example:
>>> s = 'Foo, bar'
>>> s.replace(',',' ')
'Foo bar'
>>> s.replace(',','')
'Foo bar'
>>> s.strip(',') # clears the ','s at the start and end of the string which there are none
'Foo, bar'
>>> s.strip(',') == s
True
unicode('foo,bar').translate(dict([[ord(char), u''] for char in u',']))
This will strip all commas from the text and left justify it.
for row in inputfile:
place = row['your_row_number_here'].strip(', ')
Related
I've tried using these two input statements in python. Both the statements returns same output. What's the difference between using split() and split(" ") ?
a=[int(i) for i in input().split(" ")]
print(a)
and
a=[int(i) for i in input().split()]
print(a)
The default action of method split on a string is to split on any grouping of white space:
>>> 'foo bar'.split()
['foo', 'bar']
>>> 'foo \n \t bar'.split()
['foo', 'bar']
If you pass a literal space as the argument, however, the split is done differently, with only a literal space as the splitter, and with empty strings resulting from adjacent literal spaces:
>>> 'foo \n \t bar'.split(' ')
['foo', '\n', '\t', '', '', 'bar']
If the input has only single, ordinary spaces, there will be no observable difference.
i have a list of strings.
If any of these strings has a 4-digit year, i want to truncate the string at the end of the year.
Otherwise I leave the strings alone.
I tried using:
for x in my_strings:
m=re.search("\D\d\d\d\d\D",x)
if m: x=x[:m.end()]
I also tried:
my_strings=[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) for x in my_strings]
Neither of these is working.
Can you tell me what I am doing wrong?
Something like this seems to work on trivial data:
>>> regex = re.compile(r'^(.*(?<=\D)\d{4}(?=\D))(.*)')
>>> strings = ['foo', 'bar', 'baz', 'foo 1999', 'foo 1999 never see this', 'bar 2010 n 2015', 'bar 20156 see this']
>>> [regex.sub(r'\1', s) for s in strings]
['foo', 'bar', 'baz', 'foo 1999', 'foo 1999', 'bar 2010', 'bar 20156 see this']
Looks like your only bound on the result string is at the end(), so you should be using re.match() instead, and modify your regex to:
my_expr = r".*?\D\d{4}\D"
Then, in your code, do:
regex = re.compile(my_expr)
my_new_strings = []
for string in my_strings:
match = regex.match(string)
if match:
my_new_strings.append(match.group())
else:
my_new_strings.append(string)
Or as a list comprehension:
regex = re.compile(my_expr)
matches = ((regex.match(string), string) for string in my_strings)
my_new_strings = [match.group() if match else string for match, string in matches]
Alternatively, you could use re.sub:
regex = re.compile(r'(\D\d{4})\D')
new_strings = [regex.sub(r'\1', string) for string in my_strings]
I am not entirely sure of your usecase, but the following code can give you some hints:
import re
my_strings = ['abcd', 'ab12cd34', 'ab1234', 'ab1234cd', '1234cd', '123cd1234cd']
for index, string in enumerate(my_strings):
match = re.search('\d{4}', string)
if match:
my_strings[index] = string[0:match.end()]
print my_strings
# ['abcd', 'ab12cd34', 'ab1234', 'ab1234', '1234', '123cd1234']
You were actually pretty close with the list comprehension, but your syntax is off - you need to make the first expression a "conditional expression" aka x if <boolean> else y:
[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) else x for x in my_strings]
Obviously this is pretty ugly/hard to read. There are several better ways to split your string around a 4-digit year. Such as:
[re.split(r'(?<=\D\d{4})\D', x)[0] for x in my_strings]
I have several python strings from which I want unwanted characters removed.
Examples:
"This is '-' a test"
should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
should be "This is a test"
"> FOO < BAR"
should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I"
should be ""
(because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia ;‘\\~siI.ve_rswinq m"
should be ""
"2|'J]B"
should be ""
this is what I have so far, however, it is not keeping the original spaces between words.
>>> line = re.sub(r"\W+","","This is '-' a test")
>>> line
'Thisisatest'
>>> line = re.sub(r"\W+","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
>>> line
'ThisisatestL_U_OYOHlJ1l'
#although i would prefer this to be "This is a test" but if not possible i would
prefer "This is a test L_U_OYOHlJ1l"
>>> line = re.sub(r"\W+","","> FOO < BAR")
>>> line
'FOOBAR'
>>> line = re.sub(r"\W+","","I<<W5§!‘1“¢!°\" I")
>>> line
'IW51I'
>>> line = re.sub(r"\W+","","l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
>>> line
'llnbiasiIve_rswinqm'
>>> line = re.sub(r"\W+","","2|'J]B")
>>> line
'2JB'
I will be filtering the regex cleaned words through a list of predefined words later.
I'd go with a split and filter, like this:
' '.join(word for word in line.split() if word.isalpha() and word.lower() in list)
This will remove all non-alphabetic words and alphabetic words that are not in the list.
Examples:
def myfilter(string):
words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)
>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
This one clears out any group of non-space symbols with at least one non alphabetic character. It will leaves some unwanted group of letters though :
re.sub(r"\w*[^a-zA-Z ]+\w*","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
gives :
'This is a test OH '
It will also leave groups of more than one space :
re.sub(r"[^a-zA-Z ]+\w*","","This is '-' a test")
'This is a test' # two spaces
With Python I know that the "\n" breaks to the next line in a string, but what I am trying to do is replace every "," in a string with a '\n'. Is that possible? I am kind of new to Python.
Try this:
text = 'a, b, c'
text = text.replace(',', '\n')
print text
For lists:
text = ['a', 'b', 'c']
text = '\n'.join(text)
print text
>>> str = 'Hello, world'
>>> str = str.replace(',','\n')
>>> print str
Hello
world
>>> str_list=str.split('\n')
>>> print str_list
['Hello', ' world']
For futher operations you may check: http://docs.python.org/library/stdtypes.html
You can insert a literal \n into your string by escaping the backslash, e.g.
>>> print '\n'; # prints an empty line
>>> print '\\n'; # prints \n
\n
The same principle is used in regular expressions. Use this expresion to replace all , in a string with \n:
>>> re.sub(",", "\\n", "flurb, durb, hurr")
'flurb\n durb\n hurr'
I have four strings and any of them can be empty. I need to join them into one string with spaces between them. If I use:
new_string = string1 + ' ' + string2 + ' ' + string3 + ' ' + string4
The result is a blank space on the beginning of the new string if string1 is empty. Also, I have three blank spaces if string2 and string3 are empty.
How can I easily join them without blank spaces when I don't need them?
>>> strings = ['foo','','bar','moo']
>>> ' '.join(filter(None, strings))
'foo bar moo'
By using None in the filter() call, it removes all falsy elements.
If you KNOW that the strings have no leading/trailing whitespace:
>>> strings = ['foo','','bar','moo']
>>> ' '.join(x for x in strings if x)
'foo bar moo'
otherwise:
>>> strings = ['foo ','',' bar', ' ', 'moo']
>>> ' '.join(x.strip() for x in strings if x.strip())
'foo bar moo'
and if any of the strings have non-leading/trailing whitespace, you may need to work harder still. Please clarify what it is that you actually have.
strings = ['foo','','bar','moo']
' '.join([x for x in strings if x is not ''])
'foo bar moo'