Reconstituting Strings in Python - python

I would like to do something like:
temp=a.split()
#do some stuff with this new list
b=" ".join(temp)
where a is the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?

I assume in your third line you mean join(temp), not join(a).
To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:
>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']
The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.
You can either work directly on the even-spaced items, or you can first extract them:
>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']
then alter y as you will, e.g.:
>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']
then reinsert and join up:
>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'
Note that the \n is exactly in the position equivalent to where it was in the original, as desired.

You need to use regular expressions to rip your string apart. The resulting match object can give you the character ranges of the parts that match various sub-expressions.
Since you might have an arbitrarily large number of sections separated by whitespace, you're going to have to match the string multiple times at different starting points within the string.
If this answer is confusing to you, I can look up the appropriate references and put in some sample code. I don't really have all the libraries memorized, just what they do. :-)

It depends in what you want to split.
For default split use '\n', ' ' as delimitador, you can use
a.split(" ")
if you only want spaces as delimitador.
http://docs.python.org/library/stdtypes.html#str.split

I don't really understand your question. Can you give an example of what you want to do?
Anyway, maybe this can help:
b = '\n'.join(a)

First of all, I assume that when you say
b = " ".join(a)
You actually mean
b = " ".join(temp)
When you call split() without specifying a separator, the function will interpret whitespace of any length as a separator. I believe whitespace includes newlines, so those dissapear when you split the string. Try explicitly passing a separator (such as a simple " " space character) to split(). If you have multiple spaces in a row, using split this way will remove them all and include a series of "" empty strings in the returned list.
To restore the original spacing, just make sure that you call join() from the same string which you used as your separator in split(), and that you don't remove any elements from your intermediary list of strings.

Related

how to remove all zeros that are 7 characters long python

I have made a string without spaces. so instead of spaces, I used 0000000. but there will be no alphabet letters. so for example, 000000020000000050000000190000000200000000 should equal "test". Sorry, I am very new to python and am not good. so if someone can help me out, that would be awesome.
You should be able to achieve the desired effect using regular expressions and re.sub()
If you want to extract the literal word "test" from that string as mentioned in the comments, you'll need to account for the fact that if you have 8 0's, it will match the first 7 from left to right, so a number like 20 followed by 7 0's would cause a few issues. We can get around this by matching the string in reverse (right to left) and then reversing the finished string to undo the initial reverse.
Here's the solution I came up with as my revised answer:
import re
my_string = '000000020000000050000000190000000200000000'
# Substitute a space in place of 7 0's
# Reverse the string in the input, and then reverse the output
new_string = re.sub('0{7}', ' ', my_string[::-1])[::-1]
# >>> new_string
# ' 20 5 19 20 '
Then we can strip the leading and trailing whitespace from this answer and split it into an array
my_array = new_string.strip().split()
# >>> my_array
# ['20', '5', '19', '20']
After that, you can process the array in whatever way you see fit to get the word "test" out of it.
My solution to that would probably be the following:
import string
word = ''.join([string.ascii_lowercase[int(x) - 1] for x in my_array])
# >>> word
# 'test'
NOTE: This answer has been completely rewritten (v2).

How to convert a vertical string to a horizontal one?

how can I convert a vertical string into a horizontal one in Python?
I tried:
result=astring.replace("\n", "")
but it doesn't do anything, it remains vertical..
The code is the following:
names = "".join(name).replace("\n","")
print(names)
where "names" is:
Federica
Silvio
Enrico
I would like:
Federica, Silvio, Enrico
x = """Federica
Silvio
Enrico"""
x.replace("\n",', ')
'Federica, Silvio, Enrico'
Your method is fundamentally wrong, when you apply a function, it combines a iterables with spaces in the middle. e.g.
" ".join("hello")
'h e l l o'
So when you call it on a string with no join value, the string is unchanged. Then you replace '\n' with '', which will flatten the string but not insert the comma.
If you have the names in a string format, for example:
names = """Federica
Silvio
Enrico"""
You can split the vertical string into an horizontal string using replace:
result = names.replace("\n", ", ")
Which results in:
print(results)
'Federica, Silvio, Enrico'
From this, I can say your approach was not wrong, maybe you were not storing the result of the replace? Replace does not modify the string but returns a new one with the operation performed.

Is there a reverse \n?

I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like:
['hello\n', 'hallo\n', 'hell\n']
And when I put a word in, it gives a output of this:
hellllok could be spelled as hello
hellos
hillock
Question:
I'm wondering if there is a reverse or inverse \n so I can counteract these \n's.
Any help is appreciated.
There's no "reverse newline" in the standard character set but, even if there was, you would have to apply it to each string in turn.
And, if you can do that, you can equally modify the strings to remove the newline. In other words, create a new list using the current one, with newlines removed. That would be something like:
>>> oldlist = ['hello\n', 'hallo\n', 'hell\n']
>>> oldlist
['hello\n', 'hallo\n', 'hell\n']
>>> newlist = [s.replace('\n','') for s in oldlist]
>>> newlist
['hello', 'hallo', 'hell']
That will remove all newlines from each of the strings. If you want to ensure you only replace a single newline at the end of the strings, you can instead use:
newlist = [re.sub('\n$','',s) for s in oldlist]

Dot notation string manipulation

Is there a way to manipulate a string in Python using the following ways?
For any string that is stored in dot notation, for example:
s = "classes.students.grades"
Is there a way to change the string to the following:
"classes.students"
Basically, remove everything up to and including the last period. So "restaurants.spanish.food.salty" would become "restaurants.spanish.food".
Additionally, is there any way to identify what comes after the last period? The reason I want to do this is I want to use isDigit().
So, if it was classes.students.grades.0 could I grab the 0 somehow, so I could use an if statement with isdigit, and say if the part of the string after the last period (so 0 in this case) is a digit, remove it, otherwise, leave it.
you can use split and join together:
s = "classes.students.grades"
print '.'.join(s.split('.')[:-1])
You are splitting the string on . - it'll give you a list of strings, after that you are joining the list elements back to string separating them by .
[:-1] will pick all the elements from the list but the last one
To check what comes after the last .:
s.split('.')[-1]
Another way is to use rsplit. It works the same way as split but if you provide maxsplit parameter it'll split the string starting from the end:
rest, last = s.rsplit('.', 1)
'classes.students'
'grades'
You can also use re.sub to substitute the part after the last . with an empty string:
re.sub('\.[^.]+$', '', s)
And the last part of your question to wrap words in [] i would recommend to use format and list comprehension:
''.join("[{}]".format(e) for e in s.split('.'))
It'll give you the desired output:
[classes][students][grades]
The best way to do this is using the rsplit method and pass in the maxsplit argument.
>>> s = "classes.students.grades"
>>> before, after = s.rsplit('.', maxsplit=1) # rsplit('.', 1) in Python 2.x onwards
>>> before
'classes.students'
>>> after
'grades'
You can also use the rfind() method with normal slice operation.
To get everything before last .:
>>> s = "classes.students.grades"
>>> last_index = s.rfind('.')
>>> s[:last_index]
'classes.students'
Then everything after last .
>>> s[last_index + 1:]
'grades'
if '.' in s, s.rpartition('.') finds last dot in s,
and returns (before_last_dot, dot, after_last_dot):
s = "classes.students.grades"
s.rpartition('.')[0]
If your goal is to get rid of a final component that's just a single digit, start and end with re.sub():
s = re.sub(r"\.\d$", "", s)
This will do the job, and leave other strings alone. No need to mess with anything else.
If you do want to know about the general case (separate out the last component, no matter what it is), then use rsplit to split your string once:
>>> "hel.lo.there".rsplit(".", 1)
['hel.lo', 'there']
If there's no dot in the string you'll just get one element in your array, the entire string.
You can do it very simply with rsplit (str.rsplit([sep[, maxsplit]]) , which will return a list by breaking each element along the given separator.
You can also specify how many splits should be performed:
>>> s = "res.spa.f.sal.786423"
>>> s.rsplit('.',1)
['res.spa.f.sal', '786423']
So the final function that you describe is:
def dimimak_cool_function(s):
if '.' not in s: return s
start, end = s.rsplit('.', 1)
return start if end.isdigit() else s
>>> dimimak_cool_function("res.spa.f.sal.786423")
'res.spa.f.sal'
>>> dimimak_cool_function("res.spa.f.sal")
'res.spa.f.sal'

How to split a string in Python?

I have read the documentation but don't fully understand how to do it.
I understand that I need to have some kind of identifier in the string so that the functions can find where to split the string (unless I can target the first space in the sentence?).
So for example how would I split:
"Sico87 is an awful python developer" to "Sico87" and "is an awful Python developer"?
The strings are retrieved from a database (if this does matter).
Use the split method on strings:
>>> "Sico87 is an awful python developer".split(' ', 1)
['Sico87', 'is an awful python developer']
How it works:
Every string is an object. String objects have certain methods defined on them, such as split in this case. You call them using obj.<methodname>(<arguments>).
The first argument to split is the character that separates the individual substrings. In this case that is a space, ' '.
The second argument is the number of times the split should be performed. In your case that is 1. Leaving out this second argument applies the split as often as possible:
>>> "Sico87 is an awful python developer".split(' ')
['Sico87', 'is', 'an', 'awful', 'python', 'developer']
Of course you can also store the substrings in separate variables instead of a list:
>>> a, b = "Sico87 is an awful python developer".split(' ', 1)
>>> a
'Sico87'
>>> b
'is an awful python developer'
But do note that this will cause trouble if certain inputs do not contain spaces:
>>> a, b = "string_without_spaces".split(' ', 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
Use partition(' ') which always returns three items in the tuple - the first bit up until the separator, the separator, and then the bits after. Slots in the tuple that have are not applicable are still there, just set to be empty strings.
Examples:
"Sico87 is an awful python developer".partition(' ') returns ["Sico87"," ","is an awful python developer"]
"Sico87 is an awful python developer".partition(' ')[0] returns "Sico87"
An alternative, trickier way is to use split(' ',1) which works similiarly but returns a variable number of items. It will return a tuple of one or two items, the first item being the first word up until the delimiter and the second being the rest of the string (if there is any).

Categories