I have read the documentation but don't fully understand how to do it.
I understand that I need to have some kind of identifier in the string so that the functions can find where to split the string (unless I can target the first space in the sentence?).
So for example how would I split:
"Sico87 is an awful python developer" to "Sico87" and "is an awful Python developer"?
The strings are retrieved from a database (if this does matter).
Use the split method on strings:
>>> "Sico87 is an awful python developer".split(' ', 1)
['Sico87', 'is an awful python developer']
How it works:
Every string is an object. String objects have certain methods defined on them, such as split in this case. You call them using obj.<methodname>(<arguments>).
The first argument to split is the character that separates the individual substrings. In this case that is a space, ' '.
The second argument is the number of times the split should be performed. In your case that is 1. Leaving out this second argument applies the split as often as possible:
>>> "Sico87 is an awful python developer".split(' ')
['Sico87', 'is', 'an', 'awful', 'python', 'developer']
Of course you can also store the substrings in separate variables instead of a list:
>>> a, b = "Sico87 is an awful python developer".split(' ', 1)
>>> a
'Sico87'
>>> b
'is an awful python developer'
But do note that this will cause trouble if certain inputs do not contain spaces:
>>> a, b = "string_without_spaces".split(' ', 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
Use partition(' ') which always returns three items in the tuple - the first bit up until the separator, the separator, and then the bits after. Slots in the tuple that have are not applicable are still there, just set to be empty strings.
Examples:
"Sico87 is an awful python developer".partition(' ') returns ["Sico87"," ","is an awful python developer"]
"Sico87 is an awful python developer".partition(' ')[0] returns "Sico87"
An alternative, trickier way is to use split(' ',1) which works similiarly but returns a variable number of items. It will return a tuple of one or two items, the first item being the first word up until the delimiter and the second being the rest of the string (if there is any).
Related
Consider the following example
a= 'Apple'
b = a.split(',')
print(b)
Output is ['Apple'].
I am not getting why is it returning a list even when there is no ',' character in Apple
There might be case when we use split method we are expecting more than one element in list but since we are splitting based on separator not present in string, there will be only one element, wouldn't it be better if this mistake is caught during this split method itself
The behaviour of a.split(',') when no commas are present in a is perfectly consistent with the way it behaves when there are a positive number of commas in a.
a.split(',') says to split string a into a list of substrings that are delimited by ',' in a; the delimiter is not preserved in the substrings.
If 1 comma is found you get 2 substrings in the list, if 2 commas are found you get 3 substrings in the list, and in general, if n commas are found you get n+1 substrings in the list. So if 0 commas are found you get 1 substring in the list.
If you want 0 substrings in the list, then you'll need to supply a string with -1 commas in it. Good luck with that. :)
The docstring of that method says:
Return a list of the words in the string S, using sep as the delimiter string.
The delimiter is used to separate multiple parts of the string; having only one part is not an error.
That's the way split() function works. If you do not want that behaviour, you can implement your my_split() function as follows:
def my_split(s, d=' '):
return s.split(d) if d in s else s
This question already has answers here:
Python Regex to find a string in double quotes within a string
(6 answers)
Closed 6 years ago.
I'm trying to write a function where the input has a keyword that occurs multiple times in a string and will print the stuff that has double quotation marks between them after the keyword. Essentially...
Input= 'alkfjjiekeyword "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee keyword"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness'
Output= someonehelpmepls
itonlygivesmeoneinsteadofmultiple
If its possible to have the outputs as its own line that would be better.
Here's what I have so far:
def getEm(s):
h = s.find('keyword')
if h == -1
return -1
else:
begin = s.find('"',h)
end = s.find('"', begin+1)
result = s[begin +1:end]
print (result)
Please don't suggest import. I do not know how to do that nor know what it is, I am a beginner.
Let's take some sample input:
>>> Input= 'alkfjjiekeyword "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee keyword"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness'
I believe that one " was missing from the sample input, so I added it.
As I understand it, you want to get the strings in double-quotes that follow the word keyword. If that is the case, then:
def get_quoted_after_keyword(input):
results = []
split_by_keyword = input.split('keyword')
# you said no results before the keyword
for s in split_by_keyword[1:]:
split_by_quote = s.split('"')
if len(split_by_quote) > 1:
# assuming you want exactly one quoted result per keyword
results.append(split_by_quote[1])
return results
>print('\n'.join(get_quoted_after_keyword(Input))
>someonehelpmepls
>itonlygivesmeoneinsteadofmultiple
How it works
Let's look at the first piece:
>>> Input.split('keyword')
['alkfjjie',
' "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee ',
'"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness']
By splitting Input on keyword, we get, in this case, three strings. The second string to the last are all strings that follow the word keyword. To get those strings without the first string, we use subscripting:
>>> Input.split('keyword')[1:]
[' "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee ',
'"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness']
Now, our next task is to get the part of these strings that is in double-quotes. To do that, we split each of these strings on ". The second string, the one numbered 1, will be the string in double quotes. As a simpler example, let's take these strings:
>>> [s.split('"')[1] for s in ('"one"otherstuff', ' "two"morestuff')]
['one', 'two']
Next, we put these two steps together:
>>> [s.split('"')[1] for s in Input.split('keyword')[1:]]
['someonehelpmepls', 'itonlygivesmeoneinsteadofmultiple']
We now have the strings that we want. The last step is to print them out nicely, one per line:
>>> print('\n'.join(s.split('"')[1] for s in Input.split('keyword')[1:]))
someonehelpmepls
itonlygivesmeoneinsteadofmultiple
Limitation: this approach assumes that keyword never appears inside the double-quoted strings.
I want to do the following split:
input: 0x0000007c9226fc output: 7c9226fc
input: 0x000000007c90e8ab output: 7c90e8ab
input: 0x000000007c9220fc output: 7c9220fc
I use the following line of code to do this but it does not work!
split = element.rpartition('0')
I got these outputs which are wrong!
input: 0x000000007c90e8ab output: e8ab
input: 0x000000007c9220fc output: fc
what is the fastest way to do this kind of split?
The only idea for me right now is to make a loop and perform checking but it is a little time consuming.
I should mention that the number of zeros in input is not fixed.
Each string can be converted to an integer using int() with a base of 16. Then convert back to a string.
for s in '0x000000007c9226fc', '0x000000007c90e8ab', '0x000000007c9220fc':
print '%x' % int(s, 16)
Output
7c9226fc
7c90e8ab
7c9220fc
input[2:].lstrip('0')
That should do it. The [2:] skips over the leading 0x (which I assume is always there), then the lstrip('0') removes all the zeros from the left side.
In fact, we can use lstrip ability to remove more than one leading character to simplify:
input.lstrip('x0')
format is handy for this:
>>> print '{:x}'.format(0x000000007c90e8ab)
7c90e8ab
>>> print '{:x}'.format(0x000000007c9220fc)
7c9220fc
In this particular case you can just do
your_input[10:]
You'll most likely want to properly parse this; your idea of splitting on separation of non-zero does not seem safe at all.
Seems to be the XY problem.
If the number of characters in a string is constant then you can use
the following code.
input = "0x000000007c9226fc"
output = input[10:]
Documentation
Also, since you are using rpartitionwhich is defined as
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
Since your input can have multiple 0's, and rpartition only splits the last occurrence this a malfunction in your code.
Regular expression for 0x00000 or its type is (0x[0]+) and than replace it with space.
import re
st="0x000007c922433434000fc"
reg='(0x[0]+)'
rep=re.sub(reg, '',st)
print rep
So I'm working on a problem where I have to find various string repeats after encountering an initial string, say we take ACTGAC so the data file has sequences that look like:
AAACTGACACCATCGATCAGAACCTGA
So in that string once we find ACTGAC then I need to analyze the next 10 characters for the string repeats which go by some rules. I have the rules coded but can anyone show me how once I find the string that I need, I can make a substring for the next ten characters to analyze. I know that str.partition function can do that once I find the string, and then the [1:10] can get the next ten characters.
Thanks!
You almost have it already (but note that indexes start counting from zero in Python).
The partition method will split a string into head, separator, tail, based on the first occurence of separator.
So you just need to take a slice of the first ten characters of the tail:
>>> data = 'AAACTGACACCATCGATCAGAACCTGA'
>>> head, sep, tail = data.partition('ACTGAC')
>>> tail[:10]
'ACCATCGATC'
Python allows you to leave out the start-index in slices (in defaults to zero - the start of the string), and also the end-index (it defaults to the length of the string).
Note that you could also do the whole operation in one line, like this:
>>> data.partition('ACTGAC')[2][:10]
'ACCATCGATC'
So, based on marcog's answer in Find all occurrences of a substring in Python , I propose:
>>> import re
>>> data = 'AAACTGACACCATCGATCAGAACCTGAACTGACTGACAAA'
>>> sep = 'ACTGAC'
>>> [data[m.start()+len(sep):][:10] for m in re.finditer('(?=%s)'%sep, data)]
['ACCATCGATC', 'TGACAAA', 'AAA']
I would like to do something like:
temp=a.split()
#do some stuff with this new list
b=" ".join(temp)
where a is the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?
I assume in your third line you mean join(temp), not join(a).
To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:
>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']
The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.
You can either work directly on the even-spaced items, or you can first extract them:
>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']
then alter y as you will, e.g.:
>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']
then reinsert and join up:
>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'
Note that the \n is exactly in the position equivalent to where it was in the original, as desired.
You need to use regular expressions to rip your string apart. The resulting match object can give you the character ranges of the parts that match various sub-expressions.
Since you might have an arbitrarily large number of sections separated by whitespace, you're going to have to match the string multiple times at different starting points within the string.
If this answer is confusing to you, I can look up the appropriate references and put in some sample code. I don't really have all the libraries memorized, just what they do. :-)
It depends in what you want to split.
For default split use '\n', ' ' as delimitador, you can use
a.split(" ")
if you only want spaces as delimitador.
http://docs.python.org/library/stdtypes.html#str.split
I don't really understand your question. Can you give an example of what you want to do?
Anyway, maybe this can help:
b = '\n'.join(a)
First of all, I assume that when you say
b = " ".join(a)
You actually mean
b = " ".join(temp)
When you call split() without specifying a separator, the function will interpret whitespace of any length as a separator. I believe whitespace includes newlines, so those dissapear when you split the string. Try explicitly passing a separator (such as a simple " " space character) to split(). If you have multiple spaces in a row, using split this way will remove them all and include a series of "" empty strings in the returned list.
To restore the original spacing, just make sure that you call join() from the same string which you used as your separator in split(), and that you don't remove any elements from your intermediary list of strings.