Assume we have a string a.
A part of a looks like ac5:9qr$28c#.
This pattern (value1:value2$value3#) repeats.
Now, my question is: How do I look for these values and extract them?
Note: These string parts aren't necessarily special characters.
re.findall works well for this problem.
Try this:
import re
data = 'abc:def$ghi#ac5:9qr$28c#1234:4567$89#'
result = re.findall(r'(.*?):(.*?)\$(.*?)#', data)
print result
Result:
[('abc', 'def', 'ghi'), ('ac5', '9qr', '28c'), ('1234', '4567', '89')]
Something like this should do:
a = "ac5:9qr$28c#"
values = []
delimiters = [':','$','#']
while len(a) > 0:
for delimiter in delimiters:
delimiterIndex = a.index(delimiter )
newValue = a[0:delimiterIndex]
values.append(newValue)
a = a[delimiterIndex+1:]
print values
Output is:
['ac5', '9qr', '28c']
Of course you could implement something similar to retain the original 'a' string.
Related
I am new to python, and am trying to filter a string that looks similar to this:
"{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
And so on for 100s of three word sets.
I want to extract the second word from every set marked by "{ }", so in this example I want the output:
"Plant,Animal,Plant"
And so on.
How can I do it efficiently?
As of Right now I am using string.split(",")[1] individually for each "{ }" group.
Thanks.
This does the trick:
str_ = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
res = [x.split(',')[1] for x in str_[1:-1].split('}{')]
and produces
['Plant', 'Animal', 'Plant']
with the str_[1:-1] we remove the initial "{" and trailing "}" and we then split the remaining entities on every instance of "}{" thus producing:
["Red,Plant,Eel", "Blue,Animal,Maple", ...]
finally, for every string, we split on "," to obtain
[["Red", "Plant", "Eel"], ...]
from which we keep only the first element of each sublist with x[1].
Note that for your specific purpose, slicing the original string with str_[1:-1] is not mandatory (works without it as well), but if you wanted only the first instead of the second item it would make a difference. The same holds in case you wanted the 3rd.
If you want to concatenate the strings of the output to match your desired result, you can simply pass the resulting list to .join as follows:
out = ','.join(res)
which then gives you
"Plant,Animal,Plant"
Try This:
[i.split(',')[1] for i in str_[1:].split('}')[:len(str_.split('}'))-1]]
another solution is using regex, a bit more complicated, but it's a technique worth talking about:
import re
input_string = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
regex_string = "\{\w+\,(\w+)\,\w+\}"
result_list = re.findall(regex, input_string)
then result_list output is:
['Plant', 'Animal', 'Plant']
here's a link for regex in python
and an online regex editor
#!/bin/python3
string = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
a = string.replace('{','').replace('}',',').split(',')[1::3]
print(a)
result is
['Plant', 'Animal', 'Plant']
Let's say I have a string that looks like this:
myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
What I would like to obtain in the end would be:
myStr_l1 = '(Txt_l1) or (Txt2_l1)'
and
myStr_l2 = '(Txt_l2) or (Txt2_l2)'
Some properties:
all "Txt_"-elements of the string start with an uppercase letter
the string can contain much more elements (so there could also be Txt3, Txt4,...)
the suffixes '_l1' and '_l2' look different in reality; they cannot be used for matching (I chose them for demonstration purposes)
I found a way to get the first part done by using:
myStr_l1 = re.sub('\(\w+\)','',myStr)
which gives me
'(Txt_l1 ) or (Txt2_l1 )'
However, I don't know how to obtain myStr_l2. My idea was to remove everything between two open parentheses. But when I do something like this:
re.sub('\(w+\(', '', myStr)
the entire string is returned.
re.sub('\(.*\(', '', myStr)
removes - of course - far too much and gives me
'Txt2_l2))'
Does anyone have an idea how to get myStr_l2?
When there is an "and" instead of an "or", the strings look slightly different:
myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'
Then I can still use the command from above:
re.sub('\(\w+\)','',myStr2)
which gives:
'(Txt_l1 and Txt2_l1 )'
but I again fail to get myStr2_l2. How would I do this for these kind of strings?
And how would one then do this for mixed expressions with "and" and "or" e.g. like this:
myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))'
re.sub('\(\w+\)','',myStr3)
gives me
'(Txt_l1 and Txt2_l1 ) or (Txt3_l1 and Txt4_l1 )'
but again: How would I obtain myStr3_l2?
Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). You will have to write a parser. Look at https://pyparsing.wikispaces.com/
I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis.
import re
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
if mat:
noParens.append(mat.group(1))
noParens.append(mat.group(2).replace(')',''))
print(noParens)
This takes all the parenthesis away and puts your elements in a list. Here's an alternate way of doing it without using Regular Expressions.
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []
mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()
print(noParens)
I have a list with strings.
list_of_strings
They look like that:
'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
I want to part this string into:
/folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.
I thought to do something like that, but I am lacking the right snippet to do what I want:
list_of_stringparts = []
for string in sorted(list_of_strings):
part= string.split('/')[7] # or whatever returns the first part of my string
list_of_stringparts.append(part)
has anyone an idea? Do I need a regex?
You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:
>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split.
To construct your string back, use join():
>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'
I would do like this,
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Using re.match
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition():
list_of_stringparts = []
list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
head, sep, tail = s.partition('*')
list_of_stringparts.append(head + sep)
>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']
Or this equivalent list comprehension:
list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]
This will retain any string that does not contain a * - not sure from your question if that is desired.
How do I add a dot into a Python list?
For example
groups = [0.122, 0.1212, 0.2112]
If I want to output this data, how would I make it so it is like
122, 1212, 2112
I tried write(groups...[0]) and further research but didn't get far. Thanks.
Thankyou
[str(g).split(".")[1] for g in groups]
results in
['122', '1212', '2112']
Edit:
Use it like this:
groups = [0.122, 0.1212, 0.2112]
decimals = [str(g).split(".")[1] for g in groups]
You could use a list comprehension and return a list of strings
groups = [0.122, 0.1212, 0.2112]
[str(x).split(".")[1] for x in groups]
Result
['122', '1212', '2112']
The list comprehension is doing the following:
Turn each list element into a string
Split the string about the "." character
Return the substring to the right of the split
Return a list based on the above logic
This should do it:
groups = [0.122, 0.1212, 0.2112]
import re
groups_str = ", ".join([str(x) for x in groups])
re.sub('[0-9]*[.]', "", groups_str)
[str(x) for x in groups] will make strings of the items.
", ".join will connect the items, as a string.
import re allows you to replace regular expressions:
using re.sub, the regular expression is used by replacing any numbers followed by a dot by nothing.
EDIT (no extra modules):
Working with Lutz' answer, this will also work in the case there is an integer (no dot):
decimals = [str(g).split("0.") for g in groups]
decimals = decimals = [i for x in decimals for i in x if i != '']
It won't work though when you have numbers like 11.11, where there is a part you don't want to ignore in front of the dot.
HI i am trying to split a text
For example
'C:/bye1.txt'
i would want 'C:/bye1.txt' only
'C:/bye1.txt C:/hello1.txt'
i would want C:/hello1.txt only
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
i would want C:/bye3 and so on.
Code i tried, Problem is it only print out Y.
x = "Hello i am boey"
Last = x[len(x)-1]
print Last
Look at this:
>>> x = 'C:/bye1.txt'
>>> x.split()[-1]
'C:/bye1.txt'
>>> y = 'C:/bye1.txt C:/hello1.txt'
>>> y.split()[-1]
'C:/hello1.txt'
>>> z = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> z.split()[-1]
'C:/bye3'
>>>
Basically, you split the strings using str.split and then get the last item (that's what [-1] does).
x = "Hello i am boey".split()
Last = x[len(x)-1]
print Last
Though more Pythonic:
x = "Hello i am boey".split()
print x[-1]
Try:
>>> x = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> x.split()[-1]
'C:/bye3'
>>>
x[len()-1] returns you the last character of the list (in this case a list of characters). It is the same as x[-1]
To get the output you want, do this:
x.split()[-1]
If your sentences are separated by other delimiters, you can specify the delimeter to split like so:
delimeter = ','
item = x.split(delimeter)[-1] # Split list according to delimeter and get last item
In addition, the "lstrip()", "rstrip()", and "strip()" functions might be useful for removing unnecessary characters from the end of your string. Look them up in the python documentations.
Answer:
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'.split()[-1]
would give you 'C:/bye3'.
Details:
The split method without any parameter assumes space to be the delimiter. In the example above, it returns the following list:
['C:/bye1.txt', 'C:/hello1.txt', 'C:/bye2', 'C:/bye3']
An index of -1 specifies taking the first character in the reversed order (from the back).