I am new to python, and am trying to filter a string that looks similar to this:
"{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
And so on for 100s of three word sets.
I want to extract the second word from every set marked by "{ }", so in this example I want the output:
"Plant,Animal,Plant"
And so on.
How can I do it efficiently?
As of Right now I am using string.split(",")[1] individually for each "{ }" group.
Thanks.
This does the trick:
str_ = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
res = [x.split(',')[1] for x in str_[1:-1].split('}{')]
and produces
['Plant', 'Animal', 'Plant']
with the str_[1:-1] we remove the initial "{" and trailing "}" and we then split the remaining entities on every instance of "}{" thus producing:
["Red,Plant,Eel", "Blue,Animal,Maple", ...]
finally, for every string, we split on "," to obtain
[["Red", "Plant", "Eel"], ...]
from which we keep only the first element of each sublist with x[1].
Note that for your specific purpose, slicing the original string with str_[1:-1] is not mandatory (works without it as well), but if you wanted only the first instead of the second item it would make a difference. The same holds in case you wanted the 3rd.
If you want to concatenate the strings of the output to match your desired result, you can simply pass the resulting list to .join as follows:
out = ','.join(res)
which then gives you
"Plant,Animal,Plant"
Try This:
[i.split(',')[1] for i in str_[1:].split('}')[:len(str_.split('}'))-1]]
another solution is using regex, a bit more complicated, but it's a technique worth talking about:
import re
input_string = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
regex_string = "\{\w+\,(\w+)\,\w+\}"
result_list = re.findall(regex, input_string)
then result_list output is:
['Plant', 'Animal', 'Plant']
here's a link for regex in python
and an online regex editor
#!/bin/python3
string = "{Red,Plant,Eel}{Blue,Animal,Maple}{Yellow,Plant,Crab}"
a = string.replace('{','').replace('}',',').split(',')[1::3]
print(a)
result is
['Plant', 'Animal', 'Plant']
Related
I have a few lines of string e.g:
AR0003242303
TR0402304004
CR0402340404
I want to create a dictionary from these lines.
And I need to create change it in regex to:
KOLAORM0003242303
KOLTORM0402304004
KOLCORM0402340404
So i need to split first 2 characters, before PUT KOL, between PUT O, and Afer second char put M. How can i reach it. Through many attempts I lose patience with the regex and unfortunately I now I have no time to learn it better now. Need some result now :(
Could someone help me with this case?
Using re.sub --> re.sub(r"^([A-Z])([A-Z])", r"KOL\1O\2M", string)
Ex:
import re
s = ["AR0003242303", "TR0402304004", "CR0402340404"]
for i in s:
print( re.sub(r"^([A-Z])([A-Z])", r"KOL\1O\2M", i) )
Output:
KOLAORM0003242303
KOLTORM0402304004
KOLCORM0402340404
You don't need regex for this, you can do it with getting the list of characters from the string, recreate the list, and join the string back
def get_convert_s(s):
li = list(s)
li = ['KOL', li[0], '0', li[1], 'M', *li[2:]]
return ''.join(li)
print(get_convert_s('AR0003242303'))
#KOLA0RM0003242303
print(get_convert_s('TR0402304004'))
#KOLT0RM0402304004
print(get_convert_s('CR0402340404'))
#KOLC0RM0402340404
import re
regex = re.compile(r"([A-Z])([A-Z])([0-9]+)")
inputs = [
'AR0003242303',
'TR0402304004',
'CR0402340404'
]
results = []
for input in inputs:
matches = re.match(regex, input)
groups = matches.groups()
results.append('KOL{}O{}M{}'.format(*groups))
print(results)
Assuming the length of the strings in your list will always be the same Devesh answers is pretty much the best approach (no reason to overcomplicate it).
My solution is similar to Devesh, I just like writing functions as oneliners:
list = ["AR0003242303", "TR0402304004", "CR0402340404"]
def convert_s(s):
return "KOL"+s[0]+"0"+s[1]+"M"+s[2:]
for str in list:
print(convert_s(str));
Altough it returns the same output.
I have this code:
def add_peer_function():
all_devices=[]
all_devices.append("cisco,linux")
print(all_devices)
add_peer_function()
Which results in :
['cisco,linux']
My question is how can append the list without qota. So a result like this:
[cisco,router]
Well, I know two possible ways, but the first one is faster:
1:
def add_peer_function():
all_devices=[item for item in "cisco,linux".split(',')] # or `all_devices = ["cisco", "linux"]`
print(', '.join(all_devices)) # A prettier way to print list Thanks to Philo
add_peer_function()
2:
def add_peer_function():
all_devices=[]
for item in "cisco,linux".split(','): # or `all_devices = ["cisco", "linux"]`
all_devices.append(item)
print(', '.join(all_devices)) # A prettier way to print list Thanks to Philo
add_peer_function()
Python str.split documentation.
Python str.join documentation.
Python list comprehension documentation.
Python prints objects, by default, with its convention: strings are between quotes.
If you want to get another format, you can write your own formatter.
For lists of strings, a common pattern in Python is:
my_list = ['one', 'two', 'three']
print(', '.join(my_list))
Replace ', ' by another separator, eventually.
Finally, note that "cisco,linux" is just a string with a coma, which is different from a list of strings: ["cisco", "linux"].
Of course, if you append the string 'cisco,linux' to a list, you get ['cisco,linux'] which is the string representation of this list in Python.
What you what is to split the string.
Try:
>>> 'cisco,linux'.split(',')
['cisco', 'linux']
append accepts only one argument. so, your_list.append(something) will add something to your_list. you can however do sth like below.
your_list += [el for el in "cisco,linux".split(",")]
I have a string with a lot of recurrencies of a single pattern like
a = 'eresQQQutnohnQQQjkhjhnmQQQlkj'
and I have another string like
b = 'rerTTTytu'
I want to substitute the entire second string having as a reference the 'QQQ' and the 'TTT', and I want to find in this case 3 different results:
'ererTTTytuohnQQQjkhjhnmQQQlkj'
'eresQQQutnrerTTTytujhnmQQQlkj'
'eresQQQutnohnQQQjkhjrerTTTytu'
I've tried using re.sub
re.sub('\w{3}QQQ\w{3}' ,b,a)
but I obtain only the first one, and I don't know how to get the other two solutions.
Edit: As you requested, the two characters surrounding 'QQQ' will be replaced as well now.
I don't know if this is the most elegant or simplest solution for the problem, but it works:
import re
# Find all occurences of ??QQQ?? in a - where ? is any character
matches = [x.start() for x in re.finditer('\S{2}QQQ\S{2}', a)]
# Replace each ??QQQ?? with b
results = [a[:idx] + re.sub('\S{2}QQQ\S{2}', b, a[idx:], 1) for idx in matches]
print(results)
Output
['errerTTTytunohnQQQjkhjhnmQQQlkj',
'eresQQQutnorerTTTytuhjhnmQQQlkj',
'eresQQQutnohnQQQjkhjhrerTTTytuj']
Since you didn't specify the output format, I just put it in a list.
This is how the string splitting works for me right now:
output = string.encode('UTF8').split('}/n}')[0]
output += '}\n}'
But I am wondering if there is a more pythonic way to do it.
The goal is to get everything before this '}/n}' including '}/n}'.
This might be a good use of str.partition.
string = '012za}/n}ddfsdfk'
parts = string.partition('}/n}')
# ('012za', '}/n}', 'ddfsdfk')
''.join(parts[:-1])
# 012za}/n}
Or, you can find it explicitly with str.index.
repl = '}/n}'
string[:string.index(repl) + len(repl)]
# 012za}/n}
This is probably better than using str.find since an exception will be raised if the substring isn't found, rather than producing nonsensical results.
It seems like anything "more elegant" would require regular expressions.
import re
re.search('(.*?}/n})', string).group(0)
# 012za}/n}
It can be done with with re.split() -- the key is putting parens around the split pattern to preserve what you split on:
import re
output = "".join(re.split(r'(}/n})', string.encode('UTF8'))[:2])
However, I doubt that this is either the most efficient nor most Pythonic way to achieve what you want. I.e. I don't think this is naturally a split sort of problem. For example:
tag = '}/n}'
encoded = string.encode('UTF8')
output = encoded[:encoded.index(tag)] + tag
or if you insist on a one-liner:
output = (lambda string, tag: string[:string.index(tag)] + tag)(string.encode('UTF8'), '}/n}')
or returning to regex:
output = re.match(r".*}/n}", string.encode('UTF8')).group(0)
>>> string_to_split = 'first item{\n{second item'
>>> sep = '{\n{'
>>> output = [item + sep for item in string_to_split.split(sep)]
NOTE: output = ['first item{\n{', 'second item{\n{']
then you can use the result:
for item_with_delimiter in output:
...
It might be useful to look up os.linesep if you're not sure what the line ending will be. os.linesep is whatever the line ending is under your current OS, so '\r\n' under Windows or '\n' under Linux or Mac. It depends where input data is from, and how flexible your code needs to be across environments.
Adapted from Slice a string after a certain phrase?, you can combine find and slice to get the first part of the string and retain }/n}.
str = "012za}/n}ddfsdfk"
str[:str.find("}/n}")+4]
Will result in 012za}/n}
Let's say I have a string that looks like this:
myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
What I would like to obtain in the end would be:
myStr_l1 = '(Txt_l1) or (Txt2_l1)'
and
myStr_l2 = '(Txt_l2) or (Txt2_l2)'
Some properties:
all "Txt_"-elements of the string start with an uppercase letter
the string can contain much more elements (so there could also be Txt3, Txt4,...)
the suffixes '_l1' and '_l2' look different in reality; they cannot be used for matching (I chose them for demonstration purposes)
I found a way to get the first part done by using:
myStr_l1 = re.sub('\(\w+\)','',myStr)
which gives me
'(Txt_l1 ) or (Txt2_l1 )'
However, I don't know how to obtain myStr_l2. My idea was to remove everything between two open parentheses. But when I do something like this:
re.sub('\(w+\(', '', myStr)
the entire string is returned.
re.sub('\(.*\(', '', myStr)
removes - of course - far too much and gives me
'Txt2_l2))'
Does anyone have an idea how to get myStr_l2?
When there is an "and" instead of an "or", the strings look slightly different:
myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'
Then I can still use the command from above:
re.sub('\(\w+\)','',myStr2)
which gives:
'(Txt_l1 and Txt2_l1 )'
but I again fail to get myStr2_l2. How would I do this for these kind of strings?
And how would one then do this for mixed expressions with "and" and "or" e.g. like this:
myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))'
re.sub('\(\w+\)','',myStr3)
gives me
'(Txt_l1 and Txt2_l1 ) or (Txt3_l1 and Txt4_l1 )'
but again: How would I obtain myStr3_l2?
Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). You will have to write a parser. Look at https://pyparsing.wikispaces.com/
I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis.
import re
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
if mat:
noParens.append(mat.group(1))
noParens.append(mat.group(2).replace(')',''))
print(noParens)
This takes all the parenthesis away and puts your elements in a list. Here's an alternate way of doing it without using Regular Expressions.
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []
mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()
print(noParens)