match the pattern at the end of a string? - python

Imagine I have the following strings:
['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
Where the "c" string has important sub-categories (L1, L2, L3). These indicate special data for our purposes that have been generated in a program based a pre-designated string "L". In other words, I know that the special entries should have the form:
name_Lnumber
Knowing that I'm looking for this pattern, and that I am using "L" or more specifically "_L" as my designation of these objects, how could I return a list of entries that meet this condition? In this case:
['c', 'e']

Use a simple filter:
>>> l = ['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
>>> filter(lambda x: "_L" in x, l)
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Alternatively, use a list comprehension
>>> [s for s in l if "_L" in s]
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Since you need the prefix only, you can just split it:
>>> set(s.split("_")[0] for s in l if "_L" in s)
set(['c', 'e'])

you can use the following list comprehension :
>>> set(i.split('_')[0] for i in l if '_L' in i)
set(['c', 'e'])
Or if you want to match the elements that ends with _L(digit) and not something like _Lm you can use regex :
>>> import re
>>> set(i.split('_')[0] for i in l if re.match(r'.*?_L\d$',i))
set(['c', 'e'])

Related

Splitting string values in list into individual values, Python

I have a list of values in which some values are words separated by commas, but are considered single strings as shown:
l = ["a",
"b,c",
"d,e,f"]
#end result should be
#new_list = ['a','b','c','d','e','f']
I want to split those strings and was wondering if there's a one liner or something short to do such a mutation. So far what, I was thinking of just iterating through l and .split(',')-ing all the elements then merging, but that seems like it would take a while to run.
import itertools
new_list = list(itertools.chain(*[x.split(',') for x in l]))
print(new_list)
>>> ['a', 'b', 'c', 'd', 'e', 'f']
Kind of unusual but you could join all your elements with , and then split them:
l = ["a",
"b,c",
"d,e,f"]
newList = ','.join(l).split(',')
print(newList)
Output:
['a', 'b', 'c', 'd', 'e', 'f']
Here's a one-liner using a (nested) list comprehension:
new_list = [item for csv in l for item in csv.split(',')]
See it run here.
Not exactly a one-liner, but 2 lines:
>>> l = ["a",
"b,c",
"d,e,f"]
>>> ll =[]
>>> [ll.extend(x.split(',')) for x in l]
[None, None, None]
>>> ll
['a', 'b', 'c', 'd', 'e', 'f']
The accumulator needs to be created separately since x.split(',') can not be unpacked inside a comprehension.

Converting all letters in a document from one list to another in Python

I have two lists
list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
I have a variable with random text in it.
var = 'backout'
I want to convert all letters in the variable that exist in list1 to the letters in list2.
expectedOutput = 'edfkout'
Is there a way to do this?
You want to use str.translate using a translation table from string.maketrans (This is str.maketrans in Python 3)
from string import maketrans
s1 = 'abc'
s2 = 'def'
table = maketrans(s1, s2)
print('backout'.translate(table))
Edit:
Note that we have to use strings instead of lists as our arguments to maketrans.
We can map the keys to values using a zip()wrapped with dict() and then iterate the letters and map them to their corresponding ones with themselves being the default (in case not found):
keys = ['a', 'b', 'c']
values = ['d', 'e', 'f']
mapper = dict(zip(keys, values))
var = 'backout'
output = "".join([mapper.get(k, k) for k in var])
print(output) # edfkout
Convert each char to its ascii value using map(ord,list1) and do the 1-1 mapping b/w list1 and list2 using zip
tbl = dict(zip(map(ord,list1), map(ord,list2)))
var.translate(tbl)
Output:
edfkout
For people who prefer shorter but more complicated solution (instead of using the translate() method), here it is:
list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
var = 'backout'
trans_dict = dict(zip(list1, list2))
out_string = ''.join([trans_dict.get(ch, ch) for ch in var])
The explanation:
dict(zip(list1, list2))
creates the {'a': 'd', 'b': 'e', 'c': 'f'} dictionary.
trans_dict.get(ch, ch)
The 1st argument is a key - if it is found in keys, we obtain its value: trans_dict[ch]
The 2nd argument is a default value - used if the first argument is not found in keys. We obtain ch.
[trans_dict.get(ch, ch) for ch in var]
is a list comprehension - something as creating a list from the empty list, appending next and next element in the for loop.
''.join(list_of_string)
is a standard way for concatenating individual elements of the list (in our case, the individual characters).
(Instead of the empty string there may be an arbitrary string - it is used for delimiting individual elements in the concatenated string)

Split the word by each character python 3.5 [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
I tried using all the methods suggested by others but its not working.
methods like str.split(), lst = list("abcd") but its throwing error saying [TypeError: 'list' object is not callable]
I want to convert string to list for each character in the word
input str= "abc" should give list = ['a','b','c']
I want to get the characters of the str in form of list
output - ['a','b','c','d','e','f'] but its giving ['abcdef']
str = "abcdef"
l = str.split()
print l
First, don't use list as a variable name. It will prevent you from doing what you want, because it will shadow the list class name.
You can do this by simply constructing a list from the string:
l = list('abcedf')
sets l to the list ['a', 'b', 'c', 'e', 'd', 'f']
First of all, don't use list as name of the variable in your program. It is a defined keyword in python and it is not a good practice.
If you had,
str = 'a b c d e f g'
then,
list = str.split()
print list
>>>['a', 'b', 'c', 'd', 'e', 'f', 'g']
Since split by default will work on spaces, it Will give what you need.
In your case, you can just use,
print list(s)
>>>['a', 'b', 'c', 'd', 'e', 'f', 'g']
Q. "I want to convert string to list for each character in the word"
A. You can use a simple list comprehension.
Input:
new_str = "abcdef"
[character for character in new_str]
Output:
['a', 'b', 'c', 'd', 'e', 'f']
Just use a for loop.
input:::
str="abc"
li=[]
for i in str:
li.append(i)
print(li)
#use list function instead of for loop
print(list(str))
output:::
["a","b","c"]
["a","b","c"]

Split a string in Python having parenthesis (multiple splitters)

I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]

string splitting after every other comma in string in python

I have string which contains every word separated by comma. I want to split the string by every other comma in python. How should I do this?
eg, "xyz,abc,jkl,pqr" should give "xyzabc" as one string and "jklpqr" as another string
It's probably easier to split on every comma, and then rejoin pairs
>>> original = 'a,1,b,2,c,3'
>>> s = original.split(',')
>>> s
['a', '1', 'b', '2', 'c', '3']
>>> alternate = map(''.join, zip(s[::2], s[1::2]))
>>> alternate
['a1', 'b2', 'c3']
Is that what you wanted?
Split, and rejoin.
So, to split:
In [173]: "a,b,c,d".split(',')
Out[173]: ['a', 'b', 'c', 'd']
And to rejoin:
In [193]: z = iter("a,b,c,d".split(','))
In [194]: [a+b for a,b in zip(*([z]*2))]
Out[194]: ['ab', 'cd']
This works because ([z]*2) is a list of two elements, both of which are the same iterator z. Thus, zip takes the first, then second element from z to create each tuple.
This also works as a oneliner, because in [foo]*n foo is evaluated only once, whether or not it is a variable or a more complex expression:
In [195]: [a+b for a,b in zip(*[iter("a,b,c,d".split(','))]*2)]
Out[195]: ['ab', 'cd']
I've also cut out a pair of brackets, because unary * has lower precedence than binary *.
Thanks to #pillmuncher for pointing out that this can be extended with izip_longest to handle lists with an odd number of elements:
In [214]: from itertools import izip_longest
In [215]: [a+b for a,b in izip_longest(*[iter("a,b,c,d,e".split(','))]*2, fillvalue='')]
Out[215]: ['ab', 'cd', 'e']
(See: http://docs.python.org/library/itertools.html#itertools.izip_longest )
str = "a,b,c,d".split(",")
print ["".join(str[i:i+2]) for i in range(0, len(str), 2)]
Just split on every comma, then combine it back:
splitList = someString.split(",")
joinedString = ','.join([splitList[i - 1] + splitList[i] for i in range(1, len(splitList), 2)]
Like this
"a,b,c,d".split(',')

Categories