Python substitute elements inside a list - python

I have the following code that is filtering and printing a list. The final output is json that is in the form of name.example.com. I want to substitute that with name.sub.example.com but I'm having a hard time actually doing that. filterIP is a working bit of code that removes elements entirely and I have been trying to re-use that bit to also modify elements, it doesn't have to be handled this way.
def filterIP(fullList):
regexIP = re.compile(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$')
return filter(lambda i: not regexIP.search(i), fullList)
def filterSub(fullList2):
regexSub = re.compile(r'example\.com, sub.example.com')
return filter(lambda i: regexSub.search(i), fullList2)
groups = {key : filterSub(filterIP(list(set(items)))) for (key, items) in groups.iteritems() }
print(self.json_format_dict(groups, pretty=True))
This is what I get without filterSub
"type_1": [
"server1.example.com",
"server2.example.com"
],
This is what I get with filterSub
"type_1": [],
This is what I'm trying to get
"type_1": [
"server1.sub.example.com",
"server2.sub.example.com"
],

The statement:
regexSub = re.compile(r'example\.com, sub.example.com')
doesn't do what you think it does. It creates a compiled regular expression that matches the string "example.com" followed by a comma, a space, the string "sub", an arbitrary character, the string "example", an arbitrary character, and the string "com". It does not create any sort of substitution.
Instead, you want to write something like this, using the re.sub function to perform the substitution and using map to apply it:
def filterSub(fullList2):
regexSub = re.compile(r'example\.com')
return map(lambda i: re.sub(regexSub, "sub.example.com", i),
filter(lambda i: re.search(regexSub, i), fullList2))

If the examples are all truly as simple as those you listed, a regex is probably overkill. A simple solution would be to use string .split and .join. This would likely give better performance.
First split the url at the first period:
url = 'server1.example.com'
split_url = url.split('.', 1)
# ['server1', 'example.com']
Then you can use the sub to rejoin the url:
subbed_url = '.sub.'.join(split_url)
# 'server1.sub.example.com'
Of course you can do the split and the join at the same time
'.sub.'.join(url.split('.', 1))
Or create a simple function:
def sub_url(url):
return '.sub.'.join(url.split('.', 1))
To apply this to the list you can take several approaches.
A list comprehension:
subbed_list = [sub_url(url)
for url in url_list]
Map it:
subbed_list = map(sub_url, url_list)
Or my favorite, a generator:
gen_subbed = (sub_url(url)
for url in url_list)
The last looks like a list comprehension but gives the added benefit that you don't rebuild the entire list. It processes the elements one item at a time as the generator is iterated through. If you decide you do need the list later you can simply convert it to a list as follows:
subbed_list = list(gen_subbed)

Related

How to search through a set(list) using if statement with regex

I'm trying to search through a large list and from what I gather, it's super fast to use a set rather than a normal list. My struggle is to use an if statement combined with regex to find the string in the list.
I've tried the following:
import re
search = re.compile('\[edit\s')
if search in set(list):
print('found')
I created a list that has '[edit interfaces]' as an element but it doesn't seem to locate it.
I think this might be what you are looking for
search = re.compile('\[edit\s')
l1 = ["that", '[edit interfaces]', "that"]
for v in set(l1):
if re.search(search, v):
print('found ' + v)
You have to check the elements of the set to see if they match the regular expression. You can just do that with a simple check inside the for loop.
I think maybe you can omit changing the list to a set and instead use python's build in filter function.
import re
search = re.compile('\[edit\s')
l1 = ["that", '[edit interfaces]', "that"]
res = filter(lambda x : re.search(search, x), l1)
print(list(res))
This method avoids looping through the list and uses python built in functions which are fast.

How to filter a list without converting to string or loop it

I've got an object of type list and second object of type string.
I would like to filter for all values in the list-object which do not match the value of the string-object.
I have created a loop which splits the list into string and with regex found all those not matching and added these results to a new list.
This example uses hostnames "ma-tsp-a01", "ma-tsp-a02" an "ma-tsp-a03".
Currently I do further work on this new list to create a clean list of hostnames.
import re
local_hostname = 'ma-tsp-a01'
profile_files = ['/path/to/file/TSP_D01_ma-tsp-a01\n', \
'/path/to/file/TSP_D02_ma-tsp-a02\n', \
'/path/to/file/TSP_ASCS00_ma-tsp-a01\n', \
'/path/to/file/TSP_DVEBMGS03_ma-tsp-a03\n', \
'/path/to/file/TSP_DVEBMGS01_ma-tsp-a01\n']
result_list = [local_hostname]
for list_obj in profile_files:
if re.search(".*\w{3}\_\w{1,7}\d{2}\_(?!"+local_hostname+").*", list_obj):
result_list.append(list_obj.split("/")[-1].splitlines()[0].\
split("_")[-1])
print(result_list)
At the end I get the following output
['ma-tsp-a01', 'ma-tsp-a02', 'ma-tsp-a03']. This looks exactly what I am searching for. But is there a way to make this in a more pythonic way without the "for" loop?
You can create a filter object:
filtered = filter(lambda x: re.search(".*\w{3}\_\w{1,7}\d{2}\_(?!"+local_hostname+").*", x), profile_files)
Or use a generator comprehension:
filtered = (x for x in profile_files if re.search(".*\w{3}\_\w{1,7}\d{2}\_(?!"+local_hostname+").*", x))
Both behave the same

All permutations of string using formatting

I have a string template and I want to generate filenames from it. It uses percent formatting with named placeholders right now, and there can be any number of parts to be replaced.
template = "image_%(uval)02d_%(vval)02d.%(frame)04d.tif"
I have an object containing the keys for placeholders, and lists of values:
params = {
"uval": [1,2],
"vval": [1,2],
"frame": [10,11]
}
And I want to generate permutations with formatting:
[
"image_01_01.0010.tif",
"image_01_01.0011.tif",
"image_01_02.0010.tif",
"image_01_02.0011.tif",
"image_02_01.0010.tif",
"image_02_01.0011.tif",
"image_02_02.0010.tif",
"image_02_02.0011.tif"
]
So I tried this:
def permutations(template, params):
# loop through params, each time replacing expanded with the
# new list of resolved filenames.
expanded = [template]
for param in params:
newlist = []
for filename in expanded:
for number in params[param]:
newlist.append(filename % {param: number})
expanded = newlist
return expanded
print permutations(template, params)
And the problem is:
newlist.append(filename % {param: number})
KeyError: 'uval'
As it replaces one key at a time, only one placeholder exists in each iteration, so those that are not present cause the error. Ideally while replacing one key it should leave the rest of the template untouched.
It works fine if there's only one placeholder of course:
template = "image.%(frame)04d.tif"
params = {"frame": [10, 11]}
print permutations(template, params)
Result: ['image.0010.tif', 'image.0011.tif']
I don't mind using a different system, but ideally I want the template string to be expressive and easy to reason about.
Ideas welcome
I'd use itertools.product to select the parameters, and for each combination, build a single dictionary to use in a formatting step that replaces all the placeholders at once:
import itertools
def permutations(template, params):
for vals in itertools.product(*params.values()):
substituion_dict = dict(zip(params, vals))
yield template % substituion_dict
This is a generator function, so it returns an iterator rather than a list of results. In order to print it, you'll need to pass the iterator to list first. But if your real code is going to do something else (like looping over the results in a for loop, doing something with each one), you may not need to create the list at all. You can just loop on the iterator from the generator function directly.

How can I sort list of strings in specific order?

Let's say I have such a list:
['word_4_0_w_7',
'word_4_0_w_6',
'word_3_0_w_10',
'word_3_0_w_2']
and I want to sort them according to number that comes after "word" and according to number after "w".
It will look like this:
['word_3_0_w_2',
'word_3_0_w_10',
'word_4_0_w_6',
'word_4_0_w_7']
What comes in mind is to create a bunch of list and according to index after "word" stuff them with sorted strings according "w", and then merge them.
Is in Python more clever way to do it?
Use Python's key functionality, in conjunction with other answers:
def mykey(value):
ls = value.split("_")
return int(ls[1]), int(ls[-1])
newlist = sorted(firstlist, key=mykey)
## or, if you want it in place:
firstlist.sort(key=mykey)
Python will be more efficient with key vs cmp.
You can provide a function to the sort() method of list objects:
l = ['word_4_0_w_7',
'word_4_0_w_6',
'word_3_0_w_10',
'word_3_0_w_2']
def my_key_func(x):
xx = x.split("_")
return (int(xx[1]), int(xx[-1]))
l.sort(key=my_key_func)
Output:
print l
['word_3_0_w_2', 'word_3_0_w_10', 'word_4_0_w_6', 'word_4_0_w_7']
edit: Changed code according to comment by #dwanderson ; more info on this can be found here.
You can use a function to extract the relevant parts of your string and then use those parts to sort:
a = ['word_4_0_w_7', 'word_4_0_w_6', 'word_3_0_w_10', 'word_3_0_w_2']
def sort_func(x):
parts = x.split('_');
sort_key = parts[1]+parts[2]+"%02d"%int(parts[4])
return sort_key
a_sorted = sorted(a,key=sort_func)
The expression "%02d" %int(x.split('_')[4]) is used to add a leading zero in front of second number otherwise 10 will sort before 2. You may have to do the same with the number extracted by x.split('_')[2].

How to compare an element of a tuple (int) to determine if it exists in a list

I have the two following lists:
# List of tuples representing the index of resources and their unique properties
# Format of (ID,Name,Prefix)
resource_types=[('0','Group','0'),('1','User','1'),('2','Filter','2'),('3','Agent','3'),('4','Asset','4'),('5','Rule','5'),('6','KBase','6'),('7','Case','7'),('8','Note','8'),('9','Report','9'),('10','ArchivedReport',':'),('11','Scheduled Task',';'),('12','Profile','<'),('13','User Shared Accessible Group','='),('14','User Accessible Group','>'),('15','Database Table Schema','?'),('16','Unassigned Resources Group','#'),('17','File','A'),('18','Snapshot','B'),('19','Data Monitor','C'),('20','Viewer Configuration','D'),('21','Instrument','E'),('22','Dashboard','F'),('23','Destination','G'),('24','Active List','H'),('25','Virtual Root','I'),('26','Vulnerability','J'),('27','Search Group','K'),('28','Pattern','L'),('29','Zone','M'),('30','Asset Range','N'),('31','Asset Category','O'),('32','Partition','P'),('33','Active Channel','Q'),('34','Stage','R'),('35','Customer','S'),('36','Field','T'),('37','Field Set','U'),('38','Scanned Report','V'),('39','Location','W'),('40','Network','X'),('41','Focused Report','Y'),('42','Escalation Level','Z'),('43','Query','['),('44','Report Template ','\\'),('45','Session List',']'),('46','Trend','^'),('47','Package','_'),('48','RESERVED','`'),('49','PROJECT_TEMPLATE','a'),('50','Attachments','b'),('51','Query Viewer','c'),('52','Use Case','d'),('53','Integration Configuration','e'),('54','Integration Command f'),('55','Integration Target','g'),('56','Actor','h'),('57','Category Model','i'),('58','Permission','j')]
# This is a list of resource ID's that we do not want to reference directly, ever.
unwanted_resource_types=[0,1,3,10,11,12,13,14,15,16,18,20,21,23,25,27,28,32,35,38,41,47,48,49,50,57,58]
I'm attempting to compare the two in order to build a third list containing the 'Name' of each unique resource type that currently exists in unwanted_resource_types. e.g. The final result list should be:
result = ['Group','User','Agent','ArchivedReport','ScheduledTask','...','...']
I've tried the following that (I thought) should work:
result = []
for res in resource_types:
if res[0] in unwanted_resource_types:
result.append(res[1])
and when that failed to populate result I also tried:
result = []
for res in resource_types:
for type in unwanted_resource_types:
if res[0] == type:
result.append(res[1])
also to no avail. Is there something i'm missing? I believe this would be the right place to perform list comprehension, but that's still in my grey basket of understanding fully (The Python docs are a bit too succinct for me in this case).
I'm also open to completely rethinking this problem, but I do need to retain the list of tuples as it's used elsewhere in the script. Thank you for any assistance you may provide.
Your resource types are using strings, and your unwanted resources are using ints, so you'll need to do some conversion to make it work.
Try this:
result = []
for res in resource_types:
if int(res[0]) in unwanted_resource_types:
result.append(res[1])
or using a list comprehension:
result = [item[1] for item in resource_types if int(item[0]) in unwanted_resource_types]
The numbers in resource_types are numbers contained within strings, whereas the numbers in unwanted_resource_types are plain numbers, so your comparison is failing. This should work:
result = []
for res in resource_types:
if int( res[0] ) in unwanted_resource_types:
result.append(res[1])
The problem is that your triples contain strings and your unwanted resources contain numbers, change the data to
resource_types=[(0,'Group','0'), ...
or use int() to convert the strings to ints before comparison, and it should work. Your result can be computed with a list comprehension as in
result=[rt[1] for rt in resource_types if int(rt[0]) in unwanted_resource_types]
If you change ('0', ...) into (0, ... you can leave out the int() call.
Additionally, you may change the unwanted_resource_types variable into a set, like
unwanted_resource_types=set([0,1,3, ... ])
to improve speed (if speed is an issue, else it's unimportant).
The one-liner:
result = map(lambda x: dict(map(lambda a: (int(a[0]), a[1]), resource_types))[x], unwanted_resource_types)
without any explicit loop does the job.
Ok - you don't want to use this in production code - but it's fun. ;-)
Comment:
The inner dict(map(lambda a: (int(a[0]), a[1]), resource_types)) creates a dictionary from the input data:
{0: 'Group', 1: 'User', 2: 'Filter', 3: 'Agent', ...
The outer map chooses the names from the dictionary.

Categories