I want to produce a list of possible websites from two lists:
strings = ["string1", "string2", "string3"]
tlds = ["com', "net", "org"]
to produce the following output:
string1.com
string1.net
string1.org
string2.com
string2.net
string2.org
I've got to this:
for i in strings:
print i + tlds[0:]
But I can't concatenate str and list objects. How can I join these?
itertools.product is designed for this purpose.
url_tuples = itertools.product(strings, tlds)
urls = ['.'.join(url_tuple) for url_tuple in url_tuples]
print(urls)
A (nested) list comprehension would be another alternative:
[s + '.' + tld for s in strings for tld in tlds]
The itertools module provides a function that does this.
from itertools import product
urls = [".".join(elem) for elem in product(strings, tlds)]
The urls variable now holds this list:
['string1.com',
'string1.net',
'string1.org',
'string2.com',
'string2.net',
'string2.org',
'string3.com',
'string3.net',
'string3.org']
One very simple way to write this is the same as in most other languages.
for s in strings:
for t in tlds:
print s + '.' + t
Related
I am struggling to generate a list of substrings from a given list of strings.
I have a list of domains -
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
Now, for each domain in the list I want to generate subdomains.
For example the subdomains of domain "abc.pqr.com" would be
["pqr.com", "abc.pqr.com"]
Also, for domain "pqr.yum.abc.com" the subdomains would be
["yum.abc.com", "pqr.yum.abc.com", "abc.com"]
So the out put of the method would be -
["yum.abc.com", "pqr.yum.abc.com", "abc.com", "pqr.com", "abc.pqr.com"]
First you have to iterate on elements then split your element by the '.' seperator. After that in order to keep the 'com' element intact, we iterate on the range - 1. After creating every alternative, we join the result again with the seperator "."
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
domains_new = []
for d in domains:
liste = d.split(".")
for i in range(len(liste)-1):
domains_new.append(liste[i:])
domains_new = [".".join(ele) for ele in domains_new]
domains_new
output:
['abc.pqr.com', 'pqr.com', 'pqr.yum.abc.com', 'yum.abc.com', 'abc.com']
Assuming the domains only contain simple tlds like .com and no second-level domains like .co.uk, you can use a python list comprehension.
[domain.split(".", x)[-1] for domain in domains for x in range(domain.count("."))]
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
lst = []
for i in domains:
splits = i.split('.')
for j in range(len(splits),1,-1):
lst.append('.'.join(splits[-j:]))
I did something similar to https://stackoverflow.com/users/12959241/alphabetagamma
What's a cute way to do this in python?
Say we have a list of strings:
clean_be
clean_be_al
clean_fish_po
clean_po
and we want the output to be:
be
be_al
fish_po
po
Another approach which will work for all scenarios:
import re
data = ['clean_be',
'clean_be_al',
'clean_fish_po',
'clean_po', 'clean_a', 'clean_clean', 'clean_clean_1']
for item in data:
item = re.sub('^clean_', '', item)
print (item)
Output:
be
be_al
fish_po
po
a
clean
clean_1
Here is a possible solution that works with any prefix:
prefix = 'clean_'
result = [s[len(prefix):] if s.startswith(prefix) else s for s in lst]
You've merely provided minimal information on what you're trying to achieve, but the desired output for the 4 given inputs can be created via the following function:
def func(string):
return "_".join(string.split("_")[1:])
you can do this:
strlist = ['clean_be','clean_be_al','clean_fish_po','clean_po']
def func(myList:list, start:str):
ret = []
for element in myList:
ret.append(element.lstrip(start))
return ret
print(func(strlist, 'clean_'))
I hope, it was useful, Nohab
There are many ways to do based on what you have provided.
Apart from the above answers, you can do in this way too:
string = 'clean_be_al'
string = string.replace('clean_','',1)
This would remove the first occurrence of clean_ in the string.
Also if the first word is guaranteed to be 'clean', then you can try in this way too:
string = 'clean_be_al'
print(string[6:])
You can use lstrip to remove a prefix and rstrip to remove a suffix
line = "clean_be"
print(line.lstrip("clean_"))
Drawback:
lstrip([chars])
The [chars] argument is not a prefix; rather, all combinations of its values are stripped.
I have a list of strings with two different prefixes that I would like to remove.
example_list=[
'/test1/test2/test3/ABCD_1',
'/test1/test2/test3/ABCD_2',
'/test1/test2/test3/ABCD_3',
'/test1/test4/test5/test6/ABCD_4',
'/test1/test4/test5/test6/ABCD_5',
'/test1/test4/test5/test6/ABCD_6',
'/test1/test4/test5/test6/ABCD_7']
I would like the new list to look like:
example_list=[
'ABCD_1',
'ABCD_2',
'ABCD_3',
'ABCD_4',
'ABCD_5',
'ABCD_6',
'ABCD_7']
I was trying something like this, but keep running into errors.
for i in example_list:
if i.startswith('/test1/test2/test3/'):
i=i[19:]
else:
i=i[25:]
example_list = [path.split('/')[-1] for path in example_list]
Output:
['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']
given that these are all filesystem paths i suggest you use pathlib:
from pathlib import Path
example_list = [
'/test1/test2/test3/ABCD_1',
'/test1/test2/test3/ABCD_2',
'/test1/test2/test3/ABCD_3',
'/test1/test4/test5/test6/ABCD_4',
'/test1/test4/test5/test6/ABCD_5',
'/test1/test4/test5/test6/ABCD_6',
'/test1/test4/test5/test6/ABCD_7']
res = [Path(item).name for item in example_list]
print(res) # ['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']
Just use reverse indexing:
new_list=[]
for i in example_list:
j=i[-6:]
new_list.append(j)
print(new_list)
Output will be
['ABCD_1', 'ABCD_2', 'ABCD_3', 'ABCD_4', 'ABCD_5', 'ABCD_6', 'ABCD_7']
I have a list files of strings of the following format:
files = ['/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5',
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5', ...]
I want to extract the int between iter_ and .caffemodel and return a list of those ints.
After some research I came up with this solution that does the trick, but I was wondering if there is a more elegant/pythonic way to do it, possibly using a list comprehension?
li = []
for f in files:
tmp = re.search('iter_[\d]+.caffemodel', f).group()
li.append(int(re.search(r'\d+', tmp).group()))
Just to add another possible solution: join the file names together into one big string (looks like the all end with h5, so there is no danger of creating unwanted matches) and use re.findall on that:
import re
li = [int(d) for d in re.findall(r'iter_(\d+)\.caffemodel', ''.join(files))]
Use just:
li = []
for f in files:
tmp = int(re.search('iter_(\d+)\.caffemodel', f).group(1))
li.append(tmp)
If you put an expression into parenthesis it creates another group of matched expressions.
You can also use a lookbehind assertion:
regex = re.compile("(?<=iter_)\d+")
for f in files:
number = regex.search(f).group(0)
Solution with list comprehension, as you wished:
import re
re_model_id = re.compile(r'iter_(?P<model_id>\d+).caffemodel')
li = [int(re_model_id.search(f).group('model_id')) for f in files]
Without a regex:
files = [
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5',
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5']
print([f.rsplit("_", 1)[1].split(".", 1)[0] for f in files])
['418000', '502000']
Or if you want to be more specific:
print([f.rsplit("iter_", 1)[1].split(".caffemodel", 1)[0] for f in files])
But your pattern seems to repeat so the first solution is probably sufficient.
You can also slice using find and rfind:
print( [f[f.find("iter_")+5: f.rfind("caffe")-1] for f in files])
['418000', '502000']
I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.
The string that comes back from Enom looks somewhat like this:
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1
I'd like to build a list from that which looks like this:
[domain1.com, domain2.org, domain3.co.uk, domain4.net]
To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.
re.findall("^.*(SLD|TLD).*$", enom, re.M)
Edit:
Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.
Here is the naive approach:
a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
print ".".join(c[x:x+2])
>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net
You have a capturing group in your expression. re.findall documentation says:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
That's why only the conent of the capturing group is returned.
try:
re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)
This would return a list of tuples:
[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]
Combining SLDs and TLDs is then up to you.
this works for you example,
>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?
A famous quote seems to be appropriate here:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
domains = []
components = []
for line in enom.split('\n'):
k,v = line.split('=')
if k == 'TLDOverride':
continue
components.append(v)
if k.startswith('TLD'):
domains.append('.'.join(components))
components = []
P.S. I'm not sure what's this TLDOverride so the code just ignores it.
Here's one way:
import re
print map('.'.join, zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
Just for fun, map -> filter -> map:
input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""
splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)
>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
This appears to do what you want:
domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))
It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:
d = dict(x.split('=') for x in enom.strip().splitlines())
domains = [
d[key] + '.' + d.get('T' + key[1:], '')
for key in d if key.startswith('SLD')
]
You need to use multiline regex for this. This is similar to this post.
data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
domain, tld = item.group(1), item.group(2)
print "%s.%s" % (domain,tld)
As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:
lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]