Extract some interested part of list value - python

From this list
List = ['/asd/dfg/ert.py','/wer/cde/xcv.img']
Got this
List = ['ert.py','xcv.img']

There's a low-level split-based approach:
>>> a = ['/asd/dfg/ert.py','/wer/cde/xcv.img']
>>> b = [elem.split("/")[-1] for elem in a]
>>> b
['ert.py', 'xcv.img']
Or a higher-level, more descriptive approach, which is probably more robust:
>>> import os
>>> b = [os.path.basename(filename) for filename in a]
>>> b
['ert.py', 'xcv.img']
Of course this assumes that I've guessed right about what you wanted; your example is somewhat underspecified.

$List = array('/asd/dfg/ert.py','/wer/cde/xcv.img');
$pattern = "#/.*/#";
foreach ($List AS $key => $str)
$List[$key] = preg_replace($pattern, '', $str);
print_r($List);

Related

how to split after each word and get the following string in an organized way?

Given the following string:
'hello0192239world0912903spam209394'
I would like to be able to split the above string into this
hello, 0192239, world, 0912903, spam, 209394
and ideally end with a list:
[hello, 0192239], [world, 0912903], [spam, 209394]
But I just don't know how to go about even the first step, splitting by word x number. I know there's the split method and something called regex but I don't know how to use it and even if it's the right thing to use
Try this:
>>> lst = re.split('(\d+)','hello0192239world0912903spam209394')
>>> list(zip(lst[::2],lst[1::2]))
[('hello', '0192239'), ('world', '0912903'), ('spam', '209394')]
>>> lst = re.split('(\d+)','09182hello2349283world892')
>>> list(zip(lst[::2],lst[1::2]))
[('', '09182'), ('hello', '2349283'), ('world', '892')]
# as a list
>>> list(map(list,zip(lst[::2],lst[1::2])))
[['', '09182'], ['hello', '2349283'], ['world', '892']]
See below. The idea is to maintain a 'mode' and flip mode every time you switch from digit to char or the other way around.
data = 'hello0192239world0912903spam209394'
A = 'A'
D = 'D'
mode = D if data[0].isdigit() else A
holder = []
tmp = []
for x in data:
if mode == A:
is_digit = x.isdigit()
if is_digit:
mode = D
holder.append(''.join(tmp))
tmp = [x]
continue
else:
is_char = not x.isdigit()
if is_char:
mode = A
holder.append(''.join(tmp))
tmp = [x]
continue
tmp.append(x)
holder.append(''.join(tmp))
print(holder)
output
['hello', '0192239', 'world', '0912903', 'spam', '209394']

Zip Lists together based on many to one relationship

I have two lists and I would like to find a way to link them together (I'm not sure the exact term for doing this) by zipping them.
In list one I have a series of tif files:
list1=['LT50300281984137PAC00_sr_band1.tif',
,'LT50300281984137PAC00_sr_band2.tif'
'LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif']
in list two I have two files:
list2=[LT50300281984137PAC00_mask.tif,LT50300281994260XXX03_mask.tif]
I want to zip the files in list one which start with LT50300281984137PAC00 to the file in list 2 which starts the same way, and the same for the files which start with LT50300281994260XXX03
The code I have tried is:
ziplist=zip(sorted(list1),sorted(list2)
but this returns:
[('LT50300281984137PAC00_sr_band1', 'LT50300281984137PAC00_mask.tif'), ('LT50300281984137PAC00_sr_band2', 'LT50300281994260XXX03_mask.tif')]
I would like this to be returned:
[('LT50300281984137PAC00_sr_band1',LT50300281984137PAC00_sr_band2,LT50300281984137PAC00_sr_band3, 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif','LT50300281994260XXX03_sr_band3.tif','LT50300281994260XXX03_mask.tif')]
You can use itertools.groupby:
from itertools import groupby
list1 = [
'LT50300281984137PAC00_sr_band1.tif',
'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif',
'LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif'
]
list2 = [
'LT50300281984137PAC00_mask.tif',
'LT50300281994260XXX03_mask.tif'
]
def extract_key(s):
return s[:s.index('_')]
l = sorted(list1 + list2, key=extract_key)
l = [tuple(items) for s, items in groupby(l, key=extract_key)]
Result:
[('LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif', 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif', 'LT50300281994260XXX03_mask.tif')]
The idea is to sort the union of the two lists by the first part of each filename (extract_key). Then use groupby to create groups of the same first part.
You can use list comprehensions and builtin function filter
In [24]: [tuple(filter(lambda x: x.startswith(e.split('_')[0]), list1)+[e]) for e in list2]
Out[24]:
[('LT50300281984137PAC00_sr_band1.tif',
'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif',
'LT50300281984137PAC00_mask.tif'),
('LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif',
'LT50300281994260XXX03_mask.tif')]
Can also be done using regex.
import re
list1=['LT50300281984137PAC00_sr_band1.tif'
,'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif']
list2=['LT50300281984137PAC00_mask.tif','LT50300281994260XXX03_mask.tif']
match = re.findall(r'(\b\w+(?:PAC00)\w+.\w+\b)'," ".join(list1))
tuple1 = tuple(match+[list2[0]])
match = re.findall(r'(\b\w+(?:0XXX0)\w+.\w+\b)'," ".join(list1))
tuple2 = tuple(match+[list2[1]])
print [tuple1,tuple2]
Output
[('LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif', 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif', 'LT50300281994260XXX03_mask.tif')]
A dictionary will work better here, you can then later repurpose it for what you need:
results = {}
for f in list2:
common = f.split('_')[0]
results[common] = []
for f in list1:
common = f.split('_')[0]
try:
results[common].append(f)
except KeyError:
print('{} not a valid grouper'.format(common))
# To convert into a list of tuples
as_list = [(k,)+tuple(v) for k,v in results.iteritems()]
print(as_list)
I would use itertools.chain and itertools.groupby , with a lambda expression to take only till the first _ for the grouping. Example -
>>> from itertools import chain,groupby
>>> list1=['LT50300281984137PAC00_sr_band1.tif','LT50300281984137PAC00_sr_band2.tif','LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif','LT50300281994260XXX03_sr_band2.tif','LT50300281994260XXX03_sr_band3.tif']
>>> list2=['LT50300281984137PAC00_mask.tif','LT50300281994260XXX03_mask.tif']
>>>
>>> chained_sorted = sorted(chain(list1,list2))
>>> ret = []
>>> for i, x in groupby(chained_sorted,lambda x: x.split('_')[0]):
... ret.append(tuple(x))
...
>>> ret
[('LT50300281984137PAC00_mask.tif', 'LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif'), ('LT50300281994260XXX03_mask.tif', 'LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif')]
My first answer on StackOverflow, so please be patient. But I didn't see a need for zip()
mask1, mask2 = list2[0], list2[1]
for b in reversed(list1):
if b[0:20] in mask1:
mask1 = b + " " + mask1
else:
mask2 = b + " " + mask2
ziplist = [tuple(mask1.split()), tuple(mask2.split())]
I think ziplist should now be what you were asking for.

Programmatically figuring out if translated names are equivalent

I'm trying to see if two translated names are equivalent. Sometimes the translation will have the names ordered differently. For example:
>>> import difflib
>>> a = 'Yuk-shing Au'
>>> b = 'Au Yuk Sing'
>>> seq=difflib.SequenceMatcher(a=a.lower(), b=b.lower())
>>> seq.ratio()
0.6086956521739131
'Yuk-Shing Au' and 'Au Yuk Sing' are the same person. Is there a way to detect something like this, such that the ratio for names like this will be much higher? Similar to the result for:
>>> a = 'Yuk-shing Au'
>>> b = 'Yuk Sing Au'
>>> seq=difflib.SequenceMatcher(a=a.lower(), b=b.lower())
>>> seq.ratio()
0.8181818181818182
You can normalize the ordering of names before comparing:
def normalize(name):
name_parts = name.replace("-", " ").split()
return " ".join(sorted(name_parts)).lower()

python get difference from arrays

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here
tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]
Current Output:
['id123', 'id124', 'id125', 'id126', 'id789', 'id666']
Expected ouptut:
["id789","id666"]
Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.
invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff
output:
['id789-123', 'id666']
http://ideone.com/Q9JBw
Try sets:
invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)
>>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
>>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
>>> c = (s.split('-')[0] for s in b)
>>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]
['id789', 'id666']
>>>

editing List content in Python

I have a variable data:
data = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
I need to convert it to remove all occurrence of "b" in the list.
How can i do that?
Not sure whether it would help - but it works with your sample:
initList = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
resultList = [str(x) for x in initList ]
Or in P3:
resultList = [x.decode("utf-8") for x in initList ] # where utf-8 is encoding used
Check more on decode function.
Also you may want to take a look into the following related SO thread.
>>> a = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
>>> map(str, a)
['script', '-compiler', '123cds', '-algo', 'timing']
strin = "[b'script', b'-compiler', b'123cds', b'-algo', b'timing']"
arr = strin.strip('[]').split(', ')
res = [part.strip("b'") for part in arr]
>>> res
['script', '-compiler', '123cds', '-algo', 'timing']

Categories