python get difference from arrays - python

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here
tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]
Current Output:
['id123', 'id124', 'id125', 'id126', 'id789', 'id666']
Expected ouptut:
["id789","id666"]

Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.
invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff
output:
['id789-123', 'id666']
http://ideone.com/Q9JBw

Try sets:
invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)

>>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
>>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
>>> c = (s.split('-')[0] for s in b)
>>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]
['id789', 'id666']
>>>

Related

'for' loop does not loop correctly

a=[['kyle','movie_1','c_13'],
['blair','food','a_29'],
['reese','movie_2','abc_76']]
b=['df.movie_1',
'ghk.food',
'df.movie_2']
x = {}
for i in b:
y = i.split('.')
for j in a:
if y[1] in j : x[y[0]]=j
print(x)
This is my code to check if there is string inside a list a .
The output that I got is
{'df': ['reese', 'movie_2', 'abc_76'], 'ghk': ['blair', 'food', 'a_29']}
My desired output is
{'df': [['kyle','movie_1','c_13'],['reese', 'movie_2', 'abc_76']], 'ghk': ['blair', 'food', 'a_29']}
The cause is that the value would be cover when it exists x['df'].
You could use defaultdict to save them(A little different from you expect, though.But it is very easy):
from collections import defaultdict
a = [['kyle', 'movie_1', 'c_13'],
['blair', 'food', 'a_29'],
['reese', 'movie_2', 'abc_76']]
b = ['df.movie_1',
'ghk.food',
'df.movie_2']
x = defaultdict(list)
for i in b:
y = i.split('.')
for j in a:
if y[1] in j:
x[y[0]].append(j)
print(x)
# defaultdict(<class 'list'>, {'df': [['kyle', 'movie_1', 'c_13'], ['reese', 'movie_2', 'abc_76']], 'ghk': [['blair', 'food', 'a_29']]})
As mentioned in a previous answer, the problem is that your loops end up overwriting the value of x[y[0]]. Based on your desired output, what you need is to append to a list instead. There is already a nice solution using defaultdict. If instead you want to just use standard list, this is one way to do it:
a = [
['kyle','movie_1','c_13'],
['blair','food','a_29'],
['reese','movie_2','abc_76']]
b = [
'df.movie_1',
'ghk.food',
'df.movie_2']
x = {}
for i in b:
y = i.split('.')
for j in a:
if y[1] in j:
if y[0] not in x: # if this is the first time we append
x[y[0]] = [] # make it an empty list
x[y[0]].append(j) # then always append
print(x)
Hope This works:
A single line code
Code:
op_dict={}
[op_dict.setdefault(x.split('.')[0], []).append(y) for x in b for y in a if x.split('.')[1] in y]
Output:

Zip Lists together based on many to one relationship

I have two lists and I would like to find a way to link them together (I'm not sure the exact term for doing this) by zipping them.
In list one I have a series of tif files:
list1=['LT50300281984137PAC00_sr_band1.tif',
,'LT50300281984137PAC00_sr_band2.tif'
'LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif']
in list two I have two files:
list2=[LT50300281984137PAC00_mask.tif,LT50300281994260XXX03_mask.tif]
I want to zip the files in list one which start with LT50300281984137PAC00 to the file in list 2 which starts the same way, and the same for the files which start with LT50300281994260XXX03
The code I have tried is:
ziplist=zip(sorted(list1),sorted(list2)
but this returns:
[('LT50300281984137PAC00_sr_band1', 'LT50300281984137PAC00_mask.tif'), ('LT50300281984137PAC00_sr_band2', 'LT50300281994260XXX03_mask.tif')]
I would like this to be returned:
[('LT50300281984137PAC00_sr_band1',LT50300281984137PAC00_sr_band2,LT50300281984137PAC00_sr_band3, 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif','LT50300281994260XXX03_sr_band3.tif','LT50300281994260XXX03_mask.tif')]
You can use itertools.groupby:
from itertools import groupby
list1 = [
'LT50300281984137PAC00_sr_band1.tif',
'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif',
'LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif'
]
list2 = [
'LT50300281984137PAC00_mask.tif',
'LT50300281994260XXX03_mask.tif'
]
def extract_key(s):
return s[:s.index('_')]
l = sorted(list1 + list2, key=extract_key)
l = [tuple(items) for s, items in groupby(l, key=extract_key)]
Result:
[('LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif', 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif', 'LT50300281994260XXX03_mask.tif')]
The idea is to sort the union of the two lists by the first part of each filename (extract_key). Then use groupby to create groups of the same first part.
You can use list comprehensions and builtin function filter
In [24]: [tuple(filter(lambda x: x.startswith(e.split('_')[0]), list1)+[e]) for e in list2]
Out[24]:
[('LT50300281984137PAC00_sr_band1.tif',
'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif',
'LT50300281984137PAC00_mask.tif'),
('LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif',
'LT50300281994260XXX03_mask.tif')]
Can also be done using regex.
import re
list1=['LT50300281984137PAC00_sr_band1.tif'
,'LT50300281984137PAC00_sr_band2.tif',
'LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif',
'LT50300281994260XXX03_sr_band2.tif',
'LT50300281994260XXX03_sr_band3.tif']
list2=['LT50300281984137PAC00_mask.tif','LT50300281994260XXX03_mask.tif']
match = re.findall(r'(\b\w+(?:PAC00)\w+.\w+\b)'," ".join(list1))
tuple1 = tuple(match+[list2[0]])
match = re.findall(r'(\b\w+(?:0XXX0)\w+.\w+\b)'," ".join(list1))
tuple2 = tuple(match+[list2[1]])
print [tuple1,tuple2]
Output
[('LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif', 'LT50300281984137PAC00_mask.tif'), ('LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif', 'LT50300281994260XXX03_mask.tif')]
A dictionary will work better here, you can then later repurpose it for what you need:
results = {}
for f in list2:
common = f.split('_')[0]
results[common] = []
for f in list1:
common = f.split('_')[0]
try:
results[common].append(f)
except KeyError:
print('{} not a valid grouper'.format(common))
# To convert into a list of tuples
as_list = [(k,)+tuple(v) for k,v in results.iteritems()]
print(as_list)
I would use itertools.chain and itertools.groupby , with a lambda expression to take only till the first _ for the grouping. Example -
>>> from itertools import chain,groupby
>>> list1=['LT50300281984137PAC00_sr_band1.tif','LT50300281984137PAC00_sr_band2.tif','LT50300281984137PAC00_sr_band3.tif','LT50300281994260XXX03_sr_band1.tif','LT50300281994260XXX03_sr_band2.tif','LT50300281994260XXX03_sr_band3.tif']
>>> list2=['LT50300281984137PAC00_mask.tif','LT50300281994260XXX03_mask.tif']
>>>
>>> chained_sorted = sorted(chain(list1,list2))
>>> ret = []
>>> for i, x in groupby(chained_sorted,lambda x: x.split('_')[0]):
... ret.append(tuple(x))
...
>>> ret
[('LT50300281984137PAC00_mask.tif', 'LT50300281984137PAC00_sr_band1.tif', 'LT50300281984137PAC00_sr_band2.tif', 'LT50300281984137PAC00_sr_band3.tif'), ('LT50300281994260XXX03_mask.tif', 'LT50300281994260XXX03_sr_band1.tif', 'LT50300281994260XXX03_sr_band2.tif', 'LT50300281994260XXX03_sr_band3.tif')]
My first answer on StackOverflow, so please be patient. But I didn't see a need for zip()
mask1, mask2 = list2[0], list2[1]
for b in reversed(list1):
if b[0:20] in mask1:
mask1 = b + " " + mask1
else:
mask2 = b + " " + mask2
ziplist = [tuple(mask1.split()), tuple(mask2.split())]
I think ziplist should now be what you were asking for.

Extract some interested part of list value

From this list
List = ['/asd/dfg/ert.py','/wer/cde/xcv.img']
Got this
List = ['ert.py','xcv.img']
There's a low-level split-based approach:
>>> a = ['/asd/dfg/ert.py','/wer/cde/xcv.img']
>>> b = [elem.split("/")[-1] for elem in a]
>>> b
['ert.py', 'xcv.img']
Or a higher-level, more descriptive approach, which is probably more robust:
>>> import os
>>> b = [os.path.basename(filename) for filename in a]
>>> b
['ert.py', 'xcv.img']
Of course this assumes that I've guessed right about what you wanted; your example is somewhat underspecified.
$List = array('/asd/dfg/ert.py','/wer/cde/xcv.img');
$pattern = "#/.*/#";
foreach ($List AS $key => $str)
$List[$key] = preg_replace($pattern, '', $str);
print_r($List);

using FOR statement on 2 elements at once python

I have the following list of variables and a mastervariable
a = (1,5,7)
b = (1,3,5)
c = (2,2,2)
d = (5,2,8)
e = (5,5,8)
mastervariable = (3,2,5)
I'm trying to check if 2 elements in each variable exist in the master variable, such that the above would show B (3,5) and D (5,2) as being elements with at least 2 elements matching in the mastervariable. Also note that using sets would result in C showing up as matchign but I don't want to count C cause only 'one' of the elements in C are in mastervariable (i.e. 2 only shows up once in mastervariable not twice)
I currently have the very inefficient:
if current_variable[0]==mastervariable[0]:
if current_variable[1] = mastervariable[1]:
True
elif current_variable[2] = mastervariable[1]:
True
#### I don't use OR here because I need to know which variables match.
elif current_variable[1] == mastervariable[0]: ##<-- I'm now checking 2nd element
etc. etc.
I then continue to iterate like the above by checking each one at a time which is extremely inefficient. I did the above because using a FOR statement resulted in me checking the first element twice which was incorrect:
For i in a:
for j in a:
### this checked if 1 was in the master variable and not 1,5 or 1,7
Is there a way to use 2 FOR statement that allows me to check 2 elements in a list at once while skipping any element that has been used already? Alternatively, can you suggest an efficient way to do what I'm trying?
Edit: Mastervariable can have duplicates in it.
For the case where matching elements can be duplicated so that set breaks, use Counter as a multiset - the duplicates between a and master are found by:
count_a = Counter(a)
count_master = Counter(master)
count_both = count_a + count_master
dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
The logic is reasonably intuitive: if there's more of an item in the combined count of a and master, then it is duplicated, and the multiplicity is however many of that item are in whichever of a and master has less of them.
It gives a Counter of all the duplicates, where the count is their multiplicity. If you want it back as a tuple, you can do tuple(dups.elements()):
>>> a
(2, 2, 2)
>>> master
(1, 2, 2)
>>> dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
>>> tuple(dups.elements())
(2, 2)
Seems like a good job for sets. Edit: sets aren't suitable since mastervariable can contain duplicates. Here is a version using Counters.
>>> a = (1,5,7)
>>>
>>> b = (1,3,5)
>>>
>>> c = (2,2,2)
>>>
>>> d = (5,2,8)
>>>
>>> e = (5,5,8)
>>> D=dict(a=a, b=b, c=c, d=d, e=e)
>>>
>>> from collections import Counter
>>> mastervariable = (5,5,3)
>>> mvc = Counter(mastervariable)
>>> for k,v in D.items():
... vc = Counter(v)
... if sum(min(count, vc[item]) for item, count in mvc.items())==2:
... print k
...
b
e

editing List content in Python

I have a variable data:
data = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
I need to convert it to remove all occurrence of "b" in the list.
How can i do that?
Not sure whether it would help - but it works with your sample:
initList = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
resultList = [str(x) for x in initList ]
Or in P3:
resultList = [x.decode("utf-8") for x in initList ] # where utf-8 is encoding used
Check more on decode function.
Also you may want to take a look into the following related SO thread.
>>> a = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
>>> map(str, a)
['script', '-compiler', '123cds', '-algo', 'timing']
strin = "[b'script', b'-compiler', b'123cds', b'-algo', b'timing']"
arr = strin.strip('[]').split(', ')
res = [part.strip("b'") for part in arr]
>>> res
['script', '-compiler', '123cds', '-algo', 'timing']

Categories