I have 3 list
list1= [min_0,min_1...min_150] consists of minimum indexes which has usually has 50-150 elements,
list2= [max_0,max_1...max_150] consists of maximum indexes which has usually has 50-150 elements,
list3= [min_0,max_0,max_1,min_1 ...max_149,min_150]
list3 is joint of list1 and list2 and it is ordered. list3 has generally has 200-300 elements.
I want to create 5 elements [x0,x1,x2,x3,x4] combinations from list3 that fits to 2 conditions with itertools of python.
condition 1: x0,x2 and x4 must in list1 and x1, x3 must in list2 or x0,x2, x4 must in list2 and x1, x3 must in list1
condition 2 : x4-x0 <=89
the problem is performance. Possible combinations for (300,5) is 19,582,837,560 . I have tried the split list3 to n parts and get some good performance but in this case, I have missed some possibilities which fit my conditions.
I hope the question is clear. How can I get the best performance? thanks.
In order to avoid billions of iterations, you will need to simplify the combination domains. This will be easier using sets.
So lets say your 3 lists are actually sets:
set1 = set(list1)
set2 = set(list2)
set3 = set(list3)
You have two patterns to look for:
Lets do part 1:
elements of list3 where x0,x2,x4 are in list1 and x1,x3 are in list2
x0,x2,x4 will be combinations of 3 out of set3 & set1
x1,x3 will be combinations of 2 out of set3 & set2
The 5 value tuples will be the product of these combinations:
part1 = { (x0,x1,x2,x3,x4) for x0,x2,x4 in combinations(set3&set1,3)
if x4-x0 <= 89
for x1,x3 in combinations(set3&set2,2) }
the second part uses the same approach but with the odd/even elements from the other lists:
part2 = { (x0,x1,x2,x3,x4) for x0,x2,x4 in combinations(set3&set2,3)
if x4-x0 <= 89
for x1,x3 in combinations(set3&set1,2) }
And the result is the union of the two parts:
result = part1 | part2
Depending on the data, this could still be in the millions of combinations but this method will greatly reduce the number of invalid combinations that need to be filtered out by conditions.
If that is still not fast enough, you should consider writing your own combinations function to optimize applying the set3 filter and x4-x0<89 condition within the combinatory logic. (i.e. a 3 level nested loop giving (x0,x4,x2) that skips x4 values that don't fit the condition, preferably from sorted lists to allow short circuiting)
Note that if any of your lists contain duplicate values, you will definitely need to write your own filtering and combination functions to obtain pre-filtered subsets before multiplying 3-tuple and 2-tuple combinations
[EDIT] here is an example of how to write the custom combination function. I made it a generator in order to avoid creating a result set with a hundred million elements. It only generates valid combinations and applies condition 2 as early as possible to avoid useless iterations through invalid combinations:
m = 150
n = 200
list1 = list(range(m))
list2 = list(range(m,2*m))
list3 = list(range(2,2*n,2))
def combine(L1,L2,L3):
S3 =set(L3)
inL1 = [x for x in L1 if x in S3]
inL2 = [x for x in L2 if x in S3]
for evens,odds in [(inL1,inL2),(inL2,inL1)]: # only generate valid combinations
for p0,x0 in enumerate(evens[:-2]):
for p4,x4 in enumerate(evens[p0+2:],p0+2):
if abs(x4-x0)>89: continue # short circuit condition 2 early
for x2 in evens[p0+1:p4]:
for p1,x1 in enumerate(odds[:-1]):
for x3 in odds[p1+1:]:
yield (x0,x1,x2,x3,x4)
print(sum(1 for _ in combine(list1,list2,list3))) # 230488170
The 230,488,170 combinations were produced in 22 seconds on my laptop.
Here are the first few combinations in my example:
for combo in combine(list1,list2,list3): print(combo)
(2, 150, 4, 152, 6)
(2, 150, 4, 154, 6)
(2, 150, 4, 156, 6)
(2, 150, 4, 158, 6)
(2, 150, 4, 160, 6)
(2, 150, 4, 162, 6)
(2, 150, 4, 164, 6)
(2, 150, 4, 166, 6)
(2, 150, 4, 168, 6)
(2, 150, 4, 170, 6)
(2, 150, 4, 172, 6)
(2, 150, 4, 174, 6) ...
KeyboardInterrupt
If you get hundreds of millions of valid combinations, you may need to rethink the way you're processing the data because you will run into performance and memory problems at every corner.
use a function for condition 1. then apply it onto condition 2. that way condition 1 well have precise usage.
Related
I find myself in a unique situation in which I need to multiply single elements within a listed pair of numbers where each pair is nested within a parent list of elements. For example, I have my pre-defined variables as:
output = []
initial_list = [[1,2],[3,4],[5,6]]
I am trying to calculate an output such that each element is the product of a unique combination (always of length len(initial_list)) of a single element from each pair. Using my example of initial_list, I am looking to generate an output of length pow(2 * len(initial_list)) that is scable for any "n" number of pairs in initial_list (with a minimum of 2 pairs). So in this case each element of the output would be as follows:
output[0] = 1 * 3 * 5
output[1] = 1 * 3 * 6
output[2] = 1 * 4 * 5
output[3] = 1 * 4 * 6
output[4] = 2 * 3 * 5
output[5] = 2 * 3 * 6
output[6] = 2 * 4 * 5
output[7] = 2 * 4 * 6
In my specific case, the order of output assignments does not matter other than output[0], which I need to be equivalent to the product of the first element in each pair in initial_list. What is the best way to proceed to generate an output list such that each element is a unique combination of every element in each list?
...
My initial approach consisted of using;
from itertools import combinations
from itertools import permutations
from itertools import product
to somehow generate a list of every possible combination then multiply the products together and append each product to the output list, but I couldn't figure out a wait to implement the tools successfully. I have since tried to create a recursive function that combines for x in range(2): with nested recursion recalls, but once again I cannot figured out a solution.
Someone more experienced and smarter than me please help me out; Any and all help is appreciated! Thank you!
Without using any external library
def multi_comb(my_list):
"""
This returns the multiplication of
every possible combinationation of
the `my_list` of type [[a1, a2], [b1, b2], ...]
Arg: List
Return: List
"""
if not my_list: return [1]
a, b = my_list.pop(0)
result = multi_comb(my_list)
left = [a * i for i in result]
right = [b * i for i in result]
return (left + right)
print(multi_comb([[1, 2], [3, 4], [5, 6]]))
# Output
# [15, 18, 20, 24, 30, 36, 40, 48]
I am using reccursion to get the result. Here's the visual illustration of how this works.
Instead of taking a top-down approach, we can take bottom-up approach to better understand how this program works.
At the last step, a and b becomes 5 and 6 respectively. Calling multi_comb() with empty list returns [1] as a result. So left and right becomes [5] and [6]. Thus we return [5, 6] to our previous step.
At the second last step, a and b was 3 and 4 respectively. From the last step we got [5, 6] as a result. After multiplying each of the values inside the result with a and b (notice left and right), we return the result [15, 18, 20, 24] to our previous step.
At our first step, that is our starting step, we had a and b as 1 and 2 respectively. The value returned from our last step becomes our result, ie, [15, 18, 20, 24]. Now we multiply both a and b with this result and return our final output.
Note:
This program works only if list is in the form [ [a1, a2], [b1, b2], [c1, c2], ... ] as told by the OP in the comments. The problem of solving the list containing the sub-list of n items will be little different in code, but the concept is same as in this answer.
This problem can also be solved using dynamic programming
output = [1, ]
for arr in initial_list:
output = [a * b for a in arr for b in product]
This problem is easy to solve if you have just one subarray -- the output is the given subarray.
Suppose you solved the problem for the first n - 1 subarrays, and you got the output. The new subarray is appended. How the output should change? The new output is all pair-wise products of the previous output and the "new" subarray.
Look closely, there's an easy pattern. Let there be n sublists, and 2 elements in each: at index 0 and 1. Now, the indexes selected can be represented as a binary string of length n.
It'll start with 0000..000, then 0000...001, 0000...010 and so on. So all you need to do is:
n = len(lst)
for i in range(2**n):
binary = bin(i)[2:] #get binary representation
for j in range(n):
if binary[j]=="1":
#include jth list's 1st index in product
else:
#include jth list's 0th index in product
The problem would a scalable solution would be, since you're generating all possible pairs, the time complexity will be O(2^N)
Your idea to use itertools.product is great!
import itertools
initial_list = [[1,2],[3,4],[5,6]]
combinations = list(itertools.product(*initial_list))
# [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]
Now, you can get the product of each tuple in combination using for-loops, or using functools.reduce, or you can use math.prod which was introduced in python 3.8:
import itertools
import math
initial_list = [[1,2],[3,4],[5,6]]
output = [math.prod(c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
import functools
import operator
initial_list = [[1,2],[3,4],[5,6]]
output = [functools.reduce(operator.mul, c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
output = []
for c in itertools.product(*initial_list):
p = 1
for x in c:
p *= x
output.append(p)
# output == [15, 18, 20, 24, 30, 36, 40, 48]
Note: if you are more familiar with lambdas, operator.mul is pretty much equivalent to lambda x,y: x*y.
itertools.product and math.prod are a nice fit -
from itertools import product
from math import prod
input = [[1,2],[3,4],[5,6]]
output = [prod(x) for x in product(*input)]
print(output)
[15, 18, 20, 24, 30, 36, 40, 48]
I have a function to find common, uncommon items and its rates between a given list (one list) and other
lists (60,000 lists) for each user (4,000 users). Running below loop takes too long time and high momery usage
with partial list construction and crash. I think due to the long returned list and heavy elements (tuples),
so I divided it into two functions as below , but it seems the problem in appending list items in the tuple,
[(user, [items],rate),(user, [items],rate),....]. I want to create a dataframes from returned values,
What should I do to an algorithm to get around this matter and reduce memory usage?
Iam using python 3.7, windows 10, 64-bit , RAM 8G.
common items function:
def common_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
return user, com_items, com_items_rate
uncommon items function:
def uncommon_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
uncom_items = list(set(list2) - set(com_items)) # uncommon items that blonge to list2
uncom_items_rate = len(uncom_items)/len(set(list1).union(set(list2)))
return user, com_items_rate, uncom_items, uncom_items_rate # common_items_rate is also needed
Constructing the list:
common_item_rate_tuple_list = []
for usr in users: # users.shape = 4,000
list1 = get_user_list(usr) # a function to get list1, it takes 0:00:00.015632 or less for a user
# print(usr, len(list1))
for list2 in df["list2"]: # df.shape = 60,000
common_item_rate_tuple = common_items(usr,list1, list2)
common_item_rate_tuple_list.append(common_item_rate_tuple)
print(len(common_item_rate_tuple_list)) # 4,000 * 60,000 = 240,000,000 items
# sample of common_item_rate_tuple_list:
#[(1,[2,5,8], 0.676), (1,[7,4], 0.788), ....(4000,[1,5,7,9],0.318), (4000,[8,9,6],0.521)
I looked at (Memory errors and list limits?) and
(Memory error when appending to list in Python) they deal with constructed list. And I couldnot deal with suggested answer for (Python list memory error).
There are a couple things you should consider for speed and memory management with data this big.
you are or should be working only with sets here because order has no meaning in your lists and you are doing a lot of intersecting of sets. So, can you change your get_user_list() function to return a set instead of a list? That will prevent all of the unnecessary conversions you are doing. Same for list2, just make a set right away
In your look for "uncommon items" you should just use the symmetric difference operator on the sets. Much faster, many less list -> set conversions
at the end of your loop, do you really want to create a list of 240M sub-lists? That is probably your memory explosion. I would suggest a dictionary with keys as user name. and you only need to create an entry in it if there are common items. If there are "sparse" matches, you will get a very much smaller data container
--- Edit w/ example
So I think your hope of keeping it in a data frame is too big. Perhaps you can do what is needed without storing it in a data frame. Dictionary makes sense. You may even be able to compute things "on the fly" and not store the data. Anyhow. Here is a toy example that shows the memory problem using 4K users and 10K "other lists". Of course the size of the intersected sets may make this vary, but it is informative:
import sys
import pandas as pd
# create list of users by index
users = list(range(4000))
match_data = list()
size_list2 = 10_000
for user in users:
for t in range(size_list2):
match_data.append(( user, (1,5,6,9), 0.55)) # 4 dummy matches and fake percentage
print(match_data[:4])
print(f'size of match: {sys.getsizeof(match_data)/1_000_000} MB')
df = pd.DataFrame(match_data)
print(df.head())
print(f'size of dataframe {sys.getsizeof(df)/1_000_000} MB')
This yields the following:
[(0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55)]
size of match: 335.072536 MB
0 1 2
0 0 (1, 5, 6, 9) 0.55
1 0 (1, 5, 6, 9) 0.55
2 0 (1, 5, 6, 9) 0.55
3 0 (1, 5, 6, 9) 0.55
4 0 (1, 5, 6, 9) 0.55
size of dataframe 3200.00016 MB
You can see that a nutshell of your idea for only 10K other lists is 3.2GB in a dataframe. This will be unmanageable.
Here is an idea for a data structure just to use dictionaries all the way.
del df
# just keep it in a dictionary
data = {} # intended format: key= (usr, other_list) : value= [common elements]
# some fake data
user_items = { 1: {2,3,5,7,99},
2: {3,5,88,790},
3: {2,4,100} }
# some fake "list 2 data"
list2 = [ {1,2,3,4,5},
{88, 100},
{11, 13, 200}]
for user in user_items.keys():
for idx, other_set in enumerate(list2): # using enumerate to get the index of the other list
common_elements = user_items.get(user) & other_set # set intersection
if common_elements: # only put it into the dictionary if it is not empty
data[(user, idx)] = common_elements
# try a couple data pulls
print(f'for user 1 and other list 0: {data.get((1, 0))}')
print(f'for user 2 and other list 2: {data.get((2, 2))}') # use .get() to be safe. It will return None if no entry
The output here is:
for user 1 and other list 0: {2, 3, 5}
for user 2 and other list 2: None
Your other alternative if you are going to be working with this data a lot is just to put these tables into a database like sqlite which is built in and won't bomb out your memory.
This problem is largely the same as a classic 0-1 knapsack problem, but with some minor rule changes and a large dataset to play with.
Dataset (product ID, price, length, width, height, weight):
(20,000 rows)
Problem:
A company is closing in fast on delivering its 1 millionth order. The marketing team decides to give the customer who makes that order a prize as a gesture of appreciation. The prize is: the lucky customer gets a delivery tote and 1 hour in the warehouse. Use the hour to fill up the tote with any products you desire and take them home for free.
Rules:
1 of each item
Combined volume < tote capacity (45 * 30 * 25 = 47250)
Item must fit individually (Dimensions are such that it can fit into the tote, e.g. 45 * 45 * 1 wouldn't fit)
Maximize value of combined products
Minimize weight on draws
Solution (using dynamic programming):
from functools import reduce
# The main solver function
def Solver(myItems, myCapacity):
dp = {myCapacity: (0, (), 0)}
getKeys = dp.keys
for i in range(len(myItems)):
itemID, itemValue, itemVolume, itemWeight = myItems[i]
for oldVolume in list(getKeys()):
newVolume = oldVolume - itemVolume
if newVolume >= 0:
myValue, ListOfItems, myWeight = dp[oldVolume]
node = (myValue + itemValue, ListOfItems + (itemID,), myWeight + itemWeight)
if newVolume not in dp:
dp[newVolume] = node
else:
currentValue, loi, currentWeight = dp[newVolume]
if currentValue < node[0] or (currentValue == node[0] and node[-1] < currentWeight):
dp[newVolume] = node
return max(dp.values())
# Generate the product of all elements within a given list
def List_Multiply(myList):
return reduce(lambda x, y: x * y, myList)
toteDims = [30, 35, 45]
totalVolume = List_Multiply(toteDims)
productsList = []
with open('products.csv', 'r') as myFile:
for myLine in myFile:
myData = [int(x) for x in myLine.strip().split(',')]
itemDims = [myDim for myDim, maxDim in zip(sorted(myData[2:5]), toteDims) if myDim <= maxDim]
if len(itemDims) == 3:
productsList.append((myData[0], myData[1], List_Multiply(myData[2:5]), myData[5]))
print(Solver(productsList, totalVolume))
Issue:
The output is giving repeated items
ie. (14018, (26, 40, 62, 64, 121, 121, 121, 152, 152), 13869)
How can I correct this to make it choose only 1 of each item?
It seems that the reason your code may produce answers with duplicate items is that in the inner loop, when you iterate over all generated volumes so far, it is possible for the code to have replaced the solution for an existing volume value before we get there.
E.g. if your productsList contained the following
productsList = [
# id, value, volume, weight
[1, 1, 2, 1],
[2, 1, 3, 2],
[3, 3, 5, 1]
]
and
totalVolume = 10
then by the time you got to the third item, dp.keys() would contain:
10, 8, 7, 5
The order of iteration is not guaranteed, but for the sake of this example, let's assume it is as given above. Then dp[5] would be replaced by a new solution containing item #3, and later in the iteration, we would be using that as a base for a new, even better solution (except now with a duplicate item).
To overcome the above problem, you could sort the keys before the iteration (in ascending order, which is the default), like for oldVolume in sorted(getKeys()). Assuming all items have a non-negative volume, this should guarantee that we never replace a solution in dp before we have iterated over it.
Another possible problem I see above is the way we get the optimal solution at the end using max(dp.values()). In the problem statement, it says that we want to minimize weight in the case of a draw. If I'm reading the code correctly, the elements of the tuple are value, list of items, weight in that order, so below we're tied for value, but the latter choice would be preferable because of the smaller weight... however max returns the first one:
>>> max([(4, (2, 3), 3), (4, (1, 3), 2)])
(4, (2, 3), 3)
It's possible to specify the sorting key to max so something like this might work:
>>> max([(4, (2, 3), 3), (4, (1, 3), 2)], key=lambda x: (x[0], -x[-1]))
(4, (1, 3), 2)
I'm having a math brain fart moment, and google has failed to answer my quandary.
Given a sequence or list of 2 item tuples (from a Counter object), how do I quickly and elegantly get python to spit out a linear sequence or array of all the possible combinations of those tuples? My goal is trying to find the combinations of results from a Counter object.....
For example clarity, if I have this sequence:
[(500, 2), (250, 1)]
Doing this example out manually by hand, it should yield these results:
250, 500, 750, 1000, 1250.
Basically, I THINK it's a*b for the range of b and then add the resulting lists together...
I've tried this (where c=Counter object):
res = [[k*(j+1) for j in range(c[k])] for k in c]
And it will give me back:
res = [[250], [500, 1000]]
So far so good, it's going through each tuple and multiplying x * y for each y... But the resulting list isn't full of all the combinations yet, the first list [250] needs to be added to each element of the second list. This would be the case for any number of results I believe.
Now I think I need to take each list in this result list and add it to the other elements in the other lists in turn. Am I going about this wrong? I swear there should be a simpler way. I feel there should be a way to do this in a one line list comp.
Is the solution recursive? Is there a magic import or builtin method I don't know about? My head hurts......
I'm not entirely sure I follow you, but maybe you're looking for something like
from itertools import product
def lincombs(s):
terms, ffs = zip(*s)
factors = product(*(range(f+1) for f in ffs))
outs = (sum(v*f for v,f in zip(terms, ff)) for ff in factors if any(ff))
return outs
which gives
>>> list(lincombs([(500, 2), (250, 1)]))
[250, 500, 750, 1000, 1250]
>>> list(lincombs([(100, 3), (10, 3)]))
[10, 20, 30, 100, 110, 120, 130, 200, 210, 220, 230, 300, 310, 320, 330]
v*f multiplication from #DSM's answer could be avoided:
>>> from itertools import product
>>> terms = [(500, 2), (250, 1)]
>>> map(sum, product(*[xrange(0, v*a+1, v) for v, a in terms]))
[0, 250, 500, 750, 1000, 1250]
To get a sorted output without duplicates:
from itertools import groupby, imap
from operator import itemgetter
it = imap(itemgetter(0), groupby(sorted(it)))
though sorted(set(it)) that you use is ok in this case.
I have a large file with each line of the form
a b c
I would like to remove all such lines where there does not exist another line either like
b d e
or d a e
with abs(c - e) < 10.
a, b, c, d, e are all integers.
For example if the input is:
0 1 10
1 2 20
2 3 25
0 1 15
1 4 40
then the output should be
1 2 20
2 3 25
0 1 15
Is it possible to do this in anything like linear time?
One idea is to create two dictionaries of sorted lists. One for the third column values associated with first column values. The other for the third column values associated with second column values. Then when you see a b c, look up c in the sorted list you get using key a in the second dictionary and then c in the sorted list you get using key b in the first dictionary.
I don't know if this can be done in linear time. It is straightforward to do it in O(n·log n) time if there are n triplets in the input. Here is a sketch of a method, in a not-necessarily-preferred form of implementation:
Make an array of markers M, initially all clear.
Create an array and make a copy of the input, sorted first on the middle element and then by the third element whenever middle elements are equal. (Time is O(n·log n) so far.)
For each distinct middle value, make a BST (binary search tree) with key = third element. (Time is O(n·log n) again.)
Make a hash table keyed by middle values, with data pointing at appropriate BST's. That is, given a middle value y and third element z, in time O(1) we can get to the BST for triplets whose middle value is y; and from that, in time O(log n) can find the triplet with third-element value closest to z.
For each triplet t = (x,y,z) in turn, if marker is not yet set use the hash table to find the BST, if any, corresponding to x. In that BST, find the triplet u with third element closest to z. If difference is less than 10, set the markers for t and u. (Time is O(n·log n) again.)
Repeat steps 2–5 but with BST's based on first element values rather than middle value, and lookups in step 5 based on y rather than x. (Although the matching-relations are symmetric, so that we can set two markers at each cycle in step 5, some qualifying triplets may end up not marked; ie, they are in tolerance but more distant than the nearest-match that is found. It would be possible to mark all of the qualifying triplets in step 5, but that would increase worst-case time from O(n·log n) to O(n²·log n).)
For each marker that is set, output the corresponding triplet.
Overall time: O(n·log n). The same time can be achieved without building BST's but instead using binary searches within subranges of the sorted arrays.
Edit: In python, one can build structures usable with bisect as illustrated below in excerpts from an ipython interpreter session. (There may be more efficient ways of doing these steps.) Each data item in dictionary h is an array suitable for searching with bisect.
In [1]: from itertools import groupby
In [2]: a=[(0,1,10), (1,2,20), (2,3,25), (0,1,15), (1,4,40), (1,4,33), (3,3,17), (2,1,19)]
In [3]: b=sorted((e[1],e[2],i) for i,e in enumerate(a)); print b
[(1, 10, 0), (1, 15, 3), (1, 19, 7), (2, 20, 1), (3, 17, 6), (3, 25, 2), (4, 33, 5), (4, 40, 4)]
In [4]: h={k:list(g) for k,g in groupby(b,lambda x: x[0])}; h
Out[4]:
{1: [(1, 10, 0), (1, 15, 3), (1, 19, 7)],
2: [(2, 20, 1)],
3: [(3, 17, 6), (3, 25, 2)],
4: [(4, 33, 5), (4, 40, 4)]}
Like others have said, linear time may not be possible. Here is an easy O(n^2) implementation. If you sort the lists inside the dictionaries, you should be able to improve the runtime.
lines = """0 1 10
1 2 20
2 3 25
0 1 15
1 4 40"""
Adata = {}
Bdata = {}
for line in lines.split('\n'):
a,b,c = line.split(' ')[:3]
vals = map(int,[a,b,c])
if b in Adata:
Adata[b].append(vals)
else:
Adata[b] = [vals]
if a in Bdata:
Bdata[a].append(vals)
else:
Bdata[a] = [vals]
def case1(a,b,c):
if a in Adata:
for val in Adata[a]:
if abs(int(c)-val[2]) < 10:
return True
return False
def case2(a,b,c):
if b in Bdata:
for val in Bdata[b]:
if abs(int(c)-val[2]) < 10:
return True
return False
out = []
for line in lines.split('\n'):
a,b,c = line.split(' ')[:3]
if case1(a,b,c) or case2(a,b,c):
out.append(line)
for line in out:
print line
I think what you're looking for is something like
set lines
for line in infile:
if line not in lines:
lines.add(line)
outfile.write(line)