My algorithm isn't correct. Why so?

My algorithm isn't correct. Why so? - python

I am trying to solve the following problem:
The group of people consists of N members. Every member has one or more friends in the group. You are to write program that divides this group into two teams. Every member of each team must have friends in another team.
Input:
The first line of input contains the only number N (N ≤ 100). Members are numbered from 1 to N. The second, the third,…and the (N+1)th line contain list of friends of the first, the second, …and the Nth member respectively. This list is finished by zero. Remember that friendship is always mutual in this group.
Output:
The first line of output should contain the number of people in the first team or zero if it is impossible to divide people into two teams. If the solution exists you should write the list of the first group into the second line of output. Numbers should be divided by single space. If there are more than one solution you may find any of them.
My algorithm looks like this:
create a dictionary where each player maps to a list of friends
team1 = ['1']
team2 = []
left = []
for player in dictionary:
if its friend in team1:
add to team2
elif its freind in team2:
add to team1
else:
add it to left
But still it isn't correct. There may be cycles in the dictionary where the friend of 6 would be 7 and the only friend of 7 would be 6. What should I do in such a case? I do not know how long such a cycle may be. What should I do. Since, I have a while loop around my code, I currently am running into an infinite loop. I am also trying to add players from left to teams but its not working since they have cycles among them. I don't know how to solve the following problem.
Thanks.

Since this is a competition problem and it's clear you want to learn from it, I'll be a little sparse on details and explain more about how I thought about the problem.
First, consider a connected friendship component, then pick any vertex. Since the friendship relationship is commutative, it's easy to see that adding an edge means that both the vertices are "solved". This seems to suggest something like finding a perfect matching.
However, finding a perfect matching is not sufficient, as for the complete graph with three vertices, a perfect matching doesn't exist, yet it can be solved. So thinking about it little more, it seems that a Hamiltonian path is sufficient, because you can just alternate teams.
If you consider a sufficiently large tree though, it should be clear that there's no Hamiltonian path, but the obvious splitting of teams by even or odd height produces the right result. So the answer seems to be that if you can find a spanning tree, that tree can be used to split the teams into two.
This can be repeated for each component, and just playing around with graphs, it should be convincing enough for a competition, as every component has a spanning tree, so there's nowhere else to expand to. I'm not sure what would be a graph with no possible assignment. Maybe if you have an unconnected node, that's considered invalid?

Update: I found even simpler solution. The original answer is at the bottom. This one is cleaner and comes with a proof ;)
We will be building the solution incrementally. The initial state is that all the people are unallocated, and both teams are empty. We will extend the solution using one of two actions below. After each step, the division will be legal, meaning every allocated person will have a friend allocated to the other team.
Action 1: pick any two unallocated guys that are friends. Put one of them in Team A, the other in Team B. The invariant holds, because newly allocated people know each other and are on separate teams.
Action 2: pick any guy who has an allocated friend, and place him on the other team. The invariant holds, because the one allocated person was allocated in such a way to satisfy it.
So at very step you pick any doable action and execute it. Repeat until there are no more possible actions. When does this happen? It would mean that no-one of the unallocated people has any friends. Since we assumed that everyone has at least one friend, you will be able to execute the actions until there is nobody left.
Original answer:
The problem seems complicated at first, but in fact does not require rocket science. The constraint on the division is rather loose - everyone needs just one friend on the other team.
Consider a simpler case first. Let's say you are given two teams of people and one extra player that got late to the party and needs to be allocated to one of the two existing teams. If he has no friends at all, that's impossible. But if he does have any friends, you pick one of his friends and allocate the newcommer to the other team.
The outcome? If you could start with some small teams and then arrange the rest of the people in such a way that they always know someone who came before, you're golden. This means we reduced initial big problem to two smaller ones.
Tackling the first one is easy. In order to bootstrap the teams, just pick any two guys that know each other, put one in Team A, the other in Team B, and it works.
Now, the second: adding the rest of the people. Take a look at all the people that are already allocated to teams and see if they have any unallocated friends. Case 1: one of already allocated guys has an unallocated friend. You can easily add him somewhere. Case 2: all the friends of the allocated guys are already allocated, too. This means the initial friendship graph was not connected and doesn't hurt at all - just take any random unallocated guy and place him anywhere.

Related

How to solve a iterative slots allocation problem (points motion)

I am starting a project which involves an allocation problem, and having explored a bit by myself, it is pretty challenging for me to solve it efficiently.
What I call here allocation problem is the following:
There is a set of available slots in 3D space, randomly sampled ([xspot_0, yspot_0, zspot_0], [xspot_1, yspot_1, zspot_1], etc.). These slots are identified with an ID and a position, and are fixed, so they will not change with time.
There are then mobile elements (same number as the number of available slots, on the order of 250,000) which can go from spot to spot. They are identified with an ID, and at a given time step, the spot in which they are.
Each spot must have one and only one element at a given step.
At first, elements are ordered in the same way as spots: the first element (element_id=0) is in the first spot (spot_id=0), etc.
But then, these elements need to move, based on a motion vector that is defined for each spot, which is also fixed. For example, ideally at the first step, the first element should move from [xspot_0, yspot_0, zspot_0] to [xspot_0 + dxspot_0, yspot_0 + dyspot_0, zspot_0 + dzspot_0], etc.
Since spots were randomly sampled, the new target position might not exist among the spots. The goal is therefore to find a candidate slot for the next step that is as close as possible to the "ideal" position the element should be in.
On top of that first challenge, since this will probably be done through a loop, it is possible that the best candidate was already assigned to another element.
Once all new slots are defined for each element (or each element is assigned to a new slot, depending on how you see it), we do it again, applying the same motion with the new order. This is repeated as many times as I need.
Now that I defined the problem, the first thing I tried was a simple allocation based on this information. However, if I pick the best candidate every time based on the distance to the target position, as I said some elements have their best candidate already taken, so they pick the 2nd, 3rd, ... 20th, ... 100th candidate slot, which becomes highly wrong compared to the ideal position.
Another technique I was trying, without being entirely sure about what I was doing, was to assign a probability distribution calculated by doing the inverse exponential of the distance between the slots and the target position. Then I normalized this distribution to obtain probabilities (which seem arbitrary). I still do not get very good results for a single step.
Therefore, I was wondering if someone knows how to solve this type of problem in a more accurate/more efficient way. For your information, I mainly use Python 3 for development.
Thank you!

Scheduling: Minimizing Gaps between Non-Overlapping Time Ranges

Using Django to develop a small scheduling web application where people are assigned certain times to meet with their superiors. Employees are stored as models, with a OneToMany relation to a model representing time ranges and day of the week where they are free. For instance:
Bob: (W 9:00, 9:15), (W 9:15, 9:30), ... (W 15:00, 15:20)
Sarah: (Th 9:05, 9:20), (F 9:20, 9:30), ... (Th 16:00, 16:05)
...
Mary: (W 8:55, 9:00), (F 13:00, 13:35), ... etc
My program allows a basic schedule setup, where employers can choose to view the first N possible schedules with the least gaps in between meetings under the condition that they meet all their employees at least once during that week. I am currently generating all possible permutations of meetings, and filtering out schedules where there are overlaps in meeting times. Is there a way to generate the first N schedules out of M possible ones, without going through all M possibilities?
Clarification: We are trying to get the minimum sum of gaps for any given day, summed over all days.

I would use a search algorithm, like A-star, to do this. Each node in the graph represents a person's available time slots and a path from one node to another means that node_a and node_b are in the partial schedule.
Another solution would be to create a graph in which the nodes are each person's availability times and there is a edge from node_a to node_b if the person associated with node_a is not the same as the person associated with node_b. The weight of each node is the amount of time between the time associated with the two nodes.
After creating this graph, you could generate a variant of a minimum spanning tree from the graph. The variant would differ from MSTs in that:
you'll only add a node to the MST if the person associated with the node is not already in the MST.
you finish creating the MST when all persons are in the MST.
The minimum spanning tree generated would represent a single schedule.
To generate other schedules, remove all the edges from the graph which are found in the schedule you just created and then create a new minimum spanning tree from the graph with the removed edges.

In general, scheduling problems are NP-hard, and while I can't figure out a reduction for this problem to prove it such, it's quite similar to a number of other well-known NP-complete problems. There may be a polynomial-time solution for finding the minimum gap for a single day (though I don't know it off hand, either), but I have less hopes for needing to solve it for multiple days. Unfortunately, it's a complicated problem, and there may not be a perfectly elegant answer. (Or, I'm going to kick myself when someone posts one later.)
First off, I'd say that if your dataset is reasonably small and you've been able to compute all possible schedules fairly quickly, you may just want to stick with that solution, as all others will be approximations, and could possibly end up running slower, if the constant factor of their running time is large. (Meaning that it doesn't grow with the size of the dataset, so it will relatively be smaller for a large dataset.)
The simplest approximation would be to just use a greedy heuristic. It will almost assuredly not find the optimum schedules, and may take a long time to find a solution if most of the schedules are overlapping, and there are only a few that are even valid solutions - but I'm going to assume that this is not the case for employee times.
Start with an arbitrary schedule, choosing one timeslot for each employee at random. For each iteration, pick one employee and change his timeslot to the best possible time, with respect to the rest of current schedule. Repeat this process until your satisfied with the result - when it isn't improving quickly enough anymore or has taken too long. You're probably not going to want to repeat until you can't make any more changes that improve the schedule, since this process will likely loop for most data.
It's not a great heuristic, but it should produce some reasonable schedules, and has a lot of room for adjustment. You may want to always try to switch overlapping times first before any others, or you may want to try to flip the employee who currently contributes to the largest gap, or maybe eliminate certain slots that you've already tried. You may want to sometimes allow a move to a less optimal solution in hopes that you're at a local minima and want to get out of it - some randomness can also help with this. Make sure you always keep track of the best solution you've seen so far.
To generate more schedules, the most obvious thing would be to just start the process over with a different random schedule. Or, maybe flip a few arbitrary times from the previous solution you found, and repeat from there.
Edit: This is all fairly related to genetic algorithms, and you may want to use some of the ideas I presented here in a GA.

Program, that chooses the best out of 10

I need to make a program in python that chooses cars from an array, that is filled with the mass of 10 cars. The idea is that it fills a barge, that can hold ~8 tonnes most effectively and minimum space is left unfilled. My idea is, that it makes variations of the masses and chooses one, that is closest to the max weight. But since I'm new to algorithms, I don't have a clue how to do it

I'd solve this exercise with dynamic programming. You should be able to get the optimal solution in O(m*n) operations (n beeing the number of cars, m beeing the total mass).
That will only work however if the masses are all integers.
In general you have a binary linear programming problem. Those are very hard in general (NP-complete).
However, both ways lead to algorithms which I wouldn't consider to be beginners material. You might be better of with trial and error (as you suggested) or simply try every possible combination.

This is a 1D bin-packing problem. It's a NP problem and there isn't an optimal solution. However there is a way to solve this with greedy algorithm. Most likely you want to try my bin-packing solver at phpclasses.org (bin-packing).
If I have a graph unweigthed and undirected and each node is connected which each node then I have (n^2-n)/2 pairs of node and overall n^2-n possibilities/combinations:
1,2,3,4,5,...,64
2,1,X,X,X,...,X
3,X,1,X,X,...,X
4,X,X,1,X,...,X
5,X,X,X,1,...,X
.,X,X,X,X,1,.,X
.,X,X,X,X,X,1,X
64,X,X,X,X,X,X,1
Isn't this the same with 10 cars? (45 pairs of cars and 90 combinations/possibilites). Did I forgot something? Where is the error?

A problem like you have here is similar to the classic traveling salesman problem, which asks for the most efficient way for a salesman to visit a list of cities. The difference is that it is conceivable that you might not need every car to fill the barge, whereas the salesman must visit every city. But the problem is similar. The brute-force way to solve the problem is to investigate every possible combination of cars, from 1 car to all 10. We will assume that it is valid to have any number of each car (i.e. if car 2 is a Ford Focus, you could have three Ford Foci). This is easy to change if the car list is an exact list of specific cars, however, and you can use only 1 of each.
Now, this quickly begins to consume a lot of time. As the number of cars goes up, the number of possible combinations of cars goes up geometrically, which means that with a number smaller than you expect, it will take longer to run the program then there is time left in your life. 10 should be manageable, though (it turns out to be a little more than 700,000 combinations, or 1024 if you can only have one of each item).
The first thing is to define the weight of each car and the maximum weight the barge can carry.
weights = [1, 2, 1, 3, 1, 2, 2, 4, 1, 2, 2]
capacity = 8
Now we need some way to find each possible combination. Python's itertools module has a function that will give us every combination of a given length, but we want all lengths. So we will write one loop that goes from 1 to 10 and calls itertools.combinations_with_replacement for each length. We can then find out the total weight of each combination, and if it is higher than any weight we have already found, yet still within capacity, we will remember it as the best we have found so far.
The only real trick here is that we don't want combinations of the weights -- we want combinations of the indexes of the weights, because at the end we want to know which cars to put on the barge, not their weights. So combinations_with_replacements(range(10), ...) rather than combinations_with_replacements(weights, ...). Inside the loop you will want to get the weight of each car in the combination with weights[i] to sum it up.
I originally had code here, but took it out since this is homework. :-) (It wasn't originally tagged as such, but I should have known -- I blame the time change.)
If you wanted to allow only one of each car, you'd use itertools.combinations instead of combinations_with_replacement.
A shortcut is possible since you mention elsewhere that cars weigh from 1-2 tonnes. This means that you know you will need at least 4 of them (4 * 2 = 8), so you can skip all the combinations of 1-3 cars. However, this wouldn't generalize well if the prof changed the parameters on you.

Help Me Figure Out A Random Scheduling Algorithm using Python and PostgreSQL

I am trying to do the schedule for the upcoming season for my simulation baseball team. I have an existing Postgresql database that contains the old schedule.
There are 648 rows in the database: 27 weeks of series for 24 teams. The problem is that the schedule has gotten predictable and allows teams to know in advance about weak parts of their schedule. What I want to do is take the existing schedule and randomize it. That way teams are still playing each other the proper number of times but not in the same order as before.
There is one rule that has been tripping me up: each team can only play one home and one road series PER week. I had been fooling around with SELECT statements based on ORDER BY RANDOM() but I haven't figured out how to make sure a team only has one home and one road series per week.
Now, I could do this in PHP (which is the language I am most comfortable with) but I am trying to make the shift to Python so I'm not sure how to get this done in Python. I know that Python doesn't seem to handle two dimensional arrays very well.
Any help would be greatly appreciated.

Have you considered keeping your same "schedule", and just shuffling the teams? Generating a schedule where everyone plays each other the proper number of times is possible, but if you already have such a schedule then it's much easier to just shuffle the teams.
You could keep your current table, but replace each team in it with an id (0-23, or A-X, or whatever), then randomly generate into another table where you assign each team to each id (0 = TeamJoe, 1 = TeamBob, etc). Then when it's time to shuffle again next year, just regenerate that mapping table.
Not sure if this answers the question the way you want, but is probably what I would go with (and is actually how I do it on my fantasy football website).

I'm not sure I fully understand the problem, but here is how I would do it:
1. create a complete list of matches that need to happen
2. iterate over the weeks, selecting which match needs to happen in this week.
You can use Python lists to represent the matches that still need to happen, and, for each week, the matches that are happening in this week.
In step 2, selecting a match to happen would work this way:
a. use random.choice to select a random match to happen.
b. determine which team has a home round for this match, using random.choice([1,2]) (if it could have been a home round for either team)
c. temporarily remove all matches that get blocked by this selection. a match is blocked if one of its teams has already two matches in the week, or if both teams already have a home match in this week, or if both teams already have a road match in this week.
d. when there are no available matches anymore for a week, proceed to the next week, readding all the matches that got blocked for the previous week.

I think I've understood your question correctly, but anyhow, you can make use of Python's set datatype and generator functionality:
import random
def scheduler(teams):
""" schedule generator: only accepts an even number of teams! """
if 0 != len(teams) % 2:
return
while teams:
home_team = random.choice(list(teams))
teams.remove(home_team)
away_team = random.choice(list(teams))
teams.remove(away_team)
yield(home_team, away_team)
# team list from sql select statement
teams = set(["TEAM A", "TEAM B", "TEAM C", "TEAM D"])
for team in scheduler(teams):
print(team)
This keeps SQL processing to a minimum and should be very easy to add to new rules, like the ones I didn't understand ;) Good luck
EDIT:
Ah, makes more sense now, should have had one less beer! In which case, I'd definitely recommend NumPy. Take a look through the tutorial and look at consuming the 2-dimensional home-away teams array as you are grabbing random fixtures. It would probably be best to feed the home and away teams from the home day into the away day, so you can ensure there that each team plays home and away each week.

writing optimization function

I'm trying to write a tennis reservation system and I got stucked with this problem.
Let's say you have players with their prefs regarding court number, day and hour.
Also every player is ranked so if there is day/hour slot and there are several players
with preferences for this slot the one with top priority should be chosen.
I'm thinking about using some optimization algorithms to solve this problem but I'am not sure what would be the best cost function and/or algorithm to use.
Any advice?
One more thing I would prefer to use Python but some language-agnostic advice would be welcome also.
Thanks!
edit:
some clarifications-
the one with better priority wins and loser is moved to nearest slot,
rather flexible time slots question
yes, maximizing the number of people getting their most highly preffered times

The basic Algorithm
I'd sort the players by their rank, as the high ranked ones always push away the low ranked ones. Then you start with the player with the highest rank, give him what he asked for (if he really is the highest, he will always win, thus you can as well give him whatever he requested). Then I would start with the second highest one. If he requested something already taken by the highest, try to find a slot nearby and assign this slot to him. Now comes the third highest one. If he requested something already taken by the highest one, move him to a slot nearby. If this slot is already taken by the second highest one, move him to a slot some further away. Continue with all other players.
Some tunings to consider:
If multiple players can have the same rank, you may need to implement some "fairness". All players with equal rank will have a random order to each other if you sort them e.g. using QuickSort. You can get some some fairness, if you don't do it player for player, but rank for rank. You start with highest rank and the first player of this rank. Process his first request. However, before you process his second request, process the first request of the next player having highest rank and then of the third player having highest rank. The algorithm is the same as above, but assuming you have 10 players and player 1-4 are highest rank and players 5-7 are low and players 8-10 are very low, and every player made 3 requests, you process them as
Player 1 - Request 1
Player 2 - Request 1
Player 3 - Request 1
Player 4 - Request 1
Player 1 - Request 2
Player 2 - Request 2
:
That way you have some fairness. You could also choose randomly within a ranking class each time, this could also provide some fairness.
You could implement fairness even across ranks. E.g. if you have 4 ranks, you could say
Rank 1 - 50%
Rank 2 - 25%
Rank 3 - 12,5%
Rank 4 - 6,25%
(Just example values, you may use a different key than always multiplying by 0.5, e.g. multiplying by 0.8, causing the numbers to decrease slower)
Now you can say, you start processing with Rank 1, however once 50% of all Rank 1 requests have been fulfilled, you move on to Rank 2 and make sure 25% of their requests are fulfilled and so on. This way even a Rank 4 user can win over a Rank 1 user, somewhat defeating the initial algorithm, however you offer some fairness. Even a Rank 4 player can sometimes gets his request, he won't "run dry". Otherwise a Rank 1 player scheduling every request on the same time as a Rank 4 player will make sure a Rank 4 player has no chance to ever get a single request. This way there is at least a small chance he may get one.
After you made sure everyone had their minimal percentage processed (and the higher the rank, the more this is), you go back to top, starting with Rank 1 again and process the rest of their requests, then the rest of the Rank 2 requests and so on.
Last but not least: You may want to define a maximum slot offset. If a slot is taken, the application should search for the nearest slot still free. However, what if this nearest slot is very far away? If I request a slot Monday at 4 PM and the application finds the next free one to be Wednesday on 9 AM, that's not really helpful for me, is it? I might have no time on Wednesday at all. So you may limit slot search to the same day and saying the slot might be at most 3 hours off. If no slot is found within that range, cancel the request. In that case you need to inform the player "We are sorry, but we could not find any nearby slot for you; please request a slot on another date/time and we will see if we can find a suitable slot there for you".

This is an NP-complete problem, I think, so it'll be impossible to have a very fast algorithm for any large data sets.
There's also the problem where you might have a schedule that is impossible to make. Given that that's not the case, something like this pseudocode is probably your best bet:
sort players by priority, highest to lowest
start with empty schedule
for player in players:
for timeslot in player.preferences():
if timeslot is free:
schedule.fillslot(timeslot, player)
break
else:
#if we get here, it means this player couldn't be accomodated at all.
#you'll have to go through the slots that were filled and move another (higher-priority) player's time slot

You are describing a matching problem. Possible references are the Stony Brook algorithm repository and Algorithm Design by Kleinberg and Tardos. If the number of players is equal to the number of courts you can reach a stable matching - The Stable Marriage Problem. Other formulations become harder.

There are several questions I'd ask before answering this queston:
what happens if there is a conflict, i.e. a worse player books first, then a better player books the same court? Who wins? what happens for the loser?
do you let the best players play as long as the match runs, or do you have fixed time slots?
how often is the scheduling run - is it run interactively - so potentially someone could be told they can play, only to be told they can't; or is it run in a more batch manner - you put in requests, then get told later if you can have your slot. Or do users set up a number of preferred times, and then the system has to maximise the number of people getting their most highly preferred times?
As an aside, you can make it slightly less complex by re-writing the times as integer indexes (so you're dealing with integers rather than times).

I would advise using a scoring algorithm. Basically construct a formula that pulls all the values you described into a single number. Who ever has the highest final score wins that slot. For example a simple formula might be:
FinalScore = ( PlayerRanking * N1 ) + ( PlayerPreference * N2 )
Where N1, N2 are weights to control the formula.
This will allow you to get good (not perfect) results very quickly. We use this approach on a much more complex system with very good results.
You can add more variety to this by adding in factors for how many times the player has won or lost slots, or (as someone suggested) how much the player paid.
Also, you can use multiple passes to assign slots in the day. Use one strategy where it goes chronologically, one reverse chronologically, one that does the morning first, one that does the afternoon first, etc. Then sum the scores of the players that got the spots, and then you can decide strategy provided the best results.

Basically, you have the advantage that players have priorities; therefore, you sort the players by descending priority, and then you start allocating slots to them. The first gets their preferred slot, then the next takes his preferred among the free ones and so on. It's a O(N) algorithm.

I think you should use genetic algorithm because:
It is best suited for large problem instances.
It yields reduced time complexity on the price of inaccurate answer(Not the ultimate best)
You can specify constraints & preferences easily by adjusting fitness punishments for not met ones.
You can specify time limit for program execution.
The quality of solution depends on how much time you intend to spend solving the program..
Genetic Algorithms Definition
Genetic Algorithms Tutorial
Class scheduling project with GA
Also take a look at :a similar question and another one

Money. Allocate time slots based on who pays the most. In case of a draw don't let any of them have the slot.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.