Efficient algorithm to handle system of changing probabilities - python

I need help in formulating an algorithm. I have a social networking app built in python with a redis backend. One of the features is an advertising server that advertisers can use to create and serve ads to site users.
Advertisers buy clicks. Cost per click is not fixed. One advertiser may pay 1 cent/click, another may pay 2 cents/click, etc. The more an advertiser pays, the greater the probability of her advertisement being displayed. I.e., higher paying advertisers get clicks faster, all else being equal.
E.g. imagine advertiser 1 paid $10 for 1000 clicks, advertiser 2 paid $20 for 1000 clicks and advertiser 3 paid $30 for 1000 clicks. All else equal, at each ad impression, advertiser 1's ad will have a 1/6 probability of appearing, advertiser 2's ad will have a 2/6 probability of appearing and advertiser 3's ad will have a 3/6 probability of appearing. If a 4th advertiser was added to the mix, the probabilities will revise. Once an advertiser's designated clicks are reached, she's taken out of the mix and the probabilities revise again.
How can this be modeled in an efficient algorithm?
One reason I get stumped when trying to model this is because upon adding (or subtracting) an advertiser, the whole system's probabilities are revised on the fly. I haven't been able to wrap my head around this; hopefully experts can help out.

As an advertiser buying ad spots, I'm personally unsure I would want to pay for an uncertain probability my ad would be shown. The closest to reasonable thing I can imagine, (which may be what you were thinking of anyway) is to only guarantee that those who paid the same rate get the same probability. An efficient way to do this would be to keep a cumulative sum of the rates paid to give your random range:
cumsum, rate, company
1, 1, A
2, 1, B
4, 2, C
8, 4, D
if you generate a random number from 0 to 7 (or 1 to 8) you can do a binary search (O log(n)) to determine which ad should be shown.
If you keep the companies sorted by rate, you could probably even do better than (O log(n))... maybe
adding and subtracting companies would be relatively simple: add (or subtract) a line for the company and the rate, then recalculate cumsum.

Related

OR Tools formulate revenue

I am seeking advice on formulating a problem to solve in OR tools.
The context is that I am a games owner, and in a shift, I can set varying number of games station. Let's say if I have 2 stations and 5 customers, then assume the most even distribution, which is 3 customers at a station and 2 customers at another station.
With different number of customers at a station, the game speed decreases. To maximise game speed, if I have 5 customers, the best solution is to have 5 stations such that each customer has 1 station each. But this increases the operational cost.
How do I represent the total game speed in this kind of scenario, where it involves even distribution of customers to station, and that the game speed depends on the number of customer at a station?
OR-Tools does not contain a general non-linear solver.
You can access quadratic solvers, although not easily, using the MPSolver API (targeting SCIP or Gurobi).
If your problem can be discretized, you can use CP-SAT to solve your problem. See this section on integer expressions .
Otherwise, OR-Tools is not what you are looking for.

How to pre-process transactional data to predict probability to buy?

I'm working on a model for a departament store that uses data from previous purchases to predict the customer's probability to buy today. For the sake of simplicity, say that we have 3 categories of products (A, B, C) and I want to use the purchase history of the customers in Q1, Q2 and Q3 2017 to predict the probability to buy in Q4 2017.
How should I structure my indicators file?
My try:
The variables I want to predict are the red colored cells in the production set.
Please note the following:
Since my set of customers is the same for both years, I'm using a photo of how customers acted last year to predict what will they do at the end of this year (which is unknown).
Data is separated by trimester, a co-worker sugested this is not correct, because I'm unintentionally giving greater weight to the indicators splitting each one in 4, when they should only be one per category.
Alternative:
Another aproach I was sugested was to use two indicators per category: Ex.'bought_in_category_A' and 'days_since_bought_A'. For me this looks simpler, but then the model will only be able to predict IF the customer will buy Y, not WHEN they will buy Y. Also, what will happen if the customer never bought A? I cannot use a 0 since that will imply customers who never bought are closer to customers who just bought a few days ago.
Questions:
Is this structure ok or would you structure the data in another way?
Is it ok to use information from last year in this case?
Is it ok to 'split' a cateogorical variable into several binary variables? does this affect the importance given to that variable?
Unfortunately, you need a different approach in order to achieve predictive analysis.
For example the products' properties are unknown here (color, taste,
size, seasonality,....)
There is no information about the customers
(age, gender, living area etc...)
You need more "transactional"
information, (when, why - how did they buy etc......)
What is the products "lifecycle"? Does it have to do with fashion?
What branch are you in? (Retail, Bulk, Finance, Clothing...)
Meanwhile have you done any campaign? How will this be measured?
I would first (if applicable) concetrate on the categories relations and behaviour for each Quarter:
For example When n1 decreases then n2 decreases
when q1 is lower than q2 or q1/2016 vs q2/2017.
I think you should first of all, work this out with a business analyst in order to to find out the right "rules" and approach.
I do no think you could get a concrete answer with these generic-assumed data.
Usually you need data from at least 3-5 recent years to do some descent predictive analysis, depending of course, on the nature of your product.
Hope, this helped a bit.
;-)
-mwk

Program, that chooses the best out of 10

I need to make a program in python that chooses cars from an array, that is filled with the mass of 10 cars. The idea is that it fills a barge, that can hold ~8 tonnes most effectively and minimum space is left unfilled. My idea is, that it makes variations of the masses and chooses one, that is closest to the max weight. But since I'm new to algorithms, I don't have a clue how to do it
I'd solve this exercise with dynamic programming. You should be able to get the optimal solution in O(m*n) operations (n beeing the number of cars, m beeing the total mass).
That will only work however if the masses are all integers.
In general you have a binary linear programming problem. Those are very hard in general (NP-complete).
However, both ways lead to algorithms which I wouldn't consider to be beginners material. You might be better of with trial and error (as you suggested) or simply try every possible combination.
This is a 1D bin-packing problem. It's a NP problem and there isn't an optimal solution. However there is a way to solve this with greedy algorithm. Most likely you want to try my bin-packing solver at phpclasses.org (bin-packing).
If I have a graph unweigthed and undirected and each node is connected which each node then I have (n^2-n)/2 pairs of node and overall n^2-n possibilities/combinations:
1,2,3,4,5,...,64
2,1,X,X,X,...,X
3,X,1,X,X,...,X
4,X,X,1,X,...,X
5,X,X,X,1,...,X
.,X,X,X,X,1,.,X
.,X,X,X,X,X,1,X
64,X,X,X,X,X,X,1
Isn't this the same with 10 cars? (45 pairs of cars and 90 combinations/possibilites). Did I forgot something? Where is the error?
A problem like you have here is similar to the classic traveling salesman problem, which asks for the most efficient way for a salesman to visit a list of cities. The difference is that it is conceivable that you might not need every car to fill the barge, whereas the salesman must visit every city. But the problem is similar. The brute-force way to solve the problem is to investigate every possible combination of cars, from 1 car to all 10. We will assume that it is valid to have any number of each car (i.e. if car 2 is a Ford Focus, you could have three Ford Foci). This is easy to change if the car list is an exact list of specific cars, however, and you can use only 1 of each.
Now, this quickly begins to consume a lot of time. As the number of cars goes up, the number of possible combinations of cars goes up geometrically, which means that with a number smaller than you expect, it will take longer to run the program then there is time left in your life. 10 should be manageable, though (it turns out to be a little more than 700,000 combinations, or 1024 if you can only have one of each item).
The first thing is to define the weight of each car and the maximum weight the barge can carry.
weights = [1, 2, 1, 3, 1, 2, 2, 4, 1, 2, 2]
capacity = 8
Now we need some way to find each possible combination. Python's itertools module has a function that will give us every combination of a given length, but we want all lengths. So we will write one loop that goes from 1 to 10 and calls itertools.combinations_with_replacement for each length. We can then find out the total weight of each combination, and if it is higher than any weight we have already found, yet still within capacity, we will remember it as the best we have found so far.
The only real trick here is that we don't want combinations of the weights -- we want combinations of the indexes of the weights, because at the end we want to know which cars to put on the barge, not their weights. So combinations_with_replacements(range(10), ...) rather than combinations_with_replacements(weights, ...). Inside the loop you will want to get the weight of each car in the combination with weights[i] to sum it up.
I originally had code here, but took it out since this is homework. :-) (It wasn't originally tagged as such, but I should have known -- I blame the time change.)
If you wanted to allow only one of each car, you'd use itertools.combinations instead of combinations_with_replacement.
A shortcut is possible since you mention elsewhere that cars weigh from 1-2 tonnes. This means that you know you will need at least 4 of them (4 * 2 = 8), so you can skip all the combinations of 1-3 cars. However, this wouldn't generalize well if the prof changed the parameters on you.

Teleporting Traveler, Optimal Profit over time Problem

I'm new to the whole traveling-salesman problem as well as stackoverflow so let me know if I say something that isn't quite right.
Intro:
I'm trying to code a profit/time-optimized multiple-trade algorithm for a game which involves multiple cities (nodes) within multiple countries (areas), where:
The physical time it takes to travel between two connected cities is always the same ;
Cities aren't linearly connected (you can teleport between some cities in the same time);
Some countries (areas) have teleport routes which make the shortest path available through other countries.
The traveler (or trader) has a limit on his coin-purse, the weight of his goods, and the quantity tradeable in a certain trade-route. The trade route can span multiple cities.
Question Parameters:
There already exists a database in memory (python:sqlite) which holds trades based on their source city and their destination city, the shortest-path cities inbetween as an array and amount, and the limiting factor with its % return on total capital (or in the case that none of the factors are limiting, then just the method that gives the highest return on total capital).
I'm trying to find the optimal profit for a certain preset chunk of time (i.e. 30 minutes)
The act of crossing into a new city is actually simultaneous
It usually takes the same defined amount of time to travel across the city map (i.e. 2 minutes)
The act of initiating the first or any new trade takes the same time as crossing one city map (i.e. 2 minutes)
My starting point might not actually have a valid trade ( I would have to travel to the first/nearest/best one )
Pseudo-Solution So Far
Optimization
First, I realize that because I have a limit on the time it takes, and I know how long each hop takes (including -1 for intiating the trade), I can limit the graph to all trades whose hops are under or equal to max_hops=int(max_time/route_time) -1. I cut elements of the trade database that don't fall within this time limit, pruning cities that have shortest-path lengths greater than max_hops.
I make another entry into the trades database that includes the shortest-paths between my current city and the starting cities of all the existing trades that aren't my current city, and give them a return of 0%. I would limit these to where the number of city hops is less than max_hops, and I would also calculate whether the current city to the starting city plus that trades shortest-path-hops would excede max_hops and remove those that exceded this limit.
Then I take the remaining trades that aren't (current_city->starting_city) and add trade routes with return of 0% between all destination and starting cities wheres the hops doesn't excede max_hops
Then I make one last prune for all cities that aren't in the trades database as either a starting city, destination city, or part of the shortest path city arrays.
Graph Search
I am left with a (much) smaller graph of trades feasible within the time limit (i.e. 30 mins).
Because all the nodes that are connected are adjacent, the connections are by default all weighted 1. I divide the %return over the number of hops in the trade then take the inverse and add + 1 (this would mean a weight of 1.01 for a 100% return route). In the case where the return is 0%, I add ... 2?
It should then return the most profitable route...
The Question:
Mostly,
How do I add the ability to take multiple routes when I have left over money or space and keep route finding through path discrete to single trade routes? Due to the nature of the goods being sold at multiple prices and quantities within the city, there would be a lot of overlapping routes.
How do I penalize initiating a new trade route?
Is graph search even useful in this situation?
On A Side Note,
What kinds of prunes/optimizations to the graph should I (or should I not) make?
Is my weighting method correct? I have a feeling it will give me disproportional weights. Should I use the actual return instead of percentage return?
If I am coding in python are libraries such as python-graph suitable for my needs? Or would it save me a lot of overhead (as I understand, graph search algorithms can be computationally intensive) to write a specialized function?
Am I best off using A* search ?
Should I be precalculating shortest-path points in the trade database and maxing optimizations or should I leave it all to the graph-search?
Can you notice anything that I could improve?
If this is a game where you are playing against humans I would assume the total size of the data space is actually quite limited. If so I would be inclined to throw out all the fancy pruning as I doubt it's worth it.
Instead, how about a simple breadth-first search?
Build a list of all cities, mark them unvisited
Take your starting city, mark the travel time as zero
for each city:
if not finished and travel time <> infinity then
attempt to visit all neighbors, only record the time if city is unvisited
mark the city finished
repeat until all cities have been visited
O(): the outer loop executes cities * maximum hops times. The inner loop executes once per city. No memory allocations are needed.
Now, for each city look at what you can buy here and sell there. When figuring the rate of return on a trade remember that growth is exponential, not linear. Twice the profit for a trade that takes twice as long is NOT a good deal! Look up how to calculate the internal rate of return.
If the current city has no trade don't bother with the full analysis, simply look over the neighbors and run the analysis on them instead, adding one to the time for each move.
If you have CPU cycles to spare (and you very well might, anything meant for a human to play will have a pretty small data space) you can run the analysis on every city adding in the time it takes to get to the city.
Edit: Based on your comment you have tons of CPU power available as the game isn't running on your CPU. I stand by my solution: Check everything. I strongly suspect it will take longer to obtain the route and trade info than it will be to calculate the optimal solution.
I think you've defined something that fits into a class of problems called inventory - routing problems. I assume since you have both goods and coin, the traveller is both buying and selling along the chosen route. Let's first assume that EVERYTHING is deterministic - all quantities of goods in demand, supply available, buying and selling prices, etc are known in advance. The stochastic version gets more difficult (obviously).
One objective would be to maximize profits with a constraint on the purse and the goods. If the traveller has to return home its looks like a tour, if not, it looks like a path. Since you haven't required the traveller to visit EVERY node, it is NOT a TSP. That's good - shortest path problems are generally much easier than TSPs to solve.
Because of the side constraints and the limited choice of next steps at each node - I'd consider using dynamic programming first attempt at a solution technique. It will help you enumerate what you buy and sell at each stage and there's a limited number of stages. Also, because you put a time constraint on the decision, that limits the state space of choices.
To those who suggested Djikstra's algorithm - you may be right - the labelling conventions would need to include the time, coin, and goods and corresponding profits. It may be that the assumptions of Djikstra's may not work for this with the added complexity of profit. Haven't thought through that yet.
Here's a link to a similar problem in capital budgeting.
Good luck !

writing optimization function

I'm trying to write a tennis reservation system and I got stucked with this problem.
Let's say you have players with their prefs regarding court number, day and hour.
Also every player is ranked so if there is day/hour slot and there are several players
with preferences for this slot the one with top priority should be chosen.
I'm thinking about using some optimization algorithms to solve this problem but I'am not sure what would be the best cost function and/or algorithm to use.
Any advice?
One more thing I would prefer to use Python but some language-agnostic advice would be welcome also.
Thanks!
edit:
some clarifications-
the one with better priority wins and loser is moved to nearest slot,
rather flexible time slots question
yes, maximizing the number of people getting their most highly preffered times
The basic Algorithm
I'd sort the players by their rank, as the high ranked ones always push away the low ranked ones. Then you start with the player with the highest rank, give him what he asked for (if he really is the highest, he will always win, thus you can as well give him whatever he requested). Then I would start with the second highest one. If he requested something already taken by the highest, try to find a slot nearby and assign this slot to him. Now comes the third highest one. If he requested something already taken by the highest one, move him to a slot nearby. If this slot is already taken by the second highest one, move him to a slot some further away. Continue with all other players.
Some tunings to consider:
If multiple players can have the same rank, you may need to implement some "fairness". All players with equal rank will have a random order to each other if you sort them e.g. using QuickSort. You can get some some fairness, if you don't do it player for player, but rank for rank. You start with highest rank and the first player of this rank. Process his first request. However, before you process his second request, process the first request of the next player having highest rank and then of the third player having highest rank. The algorithm is the same as above, but assuming you have 10 players and player 1-4 are highest rank and players 5-7 are low and players 8-10 are very low, and every player made 3 requests, you process them as
Player 1 - Request 1
Player 2 - Request 1
Player 3 - Request 1
Player 4 - Request 1
Player 1 - Request 2
Player 2 - Request 2
:
That way you have some fairness. You could also choose randomly within a ranking class each time, this could also provide some fairness.
You could implement fairness even across ranks. E.g. if you have 4 ranks, you could say
Rank 1 - 50%
Rank 2 - 25%
Rank 3 - 12,5%
Rank 4 - 6,25%
(Just example values, you may use a different key than always multiplying by 0.5, e.g. multiplying by 0.8, causing the numbers to decrease slower)
Now you can say, you start processing with Rank 1, however once 50% of all Rank 1 requests have been fulfilled, you move on to Rank 2 and make sure 25% of their requests are fulfilled and so on. This way even a Rank 4 user can win over a Rank 1 user, somewhat defeating the initial algorithm, however you offer some fairness. Even a Rank 4 player can sometimes gets his request, he won't "run dry". Otherwise a Rank 1 player scheduling every request on the same time as a Rank 4 player will make sure a Rank 4 player has no chance to ever get a single request. This way there is at least a small chance he may get one.
After you made sure everyone had their minimal percentage processed (and the higher the rank, the more this is), you go back to top, starting with Rank 1 again and process the rest of their requests, then the rest of the Rank 2 requests and so on.
Last but not least: You may want to define a maximum slot offset. If a slot is taken, the application should search for the nearest slot still free. However, what if this nearest slot is very far away? If I request a slot Monday at 4 PM and the application finds the next free one to be Wednesday on 9 AM, that's not really helpful for me, is it? I might have no time on Wednesday at all. So you may limit slot search to the same day and saying the slot might be at most 3 hours off. If no slot is found within that range, cancel the request. In that case you need to inform the player "We are sorry, but we could not find any nearby slot for you; please request a slot on another date/time and we will see if we can find a suitable slot there for you".
This is an NP-complete problem, I think, so it'll be impossible to have a very fast algorithm for any large data sets.
There's also the problem where you might have a schedule that is impossible to make. Given that that's not the case, something like this pseudocode is probably your best bet:
sort players by priority, highest to lowest
start with empty schedule
for player in players:
for timeslot in player.preferences():
if timeslot is free:
schedule.fillslot(timeslot, player)
break
else:
#if we get here, it means this player couldn't be accomodated at all.
#you'll have to go through the slots that were filled and move another (higher-priority) player's time slot
You are describing a matching problem. Possible references are the Stony Brook algorithm repository and Algorithm Design by Kleinberg and Tardos. If the number of players is equal to the number of courts you can reach a stable matching - The Stable Marriage Problem. Other formulations become harder.
There are several questions I'd ask before answering this queston:
what happens if there is a conflict, i.e. a worse player books first, then a better player books the same court? Who wins? what happens for the loser?
do you let the best players play as long as the match runs, or do you have fixed time slots?
how often is the scheduling run - is it run interactively - so potentially someone could be told they can play, only to be told they can't; or is it run in a more batch manner - you put in requests, then get told later if you can have your slot. Or do users set up a number of preferred times, and then the system has to maximise the number of people getting their most highly preferred times?
As an aside, you can make it slightly less complex by re-writing the times as integer indexes (so you're dealing with integers rather than times).
I would advise using a scoring algorithm. Basically construct a formula that pulls all the values you described into a single number. Who ever has the highest final score wins that slot. For example a simple formula might be:
FinalScore = ( PlayerRanking * N1 ) + ( PlayerPreference * N2 )
Where N1, N2 are weights to control the formula.
This will allow you to get good (not perfect) results very quickly. We use this approach on a much more complex system with very good results.
You can add more variety to this by adding in factors for how many times the player has won or lost slots, or (as someone suggested) how much the player paid.
Also, you can use multiple passes to assign slots in the day. Use one strategy where it goes chronologically, one reverse chronologically, one that does the morning first, one that does the afternoon first, etc. Then sum the scores of the players that got the spots, and then you can decide strategy provided the best results.
Basically, you have the advantage that players have priorities; therefore, you sort the players by descending priority, and then you start allocating slots to them. The first gets their preferred slot, then the next takes his preferred among the free ones and so on. It's a O(N) algorithm.
I think you should use genetic algorithm because:
It is best suited for large problem instances.
It yields reduced time complexity on the price of inaccurate answer(Not the ultimate best)
You can specify constraints & preferences easily by adjusting fitness punishments for not met ones.
You can specify time limit for program execution.
The quality of solution depends on how much time you intend to spend solving the program..
Genetic Algorithms Definition
Genetic Algorithms Tutorial
Class scheduling project with GA
Also take a look at :a similar question and another one
Money. Allocate time slots based on who pays the most. In case of a draw don't let any of them have the slot.

Categories