SciPy Optimization algorithm - python

I need to solve an optimization task with Python.
The task is following:
Fabric produces desks, chairs, bureau and cupboards. For producing this stuff two types of boards could be used. Fabric has 1500m. of first type and 1000m. of second. Fabric has 800 Employees. What should produce fabric and how much to receive a maximum profit?
The input values are following:
| | Products |
| | Desk | Chair | Bureau | Cupboard |
|--------------|------|-------|--------|----------|
| Board 1 type | 5 | 1 | 9 | 12 |
| Board 2 type | 2 | 3 | 4 | 1 |
| Employees | 3 | 2 | 5 | 10 |
| Profit | 12 | 5 | 15 | 10 |
Unfortunately I don't have an experience in solving optimization tasks so I don't even know where to start. What I did:
I found sciPy optimization package which suppose to solve such type of problems.
I have some vision about input and output for my function. The input should amount of each type of product and the output supposed to be the profit. But the choice of resources(boards, employees) might also be different. And this affects algorithm implementation.
Could you please give me at least any direction where to go? Thank you!
EDIT:
Basically #Balzola is right. It's a simplex algorithm. The task might be solved by using SciPy.optimize.linprog solution which uses simplex under the hood.

Typical https://en.wikipedia.org/wiki/Simplex_algorithm
Looks like scipy can do it:
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#nelder-mead-simplex-algorithm-method-nelder-mead

Related

Which methodology of programming technique could I use to solve the workflow optimization with constraints?

So there is a problem about how to maximize the productivity of the production line if there are many constraints.
Below is the table of the productivity of each worker and in which step they can produce.
The constraints are like,
Each product is required to process these 6 procedures sequentially (1 to 2 to 3 to 4 to 5 to 6) and each worker is only capable to process certain steps. All the products will start from Building A, and after completing all the steps, they can be in either building for shipment. Each worker can only process 1 product at one time and is not allowed to run different procedures concurrently. It is assumed that the product is always available to start at Building X.
The transportation time within the same building is assumed to be negligible. However, cross building transportation time is 25 mins. The truck of a maximum capacity of 5, can only be at either building at any point in time.
| Worker | Procedure 1 time/min | Procedure 2 time/min | Procedure 3 time/min | Procedure 4 time/min | Procedure 5 time/min | Procedure 6 time/min |
| -------- | -------- |-------- |-------- |-------- |-------- |-------- |
| a | 5 | | 10 | | | |
| b | | 15 | | | | 10 |
| c | | 15 | | | 10 | |
| d | 5 | | | 15 | | |
| e | 5 | |5 | | 15 | |
| f | | | | 10 | | 10 |
The objective is to find the the maximum throughput (the total number of products produced) within 168 hours. You will also need to be able to list out every step that each product went through during the process.
I have tried to split the question into two parts:
Firstly, the workers produce the products normally (I have to list out every single steps by hand but I am still not sure if it is the best way to optimise the results) , and at some point in time -- the last stage is to assume that all the workers are in equilibrium state in doing each procedure, and each procedure produces the some amount of products at the same time. (The idea is to assume that all the workers are working all the time as well as the truck to maximise the productivity) I have tried to solve the second part using linear programming and get the results, but I cannot get the specific steps of which the results will be optimised using this methodology.
Now I am not sure which methodology could I use to solve this problem, can someone give me any suggestions please? I really appreciate it.

how to process multiple time series with machine-learning/deep learning method(fault diagnosis)

There is a industrial fault diagnosis scene.This is a binary classification problem concern to time series.When a fault occurs,the data from one machine is shown below:the label change from zero to one
| time | feature |label|
| -------- | -------------- | -------------- |
| 1 | 26 |0|
| 2 |29 |1|
| 3 | 30 |1|
| 4 | 20 |0|
The question is ,the fault doesnt happen a frequently,so i need need to select sufficient amount of slices of time series for training.
Thus i wanna ask that how should i orgnaize these data:should i take them as one time serise or any other choices.How to orgnize theses data and What machine learning method should I use to realize fault diagnosis?

PCA on complex-valued data

What's the correct way to undertake PCA on complex-valued data?
I see that this solution exists.
Are there any python packages that have implemented a complex-PCA?
So far I have just broken my data into real and imaginary parts and performed PCA as if they were real. For example:
| sw | fw | mw |
|4+4i |3+2i|1-1i|
would become:
| swreal | swimag | fwreal | fwimag | mwreal | mwimag |
| 4 | 4 | 3 | 2 | 1 | -1 |
My PCA ends up looking like this:
I want to pursue a complex PCA, but I'm not even sure how I would end up representing it? If it were a 2D plot, the only way is similar to above(?), in which case, would it look any different?

Is there a python package that can find the most impactful group (categorical features) from my data?

My problem is that I have a dataset of our campaign like this:
| Customer | Province | District | City | Age | No. of Order |
| -------- | ------- | -------- | -----| ----| ------- |
| A | P1 | D1 | C1 | 21 | 5 |
| B | P2 | D2 | C2 | 22 | 9 |
....
And I need to find the most impactful group of customers (usually there will be >20 categorical groups). For example: "Customers from Province P1, District D1, Age 25 are the most promising group because they contributed 50% total order while being 10% of our customer base".
I'm currently using Pandas to loop through all the combinations of [2,3,4] from all my categorical features and calculate the sale proportion for each group but it is very time-consuming
I want to ask if there is already a Python package that can help to find that kind of group?
You can automate that by using Decision Trees.
Not all features may be useful. Eliminate trivial ones using PCA (principal component analysis)
You may use scikit-learn package for both of above.

Simple moving average for random related time values

I'm beginner programmer looking for help with Simple Moving Average SMA. I'm working with column files, where first one is related to the time and second is value. The time intervals are random and also the value. Usually the files are not big, but the process is collecting data for long time. At the end files look similar to this:
+-----------+-------+
| Time | Value |
+-----------+-------+
| 10 | 3 |
| 1345 | 50 |
| 1390 | 4 |
| 2902 | 10 |
| 34057 | 13 |
| (...) | |
| 898975456 | 10 |
+-----------+-------+
After whole process number of rows is around 60k-100k.
Then i'm trying to "smooth" data with some time window. For this purpose I'm using SMA. [AWK_method]
awk 'BEGIN{size=$timewindow} {mod=NR%size; if(NR<=size){count++}else{sum-=array[mod]};sum+=$1;array[mod]=$1;print sum/count}' file.dat
To achive proper working of SMA with predefined $timewindow i create linear increment filled with zeros. Next, I run a script using diffrent $timewindow and I observe the results.
+-----------+-------+
| Time | Value |
+-----------+-------+
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| (...) | |
| 10 | 3 |
| 11 | 0 |
| 12 | 0 |
| (...) | |
| 1343 | 0 |
| (...) | |
| 898975456 | 10 |
+-----------+-------+
For small data it was relatively comfortable, but now it is quite time-devouring, and created files starting to be too big. I'm also familiar with Gnuplot but SMA there is hell...
So here are my questions:
Is it possible to change the awk solution to bypass filling data with zeros?
Do you recomend any other solution using bash?
I also have considered to learn python because after 6 months of learning bash, I have got to know its limitation. Will I able to solve this in python without creating big data?
I'll be glad with any form of help or advices.
Best regards!
[AWK_method] http://www.commandlinefu.com/commands/view/2319/awk-perform-a-rolling-average-on-a-column-of-data
You included a python tag, check out traces:
http://traces.readthedocs.io/en/latest/
Here are some other insights:
Moving average for time series with not-equal intervls
http://www.eckner.com/research.html
https://stats.stackexchange.com/questions/28528/moving-average-of-irregular-time-series-data-using-r
https://en.wikipedia.org/wiki/Unevenly_spaced_time_series
key phrase in bold for more research:
In statistics, signal processing, and econometrics, an unevenly (or unequally or irregularly) spaced time series is a sequence of observation time and value pairs (tn, Xn) with strictly increasing observation times. As opposed to equally spaced time series, the spacing of observation times is not constant.
awk '{Q=$2-last;if(Q>0){while(Q>1){print "| "++i" | 0 |";Q--};print;last=$2;next};last=$2;print}' Input_file

Categories