Which methodology of programming technique could I use to solve the workflow optimization with constraints? - python

So there is a problem about how to maximize the productivity of the production line if there are many constraints.
Below is the table of the productivity of each worker and in which step they can produce.
The constraints are like,
Each product is required to process these 6 procedures sequentially (1 to 2 to 3 to 4 to 5 to 6) and each worker is only capable to process certain steps. All the products will start from Building A, and after completing all the steps, they can be in either building for shipment. Each worker can only process 1 product at one time and is not allowed to run different procedures concurrently. It is assumed that the product is always available to start at Building X.
The transportation time within the same building is assumed to be negligible. However, cross building transportation time is 25 mins. The truck of a maximum capacity of 5, can only be at either building at any point in time.
| Worker | Procedure 1 time/min | Procedure 2 time/min | Procedure 3 time/min | Procedure 4 time/min | Procedure 5 time/min | Procedure 6 time/min |
| -------- | -------- |-------- |-------- |-------- |-------- |-------- |
| a | 5 | | 10 | | | |
| b | | 15 | | | | 10 |
| c | | 15 | | | 10 | |
| d | 5 | | | 15 | | |
| e | 5 | |5 | | 15 | |
| f | | | | 10 | | 10 |
The objective is to find the the maximum throughput (the total number of products produced) within 168 hours. You will also need to be able to list out every step that each product went through during the process.
I have tried to split the question into two parts:
Firstly, the workers produce the products normally (I have to list out every single steps by hand but I am still not sure if it is the best way to optimise the results) , and at some point in time -- the last stage is to assume that all the workers are in equilibrium state in doing each procedure, and each procedure produces the some amount of products at the same time. (The idea is to assume that all the workers are working all the time as well as the truck to maximise the productivity) I have tried to solve the second part using linear programming and get the results, but I cannot get the specific steps of which the results will be optimised using this methodology.
Now I am not sure which methodology could I use to solve this problem, can someone give me any suggestions please? I really appreciate it.

Related

how to process multiple time series with machine-learning/deep learning method(fault diagnosis)

There is a industrial fault diagnosis scene.This is a binary classification problem concern to time series.When a fault occurs,the data from one machine is shown below:the label change from zero to one
| time | feature |label|
| -------- | -------------- | -------------- |
| 1 | 26 |0|
| 2 |29 |1|
| 3 | 30 |1|
| 4 | 20 |0|
The question is ,the fault doesnt happen a frequently,so i need need to select sufficient amount of slices of time series for training.
Thus i wanna ask that how should i orgnaize these data:should i take them as one time serise or any other choices.How to orgnize theses data and What machine learning method should I use to realize fault diagnosis?

Simple moving average for random related time values

I'm beginner programmer looking for help with Simple Moving Average SMA. I'm working with column files, where first one is related to the time and second is value. The time intervals are random and also the value. Usually the files are not big, but the process is collecting data for long time. At the end files look similar to this:
+-----------+-------+
| Time | Value |
+-----------+-------+
| 10 | 3 |
| 1345 | 50 |
| 1390 | 4 |
| 2902 | 10 |
| 34057 | 13 |
| (...) | |
| 898975456 | 10 |
+-----------+-------+
After whole process number of rows is around 60k-100k.
Then i'm trying to "smooth" data with some time window. For this purpose I'm using SMA. [AWK_method]
awk 'BEGIN{size=$timewindow} {mod=NR%size; if(NR<=size){count++}else{sum-=array[mod]};sum+=$1;array[mod]=$1;print sum/count}' file.dat
To achive proper working of SMA with predefined $timewindow i create linear increment filled with zeros. Next, I run a script using diffrent $timewindow and I observe the results.
+-----------+-------+
| Time | Value |
+-----------+-------+
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| (...) | |
| 10 | 3 |
| 11 | 0 |
| 12 | 0 |
| (...) | |
| 1343 | 0 |
| (...) | |
| 898975456 | 10 |
+-----------+-------+
For small data it was relatively comfortable, but now it is quite time-devouring, and created files starting to be too big. I'm also familiar with Gnuplot but SMA there is hell...
So here are my questions:
Is it possible to change the awk solution to bypass filling data with zeros?
Do you recomend any other solution using bash?
I also have considered to learn python because after 6 months of learning bash, I have got to know its limitation. Will I able to solve this in python without creating big data?
I'll be glad with any form of help or advices.
Best regards!
[AWK_method] http://www.commandlinefu.com/commands/view/2319/awk-perform-a-rolling-average-on-a-column-of-data
You included a python tag, check out traces:
http://traces.readthedocs.io/en/latest/
Here are some other insights:
Moving average for time series with not-equal intervls
http://www.eckner.com/research.html
https://stats.stackexchange.com/questions/28528/moving-average-of-irregular-time-series-data-using-r
https://en.wikipedia.org/wiki/Unevenly_spaced_time_series
key phrase in bold for more research:
In statistics, signal processing, and econometrics, an unevenly (or unequally or irregularly) spaced time series is a sequence of observation time and value pairs (tn, Xn) with strictly increasing observation times. As opposed to equally spaced time series, the spacing of observation times is not constant.
awk '{Q=$2-last;if(Q>0){while(Q>1){print "| "++i" | 0 |";Q--};print;last=$2;next};last=$2;print}' Input_file

SciPy Optimization algorithm

I need to solve an optimization task with Python.
The task is following:
Fabric produces desks, chairs, bureau and cupboards. For producing this stuff two types of boards could be used. Fabric has 1500m. of first type and 1000m. of second. Fabric has 800 Employees. What should produce fabric and how much to receive a maximum profit?
The input values are following:
| | Products |
| | Desk | Chair | Bureau | Cupboard |
|--------------|------|-------|--------|----------|
| Board 1 type | 5 | 1 | 9 | 12 |
| Board 2 type | 2 | 3 | 4 | 1 |
| Employees | 3 | 2 | 5 | 10 |
| Profit | 12 | 5 | 15 | 10 |
Unfortunately I don't have an experience in solving optimization tasks so I don't even know where to start. What I did:
I found sciPy optimization package which suppose to solve such type of problems.
I have some vision about input and output for my function. The input should amount of each type of product and the output supposed to be the profit. But the choice of resources(boards, employees) might also be different. And this affects algorithm implementation.
Could you please give me at least any direction where to go? Thank you!
EDIT:
Basically #Balzola is right. It's a simplex algorithm. The task might be solved by using SciPy.optimize.linprog solution which uses simplex under the hood.
Typical https://en.wikipedia.org/wiki/Simplex_algorithm
Looks like scipy can do it:
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#nelder-mead-simplex-algorithm-method-nelder-mead

no performance difference between different number of map tasks (1, 2, 4..)

I am very new to hadoop and am testing the performance difference between different number of map tasks and reduce tasks. The file size is about 5GB and hadoop is installed on 4 core/8 core machine (hyper threading).
The map and reduce were written in python, so I specify the number of map tasks by -D mapred.map.tasks=2 and specify the number of reduce tasks by -D mapred.reduce.tasks=2.
Problem
The problem is that the result doesn't show any performance difference between different number of map tasks..
Result
+----------+----------+----------+
| map | reduce | time |
+----------+----------+----------+
| 1 | 1 | 47m 09s |
| 2 | 1 | 45m 35s |
| 4 | 1 | 46m 30s |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| 1 | 2 | 38m 37s |
| 2 | 2 | 39m 22s |
| 4 | 2 | 39m 29s |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| 1 | 4 | 38m 18s |
| 2 | 4 | 38m 48s |
| 4 | 4 | 38m 23s |
+----------+----------+----------+
It seems that there is a few minute difference between using 1 reduce task and using 2 reduce tasks, but no difference when I change the number of map tasks.Is it that all the tasks are performed on only one node, and the map tasks are not running in parallel?
What could be causing this? I would appreciate any information.
Edit
I also tried specifying these values in mapred-site.xml instead of in the command but didn't make any changes.
Option mapred.map.tasks is not a directive but a hint to hadoop, so how did you check actual number of map tasks executed? While job is performed, you can monitor running jobs in job tracker and running tasks in task tracker. Also, you can ssh on your hadoop machine, and check for running map/reduce tasks, those will be java processes.
You can try set mapreduce.tasktracker.map.tasks.maximum in your mapred-site.xml to bound mappers per node to see parallel execution benefits.
For more performance monitor options, you probably might opt to install Ganglia, also see this blog entry: Monitoring Hadoop beyond Ganglia

Improving MySQL read time, MySQLdb

I have a table with more than a million record with the following structure:
mysql> SELECT * FROM Measurement;
+----------------+---------+-----------------+------+------+
| Time_stamp | Channel | SSID | CQI | SNR |
+----------------+---------+-----------------+------+------+
| 03_14_14_30_14 | 7 | open | 40 | -70 |
| 03_14_14_30_14 | 7 | roam | 31 | -79 |
| 03_14_14_30_14 | 8 | open2 | 28 | -82 |
| 03_14_14_30_15 | 8 | roam2 | 29 | -81 |....
I am reading data from this table into python for plotting. The problem is, the MySQL reads are too slow and it is taking me hours to get the plots even after using
MySQLdb.cursors.SSCursor (as suggested by a few in this forum) to quicken up the task.
con = mdb.connect('localhost', 'testuser', 'conti', 'My_Freqs', cursorclass = MySQLdb.cursors.SSCursor);
cursor=con.cursor()
cursor.execute("Select Time_stamp FROM Measurement")
for row in cursor:
... Do processing ....
Will normalizing the table help me in speeding up the task? If so, How should i normalize it?
P.S: Here is the result for EXPLAIN
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| Time_stamp | varchar(128) | YES | | NULL | |
| Channel | int(11) | YES | | NULL | |
| SSID | varchar(128) | YES | | NULL | |
| CQI | int(11) | YES | | NULL | |
| SNR | float | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
The problem is probably that you are looping over the cursor instead of just dumping out all the data at once and then processing it. You should be able to dump out a couple million rows in a couple/few seconds. Try to do something like
cursor.execute("select Time_stamp FROM Measurement")
data = cusror.fetchall()
for row in data:
#do some stuff...
Well, since you're saying the whole table has to be read, I guess you can't do much about it. It has more than 1 million records... you're not going to optimize much on the database side.
How much time does it take you to process just one record? Maybe you could try optimizing that part. But even if you got down to 1 millisecond per record, it would still take you about half an hour to process the full table. You're dealing with a lot of data.
Maybe run multiple plotting jobs in parallel? With the same metrics as above, dividing your data in 6 equal-sized jobs would (theoretically) give you the plots in 5 minutes.
Do your plots have to be fine-grained? You could look for ways to ignore certain values in the data, and generate a complete plot only when the user needs it (wild speculation here, I really have no idea what your plots look like)

Categories