Loop for all possible combinations from 3 variables [duplicate] - python

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 3 years ago.
I have a bit of code that runs some stats for a moving 3D window at varying size. I have created a loop to to do this in increments of 5 from 5 to 50 as below.
For example first X = 5, Y = 5, Z = 5, then X = 10, Y = 10, z = 10 etc.
This works fine, but what I'd like to do is run the loop with every possible combination of X, Y and Z in increments of 5.
For example
X Y Z
5 5 5
10 5 5
15 5 5
.. .. ..
50 5 5
5 10 5
5 15 5
5 20 5
.. .. ..
10 10 5
10 15 5
etc,
so all in all it would be 1000 possible combinations I think
Can i do this with something like itertools.permutations?
I'm pretty new to python and coding so help would be much appreciated
#python code
sizeX = (0)
sizeY = (0)
sizeZ = (0)
count = (0)
for i in range(0,10):
count = (count + 1)
sizeX = (sizeX + 5)
sizeY = (sizeY + 5)
sizeZ = (sizeZ + 5)
#run the code

If you know you will have 3 variables for certain, you can use nested for loops with range:
for i in range(5, 55, 5):
for j in range(5, 55, 5):
for k in range(5, 55, 5):
print(i, j, k)

Related

How to fill a zeros matrix using for loop?

I have an array full of zeros created as A = np.zeros(m,m) and m = 6.
I want to fill this array with specific numbers that each equals sum of it's row and column number; such as A(x,y) = x+y.
How can i do this using for loop and while loop?
Method that avoids a loop with rather significant performance improvement on a large ndarray:
A = np.zeros((6,6))
m = A.shape[0]
n = A.shape[1]
x = np.transpose(np.array([*range(1,m+1)]*n).reshape(n,m))
y = np.array([*range(1,n+1)]*m).reshape(m,n)
A = x+y
print(A)
[[ 2 3 4 5 6 7]
[ 3 4 5 6 7 8]
[ 4 5 6 7 8 9]
[ 5 6 7 8 9 10]
[ 6 7 8 9 10 11]
[ 7 8 9 10 11 12]]
A = np.zeros((6,6))
for i in range(0,A.shape[0]):
for j in range(0, A.shape[1]):
A[i][j] = i+j+2
If you want the rows and columns to be starting from 1, you can directly use this code, but if you want them to be starting from 0 you can surely remove the "+2" in line-4.
Explanation:
I am first traversing the row in a loop when, then traversing the columns in loop 2, and then I am accessing the cell value using A[i][j]. and assigning it to i+j+2 (or just i + j). This way the original array will fill your new values.
Have you tried this?
for y in range(len(A)):
for x in range(len(A[Y]):
A[y][x] = x + y

How can I evenly split up a pandas.DataFrame into n-groups? [duplicate]

This question already has answers here:
Split dataframe into relatively even chunks according to length
(2 answers)
Closed 1 year ago.
I need to perform n-fold (in my particular case, a 5-fold) cross validation on a dataset that I've stored in a pandas.DataFrame. My current way seems to rearrange the row labels;
spreadsheet1 = pd.ExcelFile("Testing dataset.xlsx")
dataset = spreadsheet1.parse('Sheet1')
data = 5 * [pd.DataFrame()]
i = 0
while(i < len(dataset)):
j = 0
while(j < 5 and i < len(dataset)):
data[j] = (data[j].append(dataset.iloc[i])).reset_index(drop = True)
i += 1
j += 1
How can I split my DataFrame efficiently/intelligently without tampering with the order of the columns?
Use np.array_split to break it up into a list of "evenly" sized DataFrames. You can shuffle too if you sample the full DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(24).reshape(-1,2), columns=['A', 'B'])
N = 5
np.array_split(df, N)
#np.array_split(df.sample(frac=1), N) # Shuffle and split
[ A B
0 0 1
1 2 3
2 4 5,
A B
3 6 7
4 8 9
5 10 11,
A B
6 12 13
7 14 15,
A B
8 16 17
9 18 19,
A B
10 20 21
11 22 23]
I am still not sure why you want to do it in this way but here is a solution
df['fold'] = np.random.randint(1, 6, df.shape[0])
For example, your first fold is
df.loc[df['fold'] == 1]

Mixed integer program python

I have this optimization problem where I am trying to maximize column z based on a unique value from column X, but also within a constraint that each of the unique values picked of X added up column of Y most be less than or equal to (in this example) 23.
For example, I have this sample data:
X Y Z
1 9 25
1 7 20
1 5 5
2 9 20
2 7 10
2 5 5
3 9 10
3 7 5
3 5 5
The result should look like this:
X Y Z
1 9 25
2 9 20
3 5 5
This is replica for Set up linear programming optimization in R using LpSolve? with solution but I need the same in python.
For those who would want some help to get started with pulp in python can refer to http://ojs.pythonpapers.org/index.php/tppm/article/view/111
Github repo- https://github.com/coin-or/pulp/tree/master/doc/KPyCon2009 could be handy as well.
Below is the code in python for the dummy problem asked
import pandas as pd
import pulp
X=[1,1,1,2,2,2,3,3,3]
Y=[9,7,5,9,7,5,9,7,5]
Z=[25,20,5,20,10,5,10,5,5]
df = pd.DataFrame({'X':X,'Y':Y,'Z':Z})
allx = df['X'].unique()
possible_values = [(w,b) for w in allx for b in range(1,4)]
x = pulp.LpVariable.dicts('arr', (allx, range(1,4)),
lowBound = 0,
upBound = 1,
cat = pulp.LpInteger)
model = pulp.LpProblem("Optim", pulp.LpMaximize)
model += sum([x[w][b]*df[df['X']==w].reset_index()['Z'][b-1] for (w,b) in possible_values])
model += sum([x[w][b]*df[df['X']==w].reset_index()['Y'][b-1] for (w,b) in possible_values]) <= 23, \
"Maximum_number_of_Y"
for value in allx:
model += sum([x[w][b] for (w,b) in possible_values if w==value])>=1
for value in allx:
model += sum([x[w][b] for (w,b) in possible_values if w==value])<=1
##View definition
model
model.solve()
print("The choosen rows are out of a total of %s:"%len(possible_values))
for v in model.variables():
print v.name, "=", v.varValue
For solution in R
d=data.frame(x=c(1,1,1,2,2,2,3,3,3),y=c(9,7,5,9,7,5,9,7,5),z=c(25,20,5,20,10,5,10,5,3))
library(lpSolve)
all.x <- unique(d$x)
d[lp(direction = "max",
objective.in = d$z,
const.mat = rbind(outer(all.x, d$x, "=="), d$y),
const.dir = rep(c("==", "<="), c(length(all.x), 1)),
const.rhs = rep(c(1, 23), c(length(all.x), 1)),
all.bin = TRUE)$solution == 1,]

I have a hard time trying to understand this [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
m = 0
x = 1
while x < 4:
y = 1
while y < 3:
m = m + x + y
y = y + 1
x = x + 1
print(m)
The output is supposed to be 21 but i dont get it , what am i missing? a little help please
m = 0 and x = 1
Since x < 4 it goes inside the while loop where y is set to 1
Since y < 3 it goes inside the nested while
m becomes m + x + y = 0 + 1 + 1 = 2 and y becomes y + 1 = 1 + 1 = 2
Back to the loop condition: y < 3? Yes! Because y = 2. So it goes again inside the while
m becomes m + x + y = 2 + 1 + 2 = 5 and y becomes 3
Back again to the loop condition: y < 3? No! 3 is not less than 3, so the while is now skipped
x becomes x + 1 = 1 + 1 = 2
Back to first while condition: x < 4? Yes! Because x = 2. So it goes inside the loop again
Back to step 2.
When x finally becomes 4, the while loop will terminate and m will be printed.
Let's have a "graphical" representation.
Consider:
x values starting with 1 and growing from left to right (we don't care what's after 3: while x < 4)
y values (!!! for each x !!!) starting with 1 and growing from top to bottom (we don't care what's after 2: while y < 3)
x values are displayed using the "normal" font style, while y ones are displayed in "italic"
Everything that we care about is displayed in "bold" (actually what is not in "bold" won't even be computed by the program, I'm placing those values here, just for clarity's sake):
x values (x ∈ {1, 2, 3})
y values (y ∈ {1, 2})
x row is displayed twice, since for each y, x is added to the sum
The sum:
Is under the separation line and starts from 0
Each number is the sum (consider it a partial sum) of the numbers (in bold) on that column (above it) - they correspond to one x iteration and they contain all y iterations for that x
At the end, we add those values - for all x iterations - and get the final value
x (→): 1 2 3 4 5 6 ...
y (↓): 1 1 1 1 ...
x (→): 1 2 3 4 5 6 ...
y (↓): 2 2 2 2 ...
y (↓): 3 3 3 3 ...
y (↓): ... ... ... ... ...
sum: 0 + 5 + 7 + 9 = 21

Cumulative style calculations of entries in a data table using class initialisation in Python

I am trying to determine the optimum value of Z in a data table using Python. The optimum of Z occurs when the difference in Y values is greater than 10. In my code I am assigning the elements of each entry into a class. In order to determine the optimum I therefore need to access the previously calculated value of Y and subtract it from the new value. This all seems very cumbersome to me so if you know of a better way I can perform these type of calculations please let me know. My sample data table is:
X Y Z
1 5 10
2 3 20
3 4 30
4 6 40
5 12 50
6 12 60
7 34 70
8 5 80
My code so far is:
class values:
def __init__(self, X, Y, Z):
self.X = X
self.Y = Y
self.Z = Z
#Diff = Y2 - Y1
#if Diff > 10:
#optimum = Z
#else:
#pass
#optimum
valueLst = []
f = open('sample.txt','r')
for i in f:
X = i.split('\t')[0]
Y = i.split('\t')[1]
Z = i.split('\t')[2]
x = values(X,Y,Z)
valueLst.append(x)
An example of the operation I would like to achieve is shown in the following table. The difference in Y values is calculated in the third column, I would like to return value of Z when the difference is 22 i.e. Z value of 70.
1 2 10
2 3 1 20
3 4 1 30
4 6 2 40
5 12 6 50
6 12 0 60
7 34 22 70
8 35 1 80
Any help would be much appreciated.
A class seems like overkill for this. Why not a list of (x, y, z) tuples?
valueLst = []
for i in f:
valueLst.append(tuple(i.split('\t')))
You can then determine the differences between the y values and get the last item z from the 3-tuple corresponding to the largest delta-y:
yDiffs = [0] + list(valueLst[i][1] - valueLst[i-1][1]
for i in range(1, len(valueLst)))
bestZVal = valueLst[yDiffs.index(max(yDiffs))][2]
To start, you can put the columns into a list data structure:
f = open('sample.txt','r')
x, y, z = [], [], []
for i in f:
ix, iy, iz = map(int, i.split('\t')) # the map function changes each number
# to an integer from a string
y.append(iy)
z.append(iz)
When you have data structures, you can use them together to get other data structures you want.
Then you can get each difference starting from the second y:
differences = [y[i] - y[i+1] for i in range(1,len(y))]
What you want is the z at the same index as the max of the differences, so:
maxIndex = y.index(max(differences))
answer = z[maxIndex]
Skipping the building of tuples x, y and z
diffs = [curr-prev for curr, prev in izip(islice(y, 1, None), islice(y, len(y)-1))]
max_diff = max(diffs)
Z = y[diffs.index(max_diff)+1]
Given a file with this content:
1 5 10
2 3 20
3 4 30
4 6 40
5 12 50
6 12 60
7 34 70
8 5 80
You can read the file and convert to a list of tuples like so:
data=[]
with open('value_list.txt') as f:
for line in f:
x,y,z=map(int,line.split())
data.append((x,y,z))
print(data)
Prints:
[(1, 5, 10), (2, 3, 20), (3, 4, 30), (4, 6, 40), (5, 12, 50), (6, 12, 60), (7, 34, 70), (8, 5, 80)]
Then you can use that data to find tuples that meet your criteria using a list comprehension. In this case y-previous y>10:
tgt=10
print([data[i][2] for i in range(1,len(data)) if data[i][1]-data[i-1][1]>tgt])
[70]

Categories