I am just learning python and I'm simply trying to print the results of a function using a range of numbers, but I am getting the error "The truth value of an array with more than one element is ambiguous."
print(t1) works and shows the range I want to use in the calculations.
print(some_function(55,t1)) produces the error
What am I missing?
Please note, I am doing this to help someone for an assignment and they can only use commands or functions that they have been shown, which is not a lot, basically just what's in the current code and arrays.
Thanks for any help
from pylab import *
def some_function(ff, dd):
if dd >=0 and dd <=300:
tt = (22/-90)*ff+24
elif dd >=300 and dd <=1000:
st = (22/-90)*(ff)+24
gg = (st-2)/-800
tt = gg*dd+(gg*-1000+2)
else:
tt = 2.0
return tt
t1=arange(0,12000,1000)
print(t1)
print(some_function(55,t1))
You are only making a minor error.
t1=arange(0,12000,1000)
print(t1)
[ 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000]
You have to loop through t1 and call the function for each value in the loop.
for x in t1:
print(some_function(55,x))
10.555555555555555
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
We are missing part of the loop in the calculation because of the values in t1. Let's adjust the range a bit.
t1=arange(0,2000,100)
print(t1)
[ 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
1400 1500 1600 1700 1800 1900]
And the resultant function:
for x in t1:
print(some_function(55,x))
10.555555555555555
10.555555555555555
10.555555555555555
10.555555555555555
8.416666666666668
7.347222222222222
6.277777777777779
5.208333333333334
4.138888888888889
3.0694444444444446
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
Related
Consider a table that is created using the following code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Reference Value' : [4.8, 2.4, 3.6, 0.6, 4.8, 5.4], 'True Result' : [8, 4, 6, 1, 8, 9]})
x = 1.5
df["Predicted Result"] = df['Reference Value'] * x
df["Error Squared"] = np.square(df["True Result"] - df["Predicted Result"])
Which if printed, looks as follows:
Reference Value True Result Predicted Result Error Squared
0 4.8 8 7.2 0.64
1 2.4 4 3.6 0.16
2 3.6 6 5.4 0.36
3 0.6 1 0.9 0.01
4 4.8 8 7.2 0.64
5 5.4 9 8.1 0.81
The total squared error is:
print("Total Error Squared: " + str(np.sum(df["Error Squared"])))
>> Total Error Squared: 2.6199999999999997
I am trying to change x such that the total error squared in the table is minimized. Ideally, after minimization, the table should look something like this:
Reference Value True Result Predicted Result Error Squared
0 4.8 8 8.0 0.0
1 2.4 4 4.0 0.0
2 3.6 6 6.0 0.0
3 0.6 1 1.0 0.0
4 4.8 8 8.0 0.0
5 5.4 9 9.0 0.0
with x being set to 1.6666
How can I achieve this through scipy or similar? Thanks
You can use scipy.optimize.minimize:
from scipy.optimize import minimize
ref_vals = df["Reference Value"].values
true_vals = df["True Result"].values
def obj(x):
return np.sum((true_vals - ref_vals * x)**2)
res = minimize(obj, x0=[1.0])
where res.x contains the solution 1.66666666.
I am using pulp to create an allocator function which packs the items in the trucks based on the weight and volume. It works fine(takes 10-15 sec) for 10-15 items but when I double the items it takes more than half hour to solve it.
def allocator(item_mass,item_vol,truck_mass,truck_vol,truck_cost, id_series):
n_items = len(item_vol)
set_items = range(n_items)
n_trucks = len(truck_cost)
set_trucks = range(n_trucks)
print("working1")
y = pulp.LpVariable.dicts('truckUsed', set_trucks,
lowBound=0, upBound=1, cat=LpInteger)
x = pulp.LpVariable.dicts('itemInTruck', (set_items, set_trucks),
lowBound=0, upBound=1, cat=LpInteger)
print("working2")
# Model formulation
prob = LpProblem("Truck allocation problem", LpMinimize)
# Objective
prob += lpSum([truck_cost[i] * y[i] for i in set_trucks])
print("working3")
# Constraints
for j in set_items:
# Every item must be taken in one truck
prob += lpSum([x[j][i] for i in set_trucks]) == 1
for i in set_trucks:
# Respect the mass constraint of trucks
prob += lpSum([item_mass[j] * x[j][i] for j in set_items]) <= truck_mass[i]*y[i]
# Respect the volume constraint of trucks
prob += lpSum([item_vol[j] * x[j][i] for j in set_items]) <= truck_vol[i]*y[i]
print("working4")
# Ensure y variables have to be set to make use of x variables:
for j in set_items:
for i in set_trucks:
x[j][i] <= y[i]
print("working5")
s = id_series #id_series
prob.solve()
print("working6")
This is the data i am running it on
items:
Name Pid Quantity Length Width Height Volume Weight t_type
0 A 1 1 4.60 4.30 4.3 85.05 1500 Open
1 B 2 1 4.60 4.30 4.3 85.05 1500 Open
2 C 3 1 6.00 5.60 9.0 302.40 10000 Container
3 D 4 1 8.75 5.60 6.6 441.00 1000 Open
4 E 5 1 6.00 5.16 6.6 204.33 3800 Open
5 C 6 1 6.00 5.60 9.0 302.40 10000 All
6 C 7 1 6.00 5.60 9.0 302.40 10000 Container
7 D 8 1 8.75 5.60 6.6 441.00 6000 Open
8 E 9 1 6.00 5.16 6.6 204.33 3800 Open
9 C 10 1 6.00 5.60 9.0 302.40 10000 All
.... times 5
trucks(this just the top 5 rows, I have 54 types of trucks in total):
Category Name TruckID Length(ft) Breadth(ft) Height(ft) Volume \
0 LCV Tempo 407 0 9.5 5.5 5.5 287.375
1 LCV Tempo 407 1 9.5 5.5 5.5 287.375
2 LCV Tempo 407 2 9.5 5.5 5.5 287.375
3 LCV 13 Feet 3 13.0 5.5 7.0 500.500
4 LCV 14 Feet 4 14.0 6.0 6.0 504.000
Weight Price
0 1500 1
1 2000 1
2 2500 2
3 3500 3
4 4000 3
where ItemId is this:
data["ItemId"] = data.index + 1
id_series = data["ItemId"].tolist()
PuLP can handle multiple solvers. See what ones you have with:
pulp.pulpTestAll()
This will give a list like:
Solver pulp.solvers.PULP_CBC_CMD unavailable.
Solver pulp.solvers.CPLEX_DLL unavailable.
Solver pulp.solvers.CPLEX_CMD unavailable.
Solver pulp.solvers.CPLEX_PY unavailable.
Testing zero subtraction
Testing continuous LP solution
Testing maximize continuous LP solution
...
* Solver pulp.solvers.COIN_CMD passed.
Solver pulp.solvers.COINMP_DLL unavailable.
Testing zero subtraction
Testing continuous LP solution
Testing maximize continuous LP solution
...
* Solver pulp.solvers.GLPK_CMD passed.
Solver pulp.solvers.XPRESS unavailable.
Solver pulp.solvers.GUROBI unavailable.
Solver pulp.solvers.GUROBI_CMD unavailable.
Solver pulp.solvers.PYGLPK unavailable.
Solver pulp.solvers.YAPOSIB unavailable.
You can then solve using, e.g.:
lp_prob.solve(pulp.COIN_CMD())
Gurobi and CPLEX are commercial solvers that tend to work quite well. Perhaps you could access them? Gurobi has a good academic license.
Alternatively, you may wish to look into an approximate solution, depending on your quality constraints.
I learned the Batch gradient descent algorithm recently and tried implementing it in Python. I used a data set which is not random. When I tried running the below code, the process is converging after 3 iterations but with a big error. Can someone guide me in a right way?
Sample Data set:(original data set length is 600.)
6203.75 1 173.8 43.6 0.0 183.0
6329.75 1 115.0 60.1 0.0 236.2
5830.75 1 159.5 94.1 21.0 275.8
4061.75 1 82.5 45.0 11.0 75.7
3311 1 185.5 46.1 4.0 0.0
4349.75 1 169.5 40.3 5.0 73.5
5695.25 1 138.5 68.9 6.0 204.2
5633.5 1 50.0 117.3 4.0 263.9
First column is the output. Second column is the constant value. Rest are features.
Thank you
data = open('Data_trial.txt','r')
import time
lines=data.readlines()
dataSet=[]
for line in lines:
dataSet.append(line.split())
original_output=[]
features=[]
for i in range(0,len(dataSet)):
features.append([])
predict=[]
grad=[]
weights=[0,0,0,0,0]
learning_factor=0.01
for i in range(0,len(dataSet)):
for j in range(0,len(dataSet[i])):
if j==0:
original_output.append(float(dataSet[i][j]))
else:
features[i].append(float(dataSet[i][j]))
def prediction(predict,weights,original_output,features):
for count in range(0,len(original_output)):
predict.append(sum(weights[i]*features[count][i] for i in range(0,len(features[count]))))
print("predicted values",predict)
def gradient(predict,grad,original_output,features):
for count in range(0,len(weights)):
grad.append(sum((predict[i]-original_output[i])*features[i][count]
for i in range(0,len(original_output))))
print("Gradient values",grad)
def weights_update(grad,learning_factor,weights):
for i in range(0,len(weights)):
weights[i]-=learning_factor*grad[i]
print("Updated weights",weights)
if __name__=="__main__":
while True:
prediction(predict,weights,original_output,features)
gradient(predict,grad,original_output,features)
weights_update(grad,learning_factor,weights)
time.sleep(1)
predict=[]
grad=[]
print()
I need to combine two dataframes that contain information about train track sections: while the "Line" identifies a track section, the two attributes "A" and "B" are given for subsections of the Line defined by start point and end point on the line; these subsections do not match between the two dataframes:
df1
Line startpoint endpoint Attribute_A
100 2.506 2.809 B-70
100 2.809 2.924 B-91
100 2.924 4.065 B-84
100 4.065 4.21 B-70
100 4.21 4.224 B-91
...
df2
Line startpoint endpoint Attribute_B
100 2.5 2.6 140
100 2.6 2.7 158
100 2.7 2.8 131
100 2.8 2.9 124
100 2.9 3.0 178
...
What I would need is a merged dataframe that gives me the combination of Attributes A and B for the respective minimal subsections where they are shared:
df3
Line startpoint endpoint Attribute_A Attribute_B
100 2.5 2.506 nan 140
100 2.506 2.6 B-70 140
100 2.6 2.7 B-70 158
100 2.7 2.8 B-70 131
100 2.8 2.809 B-70 124
100 2.809 2.9 B-91 124
100 2.9 2.924 B-91 178
100 2.924 3.0 B-84 178
...
How can I do this best in python? I'm somewhate new to it and while I get around basic calculations between rows and columns, I'm at my wit's ends with this problem; the approach of merging and sorting the two dataframes and calculating the respective differences between start- / endpoints didn't get me very far and I can't seem to find applicable information on the forums. I'm grateful for any hint !
Here is my solution, a bit long but it works:
First step is finding the intervals:
all_start_points = set(df1['startpoint'].values.tolist() + df2['startpoint'].values.tolist())
all_end_points = set(df1['endpoint'].values.tolist() + df2['endpoint'].values.tolist())
all_points = sorted(list(all_start_points.union(all_end_points)))
intervals = [(start, end) for start, end in zip(all_points[:-1], all_points[1:])]
Then we need to find the relevant interval in each dataframe (if present):
import numpy as np
def find_interval(df, interval):
return df[(df['startpoint']<=interval[0]) &
(df['endpoint']>=interval[1])]
attr_A = [find_interval(df1, intv)['Attribute_A'] for intv in intervals]
attr_A = [el.iloc[0] if len(el)>0 else np.nan for el in attr_A]
attr_B = [find_interval(df2, intv)['Attribute_B'] for intv in intervals]
attr_B = [el.iloc[0] if len(el)>0 else np.nan for el in attr_B]
Finally, we put everything together:
out = pd.DataFrame(intervals, columns = ['startpoint', 'endpoint'])
out = pd.concat([out, pd.Series(attr_A).to_frame('Attribute_A'), pd.Series(attr_B).to_frame('Attribute_B')], axis = 1)
out['Line'] = 100
And I get the expected result:
out
Out[111]:
startpoint endpoint Attribute_A Attribute_B Line
0 2.500 2.506 NaN 140.0 100
1 2.506 2.600 B-70 140.0 100
2 2.600 2.700 B-70 158.0 100
3 2.700 2.800 B-70 131.0 100
4 2.800 2.809 B-70 124.0 100
5 2.809 2.900 B-91 124.0 100
6 2.900 2.924 B-91 178.0 100
7 2.924 3.000 B-84 178.0 100
8 3.000 4.065 B-84 NaN 100
9 4.065 4.210 B-70 NaN 100
10 4.210 4.224 B-91 NaN 100
I have the following dataframe:
from pandas import *
from math import *
data=read_csv('agosto.csv')
Fecha DirViento MagViento
0 2011/07/01 00:00 N 6.6
1 2011/07/01 00:15 N 5.5
2 2011/07/01 00:30 N 6.6
3 2011/07/01 00:45 N 7.5
4 2011/07/01 01:00 --- 6.0
5 2011/07/01 01:15 --- 7.1
6 2011/07/01 01:30 S 4.7
7 2011/07/01 01:45 SE 3.1
.
.
.
The first thing i want to do, is to convert wind values to numerical values in order to obtain the u and v wind components. But when I perform the operations, the missing data (---) generates conflicts.
direccion=[]
for i in data['DirViento']:
if i=='SSW':
dir=202.5
if i=='S':
dir=180.0
if i=='N':
dir=360.0
if i=='NNE':
dir=22.5
if i=='NE':
dir=45.0
if i=='ENE':
dir=67.5
if i=='E':
dir=90.0
if i=='ESE':
dir=112.5
if i=='SE':
dir=135.0
if i=='SSE':
dir=157.5
if i=='SW':
dir=225.0
if i=='WSW':
dir=247.5
if i=='W':
dir=270.0
if i=='WNW':
dir=292.5
if i=='NW':
dir=315.0
if i=='NNW':
dir=337.5
direccion.append(dir)
data['DirViento']=direccion
i get the following:
data['DirViento'].head()
0 67.5
1 67.5
2 67.5
3 67.5
4 67.5
because missing data is assigned the value of the other rows? The components of get with the following code
Vviento=[]
Uviento=[]
for i in range(0,len(data['MagViento'])):
Uviento.append((data['MagViento'][i]*sin((data['DirViento'][i]+180)*(pi/180.0))))
Vviento.append((data['MagViento'][i]*cos((data['DirViento'][i]+180)*(pi/180.0))))
data['PromeU']=Uviento
data['PromeV']=Vviento
Now grouped to obtain statistical data
index=data.set_index(['Fecha','Hora'],inplace=True)
g = index.groupby(level=0)
but i get error
IndexError: index out of range for array
Am I doing something wrong? How to perform operations without taking into account missing data?
I see one flow in your code. You conditional statement should be more like:
if i == 'SSW':
dir = 202.5
elif i == 'S':
...
else:
dir = np.nan
Or you can clean dir variable in the beginning of the loop. Otherwise dir for row with missing data will be the same as dir for previous iteration.
But I think this code could be improved in more pythonic way, for example, something like this.
# test DataFrame
df = pd.DataFrame({'DirViento':['N', 'N', 'N', 'N', '--', '--', 'S', 'SE'])
DirViento
0 N
1 N
2 N
3 N
4 --
5 --
6 S
7 SE
# create points of compass list
dir_lst = ['NNE','NE','ENE','E','ESE','SE','SSE','S','SSW','WSW','W','WNW','NW','NNW','N']
# create dictionary from it
dir_dict = {x: (i + 1) *22.5 for i, x in enumerate(dir_lst)}
# add a new column
df['DirViento2'] = df['DirViento'].apply(lambda x: dir_dict.get(x, None))
DirViento DirViento2
0 N 360
1 N 360
2 N 360
3 N 360
4 -- NaN
5 -- NaN
6 S 180
7 SE 135
update Good suggestion from #DanAllan in comments, the code becomes even shorter and even more pythonic:
df['DirViento2'] = df['DirViento'].replace(dir_dict)