Python Code for Standard Deviation with data from SQLITE3 - python

from math import *
import sqlite3
ages=sqlite3.connect('person.sqlite3')
def main():
ageslist=ages.execute("SELECT age from person")
#average age
for row in ageslist:
row[0]
average = (sum(row[0]))/len(row[0])
#subtracts average x from x or opposite and square, depending on n
for n in range(len(ageslist) - 1):
if numbers[n] > average:
numbers.append((ageslist[n] - average)**2)
if numbers[n] < average:
numbers.append((average - ageslist[n])**2)
#takes square rt of the sum of all these numbers and divides by n-1
Stdv = math.sqrt(sum(ageslist))/(len(ageslist)-1)
end=time()
print(Stdv)
main()
I am trying to find the standard deviation of the ages from an SQLITE3 db. However, I am getting the current error:
average = (sum(row[0]))/len(row[0])
TypeError: 'int' object is not iterable
How can I correct this?

The query sent to the database connection returns an iterator. You can only pass over that iterator once before it is flushed from memory. Here is some correction to your code to do what you are asking.
conn = sqlite3.connect('person.sqlite3')
def main():
ages_iterator = conn.execute("SELECT age from person")
# this turns the iterator into an actual list, which you need for stdev
age_list = [a[0] for a in ages_iterator]
# average age
average = (sum(age_list))/len(age_list)
# subtracts average x from x square
# because you are squaring the difference, the it does not matter if it is
# greater or less than the average
numbers = [(age-average)**2 for age in age_list]
#takes square rt of the sum of all these numbers and divides by n-1
Stdv = math.sqrt(sum(numbers))/float(len(numbers)-1)
end=time()
print(Stdv)
main()

Some quick comments in the code..
for row in ageslist:
row[0] # This statement does nothing
average = (sum(row[0]))/len(row[0]) # This statement will not have a row value to reference because your rows in ageslist will have been iterated through
When you execute ageslist=ages.execute("SELECT age from person")
your ageslist variable is now an iterable object. Once you iterate through it you can no longer reference values in it without executing the database command again.
So I believe you should have a variable that sums the age during every row iteration in the for loop and also another variable that keeps a count of the number of entries in the database. This could be done in the for loop as well. Although I'm sure there is a more "pythonic" way to accomplish this.

Related

Find the largest number in a pool of integers

I have been working on this code for quite a while now and frankly, I have no more ideas on how to solve this. I have sought different threads on how to do this, unfortunately, still have no answers.
To start off, I have this pool of data that is a string but needs to be considered as a list. For example:
# empDataLT
200401003,Luisa,Jurney,Accounting,800,21,4/8/2002,;
200208006,Clorinda,Heimann,Accounting,1050,15,5/21/1994,;
200307014,Dick,Wenzinger,Admin,565,15,10/13/1973,;
200901005,Ahmed,Angalich,Purchasing,750,20,2/10/1973,;
200704013,Iluminada,Ohms,Marketing,750,16,7/13/1972,;
201701018,Joanna,Leinenbach,Finance,1050,15,11/6/1980,;
201003007,Caprice,Suell,Admin,750,18,6/28/1992
a = empRecords.strip().split(";")
This pool is in the format: Employee Number, First name, Last Name, Department, Rate per day, No. of Days Worked, Birthdate
What I have been trying to do is to compute the employees' rate per day multiplied to the number of days worked, then find which of them is the highest earning employee. I have the following code which works decent, but of course it lacks the latter result needed (aka, the highest earning).
import empDataLT as x
def earn():
empEarn = list() # convert module to a list
for er in x.a:
empErn = er.strip().split(",")
empEarn.append(empErn)
b = sorted(empEarn, key=lambda x: x[4])
for e in b:
ern = (int(e[4]) * int(e[5]))
print(ern)
This will result to something like this:
20800
14400
21600
24000
12800
24000
Which is great because I have the result (yay). However, I am unable to look for the highest earning as I usually get error when I try max() since it's an integer. I tried converting it to a str then use max() and it will just give me the highest number per integer.
I'm not really sure what to do anymore.
Try this :
empRecords = '''200401003,Luisa,Jurney,Accounting,800,21,4/8/2002,;
200208006,Clorinda,Heimann,Accounting,1050,15,5/21/1994,;
200307014,Dick,Wenzinger,Admin,565,15,10/13/1973,;
200901005,Ahmed,Angalich,Purchasing,750,20,2/10/1973,;
200704013,Iluminada,Ohms,Marketing,750,16,7/13/1972,;
201701018,Joanna,Leinenbach,Finance,1050,15,11/6/1980,;
201003007,Caprice,Suell,Admin,750,18,6/28/1992'''
a = empRecords.strip().split(";")
earn = []
for i in a:
t = i.split(',')
cur = int(t[4])*int(t[5])
earn.append(cur)
print(cur)
print("Maximum Earning :",max(earn))
You can use max, you just need to keep the previous largest-found:
highestEarning = 0
for e in b:
highestEarning = max((int(e[4]) * int(e[5])),highestEarning)
Once the for loop is done, highestEarning will be the highest earning in the list.
max() takes an iterable, for example a list.
You probably tried to run max() on a single int, which doesn't work, because how should it? Finding the maximum value of one value is not really a thing.
You can create a list of ernand then use max() on this list.
You might use yield keyword for your task following way:
import empDataLT as x
def earn():
empEarn = list() # convert module to a list
for er in x.a:
empErn = er.strip().split(",")
empEarn.append(empErn)
b = sorted(empEarn, key=lambda x: x[4])
for e in b:
ern = (int(e[4]) * int(e[5]))
print(ern)
yield ern
highest = max(earn())
For discussion of yield and how to use I suggest realpython tutorial

How do I make sure all of my values are computed in my loop?

I am working on a 'keep the change assignment' where I round the purchases to the whole dollar and add the change to the savings account. However, the loop is not going through all of the values in my external text file. It only computes the last value. I tried splitting the file but it gives me an error. What might be the issue? my external text file is as so:
10.90
13.59
12.99
(each on different lines)
def main():
account1 = BankAccount()
file1 = open("data.txt","r+") # reading the file, + indicated read and write
s = 0 # to keep track of the new savings
for n in file1:
n = float(n) #lets python know that the values are floats and not a string
z= math.ceil(n) #rounds up to the whole digit
amount = float(z-n) # subtract the rounded sum with actaul total to get change
print(" Saved $",round(amount,2), "on this purchase",file = file1)
s = amount + s
x = (account1.makeSavings(s))
I'm fairly sure the reason for this is because you are printing the amount of money you have saved to the file. In general, you don't want to alter the length of an object you are iterating over because it can cause problems.
account1 = BankAccount()
file1 = open("data.txt","r+") # reading the file, + indicated read and write
s = 0 # to keep track of the new savings
amount_saved = []
for n in file1:
n = float(n) #lets python know that the values are floats and not a string
z= math.ceil(n) #rounds up to the whole digit
amount = float(z-n) # subtract the rounded sum with actaul total to get change
amount_saved.append(round(amount,2))
s = amount + s
x = (account1.makeSavings(s))
for n in amount_saved:
print(" Saved $",round(amount,2), "on this purchase",file = file1)
This will print the amounts you have saved at the end of the file after you are finished iterating through it.

Creating a monte carlo simulation from a loop python

I am attempting to calculate the probablility of a for loop returning a value lower than 10% of the initial value input using a monte-carlo simulation.
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
print(stock)
This is the for-loop that I wish to run a large number of times (200000 as a rough number) and calculate the probability that:
stock < stock_initial * .9
I've found examples that define their initial loop as a function and then will use that function in the loop, so I have tried to define a function from my loop:
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
This produces values for 'stock' that don't seem to fit the same range as before being defined as a function.
using this code I tried to run a monte-carlo simulation:
# code to implement monte-carlo simulation
number_of_loops = 200 # lower number to run quicker
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
#print(stock) this is to check the value of stock being produced.
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
but this returns no results outside the 10% range even when looped 200000 times, it seems the range/spread of results gets hugely reduced in my defined function somehow.
Can anyone see a problem in my defined function 'stock_value' or can see a way of implementing a monte-carlo simulation in a way I've not come across?
My full code for reference:
#import all modules required
import numpy as np # using different notation for easier writting
import scipy as sp
import matplotlib.pyplot as plt
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#collect variables provided by the user
stock_initial = float(12000) # can be input for variable price of stock initially.
period = int(63) # can be edited to an input() command for variable periods.
mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
addmoremoney = stock_initial*(1-mrgn_dec)
rtn_annual = np.repeat(np.arange(0.00,0.15,0.05), 31)
sig_annual = np.repeat(np.arange(0.01,0.31,0.01), 3) #use .31 as python doesn't include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = float((1/252))*rtn_annual
sig_daily = float((1/(np.sqrt(252))))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# returns the final value of stock after 63rd day(possibly?)
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# code to implement monte-carlo simulation
number_of_loops = 20000
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
print(stock)
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Posting an answer as I don't have the points to comment. Some queries about your code - going through these might help you find an answer:
Why have you defined rtn_annual as an array, np.repeat(np.arange(0.00,0.15,0.05), 31)? Since it just repeats the values [0.0, 0.05, 0.1], why not define it as a function?:
def rtn_annual(i):
vals = [0.0, 0.05, 0.1]
return vals[i % 3]
Likewise for sig_annual, rtn_daily, and sig_daily - the contents of these are all straightforward functions of the index, so I'm not sure what the advantage could be of making them arrays.
What does D actually represent? As you've defined it, it's a random variable with mean of 0.0, and standard deviation 1.0. So around 95% of the values in D will be in the range (-2.0, +2.0) - is that what you expect?
Have you tested your stock_value() function even on small periods (e.g. from 0 to a few days) to ensure it's doing what you think it should? It's not clear from your question whether you've verified that it's ever doing the right thing, for any input, and your comment "...(possibly?)" doesn't sound very confident.
Spoiler alert - it almost certainly doesn't. In the function stock_value, your return statement is within the for loop. It will get executed the first time round, when i = 0, and the loop will never get any further than that. This would be the chief reason why the function is giving different results to the loop.
Also, where you say "returning a value lower than 10% of...", I assume you mean "returning a value at least 10% lower than...", since that's what your probability stock < stock_initial * .9 is calculating.
I hope this helps. You may want to step through your code with a debugger in your preferred IDE (idle, or thonny, or eclipse, whatever it may be) to see what your code is actually doing.

How to limit for loop iterations within cursor?

I am using a for loop within a SearchCursor to iterate through features in a featureclass.
import arcpy
fc = r'C:\path\to\featureclass'
with arcpy.da.SearchCursor(fc, ["fieldA", "FieldB", "FieldC"]) as cursor:
for row in cursor:
# Do something...
I am currently troubleshooting the script and need to find a way to limit the iterations to, say, 5 rather than 3500 as it is currently configured. I know the most basic way to limit the number of iterations in a for loop is as follows:
numbers = [1,2,3,4,5]
for i in numbers[0:2]
print i
However, this approach does not work when iterating over a cursor object. What method can I use to limit the number of iterations of a for loop within a cursor object wrapped in a with statement?
You could use a list comprehension to grab everything and then take only the first five rows that you need. Check the example below:
max = 5 #insert max number of iterations here
with arcpy.da.SearchCursor(fc, ["fieldA", "FieldB", "FieldC"]) as cursor:
output = [list(row) for row in cursor][:max]
It is important to note that each row is a tuple output, thus the list() method is used to create a 2d list that can be used for whatever you need. Even if your dataset is 3500 rows, this should do the trick in little time. I hope this helps!
Add a counter and a logical statement to limit the number of iterations. For example:
import arcpy
fc = r'C:\path\to\featureclass'
count = 1 # Start a counter
with arcpy.da.SearchCursor(fc, ["fieldA", "FieldB", "FieldC"]) as cursor:
for row in cursor:
# Do something...
if count >= 2:
print "Processing stopped because iterations >= 2"
sys.exit()
count += 1
One possible way:
for index, row in enumerate(cursor):
if index > x:
# do something...
else:
# do something...

Combining functions in one file

infilehandle = open ("receipts-10-28-13.txt", "r")
# FUNCTIONS
def stripsigns( astring ):
"""Remove dollar signs"""
signs = "$"
nosigns = ""
for numbers in astring:
if numbers not in signs:
nosigns = nosigns + numbers
return nosigns
def totaltake():
"""Calculate total take"""
total = 0
for line in infilehandle:
values = line.split(':')
cardnumbers = values[1]
cardnumbers = stripsigns(cardnumbers)
total = (total + eval(cardnumbers))
total = round(total,2)
return total
# more code etc
def computetax(rate, total):
total = totaltake() - totaltip()
taxed = total * rate
taxed = round(taxed,2)
return taxed
# more code etc
# VARS
total = totaltake()
tips = totaltip()
tax = computetax(rate,total)
rate = eval(input("Input the tax rate as an integer:"))
# PRINT
print("Total take: $", totaltake())
print("Total tips: $", totaltips())
print("Tax owed: $", computetax(rate,total))
I'm trying to make a file that will look at elements from a txt file and then calculate things based on the numbers in the file. The functions are all either variations of the totaltake(), which is getting numbers from the file and finding the sum, or the computetax(), which takes the numbers the other functions calculate and multiplies/divides/etc. I've tested all the functions individually and they all work, but when I try to put them together in one file, it doesn't give me the output I want, and I don't know why. It prints the value of whatever is first in the vars list, and 0 for everything else -- so for the version of code I have above, it prints
Total take: $ 6533.47
Total tips: $ 0
Tax owed: $ 0
So basically, what am I doing wrong?
See Is file object in python an iterable
File objects are iterators, meaning that once a file has been iterated to completion, you cannot iterate over the file again unless it has been reset to the beginning of the file by calling file.seek(0).
When you only call totaltake(), because infilehandle has not yet been iterated, the for loop goes through all the lines in infilehandle, which is why you get the expected result.
However, when you put computetax() together with totaltake(), totaltake() gets called by itself, and then again incomputetax(). Because infilehandle has been iterated to completion the first time totaltake() is called, the for loop is not entered the second time and the initial value for total, 0, is returned.
As the value returned by totaltake() should not change, you should modify computetax() so that you can pass total as a parameter instead of calling totaltake(). You should do the same so that it also takes tips as a parameter instead of recalculating the value.
If you cannot modify computetake(), you could also seek to the beginning of infilehandle by adding infilehandle.seek(0) to the beginning of totaltake(), to allow it to be iterated again.

Categories