what is the Error in int object iteration? - python

What is 'int' object is not subscriptable in this code?
import math
import os
import random
import re
import sys
# Complete the hourglassSum function below.
def hourglassSum(arr):
sum1=0
result=0
for i in range(4):
for j in range(4):
sum1=arr[i][j]+arr[i+1][j]+arr[i+2][j]+arr[i+1][j+1]+arr[i][j+2]+arr[i+1][j+2]+arr[i+2[j+2]]
if sum1>result:
result=sum1
return result
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
arr = []
for _ in range(6):
arr.append(list(map(int, input().rstrip().split())))
result = hourglassSum(arr)
fptr.write(str(result) + '\n')
fptr.close()

The very last part of this long line:
sum1=arr[i][j]+arr[i+1][j]+arr[i+2][j]+arr[i+1][j+1]+arr[i][j+2]+arr[i+1][j+2]+arr[i+2[j+2]]
(this part here):
arr[i+2[j+2]]
Is an error; you seem to be trying to refer to 2[j+2]. Clearly the integer 2 is not an array, so Python complains to you that it makes no sense to index an integer.
You probably want that last term to be:
arr[i+2][j+2]
Looking more closely at the long line, it seems like what you are trying to accomplish is obtain the sum of the elements in a 3x3 section of arr. But even the long line is missing some of the combinations. Rather than risk typing the list of addition problems incorrectly (because there are so many), use a set of nested loops to build up the sum of the 3x3 segment.

Related

How I automate my python script or get multiple entries in one run?

I am running the following python script:
import random
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
with open('file_output.txt','a') as out:
out.write(f'{result_str}\n')
Is there a way I could automate this script to run automatically? or If I can get multiple outputs instantly?
Ex. Right now the output stores itself in the file one by one
kmfd5s6s
But if somehow I can get 1,000,000 entries in the file on one click and there is no duplication.
Same logic as given by PangolinPaws,but since you require it for a 1,000,000 entries, which is quite large, using numpy could be more effecient. Also, replacing random.choice() with random.choices() with k=8, inorder to avoid the for loop to generate the string.
import random
import numpy as np
a = np.array([])
for i in range(1000000):
str = ''.join((random.choices('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()', k = 8)))
if str not in a:
a = np.append(a,str)
np.savetxt("generate_strings.csv", a, fmt='%s')
You need to nest your out.write() in a loop, something like this, to make it happen multiple times:
import random
with open('file_output.txt','a') as out:
for x in range(1000): # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
out.write(f'{result_str}\n')
However, while unlikely, it is possible that you could end up with duplicate rows. To avoid this, you can generate and store your random strings in a loop and check for duplicates as you go. Once you have enough, write them all to the file outside the loop:
import random
results = []
while len(results) < 1000: # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
if result_str not in results: # check if the generated result_str is a duplicate
results.append(result_str)
with open('file_output.txt','a') as out:
out.write( '\n'.join(results) )

How to solve a Linear System of Equations in Python When the Coefficients are Unknown (but still real numbers)

Im not a programer so go easy on me please ! I have a system of 4 linear equations and 4 unknowns, which I think I could use python to solve relatively easily. However my equations not of the form " 5x+2y+z-w=0 " instead I have algebraic constants c_i which I dont know the explicit numerical value of, for example " c_1 x + c_2 y + c_3 z+ c_4w=c_5 " would be one my four equations. So does a solver exist which gives answers for x,y,z,w in terms of the c_i ?
Numpy has a function for this exact problem: numpy.linalg.solve
To construct the matrix we first need to digest the string turning it into an array of coefficients and solutions.
Finding Numbers
First we need to write a function that takes a string like "c_1 3" and returns the number 3.0. Depending on the format you want in your input string you can either iterate over all chars in this array and stop when you find a non-digit character, or you can simply split on the space and parse the second string. Here are both solutions:
def find_number(sub_expr):
"""
Finds the number from the format
number*string or numberstring.
Example:
3x -> 3
4*x -> 4
"""
num_str = str()
for char in sub_expr:
if char.isdigit():
num_str += char
else:
break
return float(num_str)
or the simpler solution
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split()[1])
Note: See edits
Get matrices
Now we can use that to split each expression into two parts: The solution and the equation by the "=". The equation is then split into sub_expressions by the "+" This way we would end turn the string "3x+4y = 3" into
sub_expressions = ["3x", "4y"]
solution_string = "3"
Each sub expression then needs to be fed into our find_numbers function. The End result can be appended to the coefficient and solution matrices:
def get_matrices(expressions):
"""
Returns coefficient_matrix and solutions from array of string-expressions.
"""
coefficient_matrix = list()
solutions = list()
last_len = -1
for expression in expressions:
# Note: In this solution all coefficients must be explicitely noted and must always be in the same order.
# Could be solved with dicts but is probably overengineered.
if not "=" in expression:
print(f"Invalid expression {expression}. Missing \"=\"")
return False
try:
c_string, s_string = expression.split("=")
c_strings = c_string.split("+")
solutions.append(float(s_string))
current_len = len(c_strings)
if last_len != -1 and current_len != last_len:
print(f"The expression {expression} has a mismatching number of coefficients")
return False
last_len = current_len
coefficients = list()
for c_string in c_strings:
coefficients.append(find_number(c_string))
coefficient_matrix.append(coefficients)
except Exception as e:
print(f"An unexpected Runtime Error occured at {coefficient}")
print(e)
exit()
return coefficient_matrix, solutions
Now let's write a simple main function to test this code:
# This is not the code you want to copy-paste
# Look further down.
from sys import argv as args
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
for row in matrix:
print(row)
print("")
print(solutions)
if __name__ == "__main__":
main()
Let's run the program in the console!
user:$ python3 solve.py 2x+3y=4 3x+3y=2
[2.0, 3.0]
[3.0, 3.0]
[4.0, 2.0]
You can see that the program identified all our numbers correctly
AGAIN: use the find_number function appropriate for your format
Put The Pieces Together
These Matrices now just need to be pumped directly into the numpy function:
# This is the main you want
from sys import argv as args
from numpy.linalg import solve as solve_linalg
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
coefficients = solve_linalg(matrix, solutions)
print(coefficients)
# This bit needs to be at the very bottom of your code to load all functions first.
# You could just paste the main-code here, but this is considered best-practice
if __name__ == '__main__':
main()
Now let's test that:
$ python3 solve.py x*2+y*4+z*0=20 x*1+y*1+z*-1=3 x*2+y*2+z*-3=3
[2. 4. 3.]
As you can see the program now solves the functions for us.
Out of curiosity: Math homework? This feels like math homework.
Edit: Had a typo "c_string" instead of "c_strings" worked out in all tests out of pure and utter luck.
Edit 2: Upon further inspection I would reccomend to split the sub-expressions by a "*":
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split("*")[1])
This results in fairly readable input strings

Python/Pandas - String Comparisons

I have a list of strings/narratives which I need to compare and get a distance measure between each string. The current code I have written works but for larger lists it takes along time since I use 2 for loops. I have used the levenshtien distance to measure the distance between strings.
The list of strings/narratives is stored in a dataframe.
def edit_distance(s1, s2):
m=len(s1)+1
n=len(s2)+1
tbl = {}
for i in range(m): tbl[i,0]=i
for j in range(n): tbl[0,j]=j
for i in range(1, m):
for j in range(1, n):
cost = 0 if s1[i-1] == s2[j-1] else 1
tbl[i,j] = min(tbl[i, j-1]+1, tbl[i-1, j]+1, tbl[i-1, j-1]+cost)
return tbl[i,j]
def narrative_feature_extraction(df):
startTime = time.time()
leven_matrix = np.zeros((len(df['Narrative']),len(df['Narrative'])))
for i in range(len(df['Narrative'])):
for j in range(len(df['Narrative'])):
leven_matrix[i][j] = edit_distance(df['Narrative'].iloc[i],df['Narrative'].iloc[j])
endTime = time.time()
total = (endTime - startTime)
print "Feature Extraction (Leven) Runtime:" + str(total)
return leven_matrix
X = narrative_feature_extraction(df)
If the list has n narratives, the resulting X is a nxn matrix, where the rows are the narratives and the columns is what that narrative is compared to. For example, for the distance (i,j) it is the levenshtien distance between narrative i and j.
Is there a way to optimize this code so that there isn't a need to have so many for loops? Or is there a pythonic way of calculating this?
hard to give exact code without data/examples, but a few suggestions:
Use list comprehension, much faster than for ... in range ...
Depending on your version of pandas, "df[i][j]" indexing can be veeeery slow, instead use .iloc or .loc (if you want to mix and match use .iloc[df.index.get_loc("itemname"),df.columns.get_loc("itemname")] to convert loc to iloc properly if you have this issue. (I think it is only slow if you are getting warning flags for writing to a dataframe slice and depends a lot on what version of python/pandas you have, but have not tested extensively)
Better yet, run all calcs and then throw into dataframe in one go depending on your use case
If you like the pythonic reading of for loops, try to avoid using "in range" at least and instead use "for j in X[:,0]" for example. I find this to be faster in most cases, and you can use with enumerate to keep index values (example below)
Examples/timings:
def test1(): #list comprehension
X=np.random.normal(size=(100,2))
results=[[x*y for x in X[:,0]] for y in X[:,1]]
df=pd.DataFrame(data=np.array(results))
if __name__ == '__main__':
import timeit
print("test1: "+str(timeit.timeit("test1()", setup="from __main__ import test1",number=10)))
def test2(): #enumerate, df at end
X=np.random.normal(size=(100,2))
results=np.zeros((100,100))
for ind,i in enumerate(X[:,0]):
for col,j in enumerate(X[:,1]):
results[ind,col]=i*j
df=pd.DataFrame(data=results)
if __name__ == '__main__':
import timeit
print("test2: "+str(timeit.timeit("test2()", setup="from __main__ import test2",number=10)))
def test3(): #in range, but df at end
X=np.random.normal(size=(100,2))
results=np.zeros((100,100))
for i in range(len(X)):
for j in range(len(X)):
results[i,j]=X[i,0]*X[j,1]
df=pd.DataFrame(data=results)
if __name__ == '__main__':
import timeit
print("test3: "+str(timeit.timeit("test3()", setup="from __main__ import test3",number=10)))
def test4(): #current method
X=np.random.normal(size=(100,2))
df=pd.DataFrame(data=np.zeros((100,100)))
for i in range(len(X)):
for j in range(len(X)):
df[i][j]=(X[i,0]*X[j,1])
if __name__ == '__main__':
import timeit
print("test4: "+str(timeit.timeit("test4()", setup="from __main__ import test4",number=10)))
output:
test1: 0.0492231889643
test2: 0.0587620022106
test3: 0.123777403419
test4: 12.6396287782
so list comprehension is ~250 times faster, and enumerate is twice as fast as "for x in range". Although the real slowdown is individual indexing of your dataframe (even if using .loc or .iloc this will still be your bottleneck so I suggest working with arrays outside of the df if possible)
Hope this helps and you are able to apply to your case. I'd recommend reading up on map, filter, reduce, (maybe enumerate) functions as well as they are quite quick and might help you: http://book.pythontips.com/en/latest/map_filter.html
Unfortunately I am not really familiar with your use case though, but I don't see a reason why it wouldn't be applicable or compatible with this type of code tuning.

Having an issue with using median function in numpy

I am having an issue with using the median function in numpy. The code used to work on a previous computer but when I tried to run it on my new machine, I got the error "cannot perform reduce with flexible type". In order to try to fix this, I attempted to use the map() function to make sure my list was a floating point and got this error message: could not convert string to float: .
Do some more attempts at debugging, it seems that my issue is with my splitting of the lines in my input file. The lines are of the form: 2456893.248202,4.490 and I want to split on the ",". However, when I print out the list for the second column of that line, I get
4
.
4
9
0
so it seems to somehow be splitting each character or something though I'm not sure how. The relevant section of code is below, I appreciate any thoughts or ideas and thanks in advance.
def curve_split(fn):
with open(fn) as f:
for line in f:
line = line.strip()
time,lc = line.split(",")
#debugging stuff
g=open('test.txt','w')
l1=map(lambda x:x+'\n',lc)
g.writelines(l1)
g.close()
#end debugging stuff
return time,lc
if __name__ == '__main__':
# place where I keep the lightcurve files from the image subtraction
dirname = '/home/kuehn/m4/kepler/subtraction/detrending'
files = glob.glob(dirname + '/*lc')
print(len(files))
# in order to create our lightcurve array, we need to know
# the length of one of our lightcurve files
lc0 = curve_split(files[0])
lcarr = np.zeros([len(files),len(lc0)])
# loop through every file
for i,fn in enumerate(files):
time,lc = curve_split(fn)
lc = map(float, lc)
# debugging
print(fn[5:58])
print(lc)
print(time)
# end debugging
lcm = lc/np.median(float(lc))
#lcm = ((lc[qual0]-np.median(lc[qual0]))/
# np.median(lc[qual0]))
lcarr[i] = lcm
print(fn,i,len(files))

How to check python codes by reduction?

import numpy
def rtpairs(R,T):
for i in range(numpy.size(R)):
o=0.0
for j in range(T[i]):
o +=2*(numpy.pi)/T[i]
yield R[i],o
R=[0.0,0.1,0.2]
T=[1,10,20]
for r,t in genpolar.rtpairs(R,T):
plot(r*cos(t),r*sin(t),'bo')
This program is supposed to be a generator, but I would like to check if i'm doing the right thing by first asking it to return some values for pheta (see below)
import numpy as np
def rtpairs (R=None,T=None):
R = np.array(R)
T = np.array(T)
for i in range(np.size(R)):
pheta = 0.0
for j in range(T[i]):
pheta += (2*np.pi)/T[i]
return pheta
Then
I typed import omg as o in the prompt
x = [o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20])]
# I tried to collect all values generated by the loops
It turns out to give me only one value which is 2 pi ... I have a habit to check my codes in the half way through, Is there any way for me to get a list of angles by using the code above? I don't understand why I must use a generator structure to check (first one) , but I couldn't use normal loop method to check.
Normal loop e.g.
x=[i for i in range(10)]
x=[0,1,2,3,4,5,6,7,8,9]
Here I can see a list of values I should get.
return pheta
You switched to return instead of yield. It isn't a generator any more; it's stopping at the first return. Change it back.
x = [o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20])]
This wraps the rtpairs return value in a 1-element list. That's not what you want. If you want to extract all elements from a generator and store them in a list, call list on the generator:
x = list(o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20]))

Categories