I have a file with a function defined, which imports and organizes data into a list of lists. It returns that list of lists, and this all functions fine within the same file (if I write a main function and call the import function, no problems).
def import_viscosity_data(size_of_header):
...
return (list_of_lists)
I'm trying to call this function from another file in the same directory, using the following:
import load_files
print(load_files.import_viscosity_data(7))
Unfortunately, this keeps returning 'None', and if I try to get the length of the returned array, it throws an error:TypeError: object of type 'NoneType' has no len()
I'm guessing that it's passing me a reference to the list, and the actual list gets deleted as soon as the function terminates, but I'm not sure how to resolve this problem. Any help would be greatly appreciated!
Here's the code:
import os
import tkinter
from tkinter import filedialog
#import decimal
from decimal import *
def import_viscosity_data(size_of_header):
### This function imports viscosity data from multiple files, skipping the
### header passed inof the form shearRate '\t' viscosity and puts it into
### an array of the form test_num[result_type[data]]] where result type
### is 0 (shearRate) or 1 (viscosity)
header_size = size_of_header
root = tkinter.Tk()
root.withdraw()
file_path = root.tk.splitlist(filedialog.askopenfilenames(
parent=root, title='Choose a file(s):'))
test_num = []
result_type = []
data_1 = []
data_2 = []
for file_name in file_path:
f = open(file_name)
## Skip the header, which consists of header_size lines
for i in range(header_size):
next(f)
lines = [line.strip() for line in f]
f.close()
## For a line, slice all characters before the tab, then after the tab
## convert to Decimal, and append to the data list
for index in range(len(lines)):
data_1.append(Decimal(lines[index][0:lines[index].find('\t')]))
data_2.append(Decimal(lines[index][lines[index].find('\t') + 1:]))
result_type.append(data_1)
result_type.append(data_2)
test_num.append(result_type)
data_1, data_2, result_type = [], [], []
return(test_num)
Here's some sample data to try it on (any data in 2 columns with a tab in between):
0 1.2381
0.004 1.23901
0.008 1.23688
0.012 1.23734
0.016 1.23779
0.02 1.23901
0.024 1.23932
0.028 1.23886
0.032 1.23688
0.036 1.2384
Again, within this program (running in an IDE, or if I write a small main() function), this returns a list of list of lists, and works just fine. However, when I import the function in a different file, it returns None, without throwing any errors. The function name pops up automatically in the IDE after the import load_files, so it seems to be importing properly.
Note
*This secondary problem was resolved. The file load_files.py was within a directory called load_files. The import statement was changed to from load_files import load_files and it now functions properly.*
Today my problem has gotten even worse. Now, I can't get any functions from the first file to be recognized in the second. Even a simple set of code like:
#load_files.py
def test_func():
print('test successful')
#test.py
import load_files
load_files.test_func()
is throwing this error:
Traceback (most recent call last):
File "C:\Users\tmulholland\Documents\Carreau - WLF\test.py", line 8, in <module>
load_files.test_func
AttributeError: 'module' object has no attribute 'test_func'
load_files.py is in it's own folder (of the same name) with a blank __init__.py file
Note I should add that I'm using the Pyzo IDE because I want to use the scipy library to do curve fitting / optimization problems. I can't get any functions to import correctly today into Pyzo, no matter how simple. Has anybody else had this problem?
The problem was the test.py file's import statement. At first, I confused the issue by having, in the same directory as the test.py, load_files.py and also a directory called load_files which contained load_files.py as well as a blank file called __init__.py.
The original script read
import load_files
print(load_files.import_viscosity_data(7))
I eliminated the load_files.py which shared a directory with test.py. Now, I have test.py in the parent directory, then a sub-directory called load_files which contains load_files.py. The new script reads:
from load_files import load_files
print(load_files.import_viscosity_data(7))
Now, the list of lists is passed in to the local space, so a statement like
list_test = load_files.import_viscosity_data(7)
works fine.
I couldn't get things to work correctly when I just had the two .py files in the same directory (e.g. test.py and load_files.py in the same directory, no sub-directory). The import statement of import load_files threw an error that the module doesn't exist. Anyway, it all works well now with the above code.
Special thanks to Martijn Pieters for the feedback.
Related
Asset variable must be declared from mother file (this is a simplified version).
File cannot be merged.
This mother file works :
import pandas as pd
import datetime as dt
import yfinance as yf
data=yf.Ticker("^NDX")
dataHist= data.history(interval="1d",start= dt.date.today()-dt.timedelta(days = 50) ,end= dt.date.today()-dt.timedelta(days = 0))
df = pd.DataFrame(dataHist[["Open","High","Low","Close"]])
import child_file
if __name__=="__main__":
df["expo"] = child_file.exponential_moving_average
print(df)
With this child file :
import pandas as pd
import parent_file
exponential_moving_average = pd.Series.ewm(parent_file.df["Close"],span=12).mean()
But if replace in mother file:
data=yf.Ticker("^NDX")
by this to be able to choose the asset :
if __name__=="__main__":
asset = "^"+str(input()).upper()
data=yf.Ticker(asset)
It says
NameError: name 'asset' is not defined
(from child file point of view)
How do i reorganize the code ?
I tried a lot this has went for several days now
Since you are not defining anything new or implementing any modular functions in your child file, I would suggest moving that to the parent file and deleting the child file.
However, if you so choose to maintain the two, you should
load dataFrame in your child file, but not your parent file.
You could potentially define a function in your child file that takes parameters required for loading your ticker data, and call that function from your parent file.
Edit:
I would write a function in your child file.
def moving_average(close_price):
return pd.Series.ewm(close_price, span=12).mean()
I am a fairly beginner programmer with python and in general with not that much experience, and currently I'm trying to parallelize a process that is heavily CPU bound in my code. I'm using anaconda to create environments and Visual Code to debug.
A summary of the code is as following :
from tkinter import filedialog
import myfuncs as mf, concurrent.futures
file_path = filedialog.askopenfilename('Ask for a file containing data')
# import data from file_path
a = input('Ask the user for input')
Next calculations are made from these and I reach a stage where I need to iterate of a list of lists. These lists may contain up to two values and calls are made to a separate file.
For example the inputs are :
sub_data1 = [test1]
sub_data2 = [test1, test2]
dataset = [sub_data1, sub_data2]
This is the stage I use concurrent.futures.ProcessPoolExecutor()-instance and its .map() method :
with concurrent.futures.ProcessPoolExecutor() as executor:
sm_res = executor.map(mf.process_distr, dataset)
While inside a myfuncs.py, the mf.process_distr() function works like this :
def process_distr(tests):
sm_reg = []
for i in range(len(tests)):
if i==0:
# do stuff
sm_reg.append(result1)
else:
# do stuff
sm_reg.append(result2)
return sm_reg
The problem is that when I try to execute this code on the main.py file, it seems that the main.py starts running multiple times, and asks for user inputs and file dialog pops up multiple times (same amount as cores count).
How can I resolve this matter?
Edit: After reading more into it, encapsulating the whole main.py code with:
if __name__ == '__main__':
did the trick. Thank you to anyone who gave time to help with my rookie problem.
I have three files, one being the main file that needs to be run, and the other two contain utility functions, as follows. All the files are in the same directory and I am running it on PyCharm.
# delta_plots.py - This is the main file
...
from delta_plots_utility_1 import *
from delta_plots_utility_2 import *
...
def print_parameter_header(params, flag):
batch_size, epochs, lr = params[0], params[1], params[2]
print("{} - Batch size: {}, Epochs: {}, Learning rate: {}".
format(flag.upper(), batch_size, epochs, lr))
...
if __name__ == '__main__':
# call the utility functions based on a condition
if (condition1):
utility_function_1()
elif (condition2):
utility_function_2()
# delta_plots_utility_1.py - Utility file 1
# this import statement is to import the print_parameter_header() function
# from the main file
from plot_delta_mp import *
def utility_function_1():
# this function makes a call to the print_parameter_header() function
...
print_parameter_header(params, flag)
...
# delta_plots_utility_2.py - Utility file 2
from plot_delta_mp import *
def utility_function_2():
# this function also makes a call to the print_parameter_header() function
...
print_parameter_header(params, flag)
...
The problem is when in the main file, if condition1 is true, then I am forced to put the import statement for utility file 1 before the import statement for utility file 2, and vice versa.
Otherwise, I get the following error:
NameError: name 'print_parameter_header' is not defined
I also tried importing the files as modules and then accessing the function as module.print_parameter_header(), but that does not help either.
I had the following questions regarding this:
From what I understand, the order of the import statements is not important. So why is this happening? Why does changing the order resolve the error?
Could this be because of the loop-like importing? Since I am importing the main file in the utility functions too.
If yes, then is it okay to define print_parameter_header() in the utility files? Although it would be redundant, is that a good practice?
It seems that all of your issues come from that initial misunderstanding: "From what I understand, the order of the import statements is not important."
In python, an import statement
can happen anywhere in the code (not necessarily at the beginning), so if you enter into circular dependency issues it might be a good idea to import the latest possible if you have no other design choice
creates symbols in the code. So from xxx import a will create variable a locally, just like writing a = 0. It is exactly the same.
So maybe a good solution for you would be to stop using from <xxx> import * or import <xxx>, which both import all symbols from the other module, but to import selected symbols in precisely controlled places. Such as from <xxx> import a, b and later in your code from <xxx> import c.
Sorry for not taking the time to adapt the above answer to your precise code example, but hopefully you'll get the idea.
I'm doing simulations for scientific computing, and I'm almost always going to want to be in the interactive interpreter to poke around at the output of my simulations. I'm trying to write classes to define simulated objects (neural populations) and I'd like to formalize my testing of these classes by calling a script %run test_class_WC.py in ipython. Since the module/file containing the class is changing as I try to debug it/add features, I'm reloading it each time.
./test_class_WC.py:
import WC_class # make sure WC_class exists
reload(WC_class) # make sure it's the most current version
import numpy as np
from WC_class import WC_unit # put the class into my global namespace?
E1 = WC_unit(Iapp=100)
E1.update() # see if it works
print E1.r
So right off the bat I'm using reload to make sure I've got the most current version of the module loaded so I've got the freshest class definition-- I'm sure this is clunky as heck (and maybe more sinister?), but it saves me some trouble from doing %run WC_class.py and having to do a separate call to %run test_WC.py
and ./WC_class:
class WC_unit:
nUnits = 0
def __init__(self,**kwargs):
self.__dict__.update(dict( # a bunch of params
gee = .6, # i need to be able to change
ke=.1,the=.2, # in test_class_WC.py
tau=100.,dt=.1,r=0.,Iapp=1.), **kwargs)
WC_unit.nUnits +=1
def update(self):
def f(x,k=self.ke,th=self.the): # a function i define inside a method
return 1/(1+np.exp(-(x-th)/k)) # using some of those params
x = self.Iapp + self.gee * self.r
self.r += self.dt/self.tau * (-self.r + f(x))
WC_unit basically defines a bunch of default parameters and defines an ODE that updates using basic Euler integration. I expect that test_class_WC sets up a global namespace containing np (and WC_unit, and WC_class)
When I run it, I get the following error:
In [14]: %run test_class_WC.py
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/Users/steeles/Desktop/science/WC_sequence/test_class_WC.py in <module>()
8
9 E1 = WC_unit(Iapp=100)
---> 10 E1.update()
11
12 # if bPlot:
/Users/steeles/Desktop/science/WC_sequence/WC_class.py in update(self)
19 return 1/(1+np.exp(-(x-th)/k))
20 x = self.Iapp + self.gee * self.r
---> 21 self.r += self.dt/self.tau * (-self.r + f(x))
22
23 # #class_method
/Users/steeles/Desktop/science/WC_sequence/WC_class.py in f(x, k, th)
17 def update(self):
18 def f(x,k=self.ke,th=self.the):
---> 19 return 1/(1+np.exp(-(x-th)/k))
20 x = self.Iapp + self.gee * self.r
21 self.r += self.dt/self.tau * (-self.r + f(x))
NameError: global name 'np' is not defined
Now I can get around this by just importing numpy as np in top of the WC_class module, or even by doing from numpy import exp in test_class_WC and change the update() method to contain exp() instead of np.exp()... but I'm not trying to do this because it's easy, I want to learn how all this namespace/module stuff works so I stop being a python idiot. Why is np getting lost in the WC_unit namespace? Is it because I'm dealing with two different files/modules? Does the call to np.exp inside a function have to do with it?
I'm also open to suggestions regarding improving my workflow and file structure, as it seems to be not particularly pythonic. My background is in MATLAB if that helps anyone understand. I'm editing my .py files in SublimeText2. Sorry the code is not very minimal, I've been having a hard time reproducing the problem.
The correct approach is to do an import numpy as np at the top of your sub-module as well. Here's why:
The key thing to note is that in Python, global actually means "shared at a module-level", and the namespaces for each module exist distinct from each other except when a module explicitly imports from another module. An imported module definitely cannot reach out to its 'parent' module's namespace, which is probably a good thing all things considered, otherwise you'll have modules whose behavior depends entirely on the variables defined in the module that imports it.
So when the stack trace says global name 'np' is not defined, it's talking about it at a module level. Python does not let the WC_Class module access objects in its parent module by default.
(As an aside, effbot has a quick note on how to do inter-module globals)
Another key thing to note is that even if you have multiple import numpy as np in various modules of your code, the module actually only gets loaded (i.e. executed) once. Once loaded, modules (being Python objects themselves) can be found in the dictionary sys.modules, and if a module already exists in this dictionary, any import module_to_import statement simply lets the importing module access names in the namespace of module_to_import. So having import numpy as np scattered across multiple modules in your codebase isn't wasteful.
Edit: On deeper digging, effbot has an even deeper (but still pretty quick and simple) exploration of what actually happens in module imports. For deeper exploration of the topic, you may want to check the import system discussion newly added in the Python 3 documentation.
It is normal in Python to import each module that is needed with in each. Don't count on any 'global' imports. In fact there isn't such a thing. With one exception. I discovered in
Do I have to specify import when Python script is being run in Ipython?
that %run -i myscript runs the script in the Ipython interactive namespace. So for quick test scripts this can save a bunch of imports.
I don't see the need for this triple import
import WC_class # make sure WC_class exists
reload(WC_class) # make sure it's the most current version
...
from WC_class import WC_unit
If all you are using from WC_class just use the last line.
I'm currently writing a python script which plots a numpy matrix containing some data (which I'm not having any difficulty computing). For complicated reasons having to do with how I'm creating that data, I have to go through terminal. I've done problems like this a million times in Spyder using imshow(). So, I thought I'd try to do the same in terminal. Here's my code:
from numpy import *
from matplotlib import *
def make_picture():
f = open("DATA2.txt")
arr = zeros((200, 200))
l = f.readlines()
for i in l:
j = i[:-1]
k = j.split(" ")
arr[int(k[0])][int(k[1])] = float(k[2])
f.close()
imshow(arr)
make_picture()
Suffice it to say, the array stuff works just fine. I've tested it, and it extracts the data perfectly well. So, I've got this 200 by 200 array of numbers floating around my RAM and I'd like to display it. When I run this code in Spyder, I get exactly what I expected. However, when I run this code in Terminal, I get an error message:
Traceback (most recent call last):
File "DATAmine.py", line 15, in <module>
make_picture()
File "DATAmine.py", line 13, in make_picture
imshow(arr)
NameError: global name 'imshow' is not defined
(My program's called DATAmine.py) What's the deal here? Is there something else I should be importing? I know I had to configure my Spyder paths, so I wonder if I don't have access to those paths or something. Any suggestions would be greatly appreciated. Thanks!
P.S. Perhaps I should mention I'm using Ubuntu. Don't know if that's relevant.
To make your life easier you can use
from pylab import *
This will import the full pylab package, which includes matplotlib and numpy.
Cheers