The function I wrote seems to have some problems. I want to use it to block a larger file. When I use it, the variable I defined is undefined.
On Google's colab platform.
def get_df2(file):
mydata2 = []
for chunk in pd.read_csv(file,chunksize=500000,header = None,sep='\t'):
mydata2.append(chunk)
user_data = pd.concat(mydata2,axis=0)
names2= ['user_id','age','gender','area','status','edu','ConAbility','device','work','CType','behhavior']
user_data.columns = names2
return user_data
I use my function like this:
user_data_path = 'myfile' #The file here is from my cloud, its detailed definition is too long, only abbreviations are given here.
get_df2(user_data_path)
user_data.head()
Error is as follows:
NameError Traceback (most recent call last)
<ipython-input-8-da7cac3b4241> in <module>()
1 get_df2(user_data_path)
----> 2 user_data.head()
NameError: name 'user_data' is not defined
Can someone help me?Or give me a suggestion
You are returning user_data, but not binding it to a name outside your function scope. You need:
user_data = get_df2(user_data_path)
Related
I'm getting this: KeyError: '284882215'. I
couldn't find anything via google/on StackOverflow; can some one assist?
Cheers!
ios_clean = []
ios_already_added = []
for app in ios:
name = app[0]
n_reviews = float(app[5])
print (n_reviews)
if n_reviews == reviews_max[name] and name not in ios_already_added:
ios_clean.append(app)
ios_already_added.append(name)
print (ios_clean)
print (len(ios_already_added))
KeyError Traceback (most recent call last)
<ipython-input-26-e59f5982da23> in <module>
11 n_reviews = float(app[5])
12 print (n_reviews)
---> 13 if n_reviews == reviews_max[name] and name not in ios_already_added:
14 ios_clean.append(app)
15 ios_already_added.append(name)
KeyError: '284882215'
The issue here is that you're trying to access an item in reviews_max with the key "284882215", and it does not exist.
As a solution, you can use Dict get() to safely lookup a key.
if n_reviews == reviews_max.get(name) and name not in ios_already_added:
Python documentation says:
exception KeyError
Raised when a mapping (dictionary) key is not found in the set of existing keys.
Meaning you tried to access a key which doesn't exist.
You can use the get() method which will either return the value found OR a default value, if you want to bypass the KeyError. This is the standard method for handling a use case like yours.
dict.get(key, default = None)
I want to get all of objects that are related to an instance of models.
Because my code is kinda generic, I pass the related table as an string and use eval() function to convert it to the related table class. But I got an error.
Suppose that we have an instance of a table like self.casefile; this is a part of my code:
def related_models_migration(self):
opts = self.casefile._meta
table_name = 'Files'
for f in opts.many_to_many:
name = ''.join(f.name.split('_'))
table_name += name.capitalize()
objects = self.casefile.eval(table_name).all()
and I got this error:
AttributeError Traceback (most recent call last)
<ipython-input-6-025484eeba97> in <module>
----> 1 obj.related_models_migration()
~/Documents/kangaroo/etl/data_migration.py in related_models_migration(self)
28 name = ''.join(f.name.split('_'))
29 table_name += name.capitalize()
---> 30 objects = self.casefile.eval(table_name).all()
31
32 for d in dir(etl.models):
AttributeError: 'FilesCasefiles' object has no attribute 'eval'
How can I pass the class name?
You can not use eval(..) for that. What you probably want to use here is getattr(..):
def related_models_migration(self):
opts = self.casefile._meta
table_name = 'Files'
for f in opts.many_to_many:
name = ''.join(f.name.split('_'))
table_name += name.capitalize()
objects = getattr(self.casefile, table_name).all()
I am not sure you should use table_name += … here however, since it will each time add more content to the table_name. You likely want to use something like table_name = 'Files{}'.format(name.capitalize()).
Note: normally related fields are not capitalized. One writes users or user_set, not Users.
Django provides a way to do this, although you do need to specify the name of the app in which the moodel is defined (because it's possible to have two models with the same name in different apps).
apps.get_model(app_label, model_name, require_ready=True)¶
Returns the Model with the given app_label and model_name.
As a shortcut, this method also accepts a single argument in the form
app_label.model_name. model_name is case-insensitive.
I'm completing a task I was given by my Teacher and it asks for a modular program so I tried to create some def modules but I can't figure out how to pass parameters between them.
Here's the code so far. (I don't know how to make it much neater on the site sorry.)
import string
def Datawrite ():
forename = []
surname = []
distance = []
data = open("members.txt","r")
for line in data:
value = line.split(',')
forename.append(value[0])
surname.append(value[1])
distance.append(value[2])
data.close()
def FarthestWalker(distance):
farthest_walk = distance[0]
for counter in range(len(distance)):
if float(distance[counter]) >= float(farthest_walk):
farthest_walk = distance[counter]
farthest_walk = float(farthest_walk)
Calcandstore()
def Calcandstore(forename,surname,distance,farthest_walk):
Results = open("Results.txt","w+")
Results.write("The prize winnning memberes are:\n")
seventy = 0.7*farthest_walk
Winning = []
for count in range(len(distance)):
if float(distance[count]) >= float(seventy):
Winning.append([count])
for count in range(len(Winning)):
Results.write(forename[count]+":")
Results.write(surname[count]+":")
Results.write(distance[count])
Results.close()
Datawrite()
FarthestWalker(distance)
Calcandstore(forename,surname,distance,farthest_walk)
When I run the code it returns this.
Traceback (most recent call last):
File "E:\Assignment\Test.py", line 58, in <module>
FarthestWalker(distance)
File "E:\Assignment\Test.py", line 29, in FarthestWalker
farthest_walk = distance[0]
IndexError: list index out of range
I have been tinkering with this for a few days now and I can't get the thing to work.
Here are some issues:
1) Datawrite doesn't return anything so the lists you're building are lost in the ether.
2) You call FarthestWalker with distance which is never initialized.
3) You call Calcandstore with values that are not initialized.
To pass values from functions you need to return values and declare them. For example:
def make_cat():
return 'Cat'
def print_animal(animal):
print(animal)
c = make_cat()
print_animal(c)
Is there a way to make a function that makes other functions to be called later named after the variables passed in?
For the example let's pretend https://example.com/engine_list returns this xml file, when I call it in get_search_engine_xml
<engines>
<engine address="https://www.google.com/">Google</engine>
<engine address="https://www.bing.com/">Bing</engine>
<engine address="https://duckduckgo.com/">DuckDuckGo</engine>
</engines>
And here's my code:
import re
import requests
import xml.etree.ElementTree as ET
base_url = 'https://example.com'
def make_safe(s):
s = re.sub(r"[^\w\s]", '', s)
s = re.sub(r"\s+", '_', s)
s = str(s)
return s
# This is what I'm trying to figure out how to do correctly, create a function
# named after the engine returned in get_search_engine_xml(), to be called later
def create_get_engine_function(function_name, address):
def function_name():
r = requests.get(address)
return function_name
def get_search_engine_xml():
url = base_url + '/engine_list'
r = requests.get(url)
engines_list = str(r.content)
engines_root = ET.fromstring(engines_list)
for child in engines_root:
engine_name = child.text.lower()
engine_name = make_safe(engine_name)
engine_address = child.attrib['address']
create_get_engine_function(engine_name, engine_address)
## Runs without error.
get_search_engine_xml()
## But if I try to call one of the functions.
google()
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'google' is not defined
Defining engine_name and engine_address seems to be working when I log it out. So I'm pretty sure the problem lies in create_get_engine_function, which admittedly I don't know what I'm doing and I was trying to piece together from similar questions.
Can you name a function created by another function with an argument that's passed in? Is there a better way to do this?
You can assign them to globals()
def create_get_engine_function(function_name, address):
def function():
r = requests.get(address)
function.__name__ = function_name
function.__qualname__ = function_name # for Python 3.3+
globals()[function_name] = function
Although, depending on what you're actually trying to accomplish, a better design would be to store all the engine names/addresses in a dictionary and access them as needed:
# You should probably should rename this to 'parse_engines_from_xml'
def get_search_engine_xml():
...
search_engines = {} # maps names to addresses
for child in engines_root:
...
search_engines[engine_name] = engine_address
return search_engines
engines = get_search_engine_xml()
e = requests.get(engines['google'])
<do whatever>
e = requests.get(engines['bing'])
<do whatever>
I was trying to analyse the play-by-play data of a basketball team
What I did was to read a csv file into a DataFrame object.
I want to preserve the functionality of the DataFrame object while adding in new attributes to the existing object. Thus I wrote a class called Basketball:
from data_math import *
import pandas as pd
class Basketball(pd.DataFrame):
def __init__(self,*args,**kargs):
pd.DataFrame.__init__(self,*args,**kargs)
self.FGM = calculate_FGM(pd.DataFrame)
self.FGA = calculate_FGA(pd.DateFrame)
self.FGP = self.FGM / self.FGA
self.M3 = calculate_3M(pd.DataFrame)
self.A3 = calcualte_3A(pd.DataFrame)
self.P3 = self.M3 / self.A3
self.FTM = calcualte_FTM(pd.DataFrame)
self.FTA = calculate_FTA(pd.DataFrame)
self.FTP = self.FTM / self.FTA
# self.P = score_calculate(pd.DataFrame)
I wrote another data_math.py file to help calculate the different attributes I wanted to include into the Basketball class.
from pandas import DataFrame
def score_calculate(df):
df_pt_scored = df[((df['etype']=='shot') & (df['result']=='made'))]
df_ft_scored = df[((df['etype']=='free throw') & (df['result']=='made'))]
return df_pt_scored['points'].sum()+len(df_ft_scored.index)
def calculate_FGM(df):
cond_pt = (df['etype']=='shots') & (df['results']=='made')
cond_ft = (df['etype']=='freethrow') & (df['results']=='made')
return len(df[cond_pt].index)+len(df[cond_ft].index)
def calculate_FGA(df):
shot_cond= df['etype']=='shot'
free_throw_cond = df['etype']=='free throw'
return len(df[shot_cond].index)+len(df[free_throw_cond].index)
def calculate_3M(df):
cond_3M= (df['etype']=='shot')&(df['type']=='3pt')&(df['result']=='made')
return len(df[cond_3M].index)
def calcualte_3A(df):
cond_3A = (df['etype']=='shot')&(df['type']=='3pt')
return len(df[cond_3A].index)
def calculate_FTM(df):
cond_FTM =(df['etype']=='free throw') & (df['result']=='made')
return len(df[cond_FTM].index)
def calcualte_FTA(df):
cond_FTA =(df['etype']=='free throw')
return len(df[cond_FTA].index)
In the end I start my program from main.py which I hope would give me the correct output. However while executing on this line:
team1= Basketball(tm1)
I received the following Traceback
Traceback (most recent call last):
File "/Users/luoyicheng/Developer/STAR-Research/data_analysis/source code/main.py", line 20, in <module>
team1= Basketball(tm1)
File "/Users/luoyicheng/Developer/STAR-Research/data_analysis/source code/Basketball.py", line 6, in __init__
self.FGM = calculate_FGM(pd.DataFrame)
File "/Users/luoyicheng/Developer/STAR-Research/data_analysis/source code/data_math.py", line 9, in calculate_FGM
cond_pt = (df['etype']=='shots') & (df['results']=='made')
TypeError: 'type' object has no attribute '__getitem__'
I am new to python programming and could not figure out why this error has occurred. To my understanding, this error means I am unable to use indexing feature of the DataFrame. However, if I try to code in my main function similar things I am able to get the output I want. I am also not clear of how to extend the existing DataFrame class so that I can still access the methods in the DataFrame class while extending the team1 object to have attributes such as FGM, FGA, etc.
The idea of extending this class is to allow me to pass any DataFrame object in the Basketball() so that I can have an object with extending attributes and methods. I think I also lack an understanding of the use of init and self.
Please don't blame for not describing the problem clearly as I am not familiar with all the terminology in OOP.
Thank you so much!
You're passing each function pd.DataFrame which is of type type:
In [11]: type(pd.DataFrame)
Out[11]: type
Hence the exception message.
You mean to be passing self (which is of type DataFrame):
self.FGM = calculate_FGM(pd.DataFrame)
...
should read:
self.FGM = calculate_FGM(self)
...