How does the nlp object work in spacy library?

How does the nlp object work in spacy library? - python

From what I understand so far, it is an instance of the 'Language' class in spacy, and can process text and perform a bunch of operations on it.
import spacy
nlp = spacy.blank("en")
# Process the text
doc = nlp(
"In 1990, more than 60% of people in East Asia were in extreme poverty. "
"Now less than 4% are."
)
print(doc[0])
//prints "In"
The question that bothers me is that how does an object accept an argument(a string in this case) like a class does? What is the process?
I tried the following code to check if an object can receive an argument..
class ABC:
def __init__(self,a=1):
self.a = a
def printa(self):
print(self.a)
abc = ABC()
abc(2)
abc.printa()
It gives me an error:
TypeError: 'ABC' object is not callable
spacy seems to be doing the same thing and it works..How?

You can allow an object to be called like a function by providing a __call__ method:
class ABC:
def __init__(self,a=1):
self.a = a
def printa(self):
print(self.a)
def __call__(self, *args):
print(self.a, *args)
abc = ABC()
abc(2)
abc.printa()
Output:
1 2
1
Implementing the __call__ method makes the object callable.
As for the actual type of nlp:
>>> type(nlp)
<class 'spacy.lang.en.English'>
And it does indeed have __call__:
>>> hasattr(nlp, '__call__')
True

Related

Question about my codes when using the Tokenizer.init in a class

I wish to understand more about why we can use the below code and what does it do?
If possible can anyone explain to me how does Tokenizer.__init__ work? Since we can attach the "__init__" to a library, can we do it with other way too? If possible can someone give me some example so that I could understand, it will be glad if can do another simple example.
My codes:
from tensorflow.python.keras.preprocessing.text import Tokenizer
// we import the tokenizer library
//Then we build a class
class DataWrapper(Tokenizer):
def __init__(self, num_words):
Tokenizer.__init__(self, num_words=num_words) // I dont understand how this part work.
self.fit_on_texts(texts)
self.tokens = self.texts_to_sequences(texts)
self.reverse_tokens = dict(zip(self.word_index.values(),
self.word_index.keys()))
if reverse:
self.tokens = [list(reversed(token)) for token in self.tokens]
self.num_tokens = [len(token) for token in self.tokens]
self.max_num = np.mean(self.num_tokens) + 2 * np.std(self.num_tokens)
self.max_num = int(self.max_num)
self.sequences_padded = pad_sequences(self.tokens, maxlen=self.max_num,
padding=padding,
truncating=truncating)

In python, all functions with self can be called in two ways: obj.method(params) or, cls.method(obj, params). See this example:
class A:
def foo(self):
print('foo')
a = A()
# now we can call foo function in two ways:
a.foo()
A.foo(a)
This rule works also for function with double underscore like __init__. In your case:
Tokenizer.__init__(self, num_words=num_words)
cls = Tokenizer
method = __init__
obj = self # object of class DataWrapper
params = num_words

TypeError from a 'C' programmer very, very green WRT Python [duplicate]

If I have a class ...
class MyClass:
def method(arg):
print(arg)
... which I use to create an object ...
my_object = MyClass()
... on which I call method("foo") like so ...
>>> my_object.method("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: method() takes exactly 1 positional argument (2 given)
... why does Python tell me I gave it two arguments, when I only gave one?

In Python, this:
my_object.method("foo")
... is syntactic sugar, which the interpreter translates behind the scenes into:
MyClass.method(my_object, "foo")
... which, as you can see, does indeed have two arguments - it's just that the first one is implicit, from the point of view of the caller.
This is because most methods do some work with the object they're called on, so there needs to be some way for that object to be referred to inside the method. By convention, this first argument is called self inside the method definition:
class MyNewClass:
def method(self, arg):
print(self)
print(arg)
If you call method("foo") on an instance of MyNewClass, it works as expected:
>>> my_new_object = MyNewClass()
>>> my_new_object.method("foo")
<__main__.MyNewClass object at 0x29045d0>
foo
Occasionally (but not often), you really don't care about the object that your method is bound to, and in that circumstance, you can decorate the method with the builtin staticmethod() function to say so:
class MyOtherClass:
#staticmethod
def method(arg):
print(arg)
... in which case you don't need to add a self argument to the method definition, and it still works:
>>> my_other_object = MyOtherClass()
>>> my_other_object.method("foo")
foo

In simple words
In Python you should add self as the first parameter to all defined methods in classes:
class MyClass:
def method(self, arg):
print(arg)
Then you can use your method according to your intuition:
>>> my_object = MyClass()
>>> my_object.method("foo")
foo
For a better understanding, you can also read the answers to this question: What is the purpose of self?

Something else to consider when this type of error is encountered:
I was running into this error message and found this post helpful. Turns out in my case I had overridden an __init__() where there was object inheritance.
The inherited example is rather long, so I'll skip to a more simple example that doesn't use inheritance:
class MyBadInitClass:
def ___init__(self, name):
self.name = name
def name_foo(self, arg):
print(self)
print(arg)
print("My name is", self.name)
class MyNewClass:
def new_foo(self, arg):
print(self)
print(arg)
my_new_object = MyNewClass()
my_new_object.new_foo("NewFoo")
my_bad_init_object = MyBadInitClass(name="Test Name")
my_bad_init_object.name_foo("name foo")
Result is:
<__main__.MyNewClass object at 0x033C48D0>
NewFoo
Traceback (most recent call last):
File "C:/Users/Orange/PycharmProjects/Chapter9/bad_init_example.py", line 41, in <module>
my_bad_init_object = MyBadInitClass(name="Test Name")
TypeError: object() takes no parameters
PyCharm didn't catch this typo. Nor did Notepad++ (other editors/IDE's might).
Granted, this is a "takes no parameters" TypeError, it isn't much different than "got two" when expecting one, in terms of object initialization in Python.
Addressing the topic: An overloading initializer will be used if syntactically correct, but if not it will be ignored and the built-in used instead. The object won't expect/handle this and the error is thrown.
In the case of the sytax error: The fix is simple, just edit the custom init statement:
def __init__(self, name):
self.name = name

Newcomer to Python, I had this issue when I was using the Python's ** feature in a wrong way. Trying to call this definition from somewhere:
def create_properties_frame(self, parent, **kwargs):
using a call without a double star was causing the problem:
self.create_properties_frame(frame, kw_gsp)
TypeError: create_properties_frame() takes 2 positional arguments but 3 were given
The solution is to add ** to the argument:
self.create_properties_frame(frame, **kw_gsp)

As mentioned in other answers - when you use an instance method you need to pass self as the first argument - this is the source of the error.
With addition to that,it is important to understand that only instance methods take self as the first argument in order to refer to the instance.
In case the method is Static you don't pass self, but a cls argument instead (or class_).
Please see an example below.
class City:
country = "USA" # This is a class level attribute which will be shared across all instances (and not created PER instance)
def __init__(self, name, location, population):
self.name = name
self.location = location
self.population = population
# This is an instance method which takes self as the first argument to refer to the instance
def print_population(self, some_nice_sentence_prefix):
print(some_nice_sentence_prefix +" In " +self.name + " lives " +self.population + " people!")
# This is a static (class) method which is marked with the #classmethod attribute
# All class methods must take a class argument as first param. The convention is to name is "cls" but class_ is also ok
#classmethod
def change_country(cls, new_country):
cls.country = new_country
Some tests just to make things more clear:
# Populate objects
city1 = City("New York", "East", "18,804,000")
city2 = City("Los Angeles", "West", "10,118,800")
#1) Use the instance method: No need to pass "self" - it is passed as the city1 instance
city1.print_population("Did You Know?") # Prints: Did You Know? In New York lives 18,804,000 people!
#2.A) Use the static method in the object
city2.change_country("Canada")
#2.B) Will be reflected in all objects
print("city1.country=",city1.country) # Prints Canada
print("city2.country=",city2.country) # Prints Canada

It occurs when you don't specify the no of parameters the __init__() or any other method looking for.
For example:
class Dog:
def __init__(self):
print("IN INIT METHOD")
def __unicode__(self,):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
When you run the above programme, it gives you an error like that:
TypeError: __init__() takes 1 positional argument but 6 were given
How we can get rid of this thing?
Just pass the parameters, what __init__() method looking for
class Dog:
def __init__(self, dogname, dob_d, dob_m, dob_y, dogSpeakText):
self.name_of_dog = dogname
self.date_of_birth = dob_d
self.month_of_birth = dob_m
self.year_of_birth = dob_y
self.sound_it_make = dogSpeakText
def __unicode__(self, ):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
print(id(obj))

If you want to call method without creating object, you can change method to static method.
class MyClass:
#staticmethod
def method(arg):
print(arg)
MyClass.method("i am a static method")

I get this error when I'm sleep-deprived, and create a class using def instead of class:
def MyClass():
def __init__(self, x):
self.x = x
a = MyClass(3)
-> TypeError: MyClass() takes 0 positional arguments but 1 was given

You should actually create a class:
class accum:
def __init__(self):
self.acc = 0
def accumulator(self, var2add, end):
if not end:
self.acc+=var2add
return self.acc

In my case, I forgot to add the ()
I was calling the method like this
obj = className.myMethod
But it should be is like this
obj = className.myMethod()

get return value from function in a class in variable python [duplicate]

If I have a class ...
class MyClass:
def method(arg):
print(arg)
... which I use to create an object ...
my_object = MyClass()
... on which I call method("foo") like so ...
>>> my_object.method("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: method() takes exactly 1 positional argument (2 given)
... why does Python tell me I gave it two arguments, when I only gave one?

In Python, this:
my_object.method("foo")
... is syntactic sugar, which the interpreter translates behind the scenes into:
MyClass.method(my_object, "foo")
... which, as you can see, does indeed have two arguments - it's just that the first one is implicit, from the point of view of the caller.
This is because most methods do some work with the object they're called on, so there needs to be some way for that object to be referred to inside the method. By convention, this first argument is called self inside the method definition:
class MyNewClass:
def method(self, arg):
print(self)
print(arg)
If you call method("foo") on an instance of MyNewClass, it works as expected:
>>> my_new_object = MyNewClass()
>>> my_new_object.method("foo")
<__main__.MyNewClass object at 0x29045d0>
foo
Occasionally (but not often), you really don't care about the object that your method is bound to, and in that circumstance, you can decorate the method with the builtin staticmethod() function to say so:
class MyOtherClass:
#staticmethod
def method(arg):
print(arg)
... in which case you don't need to add a self argument to the method definition, and it still works:
>>> my_other_object = MyOtherClass()
>>> my_other_object.method("foo")
foo

In simple words
In Python you should add self as the first parameter to all defined methods in classes:
class MyClass:
def method(self, arg):
print(arg)
Then you can use your method according to your intuition:
>>> my_object = MyClass()
>>> my_object.method("foo")
foo
For a better understanding, you can also read the answers to this question: What is the purpose of self?

Something else to consider when this type of error is encountered:
I was running into this error message and found this post helpful. Turns out in my case I had overridden an __init__() where there was object inheritance.
The inherited example is rather long, so I'll skip to a more simple example that doesn't use inheritance:
class MyBadInitClass:
def ___init__(self, name):
self.name = name
def name_foo(self, arg):
print(self)
print(arg)
print("My name is", self.name)
class MyNewClass:
def new_foo(self, arg):
print(self)
print(arg)
my_new_object = MyNewClass()
my_new_object.new_foo("NewFoo")
my_bad_init_object = MyBadInitClass(name="Test Name")
my_bad_init_object.name_foo("name foo")
Result is:
<__main__.MyNewClass object at 0x033C48D0>
NewFoo
Traceback (most recent call last):
File "C:/Users/Orange/PycharmProjects/Chapter9/bad_init_example.py", line 41, in <module>
my_bad_init_object = MyBadInitClass(name="Test Name")
TypeError: object() takes no parameters
PyCharm didn't catch this typo. Nor did Notepad++ (other editors/IDE's might).
Granted, this is a "takes no parameters" TypeError, it isn't much different than "got two" when expecting one, in terms of object initialization in Python.
Addressing the topic: An overloading initializer will be used if syntactically correct, but if not it will be ignored and the built-in used instead. The object won't expect/handle this and the error is thrown.
In the case of the sytax error: The fix is simple, just edit the custom init statement:
def __init__(self, name):
self.name = name

Newcomer to Python, I had this issue when I was using the Python's ** feature in a wrong way. Trying to call this definition from somewhere:
def create_properties_frame(self, parent, **kwargs):
using a call without a double star was causing the problem:
self.create_properties_frame(frame, kw_gsp)
TypeError: create_properties_frame() takes 2 positional arguments but 3 were given
The solution is to add ** to the argument:
self.create_properties_frame(frame, **kw_gsp)

As mentioned in other answers - when you use an instance method you need to pass self as the first argument - this is the source of the error.
With addition to that,it is important to understand that only instance methods take self as the first argument in order to refer to the instance.
In case the method is Static you don't pass self, but a cls argument instead (or class_).
Please see an example below.
class City:
country = "USA" # This is a class level attribute which will be shared across all instances (and not created PER instance)
def __init__(self, name, location, population):
self.name = name
self.location = location
self.population = population
# This is an instance method which takes self as the first argument to refer to the instance
def print_population(self, some_nice_sentence_prefix):
print(some_nice_sentence_prefix +" In " +self.name + " lives " +self.population + " people!")
# This is a static (class) method which is marked with the #classmethod attribute
# All class methods must take a class argument as first param. The convention is to name is "cls" but class_ is also ok
#classmethod
def change_country(cls, new_country):
cls.country = new_country
Some tests just to make things more clear:
# Populate objects
city1 = City("New York", "East", "18,804,000")
city2 = City("Los Angeles", "West", "10,118,800")
#1) Use the instance method: No need to pass "self" - it is passed as the city1 instance
city1.print_population("Did You Know?") # Prints: Did You Know? In New York lives 18,804,000 people!
#2.A) Use the static method in the object
city2.change_country("Canada")
#2.B) Will be reflected in all objects
print("city1.country=",city1.country) # Prints Canada
print("city2.country=",city2.country) # Prints Canada

It occurs when you don't specify the no of parameters the __init__() or any other method looking for.
For example:
class Dog:
def __init__(self):
print("IN INIT METHOD")
def __unicode__(self,):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
When you run the above programme, it gives you an error like that:
TypeError: __init__() takes 1 positional argument but 6 were given
How we can get rid of this thing?
Just pass the parameters, what __init__() method looking for
class Dog:
def __init__(self, dogname, dob_d, dob_m, dob_y, dogSpeakText):
self.name_of_dog = dogname
self.date_of_birth = dob_d
self.month_of_birth = dob_m
self.year_of_birth = dob_y
self.sound_it_make = dogSpeakText
def __unicode__(self, ):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
print(id(obj))

If you want to call method without creating object, you can change method to static method.
class MyClass:
#staticmethod
def method(arg):
print(arg)
MyClass.method("i am a static method")

I get this error when I'm sleep-deprived, and create a class using def instead of class:
def MyClass():
def __init__(self, x):
self.x = x
a = MyClass(3)
-> TypeError: MyClass() takes 0 positional arguments but 1 was given

You should actually create a class:
class accum:
def __init__(self):
self.acc = 0
def accumulator(self, var2add, end):
if not end:
self.acc+=var2add
return self.acc

In my case, I forgot to add the ()
I was calling the method like this
obj = className.myMethod
But it should be is like this
obj = className.myMethod()

Using getattr function on self in python

I am trying to write call multiple functions through a loop using the getattr(...). Snippet below:
class cl1(module):
I =1
Name= 'name'+str(I)
Func= 'func'+str(I)
Namecall = gettattr(self,name)
Namecall = getattr(self,name)()
This is when get the following code: self.name1 = self.func1()
The desire is to loop multiple of these but the code is not working. Can you please advise?

Firstly, do use CapitalLetters for Classes and lowercase_letters for variables as it is easier to read for other Python programmers :)
Now, you don't need to use getattr() inside the class itself
Just do :
self.attribute
However, an example will be:
class Foo(object): # Class Foo inherits from 'object'
def __init__(self, a, b): # This is the initialize function. Add all arguments here
self.a = a # Setting attributes
self.b = b
def func(self):
print('Hello World!' + str(self.a) + str(self.b))
>>> new_object = Foo(a=1, b=2) # Creating a new 'Foo' object called 'new_object'
>>> getattr(new_object, 'a') # Getting the 'a' attribute from 'new_object'
1
However, an easier way would just be referencing the attribute directly
>>> new_object.a
1
>>> new_object.func()
Hello World!12
Or, by using getattr():
>>> getattr(new_object, 'func')()
Hello World!12
Although I explained the getattr() function,
I don't seem to understand what you want to achieve, do post a sample output.

Python extension methods

OK, in C# we have something like:
public static string Destroy(this string s) {
return "";
}
So basically, when you have a string you can do:
str = "This is my string to be destroyed";
newstr = str.Destroy()
# instead of
newstr = Destroy(str)
Now this is cool because in my opinion it's more readable. Does Python have something similar? I mean instead of writing like this:
x = SomeClass()
div = x.getMyDiv()
span = x.FirstChild(x.FirstChild(div)) # so instead of this
I'd like to write:
span = div.FirstChild().FirstChild() # which is more readable to me
Any suggestion?

You can just modify the class directly, sometimes known as monkey patching.
def MyMethod(self):
return self + self
MyClass.MyMethod = MyMethod
del(MyMethod)#clean up namespace
I'm not 100% sure you can do this on a special class like str, but it's fine for your user-defined classes.
Update
You confirm in a comment my suspicion that this is not possible for a builtin like str. In which case I believe there is no analogue to C# extension methods for such classes.
Finally, the convenience of these methods, in both C# and Python, comes with an associated risk. Using these techniques can make code more complex to understand and maintain.

You can do what you have asked like the following:
def extension_method(self):
#do stuff
class.extension_method = extension_method

I would use the Adapter pattern here. So, let's say we have a Person class and in one specific place we would like to add some health-related methods.
from dataclasses import dataclass
#dataclass
class Person:
name: str
height: float # in meters
mass: float # in kg
class PersonMedicalAdapter:
person: Person
def __init__(self, person: Person):
self.person = person
def __getattr__(self, item):
return getattr(self.person, item)
def get_body_mass_index(self) -> float:
return self.person.mass / self.person.height ** 2
if __name__ == '__main__':
person = Person('John', height=1.7, mass=76)
person_adapter = PersonMedicalAdapter(person)
print(person_adapter.name) # Call to Person object field
print(person_adapter.get_body_mass_index()) # Call to wrapper object method
I consider it to be an easy-to-read, yet flexible and pythonic solution.

You can change the built-in classes by monkey-patching with the help of forbidden fruit
But installing forbidden fruit requires a C compiler and unrestricted environment so it probably will not work or needs hard effort to run on Google App Engine, Heroku, etc.
I changed the behaviour of unicode class in Python 2.7 for a Turkish i,I uppercase/lowercase problem by this library.
# -*- coding: utf8 -*-
# Redesigned by #guneysus
import __builtin__
from forbiddenfruit import curse
lcase_table = tuple(u'abcçdefgğhıijklmnoöprsştuüvyz')
ucase_table = tuple(u'ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ')
def upper(data):
data = data.replace('i',u'İ')
data = data.replace(u'ı',u'I')
result = ''
for char in data:
try:
char_index = lcase_table.index(char)
ucase_char = ucase_table[char_index]
except:
ucase_char = char
result += ucase_char
return result
curse(__builtin__.unicode, 'upper', upper)
class unicode_tr(unicode):
"""For Backward compatibility"""
def __init__(self, arg):
super(unicode_tr, self).__init__(*args, **kwargs)
if __name__ == '__main__':
print u'istanbul'.upper()

You can achieve this nicely with the following context manager that adds the method to the class or object inside the context block and removes it afterwards:
class extension_method:
def __init__(self, obj, method):
method_name = method.__name__
setattr(obj, method_name, method)
self.obj = obj
self.method_name = method_name
def __enter__(self):
return self.obj
def __exit__(self, type, value, traceback):
# remove this if you want to keep the extension method after context exit
delattr(self.obj, self.method_name)
Usage is as follows:
class C:
pass
def get_class_name(self):
return self.__class__.__name__
with extension_method(C, get_class_name):
assert hasattr(C, 'get_class_name') # the method is added to C
c = C()
print(c.get_class_name()) # prints 'C'
assert not hasattr(C, 'get_class_name') # the method is gone from C

I'd like to think that extension methods in C# are pretty much the same as normal method call where you pass the instance then arguments and stuff.
instance.method(*args, **kwargs)
method(instance, *args, **kwargs) # pretty much the same as above, I don't see much benefit of it getting implemented in python.

After a week, I have a solution that is closest to what I was seeking for. The solution consists of using getattr and __getattr__. Here is an example for those who are interested.
class myClass:
def __init__(self): pass
def __getattr__(self, attr):
try:
methodToCall = getattr(myClass, attr)
return methodToCall(myClass(), self)
except:
pass
def firstChild(self, node):
# bla bla bla
def lastChild(self, node):
# bla bla bla
x = myClass()
div = x.getMYDiv()
y = div.firstChild.lastChild
I haven't test this example, I just gave it to give an idea for who might be interested. Hope that helps.

C# implemented extension methods because it lacks first class functions, Python has them and it is the preferred method for "wrapping" common functionality across disparate classes in Python.
There are good reasons to believe Python will never have extension methods, simply look at the available built-ins:
len(o) calls o.__len__
iter(o) calls o.__iter__
next(o) calls o.next
format(o, s) calls o.__format__(s)
Basically, Python likes functions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does the nlp object work in spacy library? - python

Related

Question about my codes when using the Tokenizer.init in a class

TypeError from a 'C' programmer very, very green WRT Python [duplicate]

get return value from function in a class in variable python [duplicate]

Using getattr function on self in python

Python extension methods

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does the nlp object work in spacy library? - python

Related

Question about my codes when using the Tokenizer.__init__ in a class

TypeError from a 'C' programmer very, very green WRT Python [duplicate]

get return value from function in a class in variable python [duplicate]

Using getattr function on self in python

Python extension methods

Categories

Resources

Question about my codes when using the Tokenizer.init in a class