it does not work. I want to split data as in code in lines attribute.
class movie_analyzer:
def __init__(self,s):
for c in punctuation:
import re
moviefile = open(s, encoding = "latin-1")
movielist = []
movies = moviefile.readlines()
def lines(movies):
for movie in movies:
if len(movie.strip().split("::")) == 4:
a = movie.strip().split("::")
movielist.append(a)
return(movielist)
movie = movie_analyzer("movies-modified.dat")
movie.lines
It returns that:
You can use #property decorator to be able to access the result of the method as a property. See this very simple example of how this decorator might be used:
import random
class Randomizer:
def __init__(self, lower, upper):
self.lower = lower
self.upper = upper
#property
def rand_num(self):
return random.randint(self.lower, self.upper)
Then, you can access it like so:
>>> randomizer = Randomizer(0, 10)
>>> randomizer.rand_num
5
>>> randomizer.rand_num
7
>>> randomizer.rand_num
3
Obviously, this is a useless example; however, you can take this logic and apply it to your situation.
Also, one more thing: you are not passing self to lines. You pass movies, which is unneeded because you can just access it using self.movies. However, if you want to access those variables using self you have to set (in your __init__ method):
self.movielist = []
self.movies = moviefile.readlines()
To call a function you use movie.lines() along with the argument. What you are doing is just accessing the method declaration. Also, make sure you use self as argument in method definitions and save the parameters you want your Object to have. And it is usually a good practice to keep your imports at the head of the file.
import re
class movie_analyzer:
def __init__(self,s):
for c in punctuation:
moviefile = open(s, encoding = "latin-1")
self.movielist = []
self.movies = moviefile.readlines()
#property
def lines(self):
for movie in self.movies:
if len(movie.strip().split("::")) == 4:
a = movie.strip().split("::")
self.movielist.append(a)
return self.movielist
movie = movie_analyzer("movies-modified.dat")
movie.lines()
Related
Hello Stackoverflow folks,...
I hope this questions is not already answered.
After half a day of googeling I did resign myself to asking a question here.
My problem is the following:
I want to create a class which takes some information and processes this information:
#Klassendefinition für eine Instanz von Rohdaten
class raw_data():
def __init__(self, filename_rawdata, filename_metadata,
file_format, path, category, df_raw, df_meta):
self.filename_rawdata = filename_rawdata
self.filename_metadata = filename_metadata
self.file_format = file_format
self.path = path
self.category = category
self.df_raw = getDF(self.filename_rawdata)
self.df_meta = getDF(self.filename_metadata)
# generator
def parse(self, path):
g = gzip.open(path, 'rb')
for l in g:
yield eval(l)
# function that returns a pandas dataframe with the data
def getDF(self, filename):
i = 0
df = {}
for d in self.parse(filename):
df[i] = d
i += 1
return pd.DataFrame.from_dict(df, orient='index')
Now I have a problem with the init method, I would like to run the class method below on default when the class in instantiated, but I somehow cannot manage to get this working. I have seen several other posts here like [Calling a class function inside of __init__ [1]: Python 3: Calling a class function inside of __init__
but I am still not able to do it. The first question did work for me, but I would like to call the instance variable after the constructor ran.
I tried this:
class raw_data():
def __init__(self, filename_rawdata, filename_metadata,
file_format, path, category):
self.filename_rawdata = filename_rawdata
self.filename_metadata = filename_metadata
self.file_format = file_format
self.path = path
self.category = category
getDF(self.filename_rawdata)
getDF(self.filename_metadata)
# generator
def parse(self, path):
g = gzip.open(path, 'rb')
for l in g:
yield eval(l)
# function that returns a pandas dataframe with the data
def getDF(self, filename):
i = 0
df = {}
for d in self.parse(filename):
df[i] = d
i += 1
return pd.DataFrame.from_dict(df, orient='index')
But I get an error because getDF is not defined (obviously)..
I hope this questions is not silly by any means. I need to do it that way, because afterwards I want to run like 50-60 instance calls and I do not want to repeat like Instance.getDF() ... for every instance, but rather would like to have it called directly.
All you need to so is call getDF like any other method, using self as the object on which it should be invoked.
self.df_raw = self.getDF(self.filename_rawdata)
That said, this class could be greatly simplified by making it a dataclass.
from dataclasses import dataclass
#dataclass
class RawData:
filename_rawdata: str
filename_metadata: str
path: str
category: str
def __post_init__(self):
self.df_raw = self.getDF(self.filename_rawdata)
self.df_meta = self.getDF(self.filename_metadata)
#staticmethod
def parse(path):
with gzip.open(path, 'rb') as g:
yield from map(eval, g)
#staticmethod
def getDF(filename):
return pd.DataFrame.from_records(enumerate(RawData.parse(filename)))
The auto-generated __init__ method will set the four defined attributes for you. __post_init__ will be called after __init__, giving you the opportunity to call getDF on the two given file names.
Trying to split up and tokenize a poem (or haiku in this case), which is more of a way to teach myself how to use nltk and classes than anything else. When I run the code below, I get a Name Error: name 'psplit' is not defined even though (my thinking is) that it's defined when I return it from the split function. Can anyone help me figure out what's going wrong under the hood here?
import nltk
poem = "In the cicada's cry\nNo sign can foretell\nHow soon it must die"
class Intro():
def __init__(self, poem):
self.__poem = poem
def split(self):
psplit = (poem.split('\n'))
psplit = str(psplit)
return psplit
def tokenizer(self):
t = nltk.tokenize(psplit)
return t
i = Intro(poem)
print(i.split())
print(i.tokenizer())
There are some issues in your code:
In the split method you have to use self.__poem to access the the poem attribute of your class - as you did in the constructor.
The psplit variable in the split method is only a local variable so you can just use it in this method and nowhere else. If you want to make the variable available in the tokenize method you have to either pass it as an argument or store it as an additional attribute:
...
def tokenizer(self, psplit):
t = nltk.tokenize(psplit)
return t
...
psplit = i.split()
print(i.tokenizer(psplit))
Or:
def __init__(self, poem):
...
self._psplit = None
...
def split(self):
self._psplit = (poem.split('\n'))
self._psplit = str(psplit)
def tokenizer(self):
t = nltk.tokenize(self._psplit)
return t
...
i.split()
print(i.tokenizer())
In addition make sure your indentation is correct.
I'm writing a program to extract some data from txt files with regular expressions.
I'm new in OOP and want to save reiterative code. I want to retrieve about 15 data in each txt file, so I wrote a Class definition for each data. The patters to match can come in several formats, so I'll need to try several regex patters. By now, I only implements one regex patterns by data, but in future I need to try more in order to match the specific format used in that txt file, I plan to use a list with de patterns for each data.
I've just wrote 3 classes, but I've realized that I'm repeating too much code. So, I believe that I'm doing something wrong.
import re
import os
import glob
import csv
class PropertyNumber(object):
pattern_text = "(?<=FINCA Nº: )\w{3,6}"
regex_pattern = re.compile(pattern_text)
def __init__(self, str):
self.text_to_search = str
self.text_found = ""
def search_p_number(self):
matched_p_number = PropertyNumber.regex_pattern.search(self.text_to_search)
print(matched_p_number)
self.text_found = matched_p_number.group()
return self.text_found
class PropertyCoefficient(object):
pattern_text = "(?<=Participación: )[0-9,]{1,8}"
regex_pattern = re.compile(pattern_text)
def __init__(self, str):
self.text_to_search = str
self.text_found = ""
def search_p_coefficient(self):
matched_p_coefficient = PropertyCoefficient.regex_pattern.search(self.text_to_search)
print(matched_p_coefficient)
self.text_found = matched_p_coefficient.group()
return self.text_found
class PropertyTaxIDNumber(object):
pattern_text = "(?<=Referencia Catastral: )\d{7}[A-Z]{2}\d{4}[A-Z]\d{4}[A-Z]{2}"
regex_pattern = re.compile(pattern_text)
def __init__(self, str):
self.text_to_search = str
self.text_found = ""
def search_tax_id(self):
matched_p_taxidnumber = PropertyTaxIDNumber.regex_pattern.search(self.text_to_search)
print(matched_p_taxidnumber)
self.text_found = matched_p_taxidnumber.group()
return self.text_found
def scan_txt_report(fli):
data_retrieved = []
file_input = open(fli, mode='r', encoding='utf-8')
property_report = file_input.read()
property_number = PropertyNumber(property_report)
data_retrieved.append(property_number.search_p_number())
property_coefficient = PropertyCoefficient(property_report)
data_retrieved.append(property_coefficient.search_p_coefficient())
property_tax_id_number = PropertyTaxIDNumber(property_report)
data_retrieved.append(property_tax_id_number.search_tax_id())
return data_retrieved
def main():
if os.path.exists("./notas_simples/ns_txt"):
os.chdir("./notas_simples/ns_txt")
list_txt_files = glob.glob("*.txt")
print(list_txt_files)
with open("..\..\listado_de_fincas.csv", mode='w', newline='') as fiout:
file_writer = csv.writer(fiout, delimiter=';')
for file_name_input in list_txt_files:
data_line = scan_txt_report(file_name_input)
file_writer.writerow(data_line)
if __name__ == '__main__':
main()
# TODO Idufir: "(?<=IDUFIR: )\d{14}"
# TODO calle: "(?<=Calle ).*" Break down in street name and number of address
# TODO piso: "(?<=piso ).*," Break down in floor number and door number (or letter), without boundaries
# TODO titularidad: "(?<=TITULARIDAD\n\n).*" Break down in owner name, VAT number, % and domai type.
As you can see above, the 3 classes I've already wrote: PropertyNumber(object), PropertyCoefficient(object) and PropertyTaxIDNumber(object), has a lot of repeated code. Thus, when I add some regex patterns to each class will be worse.
Yes, you are repeating much of your code, and yes, it is a sign of a weak design. I'll take this as an OOP exercise, because this is an overkill.
First, we can see that the only difference between the different classes is their essence, and their regex pattern. So we can have a base class which handles all the repetitive code. Now each subclass simply handles the different pattern:
class BaseProperty(object):
def __init__(self, search_str, pattern):
self.text_to_search = search_str
self.text_found = ""
self.regex_pattern = re.compile(pattern)
def search_property(self):
matched_property = self.regex_pattern.search(self.text_to_search)
print(matched_property)
self.text_found = matched_property.group()
return self.text_found
class PropertyNumber(BaseProperty):
def __init__(self, search_str):
super(PropertyNumber, self).__init__(search_str, "(?<=FINCA Nº: )\w{3,6}")
class PropertyCoefficient(BaseProperty):
def __init__(self, search_str):
super(PropertyCoefficient, self).__init__(search_str, "(?<=Participación: )[0-9,]{1,8}")
Second, it doesn't appear that you're actually using the self.text_found field, so why store it? Now you can init all the properties in a single place, and make your scan_txt_report much simpler.
class BaseProperty(object):
def __init__(self, pattern):
self.regex_pattern = re.compile(pattern)
def search_property(self, search_str):
matched_property = self.regex_pattern.search(search_str)
print(matched_property)
return matched_property.group()
...
class PropertyCoefficient(BaseProperty):
def __init__(self):
super(PropertyCoefficient, self).__init__("(?<=Participación: )[0-9,]{1,8}")
properties = [PropertyNumber(), PropertyCoefficient(), ...]
def scan_txt_report(fli):
file_input = open(fli, mode='r', encoding='utf-8')
property_report = file_input.read()
data_retrieved = [prop.search_property(property_report) for prop in properties]
return data_retrieved
And unless you add some specific functionality for each subclass, you can even let go of the specific properties classes, and just do like this:
properties = [BaseProperty("(?<=FINCA Nº: )\w{3,6}"), BaseProperty("(?<=Participación: )[0-9,]{1,8}")]
And one last thing - please see the comment by #JonClements - it's a bad idea to use reserved words (such as str) as variable names.
There is no need for so many classes.It can be done via two classes.
Class Property(object,regex):
#def __init__ ...
#def prepare (This method will prepare return compiled form of regex
Class Search(object,compiled_regex):
#def __init__ ...
#def search ... (same function as now)
def scan_txt_report(fli):
data_retrieved = []
file_input = open(fli, mode='r', encoding='utf-8')
#take a csv containing all the regex.
#for all regex call property and search classes.keep appending results as well.
return data_retrieved
This way the only thing we need to change is the csv.The program remains intact and tested.
For adding new regex's the csv needs to be updated.
I am fairly new to python. I have tried to define a class, I then want to create an instance from a file, then refer to specific pieces of it, but cannot seem to. This is Python 3.3.0
Here's the class....
class Teams():
def __init__(self, ID = None, Team = None, R = None, W = None, L = None):
self._items = [ [] for i in range(5) ]
self.Count = 0
def addTeam(self, ID, Team, R=None, W = 0, L = 0):
self._items[0].append(ID)
self._items[1].append(Team)
self._items[2].append(R)
self._items[3].append(W)
self._items[4].append(L)
self.Count += 1
def addTeamsFromFile(self, filename):
inputFile = open(filename, 'r')
for line in inputFile:
words = line.split(',')
self.addTeam(words[0], words[1], words[2], words[3], words[4])
def __len__(self):
return self.Count
Here's the code in Main
startFileName = 'file_test.txt'
filename = startFileName
###########
myTestData = Teams()
myTestData.addTeamsFromFile(startFileName)
sample data in file
100,AAAA,106,5,0
200,BBBB,88,3,2
300,CCCC,45,1,4
400,DDDD,67,3,2
500,EEEE,90,4,1
I think I am good to here (not 100% sure), but now how do I reference this data to see... am i not creating the class correctly? How do I see if one instance is larger than another...
ie, myTestData[2][2] > myTestData[3][2] <----- this is where I get confused, as this doesn't work
Why don't you create a Team class like this :
class Team():
def __init__(self, ID, Team, R=None, W = 0, L = 0)
# set up fields here
Then in Teams
class Teams():
def __init__(self):
self._teams = []
def addTeam (self, ID, Team, R=None, W = 0, L = 0)
team = Team (ID, Team, R=None, W = 0, L = 0)
self._teams.append (team)
Now If i got it right you want to overwrite the > operator's behaviour.
To do that overload __gt__(self, other) [link]
So it will be
class Team ():
# init code from above for Team
def __gt__ (self, otherTeam):
return self.ID > otherTeam.ID # for example
Also be sure to convert those strings to numbers because you compare strings not numbers. Use int function for that.
The immediate problem you're running into is that your code to access the team data doesn't account for your myTestData value being an object rather than a list. You can fix it by doing:
myTestData._items[2][2] > myTestData._items[3][2]
Though, if you plan on doing that much, I'd suggest renaming _items to something that's obviously supposed to be public. You might also want to make the addTeamsFromFile method convert some of the values it reads to integers (rather than leaving them as strings) before passing them to the addTeam method.
An alternative would be to make your Teams class support direct member access. You can do that by creating a method named __getitem__ (and __setitem__ if you want to be able to assign values directly). Something like:
def __getitem__(self, index):
return self._items[index]
#Aleksandar's answer about making a class for the team data items is also a good one. In fact, it might be more useful to have a class for the individual teams than it is to have a class containing several. You could replace the Teams class with a list of Team instances. It depends on what you're going to be doing with it I guess.
To be specific in my case, the class Job has a number of Task objects on which it operates.
import tasker
class Job(object):
_name = None
_tasks = []
_result = None
def __init__(self, Name):
self._name = Name
def ReadTasks(self):
# read from a Json file and create a list of task objects.
def GetNumTasks(self):
return len(self._tasks)
def GetNumFailedTasks(self):
failTaskCnt = 0
for task in self._tasks:
if task.IsTaskFail():
failTaskCnt += 1
To make GetNumFailedTasks more succinct, I would like to use a filter, but I am not sure what is the correct way to provide filter with IsTaskFail as the first parameter.
In case, this is a duplicate, please mark it so, and point to the right answer.
You can use a generator expression with sum:
failTaskCnt = sum(1 for task in self._tasks if task.IsTaskFail())