Essentially, I'm looking to build a web app where the user can input n number of labels for a dataset put it into a dictionary with keywords for each label. I'd like the same function to be created for n number of labels, something like:
# labeling function for label 1
#labeling_function()
def lf_label_1(x):
if x.label_1 in ["bag", "surfboard", "skis"]:
return CARRY
return ABSTAIN
So, I'd get a new function for each new label added by the user. Each function then feeds into a list and ends up being input for a function. For example:
# list of labeling functions
lfs = [
lf_ride_object,
lf_carry_object,
lf_carry_subject,
lf_not_person,
lf_ydist,
lf_dist,
lf_area,
]
# applying label functions to create dataset
applier = PandasLFApplier(lfs)
L_train = applier.apply(df_train)
L_valid = applier.apply(df_valid)
For more details (and includes a second approach):
I'm looking for advice on how to use snorkel in a particular way. I'd like to create a labeling function with spaCy's PhraseMatcher. Basically, I want a user to input all of the words (with the corresponding label) in a web app and send it to the PhraseMatcher. Then, match a paragraph of text inside the labeling function.
How would I go about creating a labeling function(s?) for n number of labels on the backend? Typically, we would write code >n number of labeling functions for all the labels, but I'm trying to use snorkel in a use-case where we don't know how many labels there are until the user creates them.
Is there a way around this? Basically, the Matcher would go over the input text and check how many labels are in the text and then return all the labels found.Kind of like I'm trying to get the users to use snorkel without creating the functions themselves and only input the (label, word(s)) combinations.
Is there a way to use the Matcher in such a way that there is only one labeling function and it uses the Matcher for all labels?
For example, there could be a single labeling function that looks like this (peudo-code):
# labeling function for all labels
#nlp_labeling_function() # labeling function for using spaCy
def lf_labeler(x, label_keyword_dict):
labels = []
doc = nlp(x) # tokenizing the input text
matches = PhraseMatcher(doc) # finding all the token words that match a specific label
for match_id in matches:
labels.append(nlp.vocab.strings[match_id] # adds every label where there was a match
if labels: # runs if labels is not empty
return labels # this would return all the labels that were matches in the text document. Not sure if it's possible to return the list of labels like this.
else:
return ABSTAIN # abstains from using this text example in the dataset creation because there was no match
So, it would basically take in a piece of text (from a dataframe), check to see if any of the users' keywords are in the text and add all of the appropriate labels as a result. Return all the matched labels or simply return "ABSTAIN" to say that there were no matches.
While typing up this question I came up with some ideas that I wrote out here, so I'll be testing them in the meantime and have a look a the snorkel source code to see if I can come up with anything else.
Thanks!
Related
I would like to group my 2 marker cluster layers, where one is reliant on the other by providing a separate styling. Hence the second one is set as control=False.
Nevertheless, I want to have it disappear when the first one is switched off.
Along with the new Python folium issue v.0.14 I found, that the new feature has been provided, which potentially could resolve my issue:
https://github.com/ikoojoshi/Folium-GroupedLayerControl
Allow only one layer at a time in Folium LayerControl
and I've applied the following code:
df = pd.read_csv("or_geo.csv")
fo=FeatureGroup(name="OR")
or_cluster = MarkerCluster(name="Or", overlay=True, visible=True).add_to(map)
or_status = MarkerCluster(overlay=True,
control=False,
visible=False,
disableClusteringAtZoom=16,
).add_to(map)
GroupedLayerControl(
groups={'OrB': or_cluster, 'OrC': or_status},
collapsed=False,
).add_to(map)
and the console throws the following error:
TypeError: 'MarkerCluster' object is not iterable
How could I switch off 2 layer groups at once?
UPDATE:
The answer below provides the code, which seems to work but not in the way I need.
df = pd.read_csv("or_geo.csv")
fo=FeatureGroup(name="Or",overlay = True)
or_cluster = MarkerCluster(name="Or").add_to(map)
or_status = MarkerCluster(control=False,
visible=True,
disableClusteringAtZoom=16,
).add_to(map)
# definition of or_marker
# definition of or_stat_marker
or_cluster.add_child(or_marker)
or_status.add_child(or_stat_marker)
GroupedLayerControl(
groups={"Or": [or_cluster, or_status]},
collapsed=False,
exclusive_group=False,
).add_to(map)
I have a separate box instead, but what is worst I can just switch between one layer and another whereas I would like to have them reliant on the main group. The exclusive_groups option allows me to untick both of them but I am looking for something, which would allow me to switch off two of them at once (place the thick box on the major group instead). Is it possible to have something like this?
Try passing your markerclusters as a list to the GroupedLayerControl, not one by one. This is described here:
https://nbviewer.org/github/chansooligans/folium/blob/plugins-groupedlayercontrol/examples/plugin-GroupedLayerControl.ipynb
GroupedLayerControl(
groups={'OrB': [or_cluster, or_status]},
collapsed=False,
).add_to(map)
Update I
I see what you mean, that was definitely nonsense as it splits groups instead of joining them. so, back to topic
We had a similar discussion here and I am still convinced that the FeatureSubGroup should solve this issue. I use it in exact that way that I enable/disable a MarkerCluster in the legend and multiple FeatureGroupSubGroups (which are added not to the map but to the MarkerCluster) appear/disappear. Perhaps you try that again
I am working on Jupyter Notebook with the fastai DataBlock and Dataloaders API to prepare batches for a neural network.
Currently what I've done is:
'''
path= Path('TensorFlow\workspace\images\mini_alphabet')'''
#created this path to the folder containing 20 images labelled A1,A2,A3,etc and 20 labelled B,B2,B3 etc
and when i use .ls() I can open and view the output in this form:
[Path('A1.jpg'),Path('A10.jpg'),Path('A11.jpg'),Path('A12.jpg'),Path('A13.jpg'),Path('A14.jpg'),Path('A15.jpg'),Path('A16.jpg'),Path('A17.jpg'),Path('A18.jpg')...]
What I want to do is make a labelling function that iterates through the mini_alphabet folder, checks to see if the image name starts with the letter A or B, and then returns that letter as the label.
So far I've written this function:
def label_alphabet(fname):
labels = os.listdir(fname) # <this outputs a list ['A1.jpg','A2.jpg',...]
for l in labels:
if l[0].startswith('A'):
return "A"
else:
return "B"
Unfortunately when I use this it seems to label every single image as A. What should I do differently here?
Also if I wanted to apply a larger number of labels (ranging from A-G), how exactly should I set up that code so it labels through all of them. I was thinking of iterating '''for i''' through a list of letters and returning labels that match the filename.
Thanks for the help!
try
if l.split(".")[0].startwith("A")
instead of
if l[0].startswith('A'):
If you want an array of letters you could use
list(string.ascii_uppercase)
Also put a tab between for and the code that follows from there
I'm making a GUI for a medical tool as a class project. Given a condition, it should output a bunch of treatment options gathered from different websites like webMD. I would like to be able to handle mouseover events on any of the treatments listed to give a little more information about the treatment (such as the category of drug, whether it is a generic or not, etc).
The labels are stored in a list, as I have no idea how many different treatments will be returned beforehand. So my question is how can I make these mouseover events work. I can't write a function definition for every single possible label, they would number in the hundreds or thousands. I'm sure there's a very pythonic way to do it, but I have no idea what.
Here's my code for creating the labels:
def search_click():
"""
Builds the search results after the search button has been clicked
"""
self.output_frame.destroy() # Delete old results
build_output() # Rebuild output frames
treament_list = mockUpScript.queryConditions(self.condition_entry.get()) # Get treatment data
labels = []
frames = [self.onceFrame, self.twiceFrame, self.threeFrame, self.fourFrame] # holds the list of frames
for treament in treament_list: # For each treatment in the list
label = ttk.Label(frames[treament[1] - 1], text=treament[0]) # Build the label for treatment
labels.append(label) # Add the treatment to the list
label.pack()
and here is what the GUI looks like (don't judge [-; )
The text "Hover over drugs for information" should be changed depending on which drug your mouse is hovering over.
I can't write a function definition for every single possible label, they would number in the hundreds or thousands. I'm sure there's a very pythonic way to do it, but I have no idea what.
Check out lambda functions which are nearly identical to what you want.
In your case, something like:
def update_bottom_scroll_bar(text):
# whatever you want to do to update the text at the bottom
for treatment in treament_list: # For each treatment in the list
label = ttk.Label(frames[treatment[1] - 1], text=treatment[0]) # Build the label for treatment
label.bind("<Enter>", lambda event, t=treatment: update_bottom_scroll_bar(text=t))
label.bind("<Leave>", lambda event: update_bottom_scroll_bar(text='Default label text'))
labels.append(label) # Add the treatment to the list
label.pack()
Also please spell your variables right, I corrected treament to treatment...
I have a list of of Features (all Points) in a list in Python. The Features are dynamic stemming from database data which is updated on a 30 minutes interval.
Hence I never have a static number of features.
I need to generate a Feature Collection with all Features in my list.
However (as far as I know) the syntax for creating a FeatureCollection wants you to pass it all the features.
ie:
FeatureClct = FeatureCollection(feature1, feature2, feature3)
How does one generate a FeatureCollection without knowing how many features there will be beforehand? Is there a way to append Features to an existing FeatureCollection?
According to the documentation of python-geojson (which i guess you are using, you didn't mention it) you can also pass a list to FeatureCollection, just put all the results into a list and you're good to go:
feature1 = Point((45, 45));
feature2 = Point((-45, -45));
features = [feature1, feature2];
collection = FeatureCollection(features);
https://github.com/frewsxcv/python-geojson#featurecollection
I have defined a pyparsing rule to parse this text into a syntax-tree...
TEXT COMMANDS:
add Iteration name = "Cisco 10M/half"
append Observation name = "packet loss 1"
assign Observation results_text = 0.0
assign Observation results_bool = True
append DataPoint
assign DataPoint metric = txpackets
assign DataPoint units = packets
append DataPoint
assign DataPoint metric = txpackets
assign DataPoint units = packets
append Observation name = "packet loss 2"
append DataPoint
assign DataPoint metric = txpackets
assign DataPoint units = packets
append DataPoint
assign DataPoint metric = txpackets
assign DataPoint units = packets
SYNTAX TREE:
['add', 'Iteration', ['name', 'Cisco 10M/half']]
['append', 'Observation', ['name', 'packet loss 1']]
['assign', 'Observation', ['results_text', '0.0']]
['assign', 'Observation', ['results_bool', 'True']]
['append', 'DataPoint']
['assign', 'DataPoint', ['metric', 'txpackets']]
['assign', 'DataPoint', ['units', 'packets']]
...
I'm trying to associate all the nested key-value pairs in the syntax-tree above into a linked-list of objects... the heirarchy looks something like this (each word is a namedtuple... children in the heirarchy are on the parents' list of children):
Log: [
Iteration: [
Observation:
[DataPoint, DataPoint],
Observation:
[DataPoint, DataPoint]
]
]
The goal of all this is to build a generic test data-acquisition platform to drive the flow of tests against network gear, and record the results. After the data is in this format, the same data structure will be used to build a test report. To answer the question in the comments below, I chose a linked list because it seemed like the easiest way to sequentially dequeue the information when writing the report. However, I would rather not assign Iteration or Observation sequence numbers before finishing the tests... in case we find problems and insert more Observations in the course of conducting the test. My theory is that the position of each element in the list is sufficient, but I'm willing to change that if it's part of the problem.
The problem is that I'm getting lost trying to assign Key-Values to objects in the linked list after it's built. For instance, after I insert an Observation namedtuple into the first Iteration, I have trouble reliably handling the update of assign Observation results_bool = True in the example above.
Is there a generalized design pattern to handle this situation? I have googled this for while, but I can't seem to make the link between parsing the text (which I can do) and managing the data-heirarchy (the main problem). Hyperlinks or small demo code is fine... I just need pointers to get on the right track.
I am not aware of an actual design pattern for what you're looking for, but I have a great passion for the issue at hand. I work heavily with network devices and parsing and organizing the data is a large ongoing challenge for me.
It's clear that the problem is not parsing the data, but what you do with it afterwards. This is where you need to think about the meaning you are attaching to the data you have parsed. The nested-list method might work well for you if the objects containing the lists are also meaningful.
Namedtuples are great for quick-and-dirty class-ish behavior, but they fall flat when you need them to do anything outside of basic attribute access, especially considering that as tuples they are immutable. It seems to me that you'll want to replace certain namedtuple objects with full-blown classes. This way you can highly customize the behavior and methods available.
For example, you know that an Iteration will always contain 1 or more Observation objects which will then contain 1 or more DataPoint objects. If you can accurately describe the relationships, this sets you on the path to handling them.
I wound up using textfsm, which allows me to keep state between different lines while parsing the configuration file.