Can association rules in Python provide 2 items set in antecedents

Can association rules in Python provide 2 items set in antecedents - python

Can association rules in Python provide 2 items set in antecedents?
(A->B) -> C
I have written the code below:
rules1 = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.01)
The result shows antecendents and consequent e.g., A -> B, I would like to see (A->B) -> C

Related

How do I remove unwanted parts from strings in a Python DataFrame column

Based on the script originally suggested by u/commandlineluser at reddit, I (as a Python novice) attempted to revise the original code to remove unwanted parts that vary across column values. The Python script involves creating a dictionary with keys and values and using a list comprehension with str.replace.
(part of the original script by u/commandlineluser at reddit)
extensions = "dat", "ssp", "dta", "v9", "xlsx"
(The next line is my revision to the above part, and below is the complete code block)
extensions = "dat", "ssp", "dta", "20dta", "u20dta", "f1dta", "f2dta", "v9", "xlsx"
Some of the results are different than what I desire. Please see below (what I tried).
import pandas as pd
import re
data = {"full_url": ['https://meps.ahrq.gov/data_files/pufs/h225/h225dat.zip',
'https://meps.ahrq.gov/data_files/pufs/h51bdat.zip',
'https://meps.ahrq.gov/data_files/pufs/h47f1dat.zip',
'https://meps.ahrq.gov/data_files/pufs/h225/h225ssp.zip',
'https://meps.ahrq.gov/data_files/pufs/h220i/h220if1dta.zip',
'https://meps.ahrq.gov/data_files/pufs/h220h/h220hv9.zip',
'https://meps.ahrq.gov/data_files/pufs/h220e/h220exlsx.zip',
'https://meps.ahrq.gov/data_files/pufs/h224/h224xlsx.zip',
'https://meps.ahrq.gov/data_files/pufs/h036brr/h36brr20dta.zip',
'https://meps.ahrq.gov/data_files/pufs/h036/h36u20dta.zip',
'https://meps.ahrq.gov/data_files/pufs/h197i/h197if1dta.zip',
'https://meps.ahrq.gov/data_files/pufs/h197i/h197if2dta.zip']}
df = pd.DataFrame(data)
extensions = ["dat", "ssp", "dta", "20dta", "u20dta", "f1dta", "f2dta", "v9", "xlsx"]
replacements = dict.fromkeys((f"{ext}[.]zip$" for ext in extensions), "")
df["file_id"] = df["full_url"].str.split("/").str[-1].replace(replacements, regex=True)
print(df["file_id"])
Annotated output
0 h225 (looks good)
1 h51b (looks good)
2 h47f1 (h47 -> desired)
3 h225 (looks good)
4 h220if1 (h220i -> desired)
5 h220h (looks good)
6 h220e (looks good)
7 h224 (looks good)
8 h36brr20 (h36brr -> desired)
9 h36u20 (h36 -> desired)
10 h197if1 (h197i -> desired)
11 h197if2 (h197i -> desired)

You have two issues here, and they are all in this line:
extensions = ["dat", "ssp", "dta", "20dta", "u20dta", "f1dta", "f2dta", "v9", "xlsx"]
First issue
The first issue is in the order of the elements of this list. "dat" and "dta" are substrings of other elements in this string and they are at the front of this list. Let's take an example: h47f1dat.zip needs to become h47. But in these lines:
replacements = dict.fromkeys((f"{ext}[.]zip$" for ext in extensions), "")
df["file_id"] = df["full_url"].str.split("/").str[-1].replace(replacements, regex=True)
You keep the order, meaning that you'll first be filtering with the "dat" string, which becomes h47f1. This can be easily fixed by reordering your list.
Second issue
You missed an entry in your extensions list: if you want h47f1dat.zip to become h47 you need to have "f1dat" in your list but you only have "f1dta".
Conclusion
You were almost there! There was simply a small issue with the order of the elements and one extension was missing (or you have a typo in your URLs).
The following extensions list:
extensions = ["ssp", "20dta", "u20dta", "f1dat", "f1dta", "f2dta", "v9", "dat", "dta", "xlsx"]
Together with the rest of your code gives you the result you want:
0 h225
1 h51b
2 h47
3 h225
4 h220i
5 h220h
6 h220e
7 h224
8 h36brr
9 h36u
10 h197i
11 h197i

Good catch about the issue of the order of the elements and the missing extension! Thank you.
Question 1: Do you mean the list extensions is not sorted alphabetically? Can I not use the Python sort() method to sort the list? I have over one thousand rows in the actual dataframe, and I prefer to sort the list programmatically. I hope I do not misunderstand your comments.
Question 2: I don't understand why I am getting h36u instead of the desired value h36 in the output even after reordering the list as you suggested. Any thoughts?
I have tried another approach (code below) using Jupyter Lab, which provides the output in which the first two values are different from the desired output (also shown below), but the other values seem to be what I desire including h36.
df["file_id"] = df["full_url"].str.split("/").str[-1].str.replace(r'(\dat.zip \
|f1dat.zip|dta.zip|f1dta.zip|f2dta.zip|20dta.zip|u20dta.zip|xlsx.zip|v9.zip|ssp.zip)' \
,'', regex=True)
print(df["file_id"])
Output (annotated)
0 h225dat.zip (not desired; h225 desired)
1 h51bdat.zipn (not desired; h51b desired)
2 h47
3 h225
4 h220i
5 h220h
6 h220e
7 h224
8 h36brr
9 h36
10 h197i
11 h197i
Question 3: Any comments on the above alternative code snippets?

z3 python change logic gate children

My algorithm needs to modify the children() of the existing logic gate. Suppose i have the following code
a = Bool('a')
b = Bool('b')
c = Bool('c')
or_gate = Or(a, b)
I want to modify or_gate to be Or(a, c).
I have tried the following:
or_gate.children()[1] = c
print(or_gate)
The above code doesn't work, or_gate is still Or(a, b). So how do i change the children of a logic gate in z3?
Edit: i can't create new Or() that contains the children i want due to nested gate. For example consider the following
a = Bool('a')
b = Bool('b')
c = Bool('c')
d = Bool('d')
e = Bool('e')
or_gate_one = Or(a, b)
or_gate_two = Or(or_gate_one, c)
or_gate_three = Or(or_gate_two, d)
If we create new object instead of directly modifying the children of or_gate_one, then or_gate_two and or_gate_three should also be modified, which is horrible for large scale. I need to modify the children of or_gate_one directly.

Use the substitute method:
from z3 import *
a = Bool('a')
b = Bool('b')
c = Bool('c')
or_gate = Or(a, b)
print(or_gate)
or_gate_c = substitute(or_gate, *[(a, c)])
print(or_gate_c)
This prints:
Or(a, b)
Or(c, b)
I wouldn't worry about efficiency, unless you observe it to be a problem down the road: (1) z3 will "share" the internal AST when you use substitute as much as it can, and (2) in most z3 programming, run-time is dominated by solving the constraints, not building them.
If you find that building the constraints is very costly, then you should switch to the lower-level APIs (i.e., C/C++), instead of using Python so you can avoid the extra-level of interpretation in between. But again, only do this if you have run-time evidence that building is the bottleneck, not the solving. (Because in the common case of the latter, switching to C/C++ isn't going to save you much anyhow.)

Implementing multiple branching rules in the same branch and bound tree in PySCIPOpt

I would like to implement a custom branching rule initially (for a few nodes in the top of the tree), and then use Scip's implementation of vanilla full strong branching rule (or some other rule like pseudocost). Is this possible to do using/by extending
PySCIPOpt?
import pyscipopt as scip
import random
class oddevenbranch(scip.Branchrule):
def branchexeclp(self, allowaddcons):
'''
This rule uses the branching rule I have defined if the node number is odd,
and should use strong branching otherwise.
'''
node_ = self.model.getCurrentNode()
num = node_.getNumber()
if num % 2 == 1:
candidate_vars, *_ = self.model.getLPBranchCands()
branch_var_idx = random.randint(0,len(candidate_vars)-1)
branch_var = candidate_vars[branch_var_idx]
self.model.branchVar(branch_var)
result = scip.SCIP_RESULT.BRANCHED
return {"result": result}
else:
print(num, ': Did not branch')
result = scip.SCIP_RESULT.DIDNOTRUN
return {"result": result}
if __name__ == "__main__":
m1 = scip.Model()
m1.readProblem('xyz.mps') # Used to read the instance
m1.setIntParam('branching/fullstrong/priority', 11000)
branchrule = oddevenbranch()
m1.includeBranchrule(branchrule=branchrule,
name="CustomRand", # name of the branching rule
desc="", # description of the branching rule
priority=100000, # priority: set to this to make it default
maxdepth=-1, # maximum depth up to which it will be used, or -1 for no restriction
maxbounddist=1) # maximal relative distance from current node's dual bound to primal
m1.optimize()
I wonder what is causing this behavior. Does it need to call branching multiple times to perform strong branching?
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch
2 : Did not branch

I believe you can achieve this effect. SCIP has several branching rules and executes them one by one according to their priorities until one of them produces a result.
The default rule "relpscost" has a priority of 10000, so you should create a custom rule with a higher priority.
If you do not want to use your own rule at a node (deeper in the tree), you can
decide in the branchexeclp(allowedcons) callback of your rule to return a dict
{'result': pyscipopt.SCIP_RESULT.DIDNOTRUN}
which should inform SCIP that your rule did not execute and should be skipped, in which case the "relpscost" rule should take over (see here).
Edit: It is not quite obvious to me how your instance looks like so I am guessing a bit here: I think you are right in assuming that the multiple calls at node 2 are due to the strong branching. You can check by switching to another backup strategy, such as "mostinf".
Additionally, it is really helpful to check out the statistics by calling model.printStatistics() after the optimization has finished. I tried your code on the "cod105" instance of the MIPlIB benchmarks and similarly found a set of "Did not branch" outputs. The (abbreviated) statistics regarding branching rules are the following:
Branching Rules : ExecTime SetupTime BranchLP BranchExt BranchPS Cutoffs DomReds Cuts Conss Children
CustomRand : 0.00 0.00 1 0 0 0 0 0 0 2
fullstrong : 5.32 0.00 7 0 0 0 7 0 3 0
So the fallback rule is in fact called several times. Apparently, the "fullstrong" rule calls "CustomRand" internally, though this is not reflected in the actual statistics...

SWRL rules in OWL 2

I'm currently discovering all the possibilities of the Owlready library.
Right now I'm trying to process some SWRL rules and so far it's been going very good, but I'm stuck at one point.
I've defined some rules in my ontology and now I want to see all the results (so, everything inferred from a rule).
For example, if I had a rule
has_brother(David, ?b) ^ has_child(?b, ?s) -> has_uncle(?s, David)
and David has two brothers, John and Pete, and John's kid is Anna, Pete's kid is Simon, I would like too see something like:
has_brother(David, John) ^ has_child(John, Anna) -> has_uncle(Anna, David)
has_brother(David, Pete) ^ has_child(Pete, Simon) -> has_uncle(Simon, David)
Is this possible in any way?
I thought that maybe if I run the reasoner, I could see it in its output, but I can't find this anywhere.
I appreciate any help possible!

This is my solution:
import owlready2 as owl
onto = owl.get_ontology("http://test.org/onto.owl")
with onto:
class Person(owl.Thing):
pass
class has_brother(owl.ObjectProperty, owl.SymmetricProperty, owl.IrreflexiveProperty):
domain = [Person]
range = [Person]
class has_child(Person >> Person):
pass
class has_uncle(Person >> Person):
pass
rule1 = owl.Imp()
rule1.set_as_rule(
"has_brother(?p, ?b), has_child(?p, ?c) -> has_uncle(?c, ?b)"
)
# This rule gives "irreflexive transitivity",
# i.e. transitivity, as long it does not lead to has_brother(?a, ?a)"
rule2 = owl.Imp()
rule2.set_as_rule(
"has_brother(?a, ?b), has_brother(?b, ?c), differentFrom(?a, ?c) -> has_brother(?a, ?c)"
)
david = Person("David")
john = Person("John")
pete = Person("Pete")
anna = Person("Anna")
simon = Person("Simon")
owl.AllDifferent([david, john, pete, anna, simon])
david.has_brother.extend([john, pete])
john.has_child.append(anna)
pete.has_child.append(simon)
print("Uncles of Anna:", anna.has_uncle) # -> []
print("Uncles of Simon:", simon.has_uncle) # -> []
owl.sync_reasoner(infer_property_values=True)
print("Uncles of Anna:", anna.has_uncle) # -> [onto.Pete, onto.David]
print("Uncles of Simon:", simon.has_uncle) # -> [onto.John, onto.David]
Notes:
One might think has_brother is
symmetric, i.e. has_brother(A, B) ⇒ has_brother(B, A)
transitive, i.e. has_brother(A, B) + has_brother(B, C) ⇒ has_brother(A, C)
irreflexive, i.e. no one is his own brother.
However, transitivity only holds if the unique name assumption holds. Otherwise A could be the same individual as C and this conflicts irreflexivity. Thus I used a rule for this kind of "weak transitivity".
Once, has_brother works as expected the uncle rule also does. Of course, the reasoner must run before.
Update: I published the solution in this Jupyter notebook (which also contains the output of the execution).

How can I implement a makefile-style algorithm in python

I am implementing an acoustic feature extraction system in python, and I need to implement a makefile-style algorithm to ensure that all blocks in the feature extraction system are run in the correct order, and without repeating any feature extractions stages.
The input to this feature extraction system will be a graph detailing the links between the feature extraction blocks, and I'd like to work out which functions to run when based upon the graph.
An example of such a system might be the following:
,-> [B] -> [D] ----+
input --> [A] ^ v
`-> [C] ----+---> [E] --> output
and the function calls (assuming each block X is a function of the form output = X(inputs) might be something like:
a = A(input)
b = B(a)
c = C(a)
d = D(b,c) # can't call D until both b and c are ready
output = E(d,c) # can't call E until both c and d are ready
I already have the function graph loaded in the form of a dictionary with each dictionary entry of the form (inputs, function) like so:
blocks = {'A' : (('input'), A),
'B' : (('A'), B),
'C' : (('A'), C),
'D' : (('B','C'), D),
'output' : (('D','C'), E)}
I'm just currently drawing a blank on what the makefile algorithm does exactly, and how to go about implementing it. My google-fu appears to be not-very-helpful here too. If someone at least can give me a pointer to a good discussion of the makefile algorithm that would probably be a good start.

Topological sorting

blocks is basically an adjacency list representation of the (acyclic) dependency graph. Hence, to get the correct order to process the blocks, you can perform a topological sort.

As the other helpful answerers have already pointed out, what I'm after is a topological sort, but I think my particular case is a little simpler because the function graph must always start at input and end at output.
So, here is what I ended up doing (I've edited it slightly to remove some context-dependent stuff, so it may not be completely correct):
def extract(func_graph):
def _getsignal(block,signals):
if block in signals:
return signals[block]
else:
inblocks, fn = func_graph[block]
inputs = [_getsignal(i,signals) for i in inblocks]
signals[block] = fn(inputs)
return signals[block]
def extract_func (input):
signals = dict(input=input)
return _getsignal('output',signals)
return extract_func
So now given I can set up the function with
fn = extract(blocks)
And use it as many times as I like:
list_of_outputs = [fn(i) for i in list_of_inputs]
I should probably also put in some checks for loops, but that is a problem for another day.

For code in many languages, including Python try these Rosetta code links: Topological sort, and Topological sort/Extracted top item.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.