Linear Mixed Model (LMM) with statsmodels library (syntax suggestion) - python

I have a dataset that I tried that I want to analyised with Linear Mixed Effect (LMM). The Fixed effect is a binary classification (H and L), and the Block effect is Region (A, B and C). In other words, I wanted so see if Test1 and Test2 predict Code.
Conditions:
Category as fixed and the regions as block effect)
Fitting method (REML)
My dataset (df):
columns = ['Region', 'Code', 'Code_new', 't/acre', 'Test1', 'Test2']
values = [['FN', 'H', 1, 15.0, 0.712862151688503, 23.5568811267605],
['BOP', 'L', 0, 8.7, 0.587254318046456, 36.4475715254237],
['MN', 'H', 1, 21.4, 0.569632310916364, 36.528769122807],
['FN', 'H', 1, 17.9, 0.79394644935972, 21.3874086075949],
['FN', 'H', 1, 15.0, 0.841279669762641, 23.974678095238],
['BOP', 'L', 0, 8.2, 0.587428337428337, 36.4475715254237],
['MN', 'H', 1, 15.0, 0.613690151101401, 35.8121337704918],
['BOP', 'L', 0, 14.6, 0.679920477137176, 20.078494117647],
['BOP', 'L', 0, 11.9, 0.608206746892878, 32.4547462295081],
['BOP', 'L', 0, 11.1, 0.606286033103961, 31.0558347540983],
['BOP', 'H', 1, 18.3, 0.667314418966187, 31.3314411940298],
['BOP', 'H', 1, 26.5, 0.734909244406922, 26.1845567123287],
['MN', 'H', 1, 16.4, 0.623185442649764, 45.9700361290322],
['FN', 'H', 1, 16.6, 0.849115302352712, 16.8659350588235],
['BOP', 'H', 1, 18.1, 0.5, 13.3120415999999],
['BOP', 'H', 1, 17.6, 0.588509606416713, 23.4305816949152],
['BOP', 'L', 0, 9.4, 0.628374497415278, 26.0064304761904],
['BOP', 'L', 0, 12.3, 0.567562452687358, 23.3544589473684],
['BOP', 'L', 0, 12.0, 0.610088763801688, 15.1082439344262],
['FN', 'H', 1, 19.0, 0.745795716055939, 17.9769450666666],
['BOP', 'H', 1, 19.6, 0.619527896995708, 35.2345187096774],
['BOP', 'L', 0, 11.9, 0.629380902413431, 29.7990349206349],
['MN', 'H', 1, 16.6, 0.627708209103557, 37.3842438095238],
['MN', 'H', 1, 18.9, 0.63996043521266, 29.0667574999999],
['BOP', 'L', 0, 11.5, 0.650010453690152, 38.0719138461538],
['BOP', 'H', 1, 18.1, 0.626588465298142, 21.6720253968253],
['BOP', 'L', 0, 10.1, 0.643243243243243, 31.466765],
['BOP', 'L', 0, 7.6, 0.594365678512244, 36.4475715254237],
['BOP', 'L', 0, 11.0, 0.595460614152203, 34.986776]]
I have been using statmodels lib; I would appreciate it if you could help me to write the correct sytaxis.
My code is :
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import researchpy as rp
import statsmodels.api as sm
import scipy.stats as stats
model = smf.mixedlm("Code ~ Test + C(Code)", df, groups= 'Region').fit()
model.summary()
The expected p-value for Test1 is 0.66, and for Test2 is 0.15. Those were calculated in Matlab by another person.

Related

List that shows which element has the largest value

I am trying to make a function that will take elements and values input by the user, and list whichever element has the highest value. For example,
['H', 14.5, 'Be', 2.5, 'C', 50.5, 'O', 22.5 'Mg', 4.0, 'Si', 6.0]
the correct answer is 'C'. I can't seem to figure out how to get this to work. I don't have any code yet, unfortunately.
You can zip the list with itself to get alternating tuples. If you put the number first, you can just use max() to get the largest. This assumes there are not ties:
l = ['H', 14.5, 'Be', 2.5, 'C', 50.5, 'O', 22.5, 'Mg', 4.0, 'Si', 6.0]
num, symbol = max(zip(l[1::2], l[::2]))
# (50.5, 'C')
This works because tuples are compared in order and zipping alternating values gives a collection of tuples like:
list(zip(l[1::2], l[::2]))
# [(14.5, 'H'), (2.5, 'Be'), (50.5, 'C'), (22.5, 'O'), (4.0, 'Mg'), (6.0, 'Si')]
Given:
li=['H', 14.5, 'Be', 2.5, 'C', 50.5, 'O', 22.5, 'Mg', 4.0, 'Si', 6.0]
Create tuples and then take max of those tuples:
>>> max(((li[i],li[i+1]) for i in range(0,len(li),2)), key=lambda t: t[1])
('C', 50.5)
Welcome to StackOverflow!
As far as I understand your question, you try to find a maximum element in a list that contains both strings (e.g., 'C') as keys and numbers (e.g., '50.5') as values. For this purpose, a dictionary is more convenient:
dictionary = {'H': 14.5, 'Be': 2.5, 'C': 50.5, 'O': 22.5, 'Mg': 4.0, 'Si': 6.0}
max_key = max(dictionary, key=dictionary.get)
print(max_key)
# 'C'
I hope it helps.
Assuming that each numerical value is positive:
lst = ['H', 14.5, 'Be', 2.5, 'C', 50.5, 'O', 22.5, 'Mg', 4.0, 'Si', 6.0]
index, _ = max(enumerate(lst), key=lambda p: p[1] if isinstance(p[1], float) else 0)
el = lst[index-1]
print(el)
otherwise first filter by type and then get the index of the maximal value
_, index = max(((v, i) for i, v in enumerate(lst) if isinstance(v, float)))
el = lst[index-1]
print(el)

Python Plotnine (ggplot) add mean line per color to plot?

Using plotnine in python, I'd like to add dashed horizontal lines to my plot (a scatterplot, but preferably an answer compatible with other plot types) representing the mean for every color separately. I'd like to do so without manually computing the mean values myself or adapting other parts of the data (e.g. adding columns for color values etc).
Additionally, the original plot is generated via a function (make_plot below) and the mean lines are to be added afterwards, yet need to have the same color as the points from which they are derived.
Consider the following as a minimal example;
import pandas as pd
import numpy as np
from plotnine import *
df = pd.DataFrame( { 'MSE': [0.1, 0.7, 0.5, 0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3] } )
def make_plot(df, var_x, var_y, var_fill) :
plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()
return plot
plot = make_plot(df, 'Number', 'MSE', 'Size')
I'd like to add 4 lines, one for each Size. The exact same can be done in R using ggplot, as shown by this question. Adding geom_line(stat="hline", yintercept="mean", linetype="dashed") to plot however results in an error PlotnineError: "'stat_hline' Not in Registry. Make sure the module in which it is defined has been imported." that I am unable to resolve.
Answers that can resolve the aforementioned issue, or propose another working solution entirely, are greatly appreciated.
You can do it by first defining the means as a vector and then pass it to your function:
import pandas as pd
import numpy as np
from plotnine import *
from random import randint
df = pd.DataFrame( { 'MSE': [0.1, 0.7, 0.5, 0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3] } )
a = df.groupby(['Size'])['MSE'].mean() ### Defining yuor means
a = list(a)
def make_plot(df, var_x, var_y, var_fill):
plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =a,linetype="dashed")
return plot
plot = make_plot(df, 'Number', 'MSE', 'Size')
which gives:
Note that two of the lines coincide:
a = [0.6666666666666666, 0.5, 0.4666666666666666, 0.6666666666666666]
To add different colors to each dashed line, you can do this:
import pandas as pd
import numpy as np
from plotnine import *
df = pd.DataFrame( { 'MSE': [0.1, 0.7, 0.5, 0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3] } )
### Generate a list of colors of the same length as your categories (Sizes)
color = []
n = len(list(set(df.Size)))
for i in range(n):
color.append('#%06X' % randint(0, 0xFFFFFF))
######################################################
def make_plot(df, var_x, var_y, var_fill):
plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =list(df.groupby(['Size'])['MSE'].mean()),linetype="dashed", color =b)
return plot
plot = make_plot(df, 'Number', 'MSE', 'Size')
which returns:

Save dataframe as CSV in Python

I am trying to save the result of this code as a CSV file:
import pandas as pd
df = pd.DataFrame({'ID': ['a01', 'a01', 'a01', 'a01', 'a01', 'a01', 'a01', 'a01', 'a01', 'b02', 'b02','b02', 'b02', 'b02', 'b02', 'b02'],
'Row': [1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 3, 3, 3],
'Col': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 3, 1, 3, 1, 2, 3],
'Result': ['p', 'f', 'p', 'p', 'p', 'f', 'p', 'p', 'p', 'p', 'p', 'p', 'f', 'p', 'p', 'p']})
dfs = {}
for n, g in df.groupby('ID'):
dfs[n] = g.pivot('Row', 'Col', 'Result').fillna('')
print(f'ID: {n}')
print(dfs[n])
print('\n')
print(dfs[n].stack().value_counts().to_dict())
print('\n')
I found several methods and tried to save the output (dictionary form) into a CSV file, but without success. Any thoughts?
P.S. This is one of the methods I found, but I didn't know how to name the column based on my output?
with open("Output.csv", "w", newline="") as csv_file:
cols = ["???????????"]
writer = csv.DictWriter(csv_file, fieldnames=cols)
writer.writeheader()
writer.writerows(data)
df.to_csv('Output.csv', index = False)
For more details goto:
https://datatofish.com/export-dataframe-to-csv/
https://www.geeksforgeeks.org/saving-a-pandas-dataframe-as-a-csv/
Use the method provided by pandas data frame abject
df.to_csv()
You can use df.to_csv() to convert your data to csv.

Pandas - Add a column level to multi index

I would like to add a sublevel (L4) in my dataframe, based on a list of values:
x = [0.01, 0.01, 0.01, 0.02, 0.02, 0.02]
The df.columns returns me this:
MultiIndex(levels=[['Foo', 'Bar'], ['A', 'B', 'C'], ['a']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2], [0, 0, 0, 0, 0, 0]],
names=['L1', 'L2', 'L3'])
So far I have tried that:
df = pd.concat([df], keys=x, names=['L4'], axis=1).swaplevel(i='L4', j='L1', axis=1).swaplevel(i='L4', j='L2', axis=1).swaplevel(i='L4', j='L3', axis=1)
but it doesn't give the good value, it repeats list_levels[0] (0.01).
Do you have any idea on how I can do it ?
Thanks
Here's a way:
cols = pd.MultiIndex(levels=[['Foo', 'Bar'], ['A', 'B', 'C'], ['a']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2], [0, 0, 0, 0, 0, 0]],
names=['L1', 'L2', 'L3'])
pd.DataFrame(columns = cols).T\
.assign(x = [0.01, 0.01, 0.01, 0.02, 0.02, 0.02])\
.set_index('x', append=True).T
Output:
You can create a DataFrame with the column index as the Index, and the data being the level you want to add, as set_index(append=True) is only defined for the row Index. Then assign it with df.columns = ...
import pandas as pd
idx = pd.MultiIndex(levels=[['Foo', 'Bar'], ['A', 'B', 'C'], ['a']],
codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2], [0, 0, 0, 0, 0, 0]],
names=['L1', 'L2', 'L3'])
x = [0.01, 0.01, 0.01, 0.02, 0.02, 0.02]
pd.DataFrame(x, index=idx, columns=['L4']).set_index('L4', append=True).index
#MultiIndex([('Foo', 'A', 'a', 0.01),
# ('Foo', 'B', 'a', 0.01),
# ('Foo', 'C', 'a', 0.01),
# ('Bar', 'A', 'a', 0.02),
# ('Bar', 'B', 'a', 0.02),
# ('Bar', 'C', 'a', 0.02)],
# names=['L1', 'L2', 'L3', 'L4'])
Under the hood set_index just recreates the entire MultiIndex when appending, so a more hands-on approach is
arrays = []
for i in range(idx.nlevels):
arrays.append(idx.get_level_values(i))
arrays.append(pd.Index(x, name='L4')) # Add the new level
new_idx = pd.MultiIndex.from_arrays(arrays)
#MultiIndex([('Foo', 'A', 'a', 0.01),
# ('Foo', 'B', 'a', 0.01),
# ('Foo', 'C', 'a', 0.01),
# ('Bar', 'A', 'a', 0.02),
# ('Bar', 'B', 'a', 0.02),
# ('Bar', 'C', 'a', 0.02)],
# names=['L1', 'L2', 'L3', 'L4'])

Code appends term to both list for no apparent reason [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 6 years ago.
I am trying to write some code that prints something, but it keeps printing something else. Below is the code, what it prints, and what I want it to print.
def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
beam = beam2*len(bigrams[phoneme])
for value in bigrams[phoneme]:
beam[i][0].append(value)
if i == len(beam)-1:
i = 0
else:
i += 1
print(beam)
from collections import defaultdict
bigrams = defaultdict(dict, {'AH': {'u': 0.4, 'l': 0.2, 'ous': 0.2, 'e': 0.2}, 'IH': {'y': 0.16666666666666666, 'i': 0.6666666666666666, 'e': 0.16666666666666666}, 'AE': {'a': 1.0}, 'K': {'c': 0.4, 'x': 0.2, 'q': 0.2, 'ch': 0.2}, 'H': {}, 'G': {'g': 1.0}, 'SH': {'sh': 1.0}, 'Z': {'se': 1.0}, 'AA': {'o': 1.0}, 'JH': {'ge': 1.0}, 'W': {'u': 0.5, 'w': 0.5}, 'V': {'v': 1.0}, 'M': {'me': 0.2, 'm': 0.8}, 'N': {'ne': 0.2, 'n': 0.8}, 'F': {'f': 1.0}, 'B': {'b': 1.0}, 'D': {'de': 0.16666666666666666, 'dd': 0.16666666666666666, 'd': 0.6666666666666666}, 'OW': {'o': 1.0}, 'L': {'l': 0.8333333333333334, 'e': 0.16666666666666666}, 'T': {'te': 0.16666666666666666, 'tt': 0.08333333333333333, 't': 0.75}, 'EH': {'ea': 0.3333333333333333, 'a': 0.3333333333333333, 'e': 0.3333333333333333}, 'S': {'ss': 0.125, '_': 0.25, 's': 0.625}, 'R': {'re': 0.16666666666666666, 'r': 0.8333333333333334}, 'ER': {'or': 0.25, 'er': 0.75}, 'EY': {'ai': 0.2, 'a': 0.8}, 'P': {'p': 1.0}, 'IY': {'y': 0.5, 'e': 0.5}, 'AY': {'i': 1.0}})
trigrams = defaultdict(dict, {('T', 'u'): {'tt': 1.0}, ('S', '^'): {'s': 1.0}, ('D', '^'): {'d': 1.0}, ('K', 'e'): {'x': 1.0}, ('M', '^'): {'m': 1.0}, ('T', 'a'): {'te': 1.0}, ('S', 'x'): {'_': 1.0}, ('T', 'o'): {'t': 1.0}, ('T', 's'): {'t': 1.0}, ('AA', 'm'): {'o': 1.0}, ('IH', '^'): {'i': 0.6666666666666666, 'e': 0.3333333333333333}, ('D', 'n'): {'d': 1.0}, ('B', 'o'): {'b': 1.0}, ('IY', 'f'): {'e': 1.0}, ('K', 'i'): {'c': 1.0}, ('K', '^'): {'c': 0.3333333333333333, 'ch': 0.3333333333333333, 'q': 0.3333333333333333}, ('IH', 't'): {'i': 1.0}, ('S', 'or'): {'s': 1.0}, ('R', 'ch'): {'r': 1.0}, ('D', 'l'): {'d': 1.0}, ('IY', 'r'): {'y': 0.5, 'e': 0.5}, ('IH', 'm'): {'y': 1.0}, ('L', 'c'): {'l': 1.0}, ('EH', 'd'): {'a': 0.5, 'e': 0.5}, ('G', 'o'): {'g': 1.0}, ('V', 'n'): {'v': 1.0}, ('AE', 's'): {'a': 1.0}, ('S', 'y'): {'s': 1.0}, ('OW', 'r'): {'o': 1.0}, ('L', 'e'): {'l': 1.0}, ('N', 'i'): {'ne': 0.3333333333333333, 'n': 0.6666666666666666}, ('OW', 'l'): {'o': 1.0}, ('Z', 'n'): {'se': 1.0}, ('ER', 'm'): {'er': 1.0}, ('P', '^'): {'p': 1.0}, ('IH', 'u'): {'i': 1.0}, ('R', 'a'): {'re': 1.0}, ('R', '^'): {'r': 1.0}, ('T', 'e'): {'t': 1.0}, ('L', 'l'): {'e': 1.0}, ('EY', 't'): {'ai': 0.5, 'a': 0.5}, ('AY', 'l'): {'i': 1.0}, ('EY', 'b'): {'a': 1.0}, ('IY', 't'): {'y': 1.0}, ('ER', 'n'): {'er': 1.0}, ('OW', '^'): {'o': 1.0}, ('M', 'o'): {'me': 1.0}, ('S', 'u'): {'s': 1.0}, ('OW', 'g'): {'o': 1.0}, ('W', 'q'): {'u': 1.0}, ('T', '^'): {'t': 1.0}, ('S', 'ous'): {'_': 1.0}, ('AH', 'b'): {'u': 1.0}, ('EH', 'l'): {'ea': 1.0}, ('OW', 'm'): {'o': 1.0}, ('M', 'e'): {'m': 1.0}, ('EY', 'v'): {'a': 1.0}, ('EY', 'p'): {'a': 1.0}, ('AH', 'er'): {'ous': 1.0}, ('JH', 'er'): {'ge': 1.0}, ('ER', 'tt'): {'er': 1.0}, ('R', 't'): {'r': 1.0}, ('L', '^'): {'l': 1.0}, ('B', 'e'): {'b': 1.0}, ('SH', '^'): {'sh': 1.0}, ('ER', 'w'): {'or': 1.0}, ('W', '^'): {'w': 1.0}, ('T', 'i'): {'t': 1.0}, ('L', 'o'): {'l': 1.0}, ('B', '^'): {'b': 1.0}, ('F', '^'): {'f': 1.0}, ('AH', 'r'): {'u': 1.0}, ('L', 'ai'): {'l': 1.0}, ('N', 'ea'): {'n': 1.0}, ('AH', 'dd'): {'l': 1.0}, ('S', 'a'): {'ss': 0.5, 's': 0.5}, ('AH', 'd'): {'e': 1.0}, ('N', 'o'): {'n': 1.0}, ('AE', 'b'): {'a': 1.0}, ('AA', 'sh'): {'o': 1.0}, ('D', 'a'): {'de': 0.5, 'dd': 0.5}})
speech2text("M IH T", bigrams, trigrams, alpha=0.5)
Here is what it prints
[[['^', 'm'], 1.0], [['^', 'm'], 1.0]]
[[['^', 'm', 'me'], 1.0], [['^', 'm', 'me'], 1.0]]
...... and so on
Here is what I want it to print
[[['^', 'm'], 1.0], [['^', 'me'], 1.0]]
...... and so on
Basically, why is it appending the term onto both lists? I thought it had something to do with the fact that beam and beam2 'point' to the same list, I tried making beam2 = beam2*len(bigrams[phoneme]) and then beam = list(beam2), which I believe makes them point to two separate lists in the memory but maybe not?
Thanks for your help
EDIT:
So after some help from Gassa, my code now looks like this but I have a new problem:
def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
beam = [[[['^'],1.0]] for k in range (len(bigrams[phoneme]))]
for value in bigrams[phoneme]:
beam[i][0].append(value)
if i == len(beam)-1:
i = 0
else:
i += 1
beam2 = beam
print(beam2)
here it prints beam2 which contains two sets, then three, then three, when really I need it to contain two, then six, then 18 sets. Which would work with this code:
def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
beam = [beam2 for k in range (len(bigrams[phoneme]))]
for value in bigrams[phoneme]:
beam[i][0].append(value)
if i == len(beam)-1:
i = 0
else:
i += 1
beam2 = beam
print(beam2)
But then of course we are back to the original problem.
Thanks again for your help!
The line
beam = beam2*len(bigrams[phoneme])
creates the list beam as len(bigrams[phoneme]) references to one and the same list beam2[0].
You can instead use a line like
beam = [[['^'],1.0] for k in range (len(bigrams[phoneme]))]
Note that beam2 is no longer used.
This way, you get the output
[[['^', 'me'], 1.0], [['^'], 1.0]]
[[['^', 'me'], 1.0], [['^', 'm'], 1.0]]
...
Which is not exactly what you want, but at least the contents of beam are different lists now.
EDIT:
As for the second part of your problem, this code seems to do what you want:
def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
beam = [copy.deepcopy (j) for j in beam2 for k in range (len(bigrams[phoneme]))]
for j in range (len (beam2)):
for value in bigrams[phoneme]:
beam[i][0].append(value)
if i == len(beam)-1:
i = 0
else:
i += 1
beam2 = beam
print(beam2)
The copy.deepcopy part ensures that all lists inside lists are copied properly, and you don't have to deal with the copying yourself.
The for j in beam2 for k in range part is to put all the contents into the same list, not as a list of lists.
The new for j in range (len (beam2)): part is to apply your changes to the whole beam, not only to its prefix.

Categories