.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_gallery/stat_univ_lab_brain-volume.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_gallery_stat_univ_lab_brain-volume.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_gallery_stat_univ_lab_brain-volume.py:


Hands-On: Brain volumes study
=============================

The study provides the brain volumes of grey matter (gm), white matter (wm)
and cerebrospinal fluid) (csf) of 808 anatomical MRI scans.

.. GENERATED FROM PYTHON SOURCE LINES 8-29

.. code-block:: Python


    import os
    import tempfile
    import urllib.request

    import pandas as pd

    # Stat
    import statsmodels.formula.api as smfrmla
    import statsmodels.api as sm
    import scipy.stats

    # Plot
    import matplotlib.pyplot as plt
    import seaborn as sns

    # Plot parameters
    plt.style.use('seaborn-v0_8-whitegrid')
    fig_w, fig_h = plt.rcParams.get('figure.figsize')
    plt.rcParams['figure.figsize'] = (fig_w, fig_h * .5)


.. GENERATED FROM PYTHON SOURCE LINES 30-32

Manipulate data
---------------

.. GENERATED FROM PYTHON SOURCE LINES 34-38

Set the working directory within a directory called "brainvol"

Create 2 subdirectories: `data` that will contain downloaded data and
`reports` for results of the analysis.

.. GENERATED FROM PYTHON SOURCE LINES 38-48

.. code-block:: Python


    WD = os.path.join(tempfile.gettempdir(), "brainvol")
    os.makedirs(WD, exist_ok=True)
    #os.chdir(WD)

    # use cookiecutter file organization
    # https://drivendata.github.io/cookiecutter-data-science/
    os.makedirs(os.path.join(WD, "data"), exist_ok=True)
    #os.makedirs("reports", exist_ok=True)


.. GENERATED FROM PYTHON SOURCE LINES 49-57

**Fetch data**

- Demographic data `demo.csv` (columns: `participant_id`, `site`, `group`,
  `age`, `sex`) and tissue volume data: `group` is Control or Patient.
  `site` is the recruiting site.
- Gray matter volume `gm.csv` (columns: `participant_id`, `session`, `gm_vol`)
- White matter volume `wm.csv` (columns: `participant_id`, `session`, `wm_vol`)
- Cerebrospinal Fluid `csf.csv` (columns: `participant_id`, `session`, `csf_vol`)

.. GENERATED FROM PYTHON SOURCE LINES 57-75

.. code-block:: Python


    base_url = 'https://github.com/duchesnay/pystatsml/raw/master/datasets/brain_volumes/%s'
    data = dict()
    for file in ["demo.csv", "gm.csv", "wm.csv", "csf.csv"]:
        urllib.request.urlretrieve(base_url % file, os.path.join(WD, "data", file))

    # Read all CSV in one line
    # dicts = {k: pd.read_csv(os.path.join(WD, "data", "%s.csv" % k))
    #          for k in ["demo", "gm", "wm", "csf"]}

    demo = pd.read_csv(os.path.join(WD, "data", "demo.csv"))
    gm = pd.read_csv(os.path.join(WD, "data", "gm.csv"))
    wm = pd.read_csv(os.path.join(WD, "data", "wm.csv"))
    csf = pd.read_csv(os.path.join(WD, "data", "csf.csv"))

    print("tables can be merge using shared columns")
    print(gm.head())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    tables can be merge using shared columns
      participant_id session    gm_vol
    0    sub-S1-0002  ses-01  0.672506
    1    sub-S1-0002  ses-02  0.678772
    2    sub-S1-0002  ses-03  0.665592
    3    sub-S1-0004  ses-01  0.890714
    4    sub-S1-0004  ses-02  0.881127


.. GENERATED FROM PYTHON SOURCE LINES 76-77

**Merge tables** according to `participant_id`

.. GENERATED FROM PYTHON SOURCE LINES 77-81

.. code-block:: Python


    brain_vol = pd.merge(pd.merge(pd.merge(demo, gm), wm), csf)
    assert brain_vol.shape == (808, 9)


.. GENERATED FROM PYTHON SOURCE LINES 82-83

**Drop rows with missing values**

.. GENERATED FROM PYTHON SOURCE LINES 83-87

.. code-block:: Python


    brain_vol = brain_vol.dropna()
    assert brain_vol.shape == (766, 9)


.. GENERATED FROM PYTHON SOURCE LINES 88-90

**Compute Total Intra-cranial volume**
`tiv_vol` = `gm_vol` + `csf_vol` + `wm_vol`.

.. GENERATED FROM PYTHON SOURCE LINES 90-94

.. code-block:: Python


    brain_vol["tiv_vol"] = brain_vol["gm_vol"] + \
        brain_vol["wm_vol"] + brain_vol["csf_vol"]


.. GENERATED FROM PYTHON SOURCE LINES 95-97

**Compute tissue fractions**
`gm_f = gm_vol / tiv_vol`, `wm_f  = wm_vol / tiv_vol`.

.. GENERATED FROM PYTHON SOURCE LINES 97-101

.. code-block:: Python


    brain_vol["gm_f"] = brain_vol["gm_vol"] / brain_vol["tiv_vol"]
    brain_vol["wm_f"] = brain_vol["wm_vol"] / brain_vol["tiv_vol"]


.. GENERATED FROM PYTHON SOURCE LINES 102-103

**Save in a excel file** `brain_vol.xlsx`

.. GENERATED FROM PYTHON SOURCE LINES 103-107

.. code-block:: Python


    brain_vol.to_excel(os.path.join(WD, "data", "brain_vol.xlsx"),
                       sheet_name='data', index=False)


.. GENERATED FROM PYTHON SOURCE LINES 108-110

Descriptive Statistics
----------------------

.. GENERATED FROM PYTHON SOURCE LINES 112-113

Load excel file `brain_vol.xlsx`

.. GENERATED FROM PYTHON SOURCE LINES 113-121

.. code-block:: Python


    brain_vol = pd.read_excel(os.path.join(WD, "data", "brain_vol.xlsx"),
                              sheet_name='data')
    # Round float at 2 decimals when printing
    pd.options.display.float_format = '{:,.2f}'.format


.. GENERATED FROM PYTHON SOURCE LINES 122-125

**Descriptive statistics**
Most of participants have several MRI sessions (column `session`)
Select on rows from session one "ses-01"

.. GENERATED FROM PYTHON SOURCE LINES 125-131

.. code-block:: Python


    brain_vol1 = brain_vol[brain_vol.session == "ses-01"]
    # Check that there are no duplicates
    assert len(brain_vol1.participant_id.unique()) == len(brain_vol1.participant_id)


.. GENERATED FROM PYTHON SOURCE LINES 132-133

Global descriptives statistics of numerical variables

.. GENERATED FROM PYTHON SOURCE LINES 133-138

.. code-block:: Python


    desc_glob_num = brain_vol1.describe()
    print(desc_glob_num)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

             age  gm_vol  wm_vol  csf_vol  tiv_vol   gm_f   wm_f
    count 244.00  244.00  244.00   244.00   244.00 244.00 244.00
    mean   34.54    0.71    0.44     0.31     1.46   0.49   0.30
    std    12.09    0.08    0.07     0.08     0.17   0.04   0.03
    min    18.00    0.48    0.05     0.12     0.83   0.37   0.06
    25%    25.00    0.66    0.40     0.25     1.34   0.46   0.28
    50%    31.00    0.70    0.43     0.30     1.45   0.49   0.30
    75%    44.00    0.77    0.48     0.37     1.57   0.52   0.31
    max    61.00    1.03    0.62     0.63     2.06   0.60   0.36


.. GENERATED FROM PYTHON SOURCE LINES 139-140

Global Descriptive statistics of categorical variable

.. GENERATED FROM PYTHON SOURCE LINES 140-150

.. code-block:: Python


    desc_glob_cat = brain_vol1[["site", "group", "sex"]].describe(include='all')
    print(desc_glob_cat)

    print("Get count by level")
    desc_glob_cat = pd.DataFrame({col: brain_vol1[col].value_counts().to_dict()
                                 for col in ["site", "group", "sex"]})
    print(desc_glob_cat)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           site    group  sex
    count   244      244  244
    unique    7        2    2
    top      S7  Patient    M
    freq     65      157  155
    Get count by level
             site  group    sex
    S7      65.00    NaN    NaN
    S5      62.00    NaN    NaN
    S8      59.00    NaN    NaN
    S3      29.00    NaN    NaN
    S4      15.00    NaN    NaN
    S1      13.00    NaN    NaN
    S6       1.00    NaN    NaN
    Patient   NaN 157.00    NaN
    Control   NaN  87.00    NaN
    M         NaN    NaN 155.00
    F         NaN    NaN  89.00


.. GENERATED FROM PYTHON SOURCE LINES 151-152

Remove the single participant from site 6

.. GENERATED FROM PYTHON SOURCE LINES 152-160

.. code-block:: Python


    brain_vol = brain_vol[brain_vol.site != "S6"]
    brain_vol1 = brain_vol[brain_vol.session == "ses-01"]
    desc_glob_cat = pd.DataFrame({col: brain_vol1[col].value_counts().to_dict()
                                 for col in ["site", "group", "sex"]})
    print(desc_glob_cat)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

             site  group    sex
    S7      65.00    NaN    NaN
    S5      62.00    NaN    NaN
    S8      59.00    NaN    NaN
    S3      29.00    NaN    NaN
    S4      15.00    NaN    NaN
    S1      13.00    NaN    NaN
    Patient   NaN 157.00    NaN
    Control   NaN  86.00    NaN
    M         NaN    NaN 155.00
    F         NaN    NaN  88.00


.. GENERATED FROM PYTHON SOURCE LINES 161-162

Descriptives statistics of numerical variables per clinical status

.. GENERATED FROM PYTHON SOURCE LINES 162-166

.. code-block:: Python

    desc_group_num = brain_vol1[["group", 'gm_vol']].groupby("group").describe()
    print(desc_group_num)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

            gm_vol                                   
             count mean  std  min  25%  50%  75%  max
    group                                            
    Control  86.00 0.72 0.09 0.48 0.66 0.71 0.78 1.03
    Patient 157.00 0.70 0.08 0.53 0.65 0.70 0.76 0.90


.. GENERATED FROM PYTHON SOURCE LINES 167-179

Statistics
----------

Objectives:

1. Site effect of gray matter atrophy
2. Test the association between the age and gray matter atrophy in the control
   and patient population independently.
3. Test for differences of atrophy between the patients and the controls
4. Test for interaction between age and clinical status, ie: is the brain
   atrophy process in patient population faster than in the control population.
5. The effect of the medication in the patient population.

.. GENERATED FROM PYTHON SOURCE LINES 182-193

**1 Site effect on Grey Matter atrophy**

The model  is Oneway Anova gm_f ~ site
The ANOVA test has important assumptions that must be satisfied in order
for the associated p-value to be valid.

- The samples are independent.
- Each sample is from a normally distributed population.
- The population standard deviations of the groups are all equal.
  This property is known as homoscedasticity.


.. GENERATED FROM PYTHON SOURCE LINES 195-196

Plot

.. GENERATED FROM PYTHON SOURCE LINES 196-201

.. code-block:: Python

    sns.violinplot(x="site", y="gm_f", data=brain_vol1)
    # sns.violinplot(x="site", y="wm_f", data=brain_vol1)

    brain_vol1.groupby('site')['age'].describe()


.. image-sg:: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_001.png
   :alt: stat univ lab brain volume
   :srcset: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_001.png
   :class: sphx-glr-single-img


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>count</th>
          <th>mean</th>
          <th>std</th>
          <th>min</th>
          <th>25%</th>
          <th>50%</th>
          <th>75%</th>
          <th>max</th>
        </tr>
        <tr>
          <th>site</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>S1</th>
          <td>13.00</td>
          <td>32.31</td>
          <td>11.59</td>
          <td>20.00</td>
          <td>25.00</td>
          <td>29.00</td>
          <td>33.00</td>
          <td>57.00</td>
        </tr>
        <tr>
          <th>S3</th>
          <td>29.00</td>
          <td>38.28</td>
          <td>13.09</td>
          <td>18.00</td>
          <td>25.00</td>
          <td>37.00</td>
          <td>51.00</td>
          <td>59.00</td>
        </tr>
        <tr>
          <th>S4</th>
          <td>15.00</td>
          <td>37.20</td>
          <td>13.78</td>
          <td>20.00</td>
          <td>23.50</td>
          <td>37.00</td>
          <td>50.00</td>
          <td>56.00</td>
        </tr>
        <tr>
          <th>S5</th>
          <td>62.00</td>
          <td>35.42</td>
          <td>12.61</td>
          <td>18.00</td>
          <td>25.25</td>
          <td>31.50</td>
          <td>44.00</td>
          <td>61.00</td>
        </tr>
        <tr>
          <th>S7</th>
          <td>65.00</td>
          <td>34.51</td>
          <td>11.85</td>
          <td>19.00</td>
          <td>24.00</td>
          <td>31.00</td>
          <td>45.00</td>
          <td>59.00</td>
        </tr>
        <tr>
          <th>S8</th>
          <td>59.00</td>
          <td>31.59</td>
          <td>10.69</td>
          <td>18.00</td>
          <td>24.00</td>
          <td>29.00</td>
          <td>34.50</td>
          <td>60.00</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 202-203

Stats with scipy

.. GENERATED FROM PYTHON SOURCE LINES 203-208

.. code-block:: Python


    fstat, pval = scipy.stats.f_oneway(*[brain_vol1.gm_f[brain_vol1.site == s]
                                       for s in brain_vol1.site.unique()])
    print("Oneway Anova gm_f ~ site F=%.2f, p-value=%E" % (fstat, pval))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Oneway Anova gm_f ~ site F=14.82, p-value=1.188136E-12


.. GENERATED FROM PYTHON SOURCE LINES 209-210

Stats with statsmodels

.. GENERATED FROM PYTHON SOURCE LINES 210-218

.. code-block:: Python


    anova = smfrmla.ols("gm_f ~ site", data=brain_vol1).fit()
    # print(anova.summary())
    print("Site explains %.2f%% of the grey matter fraction variance" %
          (anova.rsquared * 100))

    print(sm.stats.anova_lm(anova, typ=2))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Site explains 23.82% of the grey matter fraction variance
              sum_sq     df     F  PR(>F)
    site        0.11   5.00 14.82    0.00
    Residual    0.35 237.00   NaN     NaN


.. GENERATED FROM PYTHON SOURCE LINES 219-221

**2. Test the association between the age and gray matter atrophy** in the
control and patient population independently.

.. GENERATED FROM PYTHON SOURCE LINES 223-224

Plot

.. GENERATED FROM PYTHON SOURCE LINES 224-230

.. code-block:: Python


    sns.lmplot(x="age", y="gm_f", hue="group", data=brain_vol1)

    brain_vol1_ctl = brain_vol1[brain_vol1.group == "Control"]
    brain_vol1_pat = brain_vol1[brain_vol1.group == "Patient"]


.. image-sg:: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_002.png
   :alt: stat univ lab brain volume
   :srcset: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 231-232

Stats with scipy

.. GENERATED FROM PYTHON SOURCE LINES 232-251

.. code-block:: Python


    print("--- In control population ---")
    beta, beta0, r_value, p_value, std_err = \
        scipy.stats.linregress(x=brain_vol1_ctl.age, y=brain_vol1_ctl.gm_f)

    print("gm_f = %f * age + %f" % (beta, beta0))
    print("Corr: %f, r-squared: %f, p-value: %f, std_err: %f"\
          % (r_value, r_value**2, p_value, std_err))

    print("--- In patient population ---")
    beta, beta0, r_value, p_value, std_err = \
        scipy.stats.linregress(x=brain_vol1_pat.age, y=brain_vol1_pat.gm_f)

    print("gm_f = %f * age + %f" % (beta, beta0))
    print("Corr: %f, r-squared: %f, p-value: %f, std_err: %f"\
          % (r_value, r_value**2, p_value, std_err))

    print("Decrease seems faster in patient than in control population")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    --- In control population ---
    gm_f = -0.001181 * age + 0.529829
    Corr: -0.325122, r-squared: 0.105704, p-value: 0.002255, std_err: 0.000375
    --- In patient population ---
    gm_f = -0.001899 * age + 0.556886
    Corr: -0.528765, r-squared: 0.279592, p-value: 0.000000, std_err: 0.000245
    Decrease seems faster in patient than in control population


.. GENERATED FROM PYTHON SOURCE LINES 252-253

Stats with statsmodels

.. GENERATED FROM PYTHON SOURCE LINES 253-266

.. code-block:: Python


    print("--- In control population ---")
    lr = smfrmla.ols("gm_f ~ age", data=brain_vol1_ctl).fit()
    print(lr.summary())
    print("Age explains %.2f%% of the grey matter fraction variance" %
          (lr.rsquared * 100))

    print("--- In patient population ---")
    lr = smfrmla.ols("gm_f ~ age", data=brain_vol1_pat).fit()
    print(lr.summary())
    print("Age explains %.2f%% of the grey matter fraction variance" %
          (lr.rsquared * 100))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    --- In control population ---
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                   gm_f   R-squared:                       0.106
    Model:                            OLS   Adj. R-squared:                  0.095
    Method:                 Least Squares   F-statistic:                     9.929
    Date:               jeu., 08 mai 2025   Prob (F-statistic):            0.00226
    Time:                        02:16:34   Log-Likelihood:                 159.34
    No. Observations:                  86   AIC:                            -314.7
    Df Residuals:                      84   BIC:                            -309.8
    Df Model:                           1                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    Intercept      0.5298      0.013     40.350      0.000       0.504       0.556
    age           -0.0012      0.000     -3.151      0.002      -0.002      -0.000
    ==============================================================================
    Omnibus:                        0.946   Durbin-Watson:                   1.628
    Prob(Omnibus):                  0.623   Jarque-Bera (JB):                0.782
    Skew:                           0.233   Prob(JB):                        0.676
    Kurtosis:                       2.962   Cond. No.                         111.
    ==============================================================================

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    Age explains 10.57% of the grey matter fraction variance
    --- In patient population ---
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                   gm_f   R-squared:                       0.280
    Model:                            OLS   Adj. R-squared:                  0.275
    Method:                 Least Squares   F-statistic:                     60.16
    Date:               jeu., 08 mai 2025   Prob (F-statistic):           1.09e-12
    Time:                        02:16:34   Log-Likelihood:                 289.38
    No. Observations:                 157   AIC:                            -574.8
    Df Residuals:                     155   BIC:                            -568.7
    Df Model:                           1                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    Intercept      0.5569      0.009     60.817      0.000       0.539       0.575
    age           -0.0019      0.000     -7.756      0.000      -0.002      -0.001
    ==============================================================================
    Omnibus:                        2.310   Durbin-Watson:                   1.325
    Prob(Omnibus):                  0.315   Jarque-Bera (JB):                1.854
    Skew:                           0.230   Prob(JB):                        0.396
    Kurtosis:                       3.268   Cond. No.                         111.
    ==============================================================================

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    Age explains 27.96% of the grey matter fraction variance


.. GENERATED FROM PYTHON SOURCE LINES 267-270

Before testing for differences of atrophy between the patients ans the controls
**Preliminary tests for age x group effect** (patients would be older or
younger than Controls)

.. GENERATED FROM PYTHON SOURCE LINES 272-273

Plot

.. GENERATED FROM PYTHON SOURCE LINES 273-276

.. code-block:: Python


    sns.violinplot(x="group", y="age", data=brain_vol1)


.. image-sg:: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_003.png
   :alt: stat univ lab brain volume
   :srcset: /auto_gallery/images/sphx_glr_stat_univ_lab_brain-volume_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: xlabel='group', ylabel='age'>


.. GENERATED FROM PYTHON SOURCE LINES 277-278

Stats with scipy

.. GENERATED FROM PYTHON SOURCE LINES 278-281

.. code-block:: Python


    print(scipy.stats.ttest_ind(brain_vol1_ctl.age, brain_vol1_pat.age))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TtestResult(statistic=np.float64(-1.2155557697674162), pvalue=np.float64(0.225343592508479), df=np.float64(241.0))


.. GENERATED FROM PYTHON SOURCE LINES 282-283

Stats with statsmodels

.. GENERATED FROM PYTHON SOURCE LINES 283-287

.. code-block:: Python


    print(smfrmla.ols("age ~ group", data=brain_vol1).fit().summary())
    print("No significant difference in age between patients and controls")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                    age   R-squared:                       0.006
    Model:                            OLS   Adj. R-squared:                  0.002
    Method:                 Least Squares   F-statistic:                     1.478
    Date:               jeu., 08 mai 2025   Prob (F-statistic):              0.225
    Time:                        02:16:34   Log-Likelihood:                -949.69
    No. Observations:                 243   AIC:                             1903.
    Df Residuals:                     241   BIC:                             1910.
    Df Model:                           1                                         
    Covariance Type:            nonrobust                                         
    ====================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------------
    Intercept           33.2558      1.305     25.484      0.000      30.685      35.826
    group[T.Patient]     1.9735      1.624      1.216      0.225      -1.225       5.172
    ==============================================================================
    Omnibus:                       35.711   Durbin-Watson:                   2.096
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):               20.726
    Skew:                           0.569   Prob(JB):                     3.16e-05
    Kurtosis:                       2.133   Cond. No.                         3.12
    ==============================================================================

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    No significant difference in age between patients and controls


.. GENERATED FROM PYTHON SOURCE LINES 288-290

**Preliminary tests for sex x group** (more/less males in patients than
in Controls)

.. GENERATED FROM PYTHON SOURCE LINES 290-300

.. code-block:: Python


    crosstab = pd.crosstab(brain_vol1.sex, brain_vol1.group)
    print("Obeserved contingency table")
    print(crosstab)

    chi2, pval, dof, expected = scipy.stats.chi2_contingency(crosstab)

    print("Chi2 = %f, pval = %f" % (chi2, pval))
    print("No significant difference in sex between patients and controls")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Obeserved contingency table
    group  Control  Patient
    sex                    
    F           33       55
    M           53      102
    Chi2 = 0.143253, pval = 0.705068
    No significant difference in sex between patients and controls


.. GENERATED FROM PYTHON SOURCE LINES 301-302

**3. Test for differences of atrophy between the patients and the controls**

.. GENERATED FROM PYTHON SOURCE LINES 302-307

.. code-block:: Python


    print(sm.stats.anova_lm(smfrmla.ols("gm_f ~ group", data=brain_vol1).fit(),
                            typ=2))
    print("No significant difference in atrophy between patients and controls")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              sum_sq     df    F  PR(>F)
    group       0.00   1.00 0.01    0.92
    Residual    0.46 241.00  NaN     NaN
    No significant difference in atrophy between patients and controls


.. GENERATED FROM PYTHON SOURCE LINES 308-309

This model is simplistic we should adjust for age and site

.. GENERATED FROM PYTHON SOURCE LINES 309-314

.. code-block:: Python


    print(sm.stats.anova_lm(smfrmla.ols(
            "gm_f ~ group + age + site", data=brain_vol1).fit(), typ=2))
    print("No significant difference in GM between patients and controls")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              sum_sq     df     F  PR(>F)
    group       0.00   1.00  1.82    0.18
    site        0.11   5.00 19.79    0.00
    age         0.09   1.00 86.86    0.00
    Residual    0.25 235.00   NaN     NaN
    No significant difference in GM between patients and controls


.. GENERATED FROM PYTHON SOURCE LINES 315-316

Observe age effect

.. GENERATED FROM PYTHON SOURCE LINES 318-320

**4. Test for interaction between age and clinical status**, ie: is the brain
atrophy process in patient population faster than in the control population.

.. GENERATED FROM PYTHON SOURCE LINES 320-332

.. code-block:: Python


    ancova = smfrmla.ols("gm_f ~ group:age + age + site", data=brain_vol1).fit()
    print(sm.stats.anova_lm(ancova, typ=2))

    print("= Parameters =")
    print(ancova.params)

    print("%.3f%% of grey matter loss per year (almost %.1f%% per decade)" %
          (ancova.params.age * 100, ancova.params.age * 100 * 10))

    print("grey matter loss in patients is accelerated by %.3f%% per decade" %
          (ancova.params['group[T.Patient]:age'] * 100 * 10))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

               sum_sq     df     F  PR(>F)
    site         0.11   5.00 20.28    0.00
    age          0.10   1.00 89.37    0.00
    group:age    0.00   1.00  3.28    0.07
    Residual     0.25 235.00   NaN     NaN
    = Parameters =
    Intercept               0.52
    site[T.S3]              0.01
    site[T.S4]              0.03
    site[T.S5]              0.01
    site[T.S7]              0.06
    site[T.S8]              0.02
    age                    -0.00
    group[T.Patient]:age   -0.00
    dtype: float64
    -0.148% of grey matter loss per year (almost -1.5% per decade)
    grey matter loss in patients is accelerated by -0.232% per decade


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 2.668 seconds)


.. _sphx_glr_download_auto_gallery_stat_univ_lab_brain-volume.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: stat_univ_lab_brain-volume.ipynb <stat_univ_lab_brain-volume.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: stat_univ_lab_brain-volume.py <stat_univ_lab_brain-volume.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: stat_univ_lab_brain-volume.zip <stat_univ_lab_brain-volume.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_