.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_packages_statistics_auto_examples_solutions_plot_brain_size.py: Relating Gender and IQ ======================= Going back to the brain size + IQ data, test if the VIQ of male and female are different after removing the effect of brain size, height and weight. Notice that here 'Gender' is a categorical value. As it is a non-float data type, statsmodels is able to automatically infer this. .. code-block:: python import pandas from statsmodels.formula.api import ols data = pandas.read_csv('../brain_size.csv', sep=';', na_values='.') model = ols('VIQ ~ Gender + MRI_Count + Height', data).fit() print(model.summary()) # Here, we don't need to define a contrast, as we are testing a single # coefficient of our model, and not a combination of coefficients. # However, defining a contrast, which would then be a 'unit contrast', # will give us the same results print(model.f_test([0, 1, 0, 0])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none OLS Regression Results ============================================================================== Dep. Variable: VIQ R-squared: 0.246 Model: OLS Adj. R-squared: 0.181 Method: Least Squares F-statistic: 3.809 Date: Thu, 18 Aug 2022 Prob (F-statistic): 0.0184 Time: 10:40:21 Log-Likelihood: -172.34 No. Observations: 39 AIC: 352.7 Df Residuals: 35 BIC: 359.3 Df Model: 3 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- Intercept 166.6258 88.824 1.876 0.069 -13.696 346.948 Gender[T.Male] 8.8524 10.710 0.827 0.414 -12.890 30.595 MRI_Count 0.0002 6.46e-05 2.615 0.013 3.78e-05 0.000 Height -3.0837 1.276 -2.417 0.021 -5.674 -0.494 ============================================================================== Omnibus: 7.373 Durbin-Watson: 2.109 Prob(Omnibus): 0.025 Jarque-Bera (JB): 2.252 Skew: 0.005 Prob(JB): 0.324 Kurtosis: 1.823 Cond. No. 2.40e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 2.4e+07. This might indicate that there are strong multicollinearity or other numerical problems. Here we plot a scatter matrix to get intuitions on our results. This goes beyond what was asked in the exercise .. code-block:: python # This plotting is useful to get an intuitions on the relationships between # our different variables from pandas.tools import plotting import matplotlib.pyplot as plt # Fill in the missing values for Height for plotting data['Height'].fillna(method='pad', inplace=True) # The parameter 'c' is passed to plt.scatter and will control the color # The same holds for parameters 'marker', 'alpha' and 'cmap', that # control respectively the type of marker used, their transparency and # the colormap plotting.scatter_matrix(data[['VIQ', 'MRI_Count', 'Height']], c=(data['Gender'] == 'Female'), marker='o', alpha=1, cmap='winter') fig = plt.gcf() fig.suptitle("blue: male, green: female", size=13) plt.show() .. image:: /packages/statistics/auto_examples/solutions/images/sphx_glr_plot_brain_size_001.png :class: sphx-glr-single-img **Total running time of the script:** ( 0 minutes 0.328 seconds) .. _sphx_glr_download_packages_statistics_auto_examples_solutions_plot_brain_size.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Python source code: plot_brain_size.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: plot_brain_size.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_