Fix redirected/broken links

damien2012eng · desilinguist · commit c1dd2f59da4c · 2023-06-22T13:22:29.000-04:00
diff --git a/doc/config_rsmeval.rst.inc b/doc/config_rsmeval.rst.inc
@@ -94,7 +94,7 @@ RSMTool provides pre-defined sections for ``rsmeval`` (listed below) and, by def
 
     - ``evaluation by group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
 
-    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The notebook shows:
+    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_. The notebook shows:
 
         - percentage of variance in squared error explained by subgroup membership
         - percentage of variance in raw (signed) error explained by subgroup membership
diff --git a/doc/config_rsmtool.rst.inc b/doc/config_rsmtool.rst.inc
@@ -148,7 +148,7 @@ RSMTool provides pre-defined sections for ``rsmtool`` (listed below) and, by def
 
     - ``evaluation_by_group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
 
-    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`__. The notebook shows:
+    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`__. The notebook shows:
 
         - percentage of variance in squared error explained by subgroup membership
         - percentage of variance in raw (signed) error  error explained by subgroup membership
diff --git a/doc/contributing.rst b/doc/contributing.rst
@@ -66,7 +66,7 @@ The RSMTool codebase enforces a certain code style via pre-commit checks and thi
 
     Rather than doing this grouping and sorting manually, we use the `isort <https://pycqa.github.io/isort/>`_ pre-commit hook to achieve this.
 
-#. All classes, functions, and methods in the main code files have `numpy-formatted docstrings <https://numpydoc.readthedocs.io/en/latest/format.html>`_ that comply with `PEP 257 <https://www.python.org/dev/peps/pep-0257/>`_. This is enforced via the `pydocstyle <http://www.pydocstyle.org/en/stable/>`_ pre-commit check. Additionally, when writing docstrings, make sure to use the appropriate quotes when referring to argument names vs. argument values. As an example, consider the docstring for the `train_skll_model <https://rsmtool.readthedocs.io/en/main/api.html#rsmtool.modeler.Modeler.train_skll_model>`_  method of the ``rsmtool.modeler.Modeler`` class. Note that string argument values are enclosed in double quotes (e.g., "csv", "neg_mean_squared_error") whereas values of other built-in types are written as literals (e.g., ``True``, ``False``, ``None``). Note also that if one had to refer to an argument name in the docstring, this referent should be written as a literal. In general, we strongly encourage looking at the docstrings in the existing code to make sure that new docstrings follow the same practices.
+#. All classes, functions, and methods in the main code files have `numpy-formatted docstrings <https://numpydoc.readthedocs.io/en/latest/format.html>`_ that comply with `PEP 257 <https://peps.python.org/pep-0257/>`_. This is enforced via the `pydocstyle <http://www.pydocstyle.org/en/stable/>`_ pre-commit check. Additionally, when writing docstrings, make sure to use the appropriate quotes when referring to argument names vs. argument values. As an example, consider the docstring for the `train_skll_model <https://rsmtool.readthedocs.io/en/main/api.html#rsmtool.modeler.Modeler.train_skll_model>`_  method of the ``rsmtool.modeler.Modeler`` class. Note that string argument values are enclosed in double quotes (e.g., "csv", "neg_mean_squared_error") whereas values of other built-in types are written as literals (e.g., ``True``, ``False``, ``None``). Note also that if one had to refer to an argument name in the docstring, this referent should be written as a literal. In general, we strongly encourage looking at the docstrings in the existing code to make sure that new docstrings follow the same practices.
 
 RSMTool tests
 -------------
diff --git a/doc/evaluation.rst b/doc/evaluation.rst
@@ -110,7 +110,7 @@ Note that in this case the variances and covariance are computed by dividing by
 
 QWK is computed using :ref:`rsmtool.utils.quadratic_weighted_kappa<qwk_api>` with ``ddof`` set to ``0``.
 
-See `Haberman (2019) <https://onlinelibrary.wiley.com/doi/abs/10.1002/ets2.12258>`_ for the full derivation of this formula. The discrete case is simply treated as a special case of the continuous one.
+See `Haberman (2019) <https://www.sciencedirect.com/science/article/pii/S0093691X10000233>`_ for the full derivation of this formula. The discrete case is simply treated as a special case of the continuous one.
 
 .. note::
 
@@ -149,7 +149,7 @@ SMD between system and human scores is computed using :ref:`rsmtool.utils.standa
 
 .. note::
 
-	In RSMTool v6.x and earlier SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://onlinelibrary.wiley.com/doi/full/10.1111/j.1745-3992.2011.00223.x>`_.  The values computed by RSMTool starting with v7.0 will be *different* from those computed by earlier versions.
+	In RSMTool v6.x and earlier SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://eric.ed.gov/?id=EJ959585>`_.  The values computed by RSMTool starting with v7.0 will be *different* from those computed by earlier versions.
 
 
 .. _mse:
@@ -179,7 +179,7 @@ Accuracy Metrics (True score)
 
 According to test theory, an observed score is a combination of the true score :math:`T` and a measurement error. The true score cannot be observed, but its distribution parameters can be estimated from observed scores. Such an estimation requires that two human scores be available for *at least a* subset of responses in the evaluation set since these are necessary to estimate the measurement error component.
 
-Evaluating system against true score produces performance estimates that are robust to errors in human scores and remain stable even when human-human agreeement varies (see `Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2/>`_.
+Evaluating system against true score produces performance estimates that are robust to errors in human scores and remain stable even when human-human agreeement varies (see `Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2/>`_.
 
 The true score evaluations computed by RSMTool are available in the :ref:`intermediate file<rsmtool_true_score_eval>` ``true_score_eval``.
 
@@ -208,7 +208,7 @@ and :math:`\sigma_T^2` is estimated as:
 
    :math:`\sigma_T^2 = \sigma_{\hat{H}}^2 - \displaystyle\frac{1}{2}\sigma_{e}^2`
 
-The PRMSE formula implemented in RSMTool is more general and can also handle the case where the number of available ratings varies across the responses (e.g.  **only a subset of responses is double-scored**). While ``rsmtool`` and ``rsmeval`` only support evaluations with two raters, the implementation of the PRMSE formula available via the :ref:`API<prmse_api>` supports cases where some of the responses have **more than two** ratings available. The formula was derived by Matt S. Johnson and is explained in more detail in `Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2/>`_.
+The PRMSE formula implemented in RSMTool is more general and can also handle the case where the number of available ratings varies across the responses (e.g.  **only a subset of responses is double-scored**). While ``rsmtool`` and ``rsmeval`` only support evaluations with two raters, the implementation of the PRMSE formula available via the :ref:`API<prmse_api>` supports cases where some of the responses have **more than two** ratings available. The formula was derived by Matt S. Johnson and is explained in more detail in `Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2/>`_.
 
 In this case, the variance of rater errors is computed as a pooled variance estimator.
 
@@ -262,7 +262,7 @@ In some cases, it may be appropriate to compute variance of human errors using a
 Fairness
 ~~~~~~~~
 
-Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://www.aclweb.org/anthology/W17-1605/>`_).
+Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://aclanthology.org/W17-1605/>`_).
 
 When defining an experiment, the RSMTool user has the option of specifying which subgroups should be considered for such evaluations using :ref:`subgroups<subgroups_rsmtool>` field. These subgroups are then used in all fairness evaluations.
 
@@ -308,7 +308,7 @@ DSM is computed using :ref:`rsmtool.utils.difference_of_standardized_means<dsm_a
 Additional fairness evaluations
 +++++++++++++++++++++++++++++++
 
-Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
+Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
 
 These include:
 
@@ -372,4 +372,4 @@ Therefore, SMD between two human scores is computed using :ref:`rsmtool.utils.st
 
 .. note::
 
-	In RSMTool v6.x and earlier, SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://onlinelibrary.wiley.com/doi/full/10.1111/j.1745-3992.2011.00223.x>`_.  Starting with v7.0, the values computed by RSMTool will be *different* from those computed by earlier versions.
+	In RSMTool v6.x and earlier, SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://eric.ed.gov/?id=EJ959585>`_.  Starting with v7.0, the values computed by RSMTool will be *different* from those computed by earlier versions.
diff --git a/doc/index.rst b/doc/index.rst
@@ -15,7 +15,7 @@ Rater Scoring Modeling Tool (RSMTool)
 
     .. image:: assets/spacer.png
 
-Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `MI Write <https://measurementinc.com/miwrite>`_ for written responses and `SpeechRater <https://www.ets.org/research/policy_research_reports/publications/report/2008/hukv>`_ for spoken responses.
+Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `MI Write <https://measurementinc.com/miwrite>`_ for written responses and `SpeechRater <https://www.ets.org/research/policy_research_reports/publications/report/2008/hukv.html>`_ for spoken responses.
 
 RSMTool is a python package which automates and combines in a *single* :doc:`pipeline <pipeline>` multiple analyses that are commonly conducted when building and evaluating automated scoring models. The output of RSMTool is a comprehensive, customizable HTML statistical report that contains the outputs of these multiple analyses. While RSMTool does make it really simple to run this set of standard analyses using a single command, it is also fully customizable and allows users to easily exclude unneeded analyses, modify the standard analyses, and even include custom analyses in the report.
 
diff --git a/doc/intermediate_files_rsmeval.rst.inc b/doc/intermediate_files_rsmeval.rst.inc
@@ -136,7 +136,7 @@ Evaluations based on test theory
 Additional fairness analyses
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
+These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_.
 
 - ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_eval>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference).
 
diff --git a/doc/intermediate_files_rsmtool.rst.inc b/doc/intermediate_files_rsmtool.rst.inc
@@ -260,7 +260,7 @@ Evaluations based on test theory
 Additional fairness analyses
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
+These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_.
 
 - ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_rsmtool>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference).
 
diff --git a/doc/internal/release_process.rst b/doc/internal/release_process.rst
@@ -34,7 +34,7 @@ This process is only meant for the project administrators, not users and develop
 
 #. Build the PyPI source and wheel distributions using ``python setup.py sdist build`` and ``python setup.py bdist_wheel build`` respectively. Note that you should delete the ``build`` directory after running the ``sdist`` command and before running the ``bdist_wheel`` command.
 
-#. Upload the source and wheel distributions to TestPyPI using ``twine upload --repository testpypi dist/*``. You will need to have the ``twine`` package installed and set up your ``$HOME/.pypirc`` correctly. See details `here <https://packaging.python.org/guides/using-testpypi/>`__. You will need to have the appropriate permissions for the ``ets`` organization on TestPyPI.
+#. Upload the source and wheel distributions to TestPyPI using ``twine upload --repository testpypi dist/*``. You will need to have the ``twine`` package installed and set up your ``$HOME/.pypirc`` correctly. See details `here <https://packaging.python.org/en/latest/guides/using-testpypi/>`__. You will need to have the appropriate permissions for the ``ets`` organization on TestPyPI.
 
 #. Install the TestPyPI package as follows::
 
diff --git a/doc/who.rst b/doc/who.rst
@@ -5,7 +5,7 @@ Who is RSMTool for?
 
 We expect the primary users of RSMTool to be researchers working on developing new automated scoring engines or on improving existing ones. Here's the most common scenario.
 
-A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <http://cohmetrix.com/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://www.audeering.com/research/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
+A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <https://soletlab.asu.edu/coh-metrix/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://www.audeering.com/research/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
 
 They can then run an RSMTool "experiment" to build a regression-based scoring model (using one of many available regressors) and produce a report. The report includes descriptive statistics for all their features, diagnostic information about the trained regression model, and a comprehensive evaluation of model performance on a held-out set of responses.
 
diff --git a/rsmtool/fairness_utils.py b/rsmtool/fairness_utils.py
@@ -167,7 +167,7 @@ def get_fairness_analyses(
     df, group, system_score_column, human_score_column="sc1", base_group=None
 ):
     """
-    Compute analyses from `Loukina et al. 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
+    Compute analyses from `Loukina et al. 2019 <https://aclanthology.org/W19-4401/>`_.
 
     The function computes how much variance group membership explains in
     overall score accuracy (osa), overall score difference (osd),
diff --git a/rsmtool/modeler.py b/rsmtool/modeler.py
@@ -357,7 +357,8 @@ def train_rebalanced_lr(self, df_train, feature_columns):
         Train a "RebalancedLR" model.
 
         This model balances empirical weights by changing betas (adapted
-        from `here <https://stats.stackexchange.com/q/30876>`_).
+        from `here <https://stats.stackexchange.com/questions/30876/
+        how-to-convert-standardized-coefficients-to-unstandardized-coefficients>`_).
 
         Parameters
         ----------
diff --git a/rsmtool/utils/metrics.py b/rsmtool/utils/metrics.py
@@ -398,15 +398,13 @@ def difference_of_standardized_means(
     # We only check for mean since the function requires
     # both of these to be set or both to be None
     if population_y_true_observed_mn is None:
-
         warnings.warn(warning_msg.format("y_true_observed"))
         (population_y_true_observed_sd, population_y_true_observed_mn) = (
             np.std(y_true_observed, ddof=ddof),
             np.mean(y_true_observed),
         )
 
     if population_y_pred_mn is None:
-
         warnings.warn(warning_msg.format("y_pred"))
         (population_y_pred_sd, population_y_pred_mn) = (
             np.std(y_pred, ddof=ddof),
@@ -446,7 +444,8 @@ def quadratic_weighted_kappa(y_true_observed, y_pred, ddof=0):  # noqa: D301
 
     The formula to compute quadratic-weighted kappa for continuous values
     was developed at ETS by Shelby Haberman.
-    See `Haberman (2019) <https://onlinelibrary.wiley.com/doi/abs/10.1002/ets2.12258>`_
+    See `Haberman (2019) <https://eric.ed.gov/?q=Measures+of+Agreement+Versus
+    +Measures+of+Prediction+Accuracy&id=EJ1238497>`_
     for the full derivation. The discrete case is simply treated as a
     special case of the continuous one.
 
diff --git a/rsmtool/utils/prmse.py b/rsmtool/utils/prmse.py
@@ -192,7 +192,6 @@ def mse_true(system, human_scores, variance_errors_human=None):
         return None
 
     else:
-
         # get total number of scores for each response
         n_scores = get_n_human_scores(human_scores)
         mean_scores = np.nanmean(human_scores, axis=1)
@@ -213,7 +212,7 @@ def prmse_true(system, human_scores, variance_errors_human=None):
     PRMSE = Proportional Reduction in Mean Squared Error.
     The formula to compute PRMSE implemented in RSMTool
     was derived at ETS by Matthew S. Johnson. See
-    `Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2.pdf>`_
+    `Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2.pdf>`_
     for further information about PRMSE.
 
     Parameters
@@ -253,7 +252,6 @@ def prmse_true(system, human_scores, variance_errors_human=None):
         return None
 
     else:
-
         variance_true = true_score_variance(human_scores, variance_errors_human)
 
         mse = mse_true(system, human_scores, variance_errors_human)