Skip to content

Commit c1dd2f5

Browse files
damien2012engdesilinguist
authored andcommitted
Fix redirected/broken links
1 parent ae8af05 commit c1dd2f5

13 files changed

+21
-23
lines changed

doc/config_rsmeval.rst.inc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ RSMTool provides pre-defined sections for ``rsmeval`` (listed below) and, by def
9494

9595
- ``evaluation by group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
9696

97-
- ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The notebook shows:
97+
- ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_. The notebook shows:
9898

9999
- percentage of variance in squared error explained by subgroup membership
100100
- percentage of variance in raw (signed) error explained by subgroup membership

doc/config_rsmtool.rst.inc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ RSMTool provides pre-defined sections for ``rsmtool`` (listed below) and, by def
148148
149149
- ``evaluation_by_group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
150150
151-
- ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`__. The notebook shows:
151+
- ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`__. The notebook shows:
152152
153153
- percentage of variance in squared error explained by subgroup membership
154154
- percentage of variance in raw (signed) error error explained by subgroup membership

doc/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ The RSMTool codebase enforces a certain code style via pre-commit checks and thi
6666
6767
Rather than doing this grouping and sorting manually, we use the `isort <https://pycqa.github.io/isort/>`_ pre-commit hook to achieve this.
6868

69-
#. All classes, functions, and methods in the main code files have `numpy-formatted docstrings <https://numpydoc.readthedocs.io/en/latest/format.html>`_ that comply with `PEP 257 <https://www.python.org/dev/peps/pep-0257/>`_. This is enforced via the `pydocstyle <http://www.pydocstyle.org/en/stable/>`_ pre-commit check. Additionally, when writing docstrings, make sure to use the appropriate quotes when referring to argument names vs. argument values. As an example, consider the docstring for the `train_skll_model <https://rsmtool.readthedocs.io/en/main/api.html#rsmtool.modeler.Modeler.train_skll_model>`_ method of the ``rsmtool.modeler.Modeler`` class. Note that string argument values are enclosed in double quotes (e.g., "csv", "neg_mean_squared_error") whereas values of other built-in types are written as literals (e.g., ``True``, ``False``, ``None``). Note also that if one had to refer to an argument name in the docstring, this referent should be written as a literal. In general, we strongly encourage looking at the docstrings in the existing code to make sure that new docstrings follow the same practices.
69+
#. All classes, functions, and methods in the main code files have `numpy-formatted docstrings <https://numpydoc.readthedocs.io/en/latest/format.html>`_ that comply with `PEP 257 <https://peps.python.org/pep-0257/>`_. This is enforced via the `pydocstyle <http://www.pydocstyle.org/en/stable/>`_ pre-commit check. Additionally, when writing docstrings, make sure to use the appropriate quotes when referring to argument names vs. argument values. As an example, consider the docstring for the `train_skll_model <https://rsmtool.readthedocs.io/en/main/api.html#rsmtool.modeler.Modeler.train_skll_model>`_ method of the ``rsmtool.modeler.Modeler`` class. Note that string argument values are enclosed in double quotes (e.g., "csv", "neg_mean_squared_error") whereas values of other built-in types are written as literals (e.g., ``True``, ``False``, ``None``). Note also that if one had to refer to an argument name in the docstring, this referent should be written as a literal. In general, we strongly encourage looking at the docstrings in the existing code to make sure that new docstrings follow the same practices.
7070

7171
RSMTool tests
7272
-------------

doc/evaluation.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ Note that in this case the variances and covariance are computed by dividing by
110110

111111
QWK is computed using :ref:`rsmtool.utils.quadratic_weighted_kappa<qwk_api>` with ``ddof`` set to ``0``.
112112

113-
See `Haberman (2019) <https://onlinelibrary.wiley.com/doi/abs/10.1002/ets2.12258>`_ for the full derivation of this formula. The discrete case is simply treated as a special case of the continuous one.
113+
See `Haberman (2019) <https://www.sciencedirect.com/science/article/pii/S0093691X10000233>`_ for the full derivation of this formula. The discrete case is simply treated as a special case of the continuous one.
114114

115115
.. note::
116116

@@ -149,7 +149,7 @@ SMD between system and human scores is computed using :ref:`rsmtool.utils.standa
149149

150150
.. note::
151151

152-
In RSMTool v6.x and earlier SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://onlinelibrary.wiley.com/doi/full/10.1111/j.1745-3992.2011.00223.x>`_. The values computed by RSMTool starting with v7.0 will be *different* from those computed by earlier versions.
152+
In RSMTool v6.x and earlier SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://eric.ed.gov/?id=EJ959585>`_. The values computed by RSMTool starting with v7.0 will be *different* from those computed by earlier versions.
153153

154154

155155
.. _mse:
@@ -179,7 +179,7 @@ Accuracy Metrics (True score)
179179

180180
According to test theory, an observed score is a combination of the true score :math:`T` and a measurement error. The true score cannot be observed, but its distribution parameters can be estimated from observed scores. Such an estimation requires that two human scores be available for *at least a* subset of responses in the evaluation set since these are necessary to estimate the measurement error component.
181181

182-
Evaluating system against true score produces performance estimates that are robust to errors in human scores and remain stable even when human-human agreeement varies (see `Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2/>`_.
182+
Evaluating system against true score produces performance estimates that are robust to errors in human scores and remain stable even when human-human agreeement varies (see `Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2/>`_.
183183

184184
The true score evaluations computed by RSMTool are available in the :ref:`intermediate file<rsmtool_true_score_eval>` ``true_score_eval``.
185185

@@ -208,7 +208,7 @@ and :math:`\sigma_T^2` is estimated as:
208208

209209
:math:`\sigma_T^2 = \sigma_{\hat{H}}^2 - \displaystyle\frac{1}{2}\sigma_{e}^2`
210210

211-
The PRMSE formula implemented in RSMTool is more general and can also handle the case where the number of available ratings varies across the responses (e.g. **only a subset of responses is double-scored**). While ``rsmtool`` and ``rsmeval`` only support evaluations with two raters, the implementation of the PRMSE formula available via the :ref:`API<prmse_api>` supports cases where some of the responses have **more than two** ratings available. The formula was derived by Matt S. Johnson and is explained in more detail in `Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2/>`_.
211+
The PRMSE formula implemented in RSMTool is more general and can also handle the case where the number of available ratings varies across the responses (e.g. **only a subset of responses is double-scored**). While ``rsmtool`` and ``rsmeval`` only support evaluations with two raters, the implementation of the PRMSE formula available via the :ref:`API<prmse_api>` supports cases where some of the responses have **more than two** ratings available. The formula was derived by Matt S. Johnson and is explained in more detail in `Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2/>`_.
212212

213213
In this case, the variance of rater errors is computed as a pooled variance estimator.
214214

@@ -262,7 +262,7 @@ In some cases, it may be appropriate to compute variance of human errors using a
262262
Fairness
263263
~~~~~~~~
264264

265-
Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://www.aclweb.org/anthology/W17-1605/>`_).
265+
Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://aclanthology.org/W17-1605/>`_).
266266

267267
When defining an experiment, the RSMTool user has the option of specifying which subgroups should be considered for such evaluations using :ref:`subgroups<subgroups_rsmtool>` field. These subgroups are then used in all fairness evaluations.
268268

@@ -308,7 +308,7 @@ DSM is computed using :ref:`rsmtool.utils.difference_of_standardized_means<dsm_a
308308
Additional fairness evaluations
309309
+++++++++++++++++++++++++++++++
310310

311-
Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
311+
Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
312312

313313
These include:
314314

@@ -372,4 +372,4 @@ Therefore, SMD between two human scores is computed using :ref:`rsmtool.utils.st
372372

373373
.. note::
374374

375-
In RSMTool v6.x and earlier, SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://onlinelibrary.wiley.com/doi/full/10.1111/j.1745-3992.2011.00223.x>`_. Starting with v7.0, the values computed by RSMTool will be *different* from those computed by earlier versions.
375+
In RSMTool v6.x and earlier, SMD was computed with the ``method`` argument set to ``"williamson"`` as described in `Williamson et al. (2012) <https://eric.ed.gov/?id=EJ959585>`_. Starting with v7.0, the values computed by RSMTool will be *different* from those computed by earlier versions.

doc/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Rater Scoring Modeling Tool (RSMTool)
1515

1616
.. image:: assets/spacer.png
1717

18-
Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `MI Write <https://measurementinc.com/miwrite>`_ for written responses and `SpeechRater <https://www.ets.org/research/policy_research_reports/publications/report/2008/hukv>`_ for spoken responses.
18+
Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `MI Write <https://measurementinc.com/miwrite>`_ for written responses and `SpeechRater <https://www.ets.org/research/policy_research_reports/publications/report/2008/hukv.html>`_ for spoken responses.
1919

2020
RSMTool is a python package which automates and combines in a *single* :doc:`pipeline <pipeline>` multiple analyses that are commonly conducted when building and evaluating automated scoring models. The output of RSMTool is a comprehensive, customizable HTML statistical report that contains the outputs of these multiple analyses. While RSMTool does make it really simple to run this set of standard analyses using a single command, it is also fully customizable and allows users to easily exclude unneeded analyses, modify the standard analyses, and even include custom analyses in the report.
2121

doc/intermediate_files_rsmeval.rst.inc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ Evaluations based on test theory
136136
Additional fairness analyses
137137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138138

139-
These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
139+
These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_.
140140

141141
- ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_eval>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference).
142142

doc/intermediate_files_rsmtool.rst.inc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ Evaluations based on test theory
260260
Additional fairness analyses
261261
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262262

263-
These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
263+
These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclanthology.org/W19-4401/>`_.
264264

265265
- ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_rsmtool>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference).
266266

doc/internal/release_process.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ This process is only meant for the project administrators, not users and develop
3434

3535
#. Build the PyPI source and wheel distributions using ``python setup.py sdist build`` and ``python setup.py bdist_wheel build`` respectively. Note that you should delete the ``build`` directory after running the ``sdist`` command and before running the ``bdist_wheel`` command.
3636

37-
#. Upload the source and wheel distributions to TestPyPI using ``twine upload --repository testpypi dist/*``. You will need to have the ``twine`` package installed and set up your ``$HOME/.pypirc`` correctly. See details `here <https://packaging.python.org/guides/using-testpypi/>`__. You will need to have the appropriate permissions for the ``ets`` organization on TestPyPI.
37+
#. Upload the source and wheel distributions to TestPyPI using ``twine upload --repository testpypi dist/*``. You will need to have the ``twine`` package installed and set up your ``$HOME/.pypirc`` correctly. See details `here <https://packaging.python.org/en/latest/guides/using-testpypi/>`__. You will need to have the appropriate permissions for the ``ets`` organization on TestPyPI.
3838

3939
#. Install the TestPyPI package as follows::
4040

doc/who.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Who is RSMTool for?
55

66
We expect the primary users of RSMTool to be researchers working on developing new automated scoring engines or on improving existing ones. Here's the most common scenario.
77

8-
A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <http://cohmetrix.com/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://www.audeering.com/research/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
8+
A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <https://soletlab.asu.edu/coh-metrix/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://www.audeering.com/research/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
99

1010
They can then run an RSMTool "experiment" to build a regression-based scoring model (using one of many available regressors) and produce a report. The report includes descriptive statistics for all their features, diagnostic information about the trained regression model, and a comprehensive evaluation of model performance on a held-out set of responses.
1111

rsmtool/fairness_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ def get_fairness_analyses(
167167
df, group, system_score_column, human_score_column="sc1", base_group=None
168168
):
169169
"""
170-
Compute analyses from `Loukina et al. 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
170+
Compute analyses from `Loukina et al. 2019 <https://aclanthology.org/W19-4401/>`_.
171171
172172
The function computes how much variance group membership explains in
173173
overall score accuracy (osa), overall score difference (osd),

rsmtool/modeler.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,8 @@ def train_rebalanced_lr(self, df_train, feature_columns):
357357
Train a "RebalancedLR" model.
358358
359359
This model balances empirical weights by changing betas (adapted
360-
from `here <https://stats.stackexchange.com/q/30876>`_).
360+
from `here <https://stats.stackexchange.com/questions/30876/
361+
how-to-convert-standardized-coefficients-to-unstandardized-coefficients>`_).
361362
362363
Parameters
363364
----------

rsmtool/utils/metrics.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -398,15 +398,13 @@ def difference_of_standardized_means(
398398
# We only check for mean since the function requires
399399
# both of these to be set or both to be None
400400
if population_y_true_observed_mn is None:
401-
402401
warnings.warn(warning_msg.format("y_true_observed"))
403402
(population_y_true_observed_sd, population_y_true_observed_mn) = (
404403
np.std(y_true_observed, ddof=ddof),
405404
np.mean(y_true_observed),
406405
)
407406

408407
if population_y_pred_mn is None:
409-
410408
warnings.warn(warning_msg.format("y_pred"))
411409
(population_y_pred_sd, population_y_pred_mn) = (
412410
np.std(y_pred, ddof=ddof),
@@ -446,7 +444,8 @@ def quadratic_weighted_kappa(y_true_observed, y_pred, ddof=0): # noqa: D301
446444
447445
The formula to compute quadratic-weighted kappa for continuous values
448446
was developed at ETS by Shelby Haberman.
449-
See `Haberman (2019) <https://onlinelibrary.wiley.com/doi/abs/10.1002/ets2.12258>`_
447+
See `Haberman (2019) <https://eric.ed.gov/?q=Measures+of+Agreement+Versus
448+
+Measures+of+Prediction+Accuracy&id=EJ1238497>`_
450449
for the full derivation. The discrete case is simply treated as a
451450
special case of the continuous one.
452451

rsmtool/utils/prmse.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,6 @@ def mse_true(system, human_scores, variance_errors_human=None):
192192
return None
193193

194194
else:
195-
196195
# get total number of scores for each response
197196
n_scores = get_n_human_scores(human_scores)
198197
mean_scores = np.nanmean(human_scores, axis=1)
@@ -213,7 +212,7 @@ def prmse_true(system, human_scores, variance_errors_human=None):
213212
PRMSE = Proportional Reduction in Mean Squared Error.
214213
The formula to compute PRMSE implemented in RSMTool
215214
was derived at ETS by Matthew S. Johnson. See
216-
`Loukina et al. (2020) <https://www.aclweb.org/anthology/2020.bea-1.2.pdf>`_
215+
`Loukina et al. (2020) <https://aclanthology.org/2020.bea-1.2.pdf>`_
217216
for further information about PRMSE.
218217
219218
Parameters
@@ -253,7 +252,6 @@ def prmse_true(system, human_scores, variance_errors_human=None):
253252
return None
254253

255254
else:
256-
257255
variance_true = true_score_variance(human_scores, variance_errors_human)
258256

259257
mse = mse_true(system, human_scores, variance_errors_human)

0 commit comments

Comments
 (0)