Skip to content

Commit 1d9d1c1

Browse files
gururaj1512kgryte
andauthored
feat: add stats/strided/covarmtk
PR-URL: #7659 Co-authored-by: Athan Reines <kgryte@gmail.com> Reviewed-by: Athan Reines <kgryte@gmail.com>
1 parent 0baa2d3 commit 1d9d1c1

File tree

15 files changed

+2234
-0
lines changed

15 files changed

+2234
-0
lines changed
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
<!--
2+
3+
@license Apache-2.0
4+
5+
Copyright (c) 2025 The Stdlib Authors.
6+
7+
Licensed under the Apache License, Version 2.0 (the "License");
8+
you may not use this file except in compliance with the License.
9+
You may obtain a copy of the License at
10+
11+
http://www.apache.org/licenses/LICENSE-2.0
12+
13+
Unless required by applicable law or agreed to in writing, software
14+
distributed under the License is distributed on an "AS IS" BASIS,
15+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
See the License for the specific language governing permissions and
17+
limitations under the License.
18+
19+
-->
20+
21+
<!-- lint disable maximum-heading-length -->
22+
23+
# covarmtk
24+
25+
> Calculate the [covariance][covariance] of two strided arrays provided known means and using a one-pass textbook algorithm.
26+
27+
<section class="intro">
28+
29+
The population [covariance][covariance] of two finite size populations of size `N` is given by
30+
31+
<!-- <equation class="equation" label="eq:population_covariance" align="center" raw="\operatorname{\mathrm{cov_N}} = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu_x)(y_i - \mu_y)" alt="Equation for the population covariance."> -->
32+
33+
```math
34+
\mathop{\mathrm{cov_N}} = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu_x)(y_i - \mu_y)
35+
```
36+
37+
<!-- </equation> -->
38+
39+
where the population means are given by
40+
41+
<!-- <equation class="equation" label="eq:population_mean_for_x" align="center" raw="\mu_x = \frac{1}{N} \sum_{i=0}^{N-1} x_i" alt="Equation for the population mean for first array."> -->
42+
43+
```math
44+
\mu_x = \frac{1}{N} \sum_{i=0}^{N-1} x_i
45+
```
46+
47+
<!-- </equation> -->
48+
49+
and
50+
51+
<!-- <equation class="equation" label="eq:population_mean_for_y" align="center" raw="\mu_y = \frac{1}{N} \sum_{i=0}^{N-1} y_i" alt="Equation for the population mean for second array."> -->
52+
53+
```math
54+
\mu_y = \frac{1}{N} \sum_{i=0}^{N-1} y_i
55+
```
56+
57+
<!-- </equation> -->
58+
59+
Often in the analysis of data, the true population [covariance][covariance] is not known _a priori_ and must be estimated from samples drawn from population distributions. If one attempts to use the formula for the population [covariance][covariance], the result is biased and yields a **biased sample covariance**. To compute an **unbiased sample covariance** for samples of size `n`,
60+
61+
<!-- <equation class="equation" label="eq:unbiased_sample_covariance" align="center" raw="\operatorname{\mathrm{cov_n}} = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x}_n)(y_i - \bar{y}_n)" alt="Equation for computing an unbiased sample variance."> -->
62+
63+
```math
64+
\mathop{\mathrm{cov_n}} = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x}_n)(y_i - \bar{y}_n)
65+
```
66+
67+
<!-- </equation> -->
68+
69+
where sample means are given by
70+
71+
<!-- <equation class="equation" label="eq:sample_mean_for_x" align="center" raw="\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i" alt="Equation for the sample mean for first array."> -->
72+
73+
```math
74+
\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i
75+
```
76+
77+
<!-- </equation> -->
78+
79+
and
80+
81+
<!-- <equation class="equation" label="eq:sample_mean_for_y" align="center" raw="\bar{y} = \frac{1}{n} \sum_{i=0}^{n-1} y_i" alt="Equation for the sample mean for second array."> -->
82+
83+
```math
84+
\bar{y} = \frac{1}{n} \sum_{i=0}^{n-1} y_i
85+
```
86+
87+
<!-- </equation> -->
88+
89+
The use of the term `n-1` is commonly referred to as Bessel's correction. Depending on the characteristics of the population distributions, other correction factors (e.g., `n-1.5`, `n+1`, etc) can yield better estimators.
90+
91+
</section>
92+
93+
<!-- /.intro -->
94+
95+
<section class="usage">
96+
97+
## Usage
98+
99+
```javascript
100+
var covarmtk = require( '@stdlib/stats/strided/covarmtk' );
101+
```
102+
103+
#### covarmtk( N, correction, meanx, x, strideX, meany, y, strideY )
104+
105+
Computes the [covariance][covariance] of two strided arrays provided known means and using a one-pass textbook algorithm.
106+
107+
```javascript
108+
var x = [ 1.0, -2.0, 2.0 ];
109+
var y = [ 2.0, -2.0, 1.0 ];
110+
111+
var v = covarmtk( x.length, 1, 1.0/3.0, x, 1, 1.0/3.0, y, 1 );
112+
// returns ~3.8333
113+
```
114+
115+
The function has the following parameters:
116+
117+
- **N**: number of indexed elements.
118+
- **correction**: degrees of freedom adjustment. Setting this parameter to a value other than `0` has the effect of adjusting the divisor during the calculation of the [covariance][covariance] according to `N-c` where `c` corresponds to the provided degrees of freedom adjustment. When computing the population [covariance][covariance], setting this parameter to `0` is the standard choice (i.e., the provided arrays contain data constituting entire populations). When computing the unbiased sample [covariance][covariance], setting this parameter to `1` is the standard choice (i.e., the provided arrays contain data sampled from larger populations; this is commonly referred to as Bessel's correction).
119+
- **meanx**: mean of `x`.
120+
- **x**: first input [`Array`][mdn-array] or [`typed array`][mdn-typed-array].
121+
- **strideX**: stride length for `x`.
122+
- **meany**: mean of `y`.
123+
- **y**: second input [`Array`][mdn-array] or [`typed array`][mdn-typed-array].
124+
- **strideY**: stride length for `y`.
125+
126+
The `N` and stride parameters determine which elements in the strided arrays are accessed at runtime. For example, to compute the [covariance][covariance] of every other element in `x` and `y`,
127+
128+
```javascript
129+
var x = [ 1.0, 2.0, 2.0, -7.0, -2.0, 3.0, 4.0, 2.0 ];
130+
var y = [ 2.0, 1.0, 2.0, 1.0, -2.0, 2.0, 3.0, 4.0 ];
131+
132+
var v = covarmtk( 4, 1, 1.25, x, 2, 1.25, y, 2 );
133+
// returns 5.25
134+
```
135+
136+
Note that indexing is relative to the first index. To introduce an offset, use [`typed array`][mdn-typed-array] views.
137+
138+
<!-- eslint-disable stdlib/capitalized-comments -->
139+
140+
```javascript
141+
var Float64Array = require( '@stdlib/array/float64' );
142+
143+
var x0 = new Float64Array( [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ] );
144+
var y0 = new Float64Array( [ 2.0, -2.0, 2.0, 1.0, -2.0, 4.0, 3.0, 2.0 ] );
145+
146+
var x1 = new Float64Array( x0.buffer, x0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
147+
var y1 = new Float64Array( y0.buffer, y0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
148+
149+
var v = covarmtk( 4, 1, 1.25, x1, 2, 1.25, y1, 2 );
150+
// returns ~1.9167
151+
```
152+
153+
#### covarmtk.ndarray( N, correction, meanx, x, strideX, offsetX, meany, y, strideY, offsetY )
154+
155+
Computes the [covariance][covariance] of two strided arrays provided known means and using a one-pass textbook algorithm and alternative indexing semantics.
156+
157+
```javascript
158+
var x = [ 1.0, -2.0, 2.0 ];
159+
var y = [ 2.0, -2.0, 1.0 ];
160+
161+
var v = covarmtk.ndarray( x.length, 1, 1.0/3.0, x, 1, 0, 1.0/3.0, y, 1, 0 );
162+
// returns ~3.8333
163+
```
164+
165+
The function has the following additional parameters:
166+
167+
- **offsetX**: starting index for `x`.
168+
- **offsetY**: starting index for `y`.
169+
170+
While [`typed array`][mdn-typed-array] views mandate a view offset based on the underlying buffer, the offset parameters support indexing semantics based on starting indices. For example, to calculate the [covariance][covariance] for every other element in `x` and `y` starting from the second element
171+
172+
```javascript
173+
var x = [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ];
174+
var y = [ -7.0, 2.0, 2.0, 1.0, -2.0, 2.0, 3.0, 4.0 ];
175+
176+
var v = covarmtk.ndarray( 4, 1, 1.25, x, 2, 1, 1.25, y, 2, 1 );
177+
// returns 6.0
178+
```
179+
180+
</section>
181+
182+
<!-- /.usage -->
183+
184+
<section class="notes">
185+
186+
## Notes
187+
188+
- If `N <= 0`, both functions return `NaN`.
189+
- If `N - c` is less than or equal to `0` (where `c` corresponds to the provided degrees of freedom adjustment), both functions return `NaN`.
190+
- Both functions support array-like objects having getter and setter accessors for array element access (e.g., [`@stdlib/array/base/accessor`][@stdlib/array/base/accessor]).
191+
- Depending on the environment, the typed versions ([`dcovarmtk`][@stdlib/stats/strided/dcovarmtk], [`scovarmtk`][@stdlib/stats/strided/scovarmtk], etc.) are likely to be significantly more performant.
192+
193+
</section>
194+
195+
<!-- /.notes -->
196+
197+
<section class="examples">
198+
199+
## Examples
200+
201+
<!-- eslint no-undef: "error" -->
202+
203+
```javascript
204+
var discreteUniform = require( '@stdlib/random/array/discrete-uniform' );
205+
var covarmtk = require( '@stdlib/stats/strided/covarmtk' );
206+
207+
var opts = {
208+
'dtype': 'generic'
209+
};
210+
var x = discreteUniform( 10, -50, 50, opts );
211+
console.log( x );
212+
213+
var y = discreteUniform( 10, -50, 50, opts );
214+
console.log( y );
215+
216+
var v = covarmtk( x.length, 1, 0.0, x, 1, 0.0, y, 1 );
217+
console.log( v );
218+
```
219+
220+
</section>
221+
222+
<!-- /.examples -->
223+
224+
* * *
225+
226+
<section class="references">
227+
228+
</section>
229+
230+
<!-- /.references -->
231+
232+
<!-- Section for related `stdlib` packages. Do not manually edit this section, as it is automatically populated. -->
233+
234+
<section class="related">
235+
236+
</section>
237+
238+
<!-- /.related -->
239+
240+
<!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->
241+
242+
<section class="links">
243+
244+
[covariance]: https://en.wikipedia.org/wiki/Covariance
245+
246+
[mdn-typed-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray
247+
248+
[mdn-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
249+
250+
[@stdlib/array/base/accessor]: https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/array/base/accessor
251+
252+
[@stdlib/stats/strided/dcovarmtk]: https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/stats/strided/dcovarmtk
253+
254+
[@stdlib/stats/strided/scovarmtk]: https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/stats/strided/scovarmtk
255+
256+
</section>
257+
258+
<!-- /.links -->
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
/**
2+
* @license Apache-2.0
3+
*
4+
* Copyright (c) 2025 The Stdlib Authors.
5+
*
6+
* Licensed under the Apache License, Version 2.0 (the "License");
7+
* you may not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
'use strict';
20+
21+
// MODULES //
22+
23+
var bench = require( '@stdlib/bench' );
24+
var uniform = require( '@stdlib/random/array/uniform' );
25+
var isnan = require( '@stdlib/math/base/assert/is-nan' );
26+
var pow = require( '@stdlib/math/base/special/pow' );
27+
var pkg = require( './../package.json' ).name;
28+
var covarmtk = require( './../lib/main.js' );
29+
30+
31+
// VARIABLES //
32+
33+
var options = {
34+
'dtype': 'generic'
35+
};
36+
37+
38+
// FUNCTIONS //
39+
40+
/**
41+
* Creates a benchmark function.
42+
*
43+
* @private
44+
* @param {PositiveInteger} len - array length
45+
* @returns {Function} benchmark function
46+
*/
47+
function createBenchmark( len ) {
48+
var x = uniform( len, -10.0, 10.0, options );
49+
return benchmark;
50+
51+
function benchmark( b ) {
52+
var v;
53+
var i;
54+
55+
b.tic();
56+
for ( i = 0; i < b.iterations; i++ ) {
57+
v = covarmtk( x.length, 1, 0.0, x, 1, 0.0, x, 1 );
58+
if ( isnan( v ) ) {
59+
b.fail( 'should not return NaN' );
60+
}
61+
}
62+
b.toc();
63+
if ( isnan( v ) ) {
64+
b.fail( 'should not return NaN' );
65+
}
66+
b.pass( 'benchmark finished' );
67+
b.end();
68+
}
69+
}
70+
71+
72+
// MAIN //
73+
74+
/**
75+
* Main execution sequence.
76+
*
77+
* @private
78+
*/
79+
function main() {
80+
var len;
81+
var min;
82+
var max;
83+
var f;
84+
var i;
85+
86+
min = 1; // 10^min
87+
max = 6; // 10^max
88+
89+
for ( i = min; i <= max; i++ ) {
90+
len = pow( 10, i );
91+
f = createBenchmark( len );
92+
bench( pkg+':len='+len, f );
93+
}
94+
}
95+
96+
main();

0 commit comments

Comments
 (0)