This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
Researching existing code corpora #37
Open
Description
An issue to collect discussions/results around researching existing code to extract data that may help with the design/scope of this proposal.
Array Method Usage
Using the GitHub dataset on BigQuery to scan for usage on existing array methods.
Number of repositories that include a particular method call
Query
SELECT
SUM(1) AS js_repos,
SUM(CAST(has_map AS INT64)) AS has_map,
SUM(CAST(has_filter AS INT64)) AS has_filter,
SUM(CAST(has_reduce AS INT64)) AS has_reduce,
SUM(CAST(has_copy_within AS INT64)) AS has_copy_within,
SUM(CAST(has_fill AS INT64)) AS has_fill,
SUM(CAST(has_pop AS INT64)) AS has_pop,
SUM(CAST(has_push AS INT64)) AS has_push,
SUM(CAST(has_reverse AS INT64)) AS has_reverse,
SUM(CAST(has_shift AS INT64)) AS has_shift,
SUM(CAST(has_sort AS INT64)) AS has_sort,
SUM(CAST(has_slice AS INT64)) AS has_slice,
SUM(CAST(has_slice_default AS INT64)) AS has_slice_default,
SUM(CAST(has_slice_pop AS INT64)) AS has_slice_pop,
SUM(CAST(has_slice_shift AS INT64)) AS has_slice_shift,
SUM(CAST(has_splice AS INT64)) AS has_splice,
SUM(CAST(has_unshift AS INT64)) AS has_unshift,
FROM (
SELECT
repo_name,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.map\(")) AS has_map,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.filter\(")) AS has_filter,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.reduce\(")) AS has_reduce,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.copyWithin\(")) AS has_copy_within,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.fill\(")) AS has_fill,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.pop\( ?\)")) AS has_pop,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.push\(")) AS has_push,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.reverse\( ?\)")) AS has_reverse,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.shift\( ?\)")) AS has_shift,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.sort\(")) AS has_sort,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\(")) AS has_slice,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\(\)")) AS has_slice_default,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\( ?0 ?, ?-1 ?\)")) AS has_slice_pop,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\( ?1 ?\)")) AS has_slice_shift,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.splice\(")) AS has_splice,
LOGICAL_OR(REGEXP_CONTAINS(content, r"\.unshift\(")) AS has_unshift,
FROM (
SELECT
repo_name,
content
FROM
`bigquery-public-data.github_repos.files`
INNER JOIN
`bigquery-public-data.github_repos.contents`
USING
(id)
WHERE
ENDS_WITH(path, '.js')
AND NOT REGEXP_CONTAINS(path, r"\d\.\d"))
GROUP BY
repo_name )
To get a general sense of the relative usage of the methods that we are proposing to add non-mutating versions of (and other methods like .map
as a benchmark).
- Tries to exclude files that are copies of libraries by excluding file paths that seem to contain a version
/\d.\d/
- Does not exclude forks (tbc)
- Includes false positives.
.map
could beObservable.prototype.map
and.slice
could beString.prototype.slice
- Dataset is from 2016 (tbc)
Category | Count | % |
---|---|---|
All repos with .js | 1,187,155 | 100 |
.push(... | 813,437 | 69 |
.slice(... | 625,390 | 53 |
.map(... | 624,565 | 52 |
.filter(... | 571,678 | 48 |
.sort(... | 536,098 | 45 |
.splice(... | 533,482 | 45 |
.shift() | 500,447 | 42 |
.pop() | 497,781 | 42 |
.unshift() | 434,742 | 37 |
.reverse() | 403,688 | 34 |
.reduce(... | 248,995 | 21 |
.fill(... | 141,936 | 12 |
.copyWithin(... | 6,034 | 0.5 |
Slice usage
.withPopped()
is ~equivalent to .slice(0, -1)
and .withShifted()
is ~equivalent to .slice(1)
. So we can also look for those particular patterns.
Category | Count | % |
---|---|---|
All repos with .js | 1,187,155 | 100 |
.slice(... | 625,390 | 53 |
.slice(1) | 434,558 | 37 |
.slice(0, -1) | 264,466 | 22 |