-
Notifications
You must be signed in to change notification settings - Fork 26
Merge in updated redist #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
0e6b04e
2deabd8
4e506ae
5bfd0d8
1baad5b
1ae29ca
4bb7583
0e579eb
eb23a5b
2be4d4f
fa2e5ca
466fd95
a053c70
400afa0
37b81c7
a002c5b
d652bb2
73ce6fc
26fba06
3dd1bc7
0d2c678
a30e4ec
b24e115
369afb2
e4c7b57
3c8c4a9
8143211
4d1e88f
4906982
8feac8b
76147e6
9e16744
083b3b9
45a5be5
d04f123
8639baf
b19d03c
cb3b8c2
9da5906
ee74cb7
9dacb0f
6d0b077
3e5eb3b
8a2ec6c
8f0500d
732ecbc
78e59a6
d81750e
c356234
5aa34ab
f926a0c
b28f5c9
337c4de
c0e052a
9e370e7
9171573
033aa05
f1e372a
891066f
8a6817a
4579d0b
2314a11
8e29aff
4782e0b
e7e0e5f
9e3c18e
bcd8c84
ba892ec
963d2cb
23e1c17
87676e9
79e0c6f
6a9544f
ee07d45
ede4fa5
bb3660e
0dba418
838a09d
a24069f
322f61d
4df0dd6
cbfb15b
4858283
999bbd8
b68fe50
487dcd3
8979e27
4e26019
32a5f75
2e5b3bc
dfeaadb
a500803
82c2706
8e42811
eaeda4f
15f3bd8
7ef437c
6dc8bc6
11d6d8a
2d94604
c427528
36c83c4
750abc0
5ca5610
7d0cef1
79f292c
71a2db0
33d242e
6af2ef4
3ed2cac
96dc627
ce63e94
086c3c9
91318bf
a7d5acd
820a179
6531948
21913a3
bc0efe0
4b74e61
adab046
2d4b2a6
b3d5557
5af72f6
f776cb7
e2e4ac9
7452576
e6f9d8d
a14f8db
ef8bb31
842e0db
696b5b4
3187a56
0157522
e98f306
c0b3bdb
cea469d
5adc7bd
6cdcfdb
89d5399
eba70c4
dcf6730
d679b97
7fac0e6
e389547
fdc0433
73ba889
e9554a3
354ad0c
fc949cd
c341b20
4c1bbe2
70b1836
bf09868
bd16605
2750687
6a2709c
03c189d
fb9009b
480f50a
b1b13a1
da355f6
de276c3
3aa720f
db69bcb
d4ce7e8
61cbe73
3c388fa
896fc3b
3ab54c6
6c63099
8cfa4ee
5b4f6c4
44c5465
85033ac
7f03402
7a59ba3
ab6373c
dc68e07
c69bad1
000a32b
6f1d60b
17e023f
9d3ced8
fa0cffe
5725d4d
f7fba3e
77df900
b0fe550
b112b07
2fe10f8
4111e4a
5a37f79
9ea023e
a2eb9d7
120acfe
3382833
666d857
f2c3c0c
4849d02
6d0bed7
70d475d
f0cf574
6eb02b5
0626438
860cdac
785b5f6
0c11a58
479845f
766b077
96f8ccb
961368d
b8fe0dd
74d5c24
86c744a
34faa16
5460e68
fc871e6
0098a52
7f50da7
cfd3691
730a22e
0a93fa2
90c1bbb
4f5daa8
cd8799b
7b07488
f68a826
a7e6a79
d5d7f83
8dd1677
93e6c3d
69a4b66
70fdd1b
d70da95
b4f5b28
90dfcaa
4472305
e71879f
01da894
69d2972
010d185
3b0ea35
32eabe2
76bf3a7
a48ba78
770fb00
a207d6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,4 +21,4 @@ builder.sh | |
^explore$ | ||
^\.github$ | ||
^LICENSE\.md$ | ||
^CRAN-SUBMISSION$ | ||
^CRAN-SUBMISSION$ |
philipwosull marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,45 @@ | ||
# 5.0.0 | ||
* Replaces old SMC weights with generally lower variance optimal weights. | ||
* Adds the option to add Mergesplit MCMC steps at any point during an SMC run. | ||
Adding mergesplit steps can help achieve convergence for plans with a larger | ||
number of districts without increasing the sample size. | ||
* Improves SMC and Mergesplit MCMC performance by pre-allocating and reusing as | ||
much memory as possible while drawing spanning trees. | ||
* Introduces new methods for sampling plans for both SMC and Mergesplit MCMC. | ||
The final output is still plans from the same distribution as before but new | ||
sampling spaces and splitting methods will sometimes perform better under some | ||
scenarios. | ||
* Introduces a new method for splitting in SMC - generalized region splits. | ||
Instead of splitting off one district at a time this allows for splitting into | ||
two arbitrary sized regions. For an equal sample size generalized region splits | ||
tends to converge slower but it is typically much faster (up to twice as fast or | ||
more) since on average it draws spanning trees on smaller subgraphs then | ||
single district splits. | ||
* Adds support for sampling multimember district plans with both SMC and | ||
mergesplit MCMC under some mild conditions. The district seat sizes (how many | ||
legislators a district can have) must be a range of values e.g. (3,4,5) and no | ||
district seat size can be the sum of two others. | ||
* When counties are used `redist_mergesplit` now samples from the same target | ||
distribution as `redist_smc` (it guarantees no more than the number of districts | ||
minus 1 splits). | ||
* `redist_mergesplit` inputs now work differently. | ||
CoryMcCartan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `nsims` is now the number of plans saved. | ||
* `warmup` is the number of steps to run the chain for before collecting any samples. | ||
* `thin` means we will run the chain for `thin - 1` steps between saving plans | ||
* Overall the chain will be run for `warmup + nsims * thin` and return `nsims` plans. | ||
* Adds the option to incorporate rejection sampling for all constraints in SMC | ||
and mergesplit MCMC. Any constraint can now include a threshold argument `thresh` | ||
where for a newly split plan if either of the two new regions has a raw score | ||
greater than or equal to `thresh` then the plan will be automatically reject. | ||
This amounts to giving plans where any region has a score above `thresh` a | ||
probability of 0. | ||
* Updates the target distribution when counties are turned on. For more details | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give any more details here, since the paper is not available? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will have a preprint by the time this is released |
||
see the forthcoming working paper. | ||
* The mergesplit backend for `redist_shortburst` now uses uniform edge sampling | ||
with forest space for the backend instead of sampling with graph space and all | ||
`k` related parameters have been removed. | ||
|
||
|
||
# 4.3.0 | ||
* Improves SMC performance by pre-allocating some memory while drawing spanning trees. | ||
* Replaces SMC label-counting adjustments (exact and importance-sampling-based) with a new backward kernel that eliminates approximation error and requires far less computation | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,6 +53,14 @@ dist_dist_diff <- function(p, i_dist, j_dist, x_center, y_center, x, y) { | |
.Call(`_redist_dist_dist_diff`, p, i_dist, j_dist, x_center, y_center, x, y) | ||
} | ||
|
||
get_region_multigraph <- function(adj_list, region_ids) { | ||
.Call(`_redist_get_region_multigraph`, adj_list, region_ids) | ||
} | ||
|
||
get_region_laplacian <- function(adj_list, region_ids) { | ||
.Call(`_redist_get_region_laplacian`, adj_list, region_ids) | ||
} | ||
|
||
log_st_map <- function(g, districts, counties, n_distr) { | ||
.Call(`_redist_log_st_map`, g, districts, counties, n_distr) | ||
} | ||
|
@@ -69,6 +77,52 @@ calcPWDh <- function(x) { | |
.Call(`_redist_calcPWDh`, x) | ||
} | ||
|
||
#' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to export these from c++ to R if they're not used by any package R code? if we are going to keep them exported (f,rom c++) but internal, then I'd like to change the names, because people can still see these functions with |
||
#' @returns A list with the following | ||
#' - `uncut_tree`: The spanning tree drawn on the region stored as a | ||
#' 0-indexed directed edge adjacency graph. | ||
#' - `num_attempts`: The number of attempts it took to draw the tree. | ||
#' | ||
#' @keywords internal | ||
#' @noRd | ||
draw_a_tree_on_a_region <- function(adj_list, counties, pop, ndists, num_regions, num_districts, region_id_to_draw_tree_on, lower, upper, region_ids, region_sizes, verbose) { | ||
.Call(`_redist_draw_a_tree_on_a_region`, adj_list, counties, pop, ndists, num_regions, num_districts, region_id_to_draw_tree_on, lower, upper, region_ids, region_sizes, verbose) | ||
} | ||
|
||
#' Splits a multidistrict into two new regions within population bounds | ||
#' | ||
#' Splits a multidistrict into two new valid regions by drawing spanning | ||
#' trees uniformly at random and attempting to find an edge to cut until | ||
#' a successful cut is made. | ||
#' | ||
#' @title Split a multidistrict into two regions | ||
#' | ||
#' @inheritParams run_redist_smc | ||
#' @noRd | ||
perform_a_valid_multidistrict_split <- function(adj_list, counties, pop, ndists, num_regions, num_districts, region_id_to_split, target, lower, upper, region_ids, region_sizes, split_dval_min, split_dval_max, split_district_only, verbose = FALSE, k_param = 1L) { | ||
.Call(`_redist_perform_a_valid_multidistrict_split`, adj_list, counties, pop, ndists, num_regions, num_districts, region_id_to_split, target, lower, upper, region_ids, region_sizes, split_dval_min, split_dval_max, split_district_only, verbose, k_param) | ||
} | ||
|
||
draw_trees_on_a_region <- function(adj_list, counties, pop, ndists, region_id_to_draw_tree_on, region_size, lower, target, upper, region_ids, num_tree, num_threads, verbose) { | ||
.Call(`_redist_draw_trees_on_a_region`, adj_list, counties, pop, ndists, region_id_to_draw_tree_on, region_size, lower, target, upper, region_ids, num_tree, num_threads, verbose) | ||
} | ||
|
||
attempt_splits_on_a_region <- function(adj_list, counties, pop, ndists, init_num_regions, region_id_to_split, lower, target, upper, region_ids, region_sizes, splitting_schedule_str, k_param, num_plans, num_threads, verbose) { | ||
.Call(`_redist_attempt_splits_on_a_region`, adj_list, counties, pop, ndists, init_num_regions, region_id_to_split, lower, target, upper, region_ids, region_sizes, splitting_schedule_str, k_param, num_plans, num_threads, verbose) | ||
} | ||
|
||
compute_log_unnormalized_target_density_components <- function(adj_list, counties, pop, constraints, pop_temper, compute_pop_temper, rho, ndists, total_seats, num_regions, district_seat_sizes, lower, target, upper, region_ids, region_sizes, output_type, num_threads) { | ||
.Call(`_redist_compute_log_unnormalized_target_density_components`, adj_list, counties, pop, constraints, pop_temper, compute_pop_temper, rho, ndists, total_seats, num_regions, district_seat_sizes, lower, target, upper, region_ids, region_sizes, output_type, num_threads) | ||
} | ||
|
||
compute_plans_log_optimal_weights <- function(adj_list, counties, pop, constraints, pop_temper, rho, splitting_schedule_str, ndists, total_seats, district_seat_sizes, num_regions, lower, target, upper, region_ids, region_sizes, num_threads) { | ||
.Call(`_redist_compute_plans_log_optimal_weights`, adj_list, counties, pop, constraints, pop_temper, rho, splitting_schedule_str, ndists, total_seats, district_seat_sizes, num_regions, lower, target, upper, region_ids, region_sizes, num_threads) | ||
} | ||
|
||
compute_plans_log_simple_weights <- function(adj_list, counties, pop, constraints, pop_temper, rho, splitting_schedule_str, ndists, total_seats, district_seat_sizes, num_regions, lower, target, upper, region_ids, region_sizes, num_threads) { | ||
.Call(`_redist_compute_plans_log_simple_weights`, adj_list, counties, pop, constraints, pop_temper, rho, splitting_schedule_str, ndists, total_seats, district_seat_sizes, num_regions, lower, target, upper, region_ids, region_sizes, num_threads) | ||
} | ||
|
||
group_pct_top_k <- function(m, group_pop, total_pop, k, n_distr) { | ||
.Call(`_redist_group_pct_top_k`, m, group_pop, total_pop, k, n_distr) | ||
} | ||
|
@@ -89,20 +143,32 @@ prec_cooccur <- function(m, idxs, ncores = 0L) { | |
.Call(`_redist_prec_cooccur`, m, idxs, ncores) | ||
} | ||
|
||
group_pct <- function(m, group_pop, total_pop, n_distr) { | ||
.Call(`_redist_group_pct`, m, group_pop, total_pop, n_distr) | ||
group_pct <- function(plans_mat, group_pop, total_pop, n_distr, ncores = 0L) { | ||
.Call(`_redist_group_pct`, plans_mat, group_pop, total_pop, n_distr, ncores) | ||
} | ||
|
||
pop_tally <- function(districts, pop, n_distr, ncores = 0L) { | ||
.Call(`_redist_pop_tally`, districts, pop, n_distr, ncores) | ||
} | ||
|
||
pop_tally <- function(districts, pop, n_distr) { | ||
.Call(`_redist_pop_tally`, districts, pop, n_distr) | ||
infer_region_seats <- function(region_pops, lower, upper, total_seats, num_threads = 0L) { | ||
.Call(`_redist_infer_region_seats`, region_pops, lower, upper, total_seats, num_threads) | ||
} | ||
|
||
max_dev <- function(districts, pop, n_distr) { | ||
.Call(`_redist_max_dev`, districts, pop, n_distr) | ||
max_dev <- function(districts, pop, n_distr, multimember_districts = FALSE, nseats = -1L, seats_matrix = matrix(1,1), num_threads = 1L) { | ||
.Call(`_redist_max_dev`, districts, pop, n_distr, multimember_districts, nseats, seats_matrix, num_threads) | ||
} | ||
|
||
ms_plans <- function(N, l, init, counties, pop, n_distr, target, lower, upper, rho, constraints, control, k, thin, verbosity) { | ||
.Call(`_redist_ms_plans`, N, l, init, counties, pop, n_distr, target, lower, upper, rho, constraints, control, k, thin, verbosity) | ||
order_district_stats <- function(district_stats, ndists, num_threads) { | ||
.Call(`_redist_order_district_stats`, district_stats, ndists, num_threads) | ||
} | ||
|
||
order_columns_by_district <- function(df, columns, ndists, num_threads = 0L) { | ||
.Call(`_redist_order_columns_by_district`, df, columns, ndists, num_threads) | ||
} | ||
|
||
ms_plans <- function(nsims, warmup, thin, ndists, total_seats, district_seat_sizes, adj_list, counties, pop, target, lower, upper, rho, init_plan, init_seats, sampling_space_str, pair_rule, control, constraints, verbosity = 3L, diagnostic_mode = FALSE) { | ||
.Call(`_redist_ms_plans`, nsims, warmup, thin, ndists, total_seats, district_seat_sizes, adj_list, counties, pop, target, lower, upper, rho, init_plan, init_seats, sampling_space_str, pair_rule, control, constraints, verbosity, diagnostic_mode) | ||
} | ||
|
||
pareto_dominated <- function(x) { | ||
|
@@ -125,6 +191,105 @@ resample_lowvar <- function(wgts) { | |
.Call(`_redist_resample_lowvar`, wgts) | ||
} | ||
|
||
maximum_input_sizes <- function() { | ||
.Call(`_redist_maximum_input_sizes`) | ||
} | ||
|
||
#' Checks a matrix of seat counts is valid | ||
#' | ||
#' Checks that a matrix of seat counts associated with a plan is valid | ||
#' meaning that every region has a positive seat value and for each plan | ||
#' the sum of seats is equal to the total number of seats (`nseats`). | ||
#' If anything is not correct an error will be thrown. | ||
#' | ||
#' @param init_seats A matrix of 1-indexed plans | ||
#' @param num_regions The number of regions in the plan. | ||
#' @param nseats The total number of seats in the map | ||
#' @param seats_range Vector of number of seats a district is allowed to have | ||
#' @param split_districts_only Whether or not to check that all but the last region are | ||
#' districts or not. (Allows for the possibility the last region is a district too). | ||
#' @param num_threads The number of threads to use. Defaults to number of machine threads. | ||
#' | ||
#' @details Modifications | ||
#' - None | ||
#' | ||
#' @keywords internal | ||
#' @noRd | ||
validate_init_seats_cpp <- function(init_seats, num_regions, nseats, seats_range, split_districts_only, num_threads = 1L) { | ||
invisible(.Call(`_redist_validate_init_seats_cpp`, init_seats, num_regions, nseats, seats_range, split_districts_only, num_threads)) | ||
} | ||
|
||
#' Get canonically relabeled plans matrix | ||
#' | ||
#' Given a matrix of 1-indexed plans (or partial plans) this function | ||
#' returns a new plans matrix with all the plans labeled canonically. | ||
#' The canonical labelling of a plan is the one where the region of the | ||
#' first vertex gets mapped to 1, the region of the next smallest vertex | ||
#' in a different region than the first gets mapped to 2, and so on. This | ||
#' is guaranteed to result in the same labelling for any plan where the | ||
#' region ids have been permuted. | ||
#' | ||
#' | ||
#' @param plans_mat A matrix of 1-indexed plans | ||
#' @param num_regions The number of regions in the plan | ||
#' @param num_threads The number of threads to use. Defaults to number of machine threads. | ||
#' | ||
#' @details Modifications | ||
#' - None | ||
#' | ||
#' @returns A matrix of canonically labelled plans | ||
#' | ||
#' @keywords internal | ||
#' @noRd | ||
get_canonical_plan_labelling <- function(plans_mat, num_regions, num_threads = 0L) { | ||
.Call(`_redist_get_canonical_plan_labelling`, plans_mat, num_regions, num_threads) | ||
} | ||
|
||
#' Count how many times each plan appears in a plans matrix | ||
#' | ||
#' Given a matrix of 1-indexed plans (or partial plans) this function | ||
#' returns a list mapping plan vectors as a giant concatened string to | ||
#' the count of how many times the plan appears. | ||
#' | ||
#' If `use_canonical_ordering` is set to true then the plans will be | ||
#' reordered using the canonical reordering function | ||
#' `get_canonical_plan_labelling`. This guarantees that the same plan | ||
#' will not be incorrectly counted if there are different permutations | ||
#' of its labels. If `use_canonical_ordering` is not set to true then | ||
#' its possible the count will be incorrect because of different | ||
#' permutations of the same underlying plan. | ||
#' | ||
#' | ||
#' @param plans_mat A matrix of 1-indexed plans | ||
#' @param num_regions The number of regions in the plan | ||
#' @param use_canonical_ordering Whether or not to reorder the plans using the | ||
#' canonical ordering on plans. | ||
#' @param num_threads The number of threads to use. Defaults to number of machine threads. | ||
#' | ||
#' @details Modifications | ||
#' - None | ||
#' | ||
#' @returns A list mapping plans (stored as a string concatened vector) to | ||
#' how many times they appear in the matrix | ||
#' | ||
#' @keywords internal | ||
#' @noRd | ||
get_plan_counts <- function(input_plans_mat, num_regions, use_canonical_ordering = TRUE, num_threads = 0L) { | ||
.Call(`_redist_get_plan_counts`, input_plans_mat, num_regions, use_canonical_ordering, num_threads) | ||
} | ||
|
||
resample_plans_lowvar <- function(normalized_weights, plans_mat, region_pops_mat, region_sizes_mat, reorder_sizes_mat) { | ||
.Call(`_redist_resample_plans_lowvar`, normalized_weights, plans_mat, region_pops_mat, region_sizes_mat, reorder_sizes_mat) | ||
} | ||
|
||
get_log_number_linking_edges <- function(adj_list, counties, constraints, ndists, nseats, num_regions, region_ids) { | ||
.Call(`_redist_get_log_number_linking_edges`, adj_list, counties, constraints, ndists, nseats, num_regions, region_ids) | ||
} | ||
|
||
get_merged_log_number_linking_edges <- function(adj_list, counties, constraints, ndists, nseats, num_regions, region_ids, region1_id, region2_id) { | ||
.Call(`_redist_get_merged_log_number_linking_edges`, adj_list, counties, constraints, ndists, nseats, num_regions, region_ids, region1_id, region2_id) | ||
} | ||
|
||
plan_joint <- function(m1, m2, pop) { | ||
.Call(`_redist_plan_joint`, m1, m2, pop) | ||
} | ||
|
@@ -149,8 +314,34 @@ k_biggest <- function(x, k = 1L) { | |
.Call(`_redist_k_biggest`, x, k) | ||
} | ||
|
||
smc_plans <- function(N, l, counties, pop, n_distr, target, lower, upper, rho, districts, n_drawn, n_steps, constraints, control, verbosity = 1L) { | ||
.Call(`_redist_smc_plans`, N, l, counties, pop, n_distr, target, lower, upper, rho, districts, n_drawn, n_steps, constraints, control, verbosity) | ||
#' Run SMC (optionally with Merge Split steps too) | ||
#' | ||
#' Uses smc method with optimal weights and merge split steps to generate a sample of `nsims` plans in `c++` | ||
#' | ||
#' | ||
#' Using the procedure outlined in <PAPER HERE> this function uses Sequential | ||
#' Monte Carlo (SMC) methods to generate a sample of `M` plans | ||
#' | ||
#' | ||
#' @param ndists The number of districts the final plans will have | ||
#' @param adj_list A 0-indexed adjacency list representing the undirected graph | ||
#' which represents the underlying map the plans are to be drawn on | ||
#' @param counties Vector of county labels of each vertex in `g` | ||
#' @param pop A vector of the population associated with each vertex in `g` | ||
#' @param target Ideal population of a valid district. This is what deviance is calculated | ||
#' relative to | ||
#' @param lower Acceptable lower bounds on a valid district's population | ||
#' @param upper Acceptable upper bounds on a valid district's population | ||
#' @param nsims The number of plans (samples) to draw | ||
#' @param k_param The k parameter from the SMC algorithm, you choose among the top k_param edges | ||
#' @param control Named list of additional parameters. | ||
#' @param num_threads The number of threads the threadpool should use | ||
#' @param verbosity What level of detail to print out while the algorithm is | ||
#' running <ADD OPTIONS> | ||
#' @keywords internal | ||
#' @noRd | ||
run_redist_smc <- function(nsims, total_seats, ndists, district_seat_sizes, initial_num_regions, adj_list, counties, pop, step_types, target, lower, upper, rho, sampling_space_str, control, constraints, verbosity, diagnostic_level, region_id_mat, region_sizes_mat, log_weights) { | ||
.Call(`_redist_run_redist_smc`, nsims, total_seats, ndists, district_seat_sizes, initial_num_regions, adj_list, counties, pop, step_types, target, lower, upper, rho, sampling_space_str, control, constraints, verbosity, diagnostic_level, region_id_mat, region_sizes_mat, log_weights) | ||
} | ||
|
||
splits <- function(dm, community, nd, max_split) { | ||
|
Uh oh!
There was an error while loading. Please reload this page.