-
Notifications
You must be signed in to change notification settings - Fork 2
Add binning functions #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prelim review -- thanks!
test/test_array_ops.py
Outdated
np.testing.assert_allclose(binned_signal_sigma_so3g, binned_signal_sigma, atol=tolerance) | ||
np.testing.assert_allclose(bin_counts_so3g, bin_counts_dets, atol=tolerance) | ||
|
||
def test_02_binning_flags_float64(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reduce duplicate code in the test by using a helper function and calling it from test_01 and test_02.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved most of the duplicated code in tests to standalone functions.
src/array_ops.cxx
Outdated
void _histogram(const T* data, const T* weights, int* histogram, | ||
const T* bin_edges, const int nsamps, const int nbins, | ||
const T lower, const T upper) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is very brute-force -- looping over ~every bin for every sample. A really basic optimization would be to assume correlation in data -- i.e. if sample i was in bin j, start your search for i+1 at bin j (and probably you'll find it in j, j-1, or j+1).
But really this function is used in one place ... where one already has "bin_indices" ... so perhaps this function isn't needed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this function and replaced with code that uses bin_indices
and confirmed it is equivalent. While that added step can also be removed and just use the existing code in the samps
loop, it is faster to do it up front like this then for check for each sample if possible.
src/array_ops.cxx
Outdated
} | ||
} | ||
// Edge case to match np.histogram | ||
if (data[i] == bin_edges[nbins] && data[i] <= upper) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that ==
right here? Wow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, technically it should have been >=
since we would want to add the values between bin_edges[nbins]
and upper
though in some cases this would just limit to only ==
. Function has been removed though, so now irrelevant.
src/array_ops.cxx
Outdated
" lower: lower bin range (float64)\n" | ||
" upper: upper bin range (float64)\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point of these two args is unclear. And they're only propagated through to _histogram
, which as discussed above I think should be perhaps be canned...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are to allow for excluding values outside of a different (perhaps more restricted) range that is not strictly defined by the bin edges. I added these here to reproduce the range
arguments in numpy
's bin_edges
and np.histogram
. While I've removed the histogram
function, I've kept these in the new code that uses bin_indices
in order to make it consistent with the sotodlib
version that has range
as an input. I have also made the docstring a bit more explanative. I tested limiting the range in the tests to less than the full x array range and find that they both agree.
The sotodlib
version doesn't use np.histogram
when there are flags so that doesn't actually limit based on the range
argument (other than in np.histogram_bin_edges
) in the way that it does when there are no flags. I've matched that difference here, but perhaps we might want to make these two cases consistent across both versions.
Adds binning functions that replicate most of the functionality of
sotodlib's
bin_signal
(https://github.com/simonsobs/sotodlib/blob/master/sotodlib/tod_ops/binning.py). There are two functions,bin_signal
andbin_flagged_signal
since binning with flags is much slower. Setting the bin edges is left up to python since it has algorithms to find the optimum bin numbers, but might be added later. The weight and flag arrays may be either 1D (nsamps
) or 2D (ndets
,nsamps
) as in thesotodlib
version.