Skip to content

Fixes + callback updates #2278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
May 19, 2025
Merged

Conversation

mathgeekcoder
Copy link
Contributor

@mathgeekcoder mathgeekcoder commented Apr 4, 2025

Changed internal callback code to use C++ rather than C for increased memory safety and ownership. e.g., avoid memory leaks, etc.

  • Includes a small breaking change for the C API: The HighsCallbackDataIn::user_solution memory pointer is allocated and managed by HiGHS. The user can modify the elements of the array - but they should not try to reallocate the memory.

  • Added setSolution and repairSolution functions that can accept partial solutions and (naively) find feasible solutions

  • Added usability changes to highspy callbacks + a setObjective function that should've been previously added.

  • Minor breaking change to highspy, the mip_solution no longer needs/supports to_array(n) - as it is now a numpy array.

QUESTIONS:

  • Should we change kHighsUndefined so that it is different to kHighsInf? Is it important to be able to tell the difference?
  • Is there already a function that finds a feasible solution that I can call in repairSolution?
  • Are you happy with repairSolution naming, or would you prefer Gurobi's useSolution?

jajhall and others added 5 commits April 3, 2025 11:47
…y safety and ownership. e.g., avoid out-of-bounds, especially after changes to the model.

Includes a small breaking change for the C API: renamed HighsCallbackDataOut to be Highs*C*CallbackDataOut.

Similar with HighsCCallbackDataIn, but it also requires user_solution_size to be set.  We will only access elements in memory up to user_solution_size, however we won't do anything unless user_solution_size == num_cols.

Also added minor usability changes to highspy callbacks + a setObjective function that should've been previously added.

Minor breaking change to highspy, the mip_solution no longer needs/supports to_array(n).
@odow
Copy link
Collaborator

odow commented Apr 4, 2025

Includes a small breaking change for the C API

No breaking changes please. People are using HiGHS in a variety of contexts, so it's really hard to know what the impact of this is.

@mathgeekcoder mathgeekcoder marked this pull request as draft April 4, 2025 23:25
@mathgeekcoder
Copy link
Contributor Author

No breaking changes please. People are using HiGHS in a variety of contexts, so it's really hard to know what the impact of this is.

Absolutely understood, but the current memory access with callbacks is quite fragile. I don't know how to improve this without any breaking changes.

The smallest breaking change would be to require a user_solution_size for HighsCallbackDataIn. That's probably not too bad?

Even with that, the C++ / highspy API would be less robust since the C-style structs ignore enum types and pass around raw pointers. We'd want to make sure the code is suitably documented about who owns the pointers etc. so we don't have memory leaks or access violations.

Happy to take suggestions?

Copy link

codecov bot commented Apr 5, 2025

Codecov Report

Attention: Patch coverage is 87.28324% with 44 lines in your changes missing coverage. Please review.

Project coverage is 79.33%. Comparing base (b58fb8e) to head (3e8f160).
Report is 89 commits behind head on latest.

Files with missing lines Patch % Lines
highs/lp_data/HighsCallback.cpp 81.93% 28 Missing ⚠️
highs/interfaces/highs_c_api.cpp 0.00% 15 Missing ⚠️
check/TestCallbacks.cpp 98.11% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           latest    #2278    +/-   ##
========================================
  Coverage   79.32%   79.33%            
========================================
  Files         345      346     +1     
  Lines       84413    84631   +218     
========================================
+ Hits        66959    67139   +180     
- Misses      17454    17492    +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@odow
Copy link
Collaborator

odow commented Apr 5, 2025

Perhaps I missed something in a different issue, but what's the problem with the current callback?

@mathgeekcoder
Copy link
Contributor Author

Perhaps I missed something in a different issue, but what's the problem with the current callback?

It's related to #2270, updating support for callbacks with highspy. When adding support for user_solution I noticed that the consuming code assumes length is num_col (see code below).

if (mipsolver.callback_->data_in.user_solution) {
    std::vector<double> user_solution(mipsolver.orig_model_->num_col_);

    for (HighsInt iCol = 0; iCol < mipsolver.orig_model_->num_col_; iCol++)
      user_solution[iCol] = mipsolver.callback_->data_in.user_solution[iCol];

This is a reasonable assumption, but there's no bound checks - and there's a lot of assumptions about memory ownership. Unless the user really knows what they're doing, it'd be easy to make a mistake (especially if making changes to the model, e.g., adding columns).

By converting the callback structure to C++, many of these potential issues are avoided. In addition, this additional copy (above) is no longer needed for at least C++/highspy users since highs keeps ownership of the memory.

Alternatively - at a minimum, by having a user_solution_size, we make the user declare how much memory they've allocated to avoid potential access violations. This is clearly not foolproof but might help avoid some issues. Honestly, it'd be better to have the C++ callback structure.

Similarly, with HighsCallbackDataOut, there's mip_solution, cutpool_start, cutpool_index, cutpool_value, cutpool_lower, and cutpool_upper are currently raw pointers.

@odow
Copy link
Collaborator

odow commented Apr 5, 2025

Unless the user really knows what they're doing, it'd be easy to make a mistake

I think a strong assumption is that callbacks are extras for experts. Callback authors are potentially novice, but typically very experienced because they understand what a callback is, when it gets called by the solver, and why it might be useful.

highspy could check bounds etc.

Before we make changes, let me try and wrap the callback with HiGHS.jl and I'll report back.

@odow
Copy link
Collaborator

odow commented Apr 5, 2025

Oh. I see what you mean. The ownership of user_solution is particularly problematic for GC based languages.

We're also missing a kHighsCallbackMipUserSolution constant in the C API.

I'd settle for breaking changes to the way we provide user_solution in the next release. @jajhall can comment, but it looks like this was added for someone particular without thinking through the broader challenges?

Other questions: how can I provide a partial solution? Is there an equivalent to GRB_UNDEFINED?

Broader: I'd be in favor of a new callback interface that followed Gurobi. You cannot interact with the in and out pointers except via things like Highs_getDoubleCallbackDataItem and Highs_setDoubleCallbackDataItem.

@mathgeekcoder
Copy link
Contributor Author

mathgeekcoder commented Apr 5, 2025

Thanks, I appreciate the 2nd opinion!

I'm not quite happy with my approach in the PR. I feel like HiGHS should consistently either own the memory, or the user does.

Looking at Gurobi API, from the users perspective, they own the memory - and they can delete at end of scope since they get/set the solution with function calls. Likely these calls are just copying to/from the internally owned buffers in Gurobis cbdata struct.

With this in mind, and since HiGHS currently owns the out solution data, it probably should also own the in solution data.

If we do this, it's easy to support a set solution() function syntax (or the user can modify the buffer directly). And I agree - it would be great to support partial solutions with UNDEFINED etc!

I'll have a crack at it this week, but I'd certainly appreciate @jajhall thoughts too!

@jajhall
Copy link
Member

jajhall commented Apr 5, 2025

Thanks for your work on this, @mathgeekcoder and @odow

it looks like this was added for someone particular without thinking through the broader challenges?

Yes, the user solution callback was created by me for NVIDIA. My C++/C/Python skills aren't up to thinking through broader challenges in this case.

it would be great to support partial solutions with UNDEFINED etc!

At the moment, partial solutions can only be defined by passing arrays of values and indices for the defined values. However, it looks simple to allow a full-length solution to contain values set to kHighsUndefined.

I'm OK with breaking changes to the user_solution callback, as it's only recently been introduced. I'm not 100% happy with having it anyway, as it generally makes the HiGHS MIP solver non-deterministic.

I'm happy with you guys doing what you think is best with HiGHS callbacks, as I've already shown that I don't have the skills to code them up the best way, although please keep @galabovaa in the loop.

In passing, I'm curious that values of data_in set in (eg) examples/call_highs_from_python.py that are passed back through the parameter list of user_interrupt_callback are recognised by the C++ method (ultimately) calling the callback, since I didn't think Python could pass parameters by reference. Maybe I should just accept that it works, and not try to understand why!

@odow
Copy link
Collaborator

odow commented Apr 6, 2025

or the user can modify the buffer directly

Yes, if HiGHS provided a valid HighsCallbackDataIn that we could modify, that would be great.

it looks simple to allow a full-length solution to contain values set to kHighsUndefined

👍

@mathgeekcoder
Copy link
Contributor Author

Just FYI, I'm still working on this (made good progress already), but had some other commitments earlier this week.

On my local clone, I've rolled back most of my breaking changes to the callback C API, the main one now is that user_solution is "owned" by HiGHS. I've been playing with some other design choices (can elaborate more when I update the PR). Would love to get feedback on it.

I'm happy with you guys doing what you think is best with HiGHS callbacks, as I've already shown that I don't have the skills to code them up the best way, although please keep @galabovaa in the loop.

Honestly, this stuff is hard to get "right" for everyone - especially since what you've written will work and can be very efficient. It just makes me nervous to ensure that users "do the right thing".

I'm also worried about how this will work when we have multiple threads etc. I've some ideas, but haven't really thought too much ahead yet. @jajhall - I know you've started looking at the parallel branch and bound - do you know if this might impact our approach? We might need to synchronize the threads or have multiple copies of the data.

In passing, I'm curious that values of data_in set in (eg) examples/call_highs_from_python.py that are passed back through the parameter list of user_interrupt_callback are recognised by the C++ method (ultimately) calling the callback, since I didn't think Python could pass parameters by reference. Maybe I should just accept that it works, and not try to understand why!

I swear there's some arcane magic behind the scenes with pybind! I probably don't understand half of it, but it does a great job.

@jajhall
Copy link
Member

jajhall commented Apr 9, 2025

A callback to feed user solutions to HiGHS is very low priority, so don't compromise other things to make it work. Feeding even the optimal solution to an exact MIP solver frequently doesn't help much - and tossing in user solutions makes the MIP solver non-deterministic. If only NVIDIA hadn't suggested it would be useful when we first spoke to them!

…s now managed by HiGHS instead of the user

* added `setSolution` and `repairSolution` members that can accept partial solutions and find feasible solutions
* added kHighsCallbackCallbackMipUserSolution to C API
* changed C++ API to support both C API and new `HighsCallbackOutput`/`HighsCallbackInput` classes that manage memory
* Updated C/C++/highspy unit tests
* Updated highspy callback interface.  All callback arrays use numpy with read_only pointers, but contents can be changed safely.
* Some formatting changes (to conform with build checks)
@odow
Copy link
Collaborator

odow commented Apr 15, 2025

On latest I've added a Julia CI job: https://github.com/ERGO-Code/HiGHS/blob/latest/.github/workflows/julia-tests-ubuntu.yml. I'm okay with any changes that don't break that build. I don't see it showing up in the current jobs so I guess this branch needs rebasing to pick up the commit.

mathgeekcoder and others added 3 commits May 5, 2025 13:15
* Added some helper functions to access cuts in highspy
* Added another highspy callback example
…y safety and ownership. e.g., avoid out-of-bounds, especially after changes to the model.

Includes a small breaking change for the C API: renamed HighsCallbackDataOut to be Highs*C*CallbackDataOut.

Similar with HighsCCallbackDataIn, but it also requires user_solution_size to be set.  We will only access elements in memory up to user_solution_size, however we won't do anything unless user_solution_size == num_cols.

Also added minor usability changes to highspy callbacks + a setObjective function that should've been previously added.

Minor breaking change to highspy, the mip_solution no longer needs/supports to_array(n).
…s now managed by HiGHS instead of the user

* added `setSolution` and `repairSolution` members that can accept partial solutions and find feasible solutions
* added kHighsCallbackCallbackMipUserSolution to C API
* changed C++ API to support both C API and new `HighsCallbackOutput`/`HighsCallbackInput` classes that manage memory
* Updated C/C++/highspy unit tests
* Updated highspy callback interface.  All callback arrays use numpy with read_only pointers, but contents can be changed safely.
* Some formatting changes (to conform with build checks)
* Added some helper functions to access cuts in highspy
* Added another highspy callback example
@mathgeekcoder mathgeekcoder changed the base branch from fix-2270 to latest May 5, 2025 20:35
@galabovaa
Copy link
Contributor

Thank you for getting back to this @mathgeekcoder!

Just a note, the valgrind test is fixed on a branch not yet merged in latest, and I'm still tracking the occational bazel tsan failure

@mathgeekcoder
Copy link
Contributor Author

Thank you for getting back to this @mathgeekcoder!

Just a note, the valgrind test is fixed on a branch not yet merged in latest, and I'm still tracking the occational bazel tsan failure

Thanks @galabovaa! Sorry it's taken me so long to get back to it.

I appreciate the note! I was going to investigate that today... you've saved me some time :) I'll try to track down the issue in julia, but afterwards I'd love to get your opinion on the PR.

…s the first two fields of HighsCallbackDataIn to be: interrupt, solution.

I had rearranged the fields in HighsCallbackDataIn, which caused the break.
@mathgeekcoder mathgeekcoder marked this pull request as ready for review May 15, 2025 14:35
@mathgeekcoder
Copy link
Contributor Author

Sorry @galabovaa, I forgot it was still in "draft". It was ready for initial review last week. Whenever you get the chance to have a look.

The additional failing checks seem to be a github error rather than code, but I couldn't see how to re-run them without pushing new changes.

@galabovaa galabovaa self-requested a review May 15, 2025 15:02
Copy link
Contributor

@galabovaa galabovaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Perhaps @jajhall should have a final look too before we merge this.

@galabovaa
Copy link
Contributor

A comment on the CI:
I have more work to do on the bazel tsan test and the valgrind fix will be merged soon.
I have just re-run the two other failing tests

Copy link
Member

@jajhall jajhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all this @mathgeekcoder

I thought that using pointers for callback data input/output would make life easier in C, but clearly not!

@jajhall jajhall merged commit 06c92d1 into ERGO-Code:latest May 19, 2025
138 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants