Skip to content

feat(send_queue): report progress for media uploads #5008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Johennes
Copy link
Contributor

@Johennes Johennes commented May 6, 2025

This makes it possible to listen to the media upload progress when using the send queue. The progress is communicated via EventSendState::NotSentYet.

I haven't yet fixed / enhanced the tests or added a changelog because I'm not 100% sure about this approach and would like to get some preliminary feedback first.

  • Public API changes documented in changelogs (optional)


Edit by @Hywan:

@Johennes Johennes marked this pull request as ready for review May 26, 2025 18:40
@Johennes Johennes requested a review from a team as a code owner May 26, 2025 18:40
@Johennes Johennes requested review from Hywan and removed request for a team May 26, 2025 18:40
Copy link
Member

@Hywan Hywan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A really nice PR! I've a bit of feedback, but otherwise we are on the right direction.

We (the Matrix Rust team) were discussing the fact the timeline item is going to be recreated for each transmission progress update. Maybe we want a bit of throttle when broadcasting the transmission progress so that we reduce the amount of data sent to the apps. People at Element will start playing with it inside Element X, let's see how it goes.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, and nice that we could in theory have this! I've got a few questions about the approach below. We'd like to try it out before validating the overall approach, because of the remark related to memory usage across the FFI, but other than that, it would be a great addition!

@jmartinesp
Copy link
Contributor

jmartinesp commented May 28, 2025

I just tested this on EXA and it seems like there's a memory peak when receiving the upload progress:

Grabacion.de.pantalla.2025-05-28.a.las.9.54.21.mov

Keep in mind though that this may also be caused by the upload process itself.


I tried measuring the allocations now and I got this:

image

EventSendProgress$MediaUpload is the binding version of the SDK record, MediaUploadProgress is what we map it to. I think we have 2x the SDK items because we don't map them in the pinned events timeline, so they're discarded.

The range in the image shows an spike in general allocations, specially in the Java Heap. It's a shame we can't filter out items for the graph and just get the values for these allocations, we have no easy way of knowing whether these allocations come mainly from the EventSendProgress objects and their mapping process or there's something else happening at the same time.

@Hywan
Copy link
Member

Hywan commented May 28, 2025

@jmartinesp Thanks! Can you measure if the memory peak maps to the size of the media, or is it something else?

@jmartinesp
Copy link
Contributor

image

I'd say it looks quite similar without these changes, the image above being a bit more stretched.

@Johennes
Copy link
Contributor Author

@jmartinesp thanks a bunch for taking the time to test the impact of this on Android! ❤️

bnjbvr pushed a commit that referenced this pull request Jun 4, 2025
Addendum to #5125 to
allow sending galleries from the FFI crate. This is the final PR for
galleries (apart from possible enhancements to report the upload status
in #5008).

This is somewhat lengthy again, apologies. Most of the changes are just
wrappers and type mapping, however. So I was hoping that it's relatively
straightforward to review in bulk. Happy to try and elaborate on the
changes or break them up into smaller commits if that helps, however.

---------

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
@bnjbvr bnjbvr self-requested a review June 12, 2025 16:09
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could avoid storing the sizes in persistent storage; after all, we're only interested in the aggregate of the remaining files to upload, so keeping this information in memory ought to be sufficient (plus, all the previously uploaded medias should still be in the event cache store until the media event is sent, at the end of the process). Let's try to make it more stateless instead.

This PR is a bit hard to review; if you introduce renamings, please keep the commit history tidy (no merge commits, only rebase/fixup! commits) and put renamings into individual commits. Also, would be sweet to group changes to the same layer (send queue vs timeline vs FFI) in commits, so as to give me a "direction" where to start from, and where to look at, in the end. You don't have to do it for this very specific PR, but for what it's worth, this is the kind of empathetic changes that usually helps (thus speeds up) reviews :)

I'd like to give this more thought and another look. Also, please add at least one end-to-end test so as to validate the approach :)

Johennes added 7 commits July 7, 2025 19:58
Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…QueueThumbnailInfo struct for easier extension

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…ting later

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…of unstable-msc4274 to use it during media progress reporting later

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…_request

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…pseudo units

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
…Upload to prepare it for communicating upload progress

Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
@Johennes Johennes force-pushed the johannes/send-queue-media-progress branch from 857dcd2 to 934038e Compare July 7, 2025 17:58
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks A LOT for breaking this into digestible commits; this made the review so much simpler. Overall this starts to look great, I've got a few non-trivial improvement requests that warrant another look, sorry about this. But we're reaching a nice state now. Really appreciate the test changes too 👏

Usually at this point, I'd try to help with the merging of this PR, by addressing the comments myself, but turns out I'll be away starting today and for a week or so, so I don't have much time to do this. If you're bored with this PR, I'll be happy to address my own comments when I'm back. Alternatively, another reviewer can make sure that my comments have been addressed in the meanwhile (although it'd be a bit of a stretch for them to read all the prior work).

Thanks a bunch, though; excited to get this soon!


/// In-memory mapping of media transaction IDs to thumbnail sizes for the
/// purpose of progress reporting.
thumbnail_size_cache: Arc<Mutex<HashMap<OwnedTransactionId, Vec<Option<usize>>>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could we explain in the code comment the oddities about the value:
    1. Why do we need to store multiple thumbnail sizes for a single media transaction? As far as I understand, the transaction id is for the media event (either a regular (single) media event or a gallery event)
    2. Why can it be None? Could we store a Vec<usize> instead, and filter out the cases where there's no thumbnails? (and not even insert into this map, in this case)
  2. Could we use a sync mutex here, aka matrix_sdk_common::locks::Mutex? we might not need the async part, if the lock is only held across sync portions of the code
  3. Out of curiosity, would it make sense to have this in send_queue::StoreLock, which already contains some information about the current request? (in the form of BeingSent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed in cf51ac9.
  2. Fixed in 97a553f.
  3. I will have to look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed 3 with 971dcda and ad963c7. I had to switch back to the async Mutex though because that has lock_owned and I couldn't figure out the lifetimes without it. 🙈


/// Determine the size of a pending file upload, if this is a thumbnail
/// upload or return 0 otherwise.
async fn get_dependent_pending_file_upload_size(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, in the worst case, we'll have to take the event cache store lock twice, and the state store lock once, only to retrieve data that was in the stores. Since we already duplicate some of this data in the in-memory cache, we might make more use of this in-memory cache, by preemptively storing all the useful information in there, such that we don't need extra store accesses here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also store the other needed sizes there, yes. I think we would still need the current look-ups from persisted data though because in the case of galleries, we would otherwise not have anything to report progress for the remaining gallery items?

Johennes added 20 commits July 9, 2025 19:14
…s updates

Control send queue progress reporting via the send queue rather than via the client
…dQueueUpdate::MediaUpload

Enable upload progress via the send queue rather than via the client builder
…tSendState::NotSentYet

Enable upload progress via the send queue rather than via the client builder
…dQueueUpdate::MediaUpload

Correct comments of assert_update macros
…dQueueUpdate::MediaUpload

Split fields of uploaded_with_progress across several lines
…dQueueUpdate::MediaUpload

Clarify that index will always be zero outside of galleries
…tSendState::NotSentYet

Clarify that index will always be zero outside of galleries
…::NotSentYet

Clarify that index will always be zero outside of galleries
…dQueueUpdate::MediaUpload

Also assert the index when asserting upload progress
…dQueueUpdate::MediaUpload

Clarify what progress means for gallery uploads
…tSendState::NotSentYet

Clarify what progress means for gallery uploads
…dQueueUpdate::MediaUpload

Rename uploaded and pending for better clarity
…dQueueUpdate::MediaUpload

Only check if upload progress should be reported before starting uploads
…dQueueUpdate::MediaUpload

Reuse MediaUploadInfo in function parameters
…s reporting later

Expand doc comment for thumbnail_size_cache
…s reporting later

Move thumbnail size cache into StoreLock
…dQueueUpdate::MediaUpload

Read cached thumbnail size from StoreLock
@Johennes
Copy link
Contributor Author

Johennes commented Jul 10, 2025

Usually at this point, I'd try to help with the merging of this PR, by addressing the comments myself, but turns out I'll be away starting today and for a week or so, so I don't have much time to do this. If you're bored with this PR, I'll be happy to address my own comments when I'm back. Alternatively, another reviewer can make sure that my comments have been addressed in the meanwhile (although it'd be a bit of a stretch for them to read all the prior work).

Thanks for the review. 🙏 Letting this sit until you return seems fine from my perspective. I have addressed all comments apart from those I left unresolved.

PS: There was a conflict from a change on main. I wasn't sure how to resolve this but the folks in #matrix-rust-sdk-dev:flipdot.org said a merge should be ok.

@Johennes Johennes requested a review from bnjbvr July 10, 2025 13:16
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it seems that taking a week of vacation gave me a fresh pair of eyes, and I spotted many new surprising things I hadn't seen before.

I'd like this PR to be split between the timeline changes and the send queue changes, otherwise we're likely to remain stuck in review land for too long, as there are still many changes to address at both layers.

I've posted a few more comments below, but does it sound reasonable to break the PR apart in smaller chunks again, please? One thing I could do, as a proof of goodwill, is try to hack on the send queue related changes myself, by making PRs against a branch that only included the send queue related changes.

Comment on lines +227 to +250
/// Progress of an operation in abstract units.
///
/// Contrary to [`TransmissionProgress`], this allows tracking the progress
/// of sending or receiving a payload in estimated pseudo units representing a
/// percentage. This is helpful in cases where the exact progress in bytes isn't
/// known, for instance, because encryption (which changes the size) happens on
/// the fly.
#[derive(Clone, Copy, uniffi::Record)]
pub struct AbstractProgress {
/// How many units were already transferred.
pub current: u64,
/// How many units there are in total.
pub total: u64,
}

impl From<matrix_sdk::AbstractProgress> for AbstractProgress {
fn from(value: matrix_sdk::AbstractProgress) -> Self {
Self {
current: value.current.try_into().unwrap_or(u64::MAX),
total: value.total.try_into().unwrap_or(u64::MAX),
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this over to the other FFI file, since it's only used there.

Comment on lines +147 to +150
assert_let_timeout!(
Duration::from_secs(3),
Some(VectorDiff::Set { index: 1, value: item }) = timeline_stream.next()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that we get this new update, despite not enabling the send queue progress here?

// Eventually, the media is updated with the final MXC IDs…
{
assert_let_timeout!(
Duration::from_secs(3),
Some(VectorDiff::Set { index: 1, value: item }) = timeline_stream.next()
);
assert_let!(Some(msg) = item.content().as_message());
assert_matches!(item.send_state(), Some(EventSendState::NotSentYet));
assert_matches!(item.send_state(), Some(EventSendState::NotSentYet { progress: None }));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something I don't understand: just above the progress was set with current == total, and now it's undefined again? That might cause a flash in UIs, where the event is marked as sent, and then in unknown progress again (until the sent echo comes back to us).

@@ -329,7 +329,7 @@ async fn test_local_reaction_to_local_echo() {

let item = item.as_event().unwrap();
assert!(item.is_local_echo());
assert_matches!(item.send_state(), Some(EventSendState::NotSentYet));
assert_matches!(item.send_state(), Some(EventSendState::NotSentYet { progress: None }));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do realize only now that it's a bit weird that message events also get a progress optional value here, despite this option being never set. This suggests that some variant EventSendState::Uploading that we could use for medias only, maybe?

Comment on lines +68 to +71
NotSentYet {
/// The progress of the sending operation, if any is available.
progress: Option<EventSendProgress>,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on other comments: we might prefer to have a new variant NotUploadedYet, for medias. The progress field isn't used, for non-media uploads.

return None;
}

if let Some(related_txn_id) = related_txn_id {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid one indent level, since this function returns an Option :)

let related_txn_id = related_txn_id?;

Comment on lines +642 to +653
let media_upload_info =
RoomSendQueue::try_create_media_upload_info(&queued_request, &room, &queue)
.await
.unwrap_or_default();
let progress = RoomSendQueue::try_create_media_upload_progress_observable(
&report_media_upload_progress,
&media_upload_info,
&related_txn_id,
room_id,
&global_update_sender,
&update_sender,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will always create a progress listener, even for sending non-medias. I don't think we should do this, so maybe it is time to split the code in such a way that sending a plain event or a media go through different code paths. Let's not act on it quite yet; i'd like to think more about this.


// Get the size of the file being uploaded from the event cache.
let bytes = if let Ok(cache) = room.client().event_cache_store().lock().await {
if let Ok(Some(content)) = cache.get_media_content(cache_key).await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really can't just ignore errors, as they're unexpected. We need proper error handling in this function.

}

if let Ok(cache) = room.client().event_cache_store().lock().await {
if let Ok(Some(content)) = cache.get_media_content(&cache_key).await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need proper error handling here too.

@Johennes
Copy link
Contributor Author

I've posted a few more comments below, but does it sound reasonable to break the PR apart in smaller chunks again, please?

I have opened #5453 for the send queue part and #5454 for the timeline part (which will include the send queue part until #5453 lands). I'll close this to retain your comments that still need to be addressed.

@Johennes Johennes closed this Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants