You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it is stated that "The CLIP loss incentivizes the video and text encoders to make the embeddings of paired videos and reports as similar as possible, while making the embeddings of unpaired videos and reports as different as possible (Fig. 1a)."
Given the specificity of the medical imaging field, I assume that patients with the same disease might have similar reports. How did you address this issue when sampling batch data or calculating the loss function?
The text was updated successfully, but these errors were encountered:
Thank you for sharing nice work.
In the paper, it is stated that "The CLIP loss incentivizes the video and text encoders to make the embeddings of paired videos and reports as similar as possible, while making the embeddings of unpaired videos and reports as different as possible (Fig. 1a)."
Given the specificity of the medical imaging field, I assume that patients with the same disease might have similar reports. How did you address this issue when sampling batch data or calculating the loss function?
The text was updated successfully, but these errors were encountered: