A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Why do you think that https://github.com/salesforce/ALPRO is a good alternative to ONE-PEACE