Research Intern (Multimodal GenAI and World Models)
IRG_M3S_T5I_2025_020
Project Overview
We are seeking highly motivated research interns interested in advancing the state of the art in multimodal generative AI and world models at SMART in the program: Mens, Manus, and Machina: How AI empowers people and the city in Singapore (M3S).
With the rapid development of Multimodal Large Language Models (MLLMs), research interns will assist researchers in investigating how to develop next-generation multimodal generative AI and world models that combine vision, language, video, and structured data. We are interested in, but not limited to, methods such as Transformer, diffusion models, Bayesian theory, world models, vision–language models and spatial intelligence with their applications.
Responsibilities
- Implement and benchmark state-of-the-art multimodal GenAI and world models.
- Tune network and parameters for improved performance of the proposed research ideas.
- Contribute to experimental design, model evaluation, and reproducible research codebases.
- Collaborate with other research staff and students to publish research results in top journals and conferences.
Requirements
- Currently pursuing a Bachelor’s or Master’s degree in Engineering, Data science, Computer science, Automation, ML/AI, or a related field.
- Strong motivation to learn, explore, and work collaboratively in the multimodal AI and GenAI field.
- Experience with Python and deep learning pipelines (e.g., PyTorch, Transformers, Timm).
- Experience with multimodal GenAI models and world models (e.g., Computer vision, NLP, Diffusion or Autoregressive models).
The internship must be approved by the University.
