This book constitutes the refereed proceedings of the International Workshop MUSCLE 2011 on Computational Intelligence for Multimedia Understanding, organized by the ERCIM working group in Pisa, Italy on December 2011. The 18 revised full papers were carefully reviewed and selected from over numerous submissions. The papers cover the following topics: multisensor systems, multimodal analysis, crossmodel data analysis and clustering, mixed-reality applications, activity and object detection and recognition, text and speech recognition, multimedia labelling, semantic annotation, and metadata, multimodal indexing and searching in very large data-bases; and case studies.