Machine Learning and Computer Vision – teaching machines to solve tasks and, in particular, learning to see – has come to a point where it is powering numerous practical applications and tools: The resulting algorithms enable to search enormous volumes of data and to retrieve relevant text, images, videos, or music from large collections or the internet. Moreover, computers are now making progress at “understanding” text and images to the point that they can answer queries in natural text, synthesize a musical tune, or create images and videos. Recent breakthroughs in image synthesis such as ‘stable diffusion’ are now already generating images from a mere user description of the content. Consequently, AI is currently transforming the abilities of computers: they can now support users in ever more complex tasks and do so with much greater ease and flexibility. But, is the machinery underneath actually understanding the content? What is missing and what will also be future limitations?
This course will review the Machine Learning and Computer Vision fundamentals underlying the present revolution in this field. We will discuss how some important basic Machine Learning algorithms practically work and how they can be employed in real applications. We will study what is necessary for a computer to “see”. Then we will review current state-of-the-art approaches for text and image retrieval, and demo algorithms for generating text, music, images, and video.
Participants will study topics from the literature and present the respective works in the course and in a written report afterwards.
Please consult with LSF and/or your examination office for crediting this course.