Graphical Probabilistic Modeling for Video and Image Content Processing

Xiao-Ping Zhang


With the rapid growing popularity of multimedia content creation and sharing enabled by the high quality mobile multimedia capture devices and the broadband wire or wireless connections, the volume of new content available to us is beyond our consumption capacity. For example, in 2012, video material uploaded to the YouTube video sharing service has increased from about 48 hours per minute a year ago to about 72 hours per minute. Tools that can automatically find the most relevant content according to our interests, specified manually or learned from our viewing history, are desired. Multimedia content processing and understanding is an indispensable component in such tools and many other multimedia applications and services. It has been an active research area in the last two decades, and it has evolved to a combination and interconnection of many subjects such as audio and music processing, image and video processing, natural language processing, computer vision, and machine learning.

Graphical models are the combination of probability theory and graph theory. They provide a way to represent the joint distribution over all of the random variables by a product of factors each depending only on a subset of the variables. Such models are flexible and scalable to capture complex dependencies among large number of random variables. Many existing statistical models and methods find a unifying representation in graphical models. For example, Hidden Markov models, random Markov field, Kalman filter, etc. As an emerging framework in machine learning with enormous potential, graphical models have been introduced in the multimedia content processing area, and are adopted widely in many applications. The intrinsic nature of multimedia content processing tasks, including high dimension of the low level features, rich prior knowledge on the structure of multimedia content, as well as the complex temporal-spatial correlation among multiple modalities, find a perfect match for graphical models. There is no doubt that the application of graphical models in multimedia content processing will keep thriving.

The purpose of this tutorial is twofold: introducing the graphical models as a new framework of machine learning, and demonstrating the applications of graphical models in multimedia content processing domain. This tutorial is intended for researchers in the multimedia content processing and understanding area as well as professionals working in related fields. Fundamentals in both graphical models and the multimedia content processing will be covered in this tutorial, and there are no special prerequisites for the audience. The tutorial is designed to present a refreshing, broad perspective on graphic models, and in-depth examples on their application in multimedia content processing. It will be valuable to experts in the constituent technologies such as multimedia indexing and search, content-based processing who are looking to broaden their knowledge beyond their current areas of expertise. Specifically, the audience will: 1) Understand the basic of the graph theory and graphical models; 2) Learn special graphical models, including hidden Markov models, Markov random field, and conditional random field; 3) Get familiar with the general approaches in multimedia content processing systems; and 4) Study a few examples on how to apply graphical models in multimedia content processing tasks, for example, video event detection, video activity recognition, video sequence matching, image labeling, etc; 5) Be able to determine which topics in multimedia content processing are of interest to them for further study.


Xiao-Ping Zhang received B.S. and Ph.D. degrees from Tsinghua University, in 1992 and 1996, respectively, both in Electronic Engineering. He holds an MBA in Finance, Economics and Entrepreneurship with Honors from the University of Chicago Booth School of Business, Chicago, IL.

Since Fall 2000, he has been with the Department of Electrical and Computer Engineering, Ryerson University, where he is now Professor, Director of Communication and Signal Processing Applications Laboratory (CASPAL). Prior to joining Ryerson, he was a Senior DSP Engineer at SAM Technology, Inc., San Francisco, and a consultant at San Francisco Brain Research Institute. He held research and teaching positions at the Communication Research Laboratory, McMaster University, and worked as a postdoctoral fellow at the Beckman Institute, the University of Illinois at Urbana-Champaign, and the University of Texas, San Antonio. His research interests include multimedia content analysis, multimedia communications and signal processing, sensor networks and electronic systems, computational intelligence and pattern classification, and applications in bioinformatics, finance, and marketing. He is a frequent consultant for biotech companies and investment firms. He is cofounder and CEO for EidoSearch, an Ontario based company offering a content-based search and analysis engine for financial data.

Dr. Zhang is a registered Professional Engineer in Ontario, Canada, a Senior Member of IEEE and a member of Beta Gamma Sigma Honor Society. He is the publicity co-chair for ICME'06 and program co-chair for ICIC'05 and ICIC'10. He served as guest editor for the Multimedia Tools and Applications Journal, and the International Journal of Semantic Computing. He is currently an Associate Editor for IEEE Transactions on Signal Processing, IEEE Transactions on Multimedia, IEEE Signal Processing letters and for Journal of Multimedia.