@misc{oai:ir.soken.ac.jp:00000856, author = {LE, Duy-Dinh and リ, ドュイディン and LE, Duy-Dinh}, month = {2016-02-17, 2016-02-17}, note = {Human faces play an important role in efficiently indexing and accessing video contents, especially in large scale broadcasting news video databases. It is due to faces are associated to people who are related to key events and key activities happening from all over the world.There are many applications using face information as the key ingredient, for example, video mining, video indexing and retrieval, person identification and so on. However, face appearance in real environments exhibits many variations such as pose changes, facial expressions, aging, illumination changes, low resolution and occlusion, making it difficult for current state of the art face processing techniques to obtain reasonable retrieval results. This thesis studies human face processing techniques whose target is to efficiently apply to a general framework for large scale video mining and indexing. In this framework, faces firstly are extracted, filtered and normalized from video sequences by using a fast and robust face detector. Next, similar faces are grouped into clusters. Then, these face clusters are labeled by the person names extracted from the video transcripts. To extract faces from video, we propose a multi-stage approach that uses cascades of classifiers to yield a coarse-to-fine strategy to reduce significantly detection time while maintaining a high detection rate. This approach is distinguished from previous work by two features. First, we use a cascade of AdaBoost classifiers that is trained to be invariant to translation up to 25% of the original window size to detect quickly face candidate regions. Second, we use SVM classifiers which reuse the features selected by AdaBoost in the previous stage for robust classification and simple training. Reusing these features brings to two advantages: (i) These features do not need to be re-evaluated because they have already been evaluated. (ii) By using SVM classifiers with powerful generalization, using too many features in the cascade is avoided, with the important results of saving training time and
avoiding over-fitting.

Furthermore, to help to reduce the training time, we propose two feature selection methods that quickly select a small and optimal subset of features by using mutual information and feature variance. In the feature selection method using mutual information, we propose using a more efficient discretization method that uses minimum description length principle (MDLP) to estimate probability densities of continuous random variables. This approach can be considered as a generalization of previous ones that mainly use a single threshold for discretization. In the other feature selection method, features are selected based on their distances to principle components computed by PCA (principle component analysis) from the data distribution. Using this approach, the final classifier is able to run faster than that using the traditional PCA-based feature extraction method since it avoids computation cost of the subspace projection. These proposed feature selection methods are integrated seamlessly and efficiently into the multi-stage based framework for face detection describedabove.

The organization of the extracted faces is usually done automatically by using a clustering method. In many video indexing applications, k-means clustering is very common. However, it suffers from a number of serious drawbacks. For example, it can not be applied to general similarity measures; the number of clusters must be provided in advance; it generates many bad clusters when the input data is noisy; and it is not scalable to handle large datasets. Instead, we propose using the relevant set correlation (RSC) clustering model from which the GreedyRSC clustering heuristic derived. This clustering model can help to avoid all the problems of k-means clustering. Furthermore, it is very efficient in finding high quality clusters in such noisy datasets as face datasets extracted from video. These high qualityclusters along with person names extracted from video transcripts are useful to identify important people appearing frequently in video databases that can be done by an association method based on the statistical machine translation. The proposed techniques are integrated in developing a video indexing and retrievalsystem that can help users to access and navigate contents in news video databases easily and quickly. The system can show representative names and faces appearing in videos ranked by their occurrence frequency, and access to related news stories by using these faces or names. Furthermore, it can show possible associations between names and faces. Our approach is generic and has the potential to handle very large scale video datasets effectively and efficiently., application/pdf, 総研大甲第1001号}, title = {HUMAN FACE PROCESSING TECHNIQUES WITH APPLICATION TO LARGE SCALE VIDEO INDEXING}, year = {} }