http://swrc.ontoware.org/ontology#Thesis
A 3-Dimensional Extension of Parallel Coordinate Plot
en
本多 啓介
ホンダ ケイスケ
HONDA Keisuke
総研大甲第1150号
The three dimensional parallel coordinate plot (3-D PCP) is a visualization method <br /> to detect hidden information in data by using human spatial perception. The 3-D PCP <br /> proposed in this dissertation can display several characteristics of multiple variables <br /> simultaneously. First, it can show all the information of each observation at a glance. <br /> Second, it can make some non-linear structures in data clear. Finally, it is useful to <br /> find piecewise linear relationships of variables and their conditions. <br /> The basic idea of the 3-D PCP has already appeared in the parallel coordinate plot. <br /> The parallel coordinate plot can visualize multi dimensional data on a two dimensional <br /> plane. Coordinates of all variables are set in parallel. In the standard form of parallel <br /> coordinate plot the bottom position of each axis corresponds to the minimum value of <br /> each variable, and the top to the maximum value. One observation corresponds to one <br /> set of connected lines. Parallel coordinate plot can shows the characteristic between <br />two adjoining axes of variables directly. If two variables have a correlation coefficient <br />of 1, lines expressing observations are located horizontally. If two variables have <br />correlation coefficient -l, lines of them cross in one point in the middle between two <br />axes. <br /> However, relations between two variables whose positions are apart more than two <br />axes are not clearly shown immediately. Another serious problem of static parallel <br />coordinate plot is that it is not easy to distinguish one observation from another when <br />the number of observations is large. To solve these problems, several interactive <br />techniques have been developed including highlighting by brushing operations. The <br />3-D PCP can show the same effect as the highlighting by brushing operation in a <br />parallel coordinate plot by extending it into 3-dimensional space. We choose one <br />variable as a reference variable, usually a response variable. The 3-D PCP places <br />connected lines expressing observations in 3-dimensional spaces by sorting them <br />according to the values of the reference variable. This observation-wise 3-D PCP <br />representation is useful for illustrating the characteristics of observations such as <br />outliers. <br /> The 3-D PCP has another representation in which values of observations on each <br />variable are connected by lines. It is called variable-wise connection representation <br />and is useful to see relations between the reference variable and other variables. For <br />example, the connected lines expressing the variable which has strong linear <br />relationships with the reference variable are located around a straight line expressing <br />the reference variable. It is well known that the scatterplot matrix can show <br />relationships between variables clearly. However, if the number of variables is large, <br />the single scatterplot elements become too small to be seen properly. The 3-D PCP can <br />show more variables than a scatterplot matrix. It is sometimes more suitable to show <br />characteristics of data simultaneously than using a static representation of scatterplot <br />matrlx. <br /> We note that the 3-D PCP can detect particular non-linear relations, i.e. interaction <br />by two variables, through observation-wise connection representation. This <br />relationship is detected by the special pattern of the angles of connected lines <br />expressing observations. As explained earlier, a correlation coefficient near 1 or -1 <br />between two variables whose axes are adjacent produces parallel or crossing patterns <br />in parallel coordinate plots. Similar patterns can be detected in 3-D PCP. If we find the <br />change of such patterns at specific values of the reference variable, we conclude that <br />the structures of data are different for each region of the reference variables. In such <br />cases, it is natural to divide the data into several groups in which the structures have <br />no more changes. We propose to draw many 3-D PCPs corresponding to the groups <br />simultaneously and use a lattice layout, which places many graphs on a grid. We show <br />that they are useful to identify the interaction between two variables by analyzing <br />simulation and real data. <br /> We realize our 3-D PCP by using the Java language. Java has several advantages for <br />implementing modern data visualization methods. It is a pure object oriented <br />programming language and has well-designed standard graphics libraries which are <br />useful to realize 2-D and 3-D graphics and interactive graphical user interfaces. These <br />libraries can work as useful components of statistical graphics and be in incorporated <br />by using so-called design patterns. Design patterns are suggested solutions to common <br />problems often appearing in object-oriented software development. Our <br />implementation is based on several design patterns for generality and reusability. Our <br />software enables us to analyze data by utilizing advanced interactive operations given <br />by Java. We show that the 3-D PCP and the software are expected to lead to new <br />achievements in the field of data visualization. <br /> This dissertation is set out as following. Chapter 1 surveys issues of information <br />visualization and basic statistical graphics used in multivariate data visualization <br />such as scatterplot, scatterplot matrix, 3-D scatterplot and parallel coordinate plot. It <br />also discusses dynamic techniques of data visualization and existing software products <br />for data visualization such as Mondrian, ParallAX, and GGobi. In Chapter 2, 2-D <br />parallel coordinate plots are discussed. Important issues at visual data analysis with <br />parallel coordinate pIot are considered. We introduce several existing works for <br />extending parallel coordinate plot into 3'dimensional spaces. In Chapter 3, we discuss <br />our extension of parallel coordinate plot into 3-dimensional space, and several of its <br />characteristics. We show usefulness of lattice layout to display several 3-D PCP at a <br />time in Chapter 4. Chapter 5 analyzes three data sets by using our 3-D PCP. In <br />Chapter 6, we explain details of our software design. Finally, concluding remarks are <br />given in Chapter 7. <br />