Speaker: Dr.Chinyi Cheng (Research associate, Imperial College, London)
Title: The knowledge-based/intelligent system and applications: From the concept-based approach to the data/knowledge visualization
Formal concept analysis-based document representation and its application on document clustering/Term relationship extraction for the legal document /Feature-opinion pair mining on sentiment classification
The typical model of knowledge-based/intelligent system and application contains the modules of data source collection/input, data entity feature extraction and representation, data or knowledge storage and utilizing. The data types for knowledge-based/intelligent system and application can be the text, image, video, voice, 3D object or scene, and so on. The state-of-the-art techniques mentioned above can be applied not only in single module but also multiple ones.
In conventional methodologies, the basic idea of process the data is based on the signal processing. For example, in text mining and information retrieval, the documents are always represented by vector in which the dimensions are the terms in whole document sets and the values of the dimensions are the occurrence frequencies in the documents. The disadvantage of these methods is that they ignored the relations between the features of data. Therefore, the techniques which were based on the concept-based method, were developed to discover the relations in the features and then organize the relations by the knowledge structure or the ontology.In the research “Formal concept analysis-based document representation and its application on document clustering”, we developed an automatically-constructed thesaurus approach which is based on the method named Formal Concept Analysis (FCA). We applied FCA to construct the term ontology to deal with the hierarchical conceptual relationships together with synonym-like relationships for the document sets. We also develop a document representation method that applies ontology to represent documents by concept-based vectors. In order to evaluate the usability and effectiveness of our method, we make use of document clustering as the application used to evaluate the generated concept-based vectors.
The similar techniques were applied to a SBIR(Small Business Innovation Research) project in Taiwan. In this project, we cooperated with the one of largest on-line legal service company in Taiwan and utilized a large legal document set to extract and discover the term relationships in the legal documents. One of the applications of this research results was used to provide the keyword suggestion for on-line service users to retrieve the information more easily.
Another research named “Feature-opinion pair mining on sentiment classification” focused on the feature extraction of positive-negative opinions from users’ feedback. In this research, we firstly applied the machine learning techniques to extract positive-negative terms from the document set, and then used these terms to document representation. Finally, we applied the machine learning techniques to construct the classifier for document classification.
Image process and pattern recognition
The cooperation research in the field of image process and pattern recognition included the topics of this cooperation included using the techniques of image process on human behavior detection and applying 3D object models to identify objects in 2D images or videos. The part of the research results was presented at the international conference on Computer, Information, and Telecommunication Systems (CITS 2015), Gijón, Spain and the 29th International Conference on Image and Vision Computing (IVCNZ 2014), New Zealand.
Delight map viewer
The main purpose of the project, SIP (Cross-ministerial Strategic Innovation Promotion Program) Innovative design/manufacturing technologies Delight Design Platform Project, was aimed at developing an innovative methodology and procedure that supports the product design and manufacturing. The Delight map viewer is one of the major outcome of this project. This software tool is a visualization-based decision and design support tool, which supports product designer and team in early stage of produce design cycle.
The Delight map viewer contains two major data sets: Kansei index contains the products’ measures like sounds, shape and touch; Delight index contain semantic measures from the customers. The Delight map viewer firstly applied the methods of dimension reduction such as PCA and Autoencoder to reduce the size of the dimension of Kansei index into only two dimensions and then utilized the two dimensions to visualize the data of Kansei index into a 2D graph. To evaluate the correlation between Kansei index with Delight index the non-liner statistic method like Multiple regression model was applied. In final, the Delight map viewer was released to many Japanese enterprises.
Data visualization for patient ward movement
In the project, a data set, which contains patient ward staying records over 3 years from three Imperial College London managed hospitals, was collected. The purpose of this project is utilizing the data visualization techniques to find the pathways of patients’ ward movement to identify the potential risk or error of patient’s ward movement decision. In order to illustrate the over thirty thousand data nodes and links between nodes, the European largest Data Observatory was utilized. The data was firstly cleaned and preprocessed, and then re-organized to the format the software tool defined.