Ebook: Information Modelling and Knowledge Bases XXIX
Information modelling and knowledge bases have become ever more essential in recent years because of the need to handle and process the vast amounts of data which now form part of everyday life. The machine to machine communication of the Internet of Things (IoT), in particular, can generate unexpectedly large amounts of raw data.
This book presents the proceedings of the 27th International Conference on Information Modelling and Knowledge Bases (EJC2017), held in Krabi, Thailand, in June 2017. The EJC conferences originally began in 1982 as a co-operative initiative between Japan and Finland, but have since become a world-wide research forum bringing together researchers and practitioners in information modelling and knowledge bases for the exchange of scientific results and achievements. Of the 42 papers submitted, 29 were selected for publication here, and these cover a wide range of information-modelling topics, including the theory of concepts, semantic computing, data mining, context-based information retrieval, ontological technology, image databases, temporal and spatial databases, document data management, software engineering, cross-cultural computing, environmental analysis, social networks, and WWW information.
The book will be of interest to all those whose work involves dealing with large amounts of data.
The 27th International Conference on Information Modelling and Knowledge Bases (EJC2017) continues the series of events that originally started as a co-operation initiative between Japan and Finland since 1982. Later, in 1991, the geographical scope of these conferences expanded to cover the whole Europe and other countries in the region. To bring the interest on information modelling and knowledge bases for the region outside Europe and Japan, the EJC2017 conference was for the first time to be organized in Thailand.
In the recent years, information modelling and knowledge bases have become more and more essential because of the drastic growth of data on cyber space. The machine to machine communication in IoT (Internet of Things) implementation can generate unexpectedly large amounts of raw data. The issue has already attracted many research communities in the area of big data storage and analytics. The EJC2017 conference provided a communication place for information exchange and updating the results.
The EJC conferences constitute a world-wide research forum for the exchange of scientific results and experiences achieved in computer science and other related disciplines using innovative methods and progressive approaches. Accordingly, a platform has been established to bring together researchers as well as practitioners to concern information modelling and knowledge bases. The main topics of EJC conferences are focused in the multidisciplinary research of information modelling, conceptual analysis, design and specification of information systems, multimedia information modelling, multimedia systems, ontology, software engineering, knowledge and process management, knowledge bases, cross-cultural communication and context modelling. We also aim at applying new progressive theories. To this end, much attention is paid also to theoretical disciplines including cognitive science, artificial intelligence, logic, linguistics and analytical philosophy.
In order to achieve EJC targets, the international program committee selected 29 papers of 42 submissions for publishing in this book. The selected papers cover the wide range of information modelling, namely the theory of concepts, semantic computing, data mining, context-based information retrieval, ontological technology, image databases, temporal and spatial databases, document data management, software engineering, cross-cultural computing, environmental analysis, social networks, WWW information management, and many others.
The conference could not be a success without the efforts from our colleagues and organizations. In the program committee, reputable researchers devoted a lot of efforts in the review process to select the best papers and create the EJC2017 program. We would like to express our sincere gratitude to their great efforts. Yasushi Kiyoki and Benhard Thalheim conducted the co-chairs of the program committee. Naofumi Yoshida in program coordination managed the review process and the conference program. Virach Sornlertlamvanich, Petchporn Chawakitchareon, Aran Hansuebsai, and their colleagues managed the conference venue and the local arrangement. Professor Hannu Jaakkola took care of general organizational matters necessary for annually organizing the conference series and, moreover, arranging the conference proceedings in the form of a book to be printed by IOS Press Amsterdam. The conference was supported by the SIIT at Thammasat University, Chulalongkorn University, Khon Kaen University, Burapha University, Keio University, the Tampere University of Technology and Christian Albrechts University at Kiel. We gratefully appreciate the efforts of all the supporters
Systems Engineering concerns the complete process for the development of complex systems comprising hardware, software, facilities and personnel. Such systems are hybrid, as some components are characterised by continuous behaviour, whereas the behaviour of others is discrete. SysML provides graphical notations capturing requirements for systems engineering. Here we present a concise conceptual model for systems engineering that is capable to capture the gist of SysML. The structural part is based on the common higher-order Entity Relationship model that is extended by continuous functions and various flow constraints. The behavioural part is then formalised by a hybrid extension of Abstract State Machines. The usage of the conceptual model is illustrated by a working example concerning the landing gear equipment of a plane.
The omnichannel strategy allows the customer to have the same experience through multiple different channels, which can greatly improve the customer experience and thus benefit businesses. This research paper studies the evaluation of customer experience in the omnichannel environment. Very little research about it exists as it is quite a new business model. Therefore, a systematic literature review was conducted to find as much relevant data as possible and analyse the results systematically. 42 channels, 97 customer experience factors and 34 tools or methods are found from 19 different studies and categorized accordingly. All the connections between the extracted data are discovered by processing it with a relations visualizing tool. This way it enables to see how everything is connected and which topics are the most common in the included studies. Finally, from the findings, this research proposes a conceptual model linking all the parts of this research together to create a basic outline for customer experience evaluation in the omnichannel environments.
It is very difficult to make a choice between star and snowflake data warehouse schema. The topic is part of the broader dilemma in the data warehousing community: which approach to use, Kimball's or Inmon's. There are advocates for each approach, with fierce “war” still going on. However, very few empirical studies exist giving either side an advantage; in the past, the approaches were being selected based on the organizational, resource or goal-specific parameters. The goal of this case study was to examine which implementation of data warehouse will yield better results in the observed scenario – a data warehouse for monitoring of energy consumption in public buildings, from perspective of the ETL process. We implemented two versions of DW, one based on star and the other on the snowflake schema model, and measured the performance of the ETL process. Our goal was to find out if there is a difference in duration between two implementations, and if the difference exists, how it changes with increase of data in operational database. Series of tests were conducted to evaluate implemented solutions. Statistical analysis showed that, for the observed scenarios, implementation based on snowflake schema performs better: the ETL execution time is shorter and the size of DW is smaller. An important observation is that the target data warehouse size is linearly dependent on the amount of the operational data.
Environmental-semantic space integration is a promising approach to realize deep analysis environmental phenomena and situations. The essential computation in environmental study is context-dependent-differential computation to analyze the changes of various situations (air, water, CO2, places of livings, sea level, coral area, etc.). It is important to realize global environmental computing methodology for analyzing difference and diversity of nature and livings in a context dependent way with a large amount of information resources in terms of global environments. In the design of environment-analysis systems, one of the most important issues is how to integrate several environmental aspects and analyze environmental data resources with semantic interpretations. In this paper, we present an environmental-semantic computing system. Our environmental-semantic computing system realizes integration and semantic-search among environmental-semantic spaces with water-quality and image databases. We have already presented a concept of “Semantic Computing System” for analyzing and interpreting environmental phenomena and changes occurring in the oceans and rivers in the world. We also introduce the concept of “SPA (Sensing, Processing and Analytical Actuation Functions)” for realizing a global environmental system, to apply it to Multi-dimensional World Map (5-Dimensional World Map) System. This concept is effective and advantageous to design environmental systems with Physical-Cyber integration to detect environmental phenomena as real data resources in a physical-space (real space), map them to cyber-space to make analytical and semantic computing, and actuate the analytically computed results to the real space with visualization for expressing environmental phenomena, causalities and influences. This paper presents integration and semantic-analysis methods for two environmental-semantic spaces with water-quality and image databases. We have implemented an actual space integration system for accessing environmental information resources with water-quality and image analysis. We clarify the feasibility and effectiveness of our method and system by showing several experimental results for environmental medical document databases.
This paper presents a new analysis method and the functions for multi-dimensional sensing data, including multi-parameter sensor data and series of sensing images, for a collaborative knowledge creation system called 5D World Map System, and the applications in the field of multidisciplinary environmental researches. The main feature of 5D World Map System is to provide a platform of collaborative work for users to perform a global analysis for sensing data in a physical space along with the related multimedia data in a cyber space, on a single view of time-series maps based on the spatiotemporal and semantic correlation calculations. The concrete target data of the proposed new method and functions for world-wide evaluation is (1) multi-parameter sensor data such as water-quality, air-quality, soil-quality etc., and (2) multispectral and natural-color image data taken by moving cameras such as UAV/car-mounted cameras or mobile phones for environmental monitoring. The proposed world-wide evaluation functions enable multiple remote-users to acquire real-time sensing data from multiple sites around the world, perform analytical visualizations of the acquired sensing data by a selected world environmental standard to discover the incidental phenomena, and provide the analysed results to related users' terminal equipment automatically. These new functions realize a new multidimensional data analysis and knowledge sharing for a collaborative environment. Especially, in the world-wide evaluation function, applying the concept of “semantic computing” to determining the environmental-quality levels of multiple places around the world. The results are able to be analysed by the time-series difference of the value of each place, the differences between the values of multiple places in a focused area, and the time-series differences between the values of multiple places, and calculated as a “world ranking”, to detect and predict an environmental irregularity and incident. In our world-wide evaluation method, we define the environmental impacts as “semantics” of environmental condition. The originality of our method is in (1) an interpreter to convert the numerical environmental quality-level to the qualitative impacts/meanings by the sentence or a set of words that even non-specialists or ordinary people are able to understand, and (2) a visualizer to realize a global comparison and “world-ranking” with a semantic computing for targeting the multi-parameter sensing values of multiple sites around the world.
Deforestation is still a major nature phenomenon in our society. For assessing deforestation effect, satellites remote sensing provides a fundamental data for observation. While new remote-sensing technologies are able to represent high-resolution forest mapping, the application is still limited only for detecting and mapping the deforestation area. In this paper, we proposed a new method for retrieve the information contained on Satellite Multispectral images in order to interpreting deforestation effect in the context of soil degradation. We proposed an idea to interpret reflected “substances (material)” of bare soil in deforested area in spectrum domain into human language. The objectives of this paper are to (1) recognize the deforestation activity automatically. (2) Identify deforestation causes and examines the deforestation effect based on deforestation causes. (3) Scrutinize deforestation effects on soil degradation. (4) Representing nature knowledge of deforestation effect by performing calculation for semantic retrieval, to bring the clear comprehensible knowledge even for people who are not familiar with forestry. Semantic retrieval formed by understanding queries and showing queries result based on semantic calculation. As for experimental study, Riau Tropical Forest has been selected as the study area, where the multispectral data was acquired by using Landsat 8 Satellite between 2013 and 2014; Where forest fire and logging activities are reported, and detected.
This paper presents an alum dosage prediction in coagulation process by using Weka Data Mining Software. The data in this research had been collected from Dongmarkkaiy Water Treatment Plant (DWTP), Vientiane capital, Laos PDR from 1st January 2008 to 31st October 2016. The total number of collected data were 2,891 records. In this research, we compared the results from multilayer perceptron (MLP), M5Rules, M5P, and REPTree method by using the root mean square error (RMSE) and mean absolute error (MAE) value. Three input independent variables, i.e. turbidity, pH, and alkalinity were used. The dependent variable was alum added for the coagulation process. Our experimental results indicated that the MLP method yielded the highest precision method in order to predict the alum dosage.
This research was to investigate the amount of methane emission in the organic paddy field (wet-season rice), direct-seeding method around Tung Kula Rong Hai, the Northeast of Thailand during planting and harvest season and to examine the influential variables such as the growth, types of soil, temperature, height of rice plant, density of rice plant and the water level in the paddy field. The study was conducted in the paddy field of Khao Dawk Mali 105 in 4 stations. The results of the study revealed that; in period 1, from the period of seeding to tillering, CH4 Flux is in the range of 0.193–1.383 mg CH4 m−2 hr−1 or 0.728 ± 0.429 mg CH4 m−2 hr−1 in average; in period 2, from tillering to ripening, CH4 Flux is 1.311–2.446 mg CH4 m−2 hr−1 with the average of 1.938 ± 0.359 mg CH4 m−2 hr−1; In period 3, from ripening to harvesting, CH4 Flux of 0.873–2.235 mg CH4 m−2 hr−1 with the average of 1.575 ± 0.351 mg CH4 m−2 hr−1
Regarding the result, the amount of methane flux has increased according to the age span and the growth of the rice. Besides, there were factors affecting the escalation of the gas release: the water level in the paddy field which means the emission will be decreased when the water is managed to drain prior to harvest stage, considering the factors of the height of rice plant in the paddy field, the smaller number and lower rice plant height is, the lesser methane is emitted. However, in term of the temperature, the higher temperature is, the more methane increases.
Consequently, the study of methane flux from rice paddy as well as its factors contributes to the farmers' ability to manage their paddy fields effectively. This will lead to low organic carbon farming and reducing greenhouse gas (GHG) emissions in agricultural sectors.
Three species of corals at Sichang Island i.e. Acropora sp. Goniopora sp. and Pavona sp. were subjected to a stress test with low salinity and normal salinity at concentration 10, 20 and 30 psu, respectively. Under water photograph and eye observation of coral activity were recorded at 12, 24 and 48 hours. The entropy or surface roughness and percent polyp activity were analyzed with comparison to eye observation of coral activity. The experiment was carried out under continuously water temperature and underwater light intensity controlled. The results indicated that “Healthy” entropy values for Acropora sp. are 1.57–1.62 and for Goniopora sp. are 4.26–4.46. In contrast, for Pavona sp., short polyp coral, there was no “Healthy” entropy value resulted from any photographic assessment in this study. The “Healthy” value of Acropora sp. evaluated from percent active polyp was more than 52.4. For conclusion, The Entropy and percent active polyp values were suitable tools for Acropora sp. and Pavona sp., a short polyp coral's healthy identification. Whereas the eye observation of coral activity both percent polyp extend and percent polyp long was suitable for Goniopora sp., a long polyp coral.
The rapid temporal and spatial changes of human population and technology development have resulted in the expansion of agriculture, industrial activity, and deforestation. Those massive expansions affect water resources, especially river. Since the river is the main resource of water in human life, it is crucial to analyze the river water-quality in order to detect the water contamination. In this paper, we present an automatic system for water-quality analysis using several databases and different contexts in dynamic sub-space selection contexts. This system obtains information resources by transforming the sensor-value information to language information. This system aims to monitor, analyze and evaluate the Global Water-quality by using Semantic-ordering functions both in single and multiple parameters. Semantic-ordering is used for spatial-dynamics environmental changes in multiple contexts (aquatic life, agricultural, drinking, fish, industrial usage, and irrigation context). As for the experimental study, four places have been selected as study areas; (1) Hawaii (USA), (2) Pori, (Finland), (3) Riga (Latvia), and (4) Vientiane (Laos). The data resource was acquired from March to September 2016. The result shows that this system is able to analyze and identify the ordering of the different water-quality on different places in the global point of view level and to present the global-scale ranking of water quality.
Evapotranspiration (ET) is the sum of evaporation and transpiration from the surface to the atmosphere. ET is a hydrologic cycle component and importance in agricultural water management such as schedules of irrigation water requirement, irrigation system design and watershed management. Evapotranspiration can be measured or estimated using several methods. The objectives of this study were simple methods and quick analyses for the estimating of crop coefficients (Kc) and daily actual evapotranspiration (ETa) from Normalized Difference Vegetation Index (NDVI) derived from Landsat 8 and Sentinel-2 satellite images. Subsequent comparison of ETa from both satellite images at Ban Nong Bua, Amphoe Ban Fang, Khon Kaen Province, Thailand. The results showed that good correlation between NDVI and ETa from both satellite images (coefficient of determination (R2) = 0.85 and 0.87, respectively). Both satellite images can be used or combined to detect temporal and spatial analysis. Sentinel-2 can provide more fine resolution of 10 m in VIS-NIR bands than Landsat 8 and is suitable for precision farming or site specific management, while Landsat 8 TIRS bands were advantages.
The global environmental analysis system is a new platform to analyze environmental multimedia data that acquired from nature resources. This system aims to realize and interpret environmental phenomena and changes occurring that happening in world wide scope. Semantic computing is important and promising approach to multispectral semantic-image analysis for various environmental aspects and contexts in physical world. In the previous study, we proposed a new system of agricultural monitoring and analysis based on semantic computing concept that it realizes the interpretation of agricultural health condition as human-level interpretation. In this paper, we propose a new analytical method for agriculture global comparisons to realize and recognize crop condition with several places in global scale. Multispectral semantic-image space for agricultural analysis can be utilized for global crop health monitoring by comparing crop conditions among different places. Our method applies semantic distance calculation to measure similarity among multispectral image data to realize the crop health condition as a ranking. According to our new proposed analytical method, we demonstrate a prototype implementation in the case of rye farm in Latvia and Finland. This prototype implementation shows an analysis in the case that image data have same crop type and conditions.
Environment friendly production is an emerging trend in industrial production. Enhancing environmental protection and ensuring company's profitability have gained great attention from any manufacturers. The use of Overall Equipment Effectiveness (OEE) to reflect efficiency of production has been widely applied in the manufacturing process. However, the full advantage of OEE as a measure to simultaneously facilitate reduction of carbon dioxide (CO2) emissions and to sustain the existing profitability has not been utilized. This study aimed to investigate the use of OEE to improve the manufacturing process of corrugated paper production using two indicators including CO2 emission and profitability. The research showed a model of measuring OEE, CO2 emission and Profitability. The obtained results indicated that impact 1 and impact 2 could maintain the same profitability (9%) but OEE and CFP from impact 1 (37% and 26%, respectively) were higher than those from impact 2 (12% and 8%, respectively). The overall results revealed that when improving OEE, CO2 emission per product (1m2) could be significantly reduced while profitability could be increased in the production of corrugated paper.
The purpose of this study is to develop forecasting models for four kinds of wastes (Contaminated materials, Monomers, Used solvents and Wastewater) and apply the outputs of forecasts to an Excel application to plan, implement and control the assets, physical facilities and money investments to support the wastes disposal and transportation of both Company A and four service providers. The method selected uses Box-Jenkins method with data periods from January 2008 to December 2016 (108 series data). After studying these data (Four waste types) using Minitab, fitted models for generating best forecasting values are ARIMA (1, 0, 1) for Contaminated Materials waste, ARIMA (1, 0, 0) or AR (1) for Monomer waste, ARIMA (1, 0, 2) or ARMA (1, 2) for Used Solvents waste and ARIMA (1, 1, 0) or ARI (1, 1) for Wastewater. The results of forecasting the wastes in Company A had RMSE (Root Mean Square Error) (0.388, 0.047, 0.060 and 0.043 respectively) lower than another research paper (1.305). For suitable forecasting models, these models can generate valuable forecasts for the company and its service providers to utilize their budget of money, assets and facilities in Excel application.
This paper presents a global sharing analysis and visualization of water quality analysis by 5D World Map (5DWM) system. The data resources in this research collected from Sichang Island, Chonburi province, Thailand during 1990 to 2002 and 2010 to 2017. Six parameters of water quality were selected during 1990 to 2002 i.e. chlorophyll a, ammonia, nitrite, nitrate, phosphate and. The total locations sites were 21 stations which situated around Sichang Island. Otherwise, ten parameters were selected during 2010 to 2016 i.e. temperature, salinity, dissolved oxygen (DO), pH, ammonia, nitrite, nitrate, phosphate, silicate, and alkalinity. On the other hand, eight parameters were selected in February and July 2017 i.e. temperature, pH, oxidation reduction potential (ORP), conductivity, turbidity, dissolved oxygen (DO), TDS (Total Dissolved Solids), and salinity. All parameter of water quality was added and display by 5DWorld Map in order to visualize and sharing the water quality. Our results showed that 5DWorld Map can apply to environmental analysis and semantic computing. We apply the dynamic evaluation and mapping functions of multiple views of temporal-spatial metrics, and integrate the results of semantic evaluation to analyze environmental multimedia information resources. 5D World Map System for world-wide viewing for Global Environmental Analysis for water quality around Sichang Island, Thailand was reported in this study.
Understanding of data quality problems, algorithms to reduce their impact on data mining and the preprocessing process in general has accumulated to the point that intelligent applications are emerging to automate previously manual preprocessing work. Application development efforts have been, however, constrained by gaps in the identification of system components especially regarding extensibility. The research question addressed is: what are the components of an intelligent data preprocessing agent? The task environment of the agent is characterized against five principal criteria by drawing from empirical studies in the business performance measurement system domain and component candidate feasibility is assessed. A component model consisting of autonomous components and their interactions is presented with design alternatives. Although execution time was unsatisfying without long-term memory component, the partially implemented model provided near-optimal results. The presented model is found to be a useful support in the design and study of intelligent data preprocessing agents.
In the context of business intelligence, data warehousing is often perceived as an integral component of concrete business intelligence solutions. Since the nature of a traditional data warehouse is accumulative – data from operational systems is fed in to the system when it is ready for inclusion – and as the data from different component systems is interrelated, operations involving data warehouses have been traditionally considered tedious and delicate. Distinct steps take place one after another in a predefined, next to unalterable sequence. In this paper, we present an alternative model for dealing with data warehouses, where the goal is to apply principles of continuous software engineering in the domain of business intelligence. To validate the methodology, we present a tool chain that has been used in a real-life implementation of a business intelligence solution, together with experiences from its operations.
A method for computing the complete meaning of sentences with anaphoric reference is presented, that is, the method for implementing the substitution of an appropriate antecedent to accompany the anaphoric reference. Our method is similar to the one applied in general by Hans Kamp's Discourse Representation Theory (DRT). ‘DRT’ is an umbrella term for a collection of logical and computational linguistic methods developed for a dynamic interpretation of natural language, where each sentence is interpreted within a certain discourse, which is a sequence of sentences uttered by a group of speakers. Interpretation conditions are given via instructions for updating the discourse representation. Yet these methods are mostly based on first-order logics. Thus, only expressions denoting individuals (indefinite or definite noun phrases) can introduce so-called discourse referents, which are free variables that are updated when interpreting the discourse. Our background theory is Pavel Tichý's Transparent Intensional Logic with its procedural rather than set-theoretic model semantics. Since our semantics is procedural, hence hyperintensional and higher-order, not only individuals, but entities of any type, like properties of individuals, propositions and hyperpropositions, relations-in-intension, and even constructions (i.e., meanings of antecedent expressions), can be linked to anaphoric variables. Moreover, the thoroughgoing typing of the universe of TIL makes it possible to determine the respective type-theoretically appropriate antecedent, which is one of the novel contributions of this paper. The second novelty is the specification of the algorithm for dynamic discourse representation within TIL.
The paper deals with the fundamental computational rule of functional programming languages, namely the rule of beta conversion. This rule specifies the way in which a function f is applied to its argument a. There are two possible ways of executing the conversion, to wit ‘by name’ and ‘by value’. It has been proved that these two ways are not operationally equivalent, and, which is worse, the execution by name is not a denotationally equivalent transformation in the logic of partial functions. Since Transparent Intensional Logic (TIL) is a partial, typed lambda calculus, we examine the validity of the rule in TIL, or rather in its computational variant the TIL-Script language. We show that there are contexts in which the rule by name can be validly applied. The main result is the specification of such contexts, and comparison with the reduction by value. To this end, we present a tool that recognizes a context in which a formal parameter of a given calling procedure occurs and interactively navigates the user to a correct way of reduction. In case of an invalid way the program informs the user about the problem and warns against undesirable side effects. As a result, the program proposes to execute the rule by value.
Website design is often performed either by technology-biased people or by artisans and thus results either in a domination of technological solutions or fancy artistic solutions. Observing however which websites evolve successfully over a dozen of years we realise that these are those that properly match their layout and playout to the expectations and especially the cultures of their users. A website may be characterised by six dimensions: presentation, content, functionality, stories to be played, context and user intentions and needs. The last two dimensions heavily depend on the culture of the website customer. Culture studies developed models on passivity resp. multi-tasking or on general pattern of thinking or feeling or acting of users. In this paper we study the desiderata of low and high context of culture for website design and compare them to the Lewis and Hofstede model.
With the rapid growth of image and video data as well as the fast spread of user-generated content in social media and cloud services, it has become increasingly difficult for users to have efficient access and effective management of their digital content. In this paper we present a novel integrated open source multimedia content management and access framework, called VisualLabel, that enables smart photo services based on automated visual content analysis, annotation, search and retrieval using state of the art analysis back ends for services such as Facebook and Flickr. This paper includes detailed descriptions of the high-level architecture used in the VisualLabel framework and proof-of-concept implementations of a front-end service, along with three analysis back ends and a web client, all of which demonstrate the basic functionality provided by the framework.
This research deals with the monitoring techniques using normal and multi-spectrum cameras to detect coral healthiness. The Acropora Corals were investigated in laboratory experiment. Parameter of investigation is Ammonia contamination from 0.01 ppm to 10.0 ppm. Under these conditions, the images of coral were captured and analyzed. An Entropy and coral indices of the obtained images were proposed. The important finding in this study is the feature extraction of coral healthiness: (1) event detection, (2) % degradation and (3) relationship between coral index and entropy. For event detection, the trends of water quality parameters such as temperature and salinity was considered. The obtained entropy values gave % degradation of coral. It was also found that GRCI index related with the relevant entropy values of coral images.
Enormous quantities of data, collected and stored in large numerous data repositories, go unused or underused today, simply because people are unable to visualize the quantities and relationships involved. This huge amount of data has far exceeded our human ability for comprehension without powerful tools. Information visualization and visual data mining can help to deal with the flood of information. We can take advantage of visualization techniques to discover data relationships that are otherwise not easily observable by looking at the “raw data”. Visualization can add significant value when trying to understand not only the raw data available in large software archives, but also the results of data mining. These data are valuable especially in software maintenance activities, understanding software evolution and the socio-technical aspects of software development. Data mining and visualization are focal enablers for information recognition and knowledge discovery from any amount of data repositories. This paper present the results of a survey, which reviews some of the most common visual data mining (VDM) techniques and their usage in the software engineering field. The results indicate what kinds of aspects of the software engineering process are studied using VDM methods, and also the most common VDM methods used in the software engineering context.
In the last few years, Concept Similarity Measures (CSMs) become important for the biomedical ontologies in order to find adaptable treatments from the conceptually similar diseases. For the ontology primitive concepts, they are not fully defined in the ontology so taxonomical path-based similarity measure cannot give the correct similarity for primitive concepts. In this paper, we propose a new primitive concept name similarity measure based on natural language processing to get a better result in concept similarity measure in terms of noun phrase construction analysis. We conduct experiments on the standard clinical ontology SNOMED CT and make comparison between taxonomical path-based measure and our proposed similarity measure against human expert results in order to prove our proposed similarity measure can outperform the existing approaches for primitive concept similarity.