Public Health Informatics (in collaboration with AI Lab)
- System Design and Evaluation: Motivated by the importance of infectious disease informatics (IDI) and the challenges to IDI system development and data sharing, we design and implement BioPortal, a Web-based IDI system that integrates cross-jurisdictional data to support information sharing, analysis, and visualization in public health. A paper is published in IEEE Transactions on Information Technology in Biomedicine to describe BioPortal's architecture and functionalities, and highlight encouraging evaluation results obtained from a controlled experiment that focused on analysis accuracy, task performance efficiency, user information satisfaction, system usability, usefulness, and ease of use.
- Surveillance Systems Survey: We completed a survey study presenting a detailed analysis of 50 local, state, national, and international syndromic surveillance systems and a review of about 200 academic publications, with in depth discussions on the technical challenges, applicable approaches or solutions, and the current state of system implementation and adoption for various components of syndromic surveillance systems ranging from system architecture, data collection and sharing, data analysis, and data access and visualization.
Spatial-Temporal and Network Data Mining
- Spatial-Temporal Cross-Correlation: We aim to develop a new statistical measure to identify significant correlations among multiple events with spatial and temporal components. This new measure, K(r, t), is defined by adding the temporal dimension to Ripley's K(r) function. Empirical studies show that the use of K(r, t) can lead to a more discriminating and flexible spatial-temporal data analysis framework. This measure also helps identify the causal events whose occurrences induce those of other events.
- RFID Data Management and Mining: We are investigating a number of data management and mining issues specific to RFID applications in a wide range of domains including marketing and healthcare. RFID devices generate real-time data streams with prominent spatial and temporal data elements. In one study, we focus on analyzing movement paths of people between store isles to enable effective in-store marketing and shelf design. In another study, we model detailed healthcare seeking behaviors. We are also interested in managing and mining such movement-related data in border security-related applications.
- Burst Detection: Many current and emerging applications require support for online analysis of rapidly changing data streams. These streams run continuously over time and there is a critical need to identify interesting patterns from the dynamically arriving data streams. Burst detection, which is concerned with automatic identification of time regions which contain unusually frequent data features, can provide useful signals to identify imminent changes and in turn facilitate timely decision making. Our research is aimed at analyzing burst patterns from multiple dependent data streams, which can be collectively modeled by a probabilistic network - Factorial Hidden Markov Model (FHMM). Such a modeling framework can find applications in many domains where co-evolving data streams are frequently present. Stock price co-movements, traffic flow through adjacent intersections are some readily available examples. From the modeling perspective, we need to explicitly consider the mutual dependencies as integral parts of the framework. These dependencies are crucial for us to understand the casual chains of burst events, and can potentially allow the burst detection to be carried out in a more holistic and robust manner.
Random Graph Analysis of Software Packages
- Open Source Software Packages Architecture Analysis: A sound design of large-scale software systems can confer to developers and users numerous benefits in terms of extensibility and usability. Unfortunately the complexity of software systems makes the evaluation of software structures a complicated and often difficult task. Using the tools and techniques of random graph theory we seek to develop a quantitative means of analyzing the structure of complex software systems. Under this framework, we propose a two-phase network growth model aimed to characterize the software development process. This model is evaluated by a case study of software function call graphs extracted from real software packages. The empirical results indicate that our two-phase model can predict a number of real software systems' topological properties, some of which can not be explained by existing work in random graphs and complex systems analysis. Our approach may lead to deeper understandings of the formation and evolution of software structures and the related software development practice.
- Broadly stated, social computing takes a computational approach to the study and modeling of social interactions and communications. It also encompasses the development of information and communications technologies (ICTs) supporting these interactions. In recent years, we've seen social computing impact numerous technology fields. It's attracted significant interest from not only researchers in computing and social sciences but also software and online game vendors, Web entrepreneurs, political analysts, and digital-government practitioners, among others.
- Collaborative Tagging and Folksonomies: Collaborative tagging sites are an important component of today's Web 2.0 applications. On such sites, viewers of Web-accessible contents (pages, videos, photos, etc.) provide tags to describe what they have viewed and such tags are shared among the viewer community. We have collected several large datasets from popular collaborative tagging websites and are performing empirical analysis using complex systems analysis framework. In addition, we are conducting data mining research to enable more effective social search. In our initial study, we have found that by incorporating tagging information the effectiveness of predicting user Web search behavior in the context of automated recommendation can be significantly improved (25%).
- Coordination of Decisions Across Multiple Markets