A generic parallel processing model for facilitating data mining and integration

Han, LX; Liew, CS; van Hemert, J; Atkinson, M (2011) A generic parallel processing model for facilitating data mining and integration. Parallel Computing, 37 (3). pp. 157-171. ISSN 0167-8191

Full text not available from this repository.


To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

Item Type: Article
  1. Han, LX
  2. Liew, CS(University of Malaya)
  3. van Hemert, J
  4. Atkinson, M
Journal or Publication Title: Parallel Computing
Uncontrolled Keywords: Pipeline streaming; Parallelism; Data mining and data integration (DMI); Workflow; Life sciences; OGSA-DAI;
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: Zanaria Saupi Udin
Date Deposited: 26 Aug 2011 15:45
Last Modified: 26 Aug 2011 15:45
URI: http://eprints.um.edu.my/id/eprint/2071

Actions (For repository staff only: Login required)

View Item