A generic parallel processing model for facilitating data mining and integration

Han, L.X. and Liew, C.S. and van Hemert, J. and Atkinson, M. (2011) A generic parallel processing model for facilitating data mining and integration. Parallel Computing, 37 (3). pp. 157-171. ISSN 0167-8191,

Full text not available from this repository.


To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

Item Type: Article
Uncontrolled Keywords: Pipeline streaming; Parallelism; Data mining and data integration (DMI); Workflow; Life sciences; OGSA-DAI;
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: Zanaria Saupi Udin
Date Deposited: 26 Aug 2011 07:45
Last Modified: 26 Dec 2014 02:22
URI: http://eprints.um.edu.my/id/eprint/2071

Actions (login required)

View Item View Item