Han, LX; Liew, CS; van Hemert, J; Atkinson, M (2011) A generic parallel processing model for facilitating data mining and integration. Parallel Computing, 37 (3). pp. 157-171. ISSN 0167-8191Full text not available from this repository.
To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.
|Journal or Publication Title:||Parallel Computing|
|Uncontrolled Keywords:||Pipeline streaming; Parallelism; Data mining and data integration (DMI); Workflow; Life sciences; OGSA-DAI;|
|Subjects:||Q Science > QA Mathematics > QA75 Electronic computers. Computer science|
|Depositing User:||Zanaria Saupi Udin|
|Date Deposited:||26 Aug 2011 15:45|
|Last Modified:||26 Aug 2011 15:45|
Actions (For repository staff only: Login required)