E-Book Overview
Contains four refereed papers covering important classes of data mining algorithms: classification, clustering, association rule discovery, and learning Bayesian networks. Srivastava et al present a detailed analysis of the parallelization strategy of tree induction algorithms. Xu et al present a parallel clustering algorithm for distributed memory machines. A new scalable algorithm for association rule discovery and a survey of other strategies is covered by Cheung et al. The final paper, written by Xiang et al, describes an algorithm for parallel learning of Bayesian networks. The papers aim to take a practical approach to large scale mining applications and increase useable knowledge concerning high performance computing technology. Lacks a subject index.
E-Book Content
+,*+ 3(5)250$1&( '$7$ 0,1,1* Scaling Algorithms, Applications and Systems
This page intentionally left blank.
+,*+ 3(5)250$1&( '$7$ 0,1,1* Scaling Algorithms, Applications and Systems
HGLWHG E\
8
5
≤
3
1
>
6
4 1
≤
4
>
5
4
78
≤
5
1
>
4
4
80
≤
7
2
>
2
3
≤
7
3
>
2
2
75
85 90 95 96
≤
8
4
>
1
1 5
≤
8
>
1
0
≤
9
5
>
0
0
The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. For each discrete attribute, one test with outcomes as many as the number of distinct values of the attribute is considered. For each continuous attribute, binary tests i