| Hauptseite > Publikationsdatenbank > DISCONA: Distributed Sample Compression for Nearest Neighbor Algorithm |
| Typ | Amount | VAT | Currency | Share | Status | Cost centre |
| Hybrid-OA | 0.00 | 0.00 | EUR | (DEAL) | ZB | |
| Sum | 0.00 | 0.00 | EUR | |||
| Total | 0.00 |
| Journal Article | FZJ-2023-01561 |
; ;
2023
Springer Science + Business Media B.V
Dordrecht [u.a.]
This record in other databases:
Please use a persistent id in citations: http://hdl.handle.net/2128/34242 doi:10.1007/s10489-023-04482-y
Abstract: Sample compression using epsilon nets effectively reduces the number of labeled instances required for accurate classification with nearest neighbor algorithms. However, one-shot construction of an epsilon nets can be extremely challenging in large-scale distributed data sets. We explore two approaches for distributed sample compression: one where local epsilon net is constructed for each data partition and then merged during an aggregation phase, and one where a single backbone of an epsilon net is constructed from one partition and aggregates target label distributions from other partitions. Both approaches are applied to the problem of malware detection in a complex, real-world data set of Android apps using the nearest neighbor algorithm. Examination of the compression rate, computational efficiency, and predictive power shows that a single backbone of an epsilon net attains favorable performance while achieving a compression rate of 99%.
|
The record appears in these collections: |