Invited Paper(초청논문)

  • 홈ˆ
  • Programs
  • Invited Paper(초청논문)

분야

확정일정

대회명

논문제목

성명

소속

지도교수

CS

12.22(금) 14:00~17:30

ISCA 2017

Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations

허태경

KAIST

허재혁

12.22() 14:00~17:30

MobiSys 2017

Mobile Plus: Multi-device Mobile Platform for Cross-device Functionality Sharing

오상은

KAIST

신인식

12.20() 09:00~09:40

MICRO 2017

GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution

김영석

서울대

김장우

12.22() 14:00~17:30

FAST 2017

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems

이세권

UNIST

노삼혁

PL

12.22() 11:00~12:00

ASE 2017

All about Activity Injection: Threats, Semantics, and Detection

이성호

KAIST

류석영

SE

12.21() 14:00~17:30

ASE 2017

Testing Intermediate Representations for Binary Analysis

김수민

KAIST

차상길

DB

12.20() 09:00~09:40

KDD 2017

PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency

송환준

KAIST

이재길

12.22() 14:00~17:30

ICDM 2017

Supervised Belief Propagation: Scalable Supervised Inference on Attributed Networks

유재민

서울대

강유

12.20(수) 09:00~09:40

VLDB 2018

Efficient Haar+ Synopsis Construction for the Maximum Absolute Error Measure

김진현

서울대

심규석

AI

12.22(금) 14:00~17:30

ICML2017

Faster Greedy MAP Inference for Determinantal Point Processes

한인수

KAIST

신진우

12.22() 14:00~17:30

ICML2017

Combined Group and Exclusive Sparsity for Deep Neural Networks

윤재홍

UNIST

황성주

12.22() 14:00~17:30

NIPS2017

Overcoming Catastrophic Forgetting by Incremental Moment Matching

이상우

서울대

장병탁



[컴퓨터시스템]
논문제목: Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations
발표자 : 허태경(KAIST)
논문초록:
To mitigate excessive TLB misses in large memory applications, techniques such as large pages, variable length segments, and HW coalescing, increase the coverage of limited hardware translation entries by exploiting the contiguous memory allocation. However, recent studies show that in non-uniform memory systems, using large pages often leads to performance degradation, or allocating large chunks of memory becomes more difficult due to memory fragmentation. Although each of the prior techniques favors its own best chunk size, diverse contiguity of memory allocation in real systems cannot always provide the optimal chunk of each technique. Under such fragmented and diverse memory allocations, this paper proposes a novel HW-SW hybrid translation architecture, which can adapt to different memory mappings efficiently. In the proposed hybrid coalescing technique, the operating system encodes memory contiguity information in a subset of page table entries, called an- chor entries. During address translation through TLBs, an anchor entry provides translation for contiguous pages following the anchor entry. As a smaller number of anchor entries can cover a large portion of virtual address space, the efficiency of TLB can be significantly improved. The most important benefit of hybrid coalescing is its ability to change the coverage of the anchor entry dynamically, reflecting the current allocation contiguity status. By using the contiguity information directly set by the operating system, the technique can provide scalable translation coverage improvements with minor hardware changes, while allowing the flexibility of memory allocation. Our experimental results show that across diverse allocation scenarios with different distributions of contiguous memory chunks, the proposed scheme can effectively reap the potential translation coverage improvement from the existing contiguity.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: Mobile Plus: Multi-device Mobile Platform for Cross-device Functionality Sharing
발표자 : 오상은(KAIST)
논문초록:
In recent years, the explosion of diverse smart devices such as mobile phones, TVs, watches, and even cars, has completely changed our lives. We communicate with friends through social network services (SNSs) whenever we want, buy stuff without visiting shops, and enjoy multimedia wherever we are, thanks to these devices. However, these smart devices cannot simply interact with each other even though they are right next to each other. For example, when you want to read a PDF stored on a smartphone on a larger TV screen, you need to do complicated work or plug in a bunch of cables. In this paper, we introduce M+, an extension of Android that supports cross-device functionality sharing in a transparent manner. As a platform-level solution, M+ enables unmodified Android applications to utilize not only application functionalities but also system functionalities across devices, as if they were to utilize them inside the same device. In addition to secure connection setup, M+ also allows performing of permission checks for remote applications in the same way as for local. Our experimental results show that M+ enables transparent cross-device sharing for various functionalities and achieves performance close to that of within-device sharing unless a large amount of data is transferred.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution
발표자 : 김영석(서울대)
논문초록:
Graphics Processing Unit (GPU) vendors have been scaling single-GPU architectures to satisfy the ever-increasing user demands for faster graphics processing. However, as it gets extremely difficult to further scale single-GPU architectures, the vendors are aiming to achieve the scaled performance by simultaneously using multiple GPUs connected with newly developed, fast inter-GPU networks (e.g., NVIDIA NVLink, AMD XDMA). With fast inter-GPU networks, it is now promising to employ split frame rendering (SFR) which improves both frame rate and single-frame latency by assigning disjoint regions of a frame to different GPUs. Unfortunately, the scalability of current SFR implementations is seriously limited as they suffer from a large amount of redundant computation among GPUs. This paper proposes GPUpd, a novel multi-GPU architecture for fast and scalable SFR. With small hardware extensions, GPUpd introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and efficiently redistribute the objects to their corresponding GPUs. C-PD not only eliminates the redundant computation among GPUs, but also incurs minimal inter-GPU network traffic by transferring object IDs instead of mid-pipeline outcomes between GPUs. To further reduce the redistribution overheads, GPUpd minimizes inter-GPU synchronizations by implementing batching and runahead-execution of draw commands. Our detailed cycle-level simulations with 8 real-world game traces show that GPUpd achieves a geomean speedup of 4.98x in single-frame latency with 16 GPUs, whereas the current SFR implementations achieve only 3.07x geomean speedup which saturates on 4 or more GPUs.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems
발표자 : 이세권(UNIST)
논문초록:
Recent interest in persistent memory (PM) has stirred development of index structures that are efficient in PM. Recent such developments have all focused on variations of the B-tree. In this paper, we show that the radix tree, which is another less popular indexing structure, can be more appropriate as an efficient PM indexing structure. This is because the radix tree structure is determined by the prefix of the inserted keys and also does not require tree rebalancing operations and node granularity updates. However, the radix tree as-is cannot be used in PM. As another contribution, we present three radix tree variants, namely, WORT (Write Optimal Radix Tree), WOART (Write Optimal Adaptive Radix Tree), and ART+CoW. Of these, the first two are optimal for PM in the sense that they only use one 8-byte failure-atomic write per update to guarantee the consistency of the structure and do not require any duplicate copies for logging or CoW. Extensive performance studies show that our proposed radix tree variants perform considerable better than recently proposed B-tree variants for PM such NVTree, wB+Tree, and FPTree for synthetic workloads as well as in implementations within Memcached.

----------------------------------------------------------------------------------------------------------------------------------

[프로그래밍언어]
논문제목: All about Activity Injection: Threats, Semantics, and Detection
발표자 : 이성호(KAIST)
논문초록:
Android supports seamless user experience by maintaining activities from different apps in the same activity stack. While such close inter-app communication is essential in the Android framework, the powerful inter-app communication contains vulnerabilities that can inject malicious activities into a victim app's activity stack to hijack user interaction flows. In this paper, we demonstrate activity injection attacks with a simple malware, and formally specify the activity activation mechanism using operational semantics. Based on the operational semantics, we develop a static analysis tool, which analyzes Android apps to detect activity injection attacks. Our tool is fast enough to analyze real-world Android apps in 6~seconds on average, and our experiments found that 1,761 apps out of 129,756 real-world Android apps inject their activities into other apps' tasks.

----------------------------------------------------------------------------------------------------------------------------------

[소프트웨어공학]
논문제목: Testing Intermediate Representations for Binary Analysis
발표자 : 김수민(KAIST)
논문초록:
Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters. Furthermore, the existing approaches suffer from low test coverage, because they largely depend on random test case generation. In this paper, we present the design and implementation of the first systematic approach to testing binary lifters. We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.

----------------------------------------------------------------------------------------------------------------------------------

[데이터베이스]
논문제목: PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency
발표자 : 송환준(KAIST)
논문초록:
The k-medoids algorithm is one of the best-known clustering algorithms. Despite this, however, it is not as widely used for big data analytics as the k-means algorithm, mainly because of its high computational complexity. Many studies have attempted to solve the efficiency problem of the k-medoids algorithm, but all such studies have improved efficiency at the expense of accuracy. In this paper, we propose a novel parallel k-medoids algorithm, which we call PAMAE, that achieves both high accuracy and high efficiency. We identify two factors—“global search” and “entire data”—that are essential to achieving high accuracy, but are also very timeconsuming if considered simultaneously. Thus, our key idea is to apply them individually through two phases: parallel seeding and parallel refinement, neither of which is costly. The first phase performs global search over sampled data, and the second phase performs local search over entire data. Our theoretical analysis proves that this serial execution of the two phases leads to an accurate solution that would be achieved by global search over entire data. In order to validate the merit of our approach, we implement PAMAE on Spark as well as Hadoop and conduct extensive experiments using various real-world data sets on 12 Microsoft Azure machines (48 cores). The results show that PAMAE significantly outperforms most of recent parallel algorithms and, at the same time, produces a clustering quality as comparable as the previous most-accurate algorithm. The source code and data are available at https://github.com/jaegil/k-Medoid.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: Supervised Belief Propagation: Scalable Supervised Inference on Attributed Networks
발표자 : 유재민(서울대)
논문초록:
Given an undirected network where some of the nodes are labeled, how can we classify the unlabeled nodes with high accuracy? Loopy Belief Propagation (LBP) is an inference algorithm widely used for this purpose with various applications including fraud detection, malware detection, web classification, and recommendation. However, previous methods based on LBP have problems in modeling complex structures of attributed networks because they manually and heuristically select the most important parameter, the propagation strength. In this paper, we propose Supervised Belief Propagation (SBP), a scalable and novel inference algorithm which automatically learns the optimal propagation strength by supervised learning. SBP is generally applicable to attributed networks including weighted and signed networks. Through extensive experiments, we demonstrate that SBP generalizes previous LBP-based meth- ods and outperforms previous LBP and RWR based methods in real-world networks.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: Efficient Haar+ Synopsis Construction for the Maximum Absolute Error Measure
발표자 : 김진현(서울대)
논문초록:
Several wavelet synopsis construction algorithms were previously proposed based on dynamic programming for unrestricted Haar wavelet synopses as well as Haar+ synopses. However, they find an optimal synopsis for every incoming value in each node of a coefficient tree, even if different incoming values share an identical optimal synopsis. To alleviate the limitation, we present novel algorithms, which keep only a minimal set of the distinct optimal synopses in each node of the tree, for the error-bounded synopsis problem. Furthermore, we propose the methods to restrict coefficient values to be considered to compute the optimal synopses in each node. In addition, by partitioning all optimal synopses in each node into a set of groups, such that every group can be represented by a compact representation, we significantly improve the performance of the proposed algorithms.

----------------------------------------------------------------------------------------------------------------------------------

[인공지능]
논문제목: Faster Greedy MAP Inference for Determinantal Point Processes
발표자 : 한인수(KAIST)
논문초록:
Determinantal point processes (DPPs) are popular probabilistic models that arise in many machine learning tasks, where distributions of diverse sets are characterized by matrix determinants. In this paper, we develop fast algorithms to find the most likely configuration (MAP) of large-scale DPPs, which is NP-hard in general. Due to the submodular nature of the MAP objective, greedy algorithms have been used with empirical success. Greedy implementations require computation of log-determinants, matrix inverses or solving linear systems at each iteration. We present faster implementations of the greedy algorithms by utilizing the complementary benefits of two log-determinant approximation schemes: (a) first-order expansions to the matrix log-determinant function and (b) high- order expansions to the scalar log function with stochastic trace estimators. In our experiments, our algorithms are significantly faster than their competitors for large-scale instances, while sacrificing marginal accuracy.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: Combined Group and Exclusive Sparsity for Deep Neural Networks
발표자 : 윤재홍(UNIST)
논문초록:
The number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at the same time remove any redundancies among the features to fully utilize the capacity of the network. Specifically, we propose to use an exclusive sparsity regularization based on (1,2)-norm, which promotes competition for features between different weights, thus enforcing them to fit to disjoint sets of features. We further combine the exclusive sparsity with the group sparsity based on (2,1)-norm, to promote both sharing and competition for features in training of a deep neural network. We validate our method on multiple public datasets, and the results show that our method can obtain more compact and efficient networks while also improving the performance over the base networks with full weights, as opposed to existing sparsity regularizations that often obtain efficiency at the expense of prediction accuracy.

----------------------------------------------------------------------------------------------------------------------------------

논문제목: Overcoming Catastrophic Forgetting by Incremental Moment Matching
발표자 : 이상우(서울대)
논문초록:
Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.

----------------------------------------------------------------------------------------------------------------------------------

서울시 서초구 방배로 76 (방배동, 머리재빌딩 401호) 우)06704 | (Tel)1588-2728 | (Fax)02-521-1352 | 고유번호 : 114-82-03170 | 대표 : 엄영익

Copyright (c) KIISE. All rights reserved.