Invited Paper

  • 홈ˆ
  • 프로그램
  • Invited Paper

한국정보과학회가 선정한 SW분야 최우수학술대회에서 발표된 논문의 저자를 초청하여, 아래와 같이 해당 분야 Oral세션에서 핵심연구내용 소개와 질의&응답을 통해 학생들의 연구의욕을 고취하는 프로그램을 마련하였습니다. 관심 있는 분들의 많은 참여 바랍니다.      

No.

대회명

논문제목

발표자

지도교수 발표일시

1

  AAAI 2018

gOCCF: Graph-Theoretic One-Class Collaborative Filtering Based on Uninteresting Items

이연창

김상욱 6.21 09:00-12:00
데이터베이스 O2세션

2

  SIGMOD 2018

RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning

송환준

이재길 6.21 09:00-12:00
데이터베이스 O2세션

3

  WWW 2018

How to Impute Missing Ratings? Claims, Solution, and Its Application to Collaborative Filtering

이영남

김상욱 6.21 09:00-12:00
데이터베이스 O2세션

4

  VLDB 2018

SQL statement logging for making SQLite truly lite 박종혁 이상원 6.21 13:30-15:30
데이터베이스 O3세션

5

  WWW 2017

Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset 박동현 강재우

6.21 13:30-15:30

데이터베이스 O3세션

6

  ICDE 2018

A Graph-based Database Partitioning Method for Parallel OLAP Query Processing

남윤민 김민수 6.22 09:00-12:00
데이터베이스 O2세션

7

  SIGMOD 2018

TurboGraph++: A Scalable and Fast Graph Analytics System 고성윤 한욱신 6.22 09:00-12:00
데이터베이스 O2세션

8

  SIGMOD 2018

TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data 김경민 한욱신 6.22 09:00-12:00
데이터베이스 O2세션

9

  ICSE 2018

Prioritizing Browser Environments for Web Application Test Execution 권정현 고인영 6.21 13:30-15:30
소프트웨어공학 O1세션

10

  FSE2017

Applying deep learning based automatic bug triager to industrial projects 이선로 이찬근 6.22 09:00-12:00
소프트웨어공학 O7세션

11

  NIPS2017

Deep Recurrent Neural Network-Based Identification of Precursor microRNAs 민선우 윤성로 6.21 09:00-12:00
인공지능 O10세션

12

  AAAI2018

Quantized Memory-Augmented Neural Networks 박성식 윤성로 6.21 09:00-12:00
인공지능 O9세션

13

  IJCAI2017

Name Nationality Classification with Recurrent Neural Networks 이진혁 강재우 6.21 13:30-15:30
인공지능 O10세션

14

  IJICAI2018

Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification 이성민 김선 6.22 09:00-12:00
인공지능 O9세션

15

  ISCA2017

Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations 박창현 허재혁 6.21 13:30-15:30
컴퓨터시스템 O9세션

16

  FAST2018

RFLUSH: Rethink the Flush 연제성 이은지 6.22 09:00-12:00
컴퓨터시스템 O8세션

17

  FAST2018

Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree 황득연 남범석 6.22 09:00-12:00
컴퓨터시스템 O8세션

18

  PLDI 2017

Taming Undefined Behavior in LLVM 이준영 허충길 6.21 13:30-15:30
프로그래밍언어 O6세션

19

  OOPSLA 2017

Data-Driven Context-Sensitivity for Points-to Analysis 전민석
정세훈
오학주 6.21 13:30-15:30
프로그래밍언어 O6세션

20

  ICSE 2018

Automatically Generating Search Heuristics for Concolic Testing

차수영

오학주 6.21 13:30-15:30
프로그래밍언어 O6세션


[데이터베이스]

논문제목: A Graph-based Database Partitioning Method for Parallel OLAP Query Processing

대회명: IEEE ICDE 2018

발표자: 남윤민(DGIST)

지도교수: 김민수 교수

저자: Yoon-Min Nam, Min-Soo Kim, Donghyoung Han

논문초록: As the amount of data to process increases, a scalable and efficient horizontal database partitioning method becomes more important for OLAP query processing in parallel database platforms. Existing partitioning methods have a few major drawbacks such as a large amount of data redundancy and not supporting join processing without shuffle in many cases despite their large data redundancy. We elucidate the drawbacks arise from their tree-based partitioning schemes and propose a novel graph-based database partitioning method called GPT that improves query performance with lower data redundancy. Through extensive experiments using three benchmarks, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance.

 

논문제목: gOCCF: Graph-Theoretic One-Class Collaborative Filtering Based on Uninteresting Items

대회명: AAAI 2018

발표자: 이연창(한양대)

지도교수: 김상욱 교수

저자: 이연창, 김상욱, 이동원

논문초록: We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.

 

논문제목: Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset

대회명: WWW 2017

발표자: 박동현(고려대)

지도교수: 강재우 교수

저자: 김성순*, 이성운*, 박동현, 강재우

논문초록: Opinion spam, intentionally written by spammers who do not have actual experience with services or products, has recently become a factor that undermines the credibility of information online. In recent years, studies have attempted to detect opinion spam using machine learning algorithms. However, limitations of gold-standard spam datasets still prove to be a major obstacle in opinion spam research. In this paper, we introduce a novel dataset called Paraphrased OPinion Spam (POPS), which contains a new type of review spam that imitates real human opinions using crowdsourcing. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews, and include factual information and domain knowledge in their reviews. The classification experiments and semantic analysis results show that our POPS dataset most linguistically and semantically resembles truthful reviews. We believe that our new deceptive opinion spam dataset will help advance opinion spam research.

 

논문제목: SQL statement logging for making SQLite truly lite

대회명: VLDB 2018

발표자: 박종혁(성균관대)

지도교수: 이상원 교수

저자: 박종혁, 오기환, 이상원

논문초록: The lightweight codebase of SQLite was helpful in making it become the de-facto standard database in most mobile devices, but, at the same time, forced it to take less complicated transactional schemes, such as physical page logging, journaling, and force commit, which in turn cause excessive write amplification. Thus, the write IO cost in SQLite is not lightweight at all. In this paper, to make SQLite truly lite in terms of IO efficiency for the transactional support, we propose SQLite/SSL, a per-transaction SQL statement logging scheme: when a transaction commits, SQLite/SSL ensures its durability by storing only SQL statements of small size, thus writing less and performing faster at no compromise of transactional solidity. Our main contribution is to show that, based on the observation that mobile transactions tend to be short and exhibit strong update locality, logical logging can, though long discarded, become an elegant and perfect fit for SQLite-based mobile applications. Further, we leverage the WAL journal mode in vanilla SQLite as a transaction-consistent checkpoint mechanism which is indispensable in any logical logging scheme. In addition, we show for the first time that byte-addressable NVM (non-volatile memory) in host-side can realize the full potential of logical logging because it allows to store fine-grained logs quickly. We have prototyped SQLite/SSL by augmenting vanilla SQLite with a transaction-consistent checkpoint mechanism and a redo-only recovery logic, and have evaluated its performance using a set of synthetic and real workloads. When a real NVM board is used as its log device, SQLite/SSL can outperform vanilla SQLite’s WAL mode by up to 300x and also outperform the state-of-the-arts SQLite/PPL scheme by several folds in terms of IO time.

 

논문제목: TurboGraph++: A Scalable and Fast Graph Analytics System

대회명: SIGMOD 2018

발표자: 고성윤(포항공대)

지도교수: 한욱신 교수

저자: 고성윤 (포항공대), 한욱신 (포항공대)

논문초록: Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively.

We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

 

논문제목: TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data

대회명: SIGMOD 2018

발표자: 김경민(포항공대)

지도교수: 한욱신 교수

저자:Kyongmin Kim, In Seo, Wook-Shin Han, Jeong-Hoon Lee, Sungpack Hong, Hassan Chafi, Hyungyu Shin, Geonhwa Jeong

논문초록: A dynamic graph is defined by an initial graph and a graph update stream consisting of edge insertions and deletions. Identifying and monitoring critical patterns in the dynamic graph is important in various application domains such as fraud detection, cyber security, and emergency response. Given a dynamic data graph and a query graph, a continuous subgraph matching system reports positive matches for an edge insertion and reports negative matches for an edge deletion. Previous systems show significantly low throughput due to either repeated subgraph matching for each edge update or expensive overheads in maintaining enormous intermediate results. We present a fast continuous subgraph matching system called TurboFlux which provides high throughput over a fast graph update stream. TurboFlux employs a concise representation of intermediate results, and its execution model allows fast incremental maintenance. Our empirical evaluation shows that TurboFlux significantly outperforms existing competitors by up to six orders of magnitude.

 

논문제목: RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning

대회명: SIGMOD 2018

발표자: 송환준(KAIST)

지도교수: 이재길 교수

저자: 송환준, 이재길(KAIST)

논문초록:

In most parallel DBSCAN algorithms, neighboring points are assigned to the same data partition for parallel processing to facilitate calculation of the density of the neighbors. This data partitioning scheme causes a few critical problems including load imbalance between data partitions, especially in a skewed data set. To remedy these problems, we propose a cell-based data partitioning scheme, pseudo random partitioning, that randomly distributes small cells rather than the points themselves. It achieves high load balance regardless of data skewness while retaining the data contiguity required for DBSCAN. In addition, we build and broadcast a highly compact summary of the entire data set, which we call a two-level cell dictionary, to supplement random partitions. Then, we develop a novel parallel DBSCAN algorithm, Random Partitioning-DBSCAN (shortly, RP-DBSCAN), that uses pseudo random partitioning together with a two-level cell dictionary. The algorithm simultaneously finds the local clusters to each data partition and then merges these local clusters to obtain global clustering. To validate the merit of our approach, we implement RP-DBSCAN on Spark and conduct extensive experiments using various real-world data sets on 12 Microsoft Azure machines (48 cores). In RP-DBSCAN, data partitioning and cluster merging are very light, and clustering on each split is not dragged out by a specific worker. Therefore, the performance results show that RP-DBSCAN significantly outperforms the state-of-the-art algorithms by up to 180 times.

 

논문제목: How to Impute Missing Ratings? Claims, Solution, and Its Application to Collaborative Filtering
대회명: WWW 2018
발표자: 이영남(한양대)
지도교수:김상욱 교수
저자:이영남,김상욱,박선주, Xing Xie
논문초록:

Data sparsity is one of the biggest problems faced by collaborative filtering used in recommender systems. Data imputation alleviates the data sparsity problem by inferring missing ratings and imputing them to the original rating matrix. In this paper, we identify the limitations of existing data imputation approaches and suggest three new claims that all data imputation approaches should follow to achieve high recommendation accuracy. Furthermore, we propose a deep-learning based approach to compute imputed values that satisfies all three claims. Based on our hypothesis that most pre-use preferences (e.g., impressions) on items lead to their post-use preferences (e.g., ratings), our approach tries to understand via deep learning how pre-use preferences lead to post-use preferences differently depending on the characteristics of users and items. Through extensive experiments on real-world datasets, we verify our three claims and hypothesis, and also demonstrate that our approach significantly outperforms existing state-of-the-art approaches.
 

 

[소프트웨어공학]

논문제목: Prioritizing Browser Environments for Web Application Test Execution

대회명: ICSE 2018

발표자: 권정현(KAIST)

지도교수: 고인영 교수

저자: 권정현, 고인영, Gregg Rothermel

논문초록: When testing client-side web applications, it is important to consider different web-browser environments. Different properties of these environments such as web-browser types and underlying platforms may cause a web application to exhibit different types of failures. As web applications evolve, they must be regression tested across these different environments. Because there are many environments to consider this process can be expensive, resulting in delayed feedback about failures in applications. In this work, we propose six techniques for providing a developer with faster feedback on failures when regression testing web applications across different web-browser environments. Our techniques draw on methods used in test case prioritization; however, in our case we prioritize web-browser environments, based on information on recent and frequent failures. We evaluated our approach using four non-trivial and popular open-source web applications. Our results show that our techniques outperform two baseline methods, namely, no ordering and random ordering, in terms of the cost-effectiveness. The improvement rates ranged from -12.24% to 39.05% for no ordering, and from -0.04% to 45.85% for random ordering.

 

논문제목: Applying deep learning based bug triager to industrial projects

대회명: FSE2017

발표자: 이선로(중앙대)

지도교수: 이찬근 교수

저자: 이선로, 허민재, 이찬근, 김밀한, 정가을

논문초록: Finding the appropriate developer for a bug report, so called ‘Bug Triage’, is one of the bottlenecks in the bug resolution process. To address this problem, many approaches have proposed various automatic bug triage techniques in recent studies. We argue that most previous studies focused on open source projects only and did not consider deep learning techniques. In this paper, we propose to use Convolutional Neural Network and word embedding to build an automatic bug triager. The results of the experiments applied to both industrial and open source projects reveal benefits of the automatic approach and suggest co-operation of human and automatic triagers. Our experience in integrating and operating the proposed system in an industrial development environment is also reported.

 

[인공지능]

논문제목: Name Nationaltiy Classification with Recurrent Neural Networks

대회명: IJCAI2017

발표자: 이진혁(고려대)

지도교수: 강재우 교수

저자: 이진혁, 김현재, 고미영, 최동희, 최재훈, 강재우

논문초록: Personal names tend to have many variations differing from country to country. Though there exists a large amount of personal names on the Web, nationality prediction solely based on names has not been fully studied due to its difficulties in extracting subtle character level features. We propose a recurrent neural network based model which predicts nationalities of each name using automatic feature extraction. Evaluation of Olympic record data shows that our model achieves greater accuracy than previous feature based approaches in nationality prediction tasks. We also evaluate our proposed model and baseline models on name ethnicity classification task, again achieving better or comparable performances. We further investigate the effectiveness of character embeddings used in our proposed model.

 

논문제목: Deep Recurrent Neural Network-Based Identification of Precursor microRNAs

대회명: NIPS 2017

발표자: 민선우(서울대)

지도교수: 윤성로 교수

저자: 박승현, 민선우, 최현수, 윤성로

논문초록: MicroRNAs (miRNAs) are small non-coding ribonucleic acids (RNAs) which play key roles in post-transcriptional gene regulation. Direct identification of mature miRNAs is infeasible due to their short lengths, and researchers instead aim at identifying precursor miRNAs (pre-miRNAs). Many of the known pre-miRNAs have distinctive stem-loop secondary structure and structure-based filtering is usually the first step to predict the possibility of a given sequence being a pre-miRNA. To identify new pre-miRNAs that often have non-canonical structure, however, we need to consider additional features other than structure. To obtain such additional characteristics, existing computational methods rely on manual feature extraction, which inevitably limits the efficiency, robustness, and generalization of computational identification. To address the limitations of existing approaches, we propose a pre-miRNA identification method that incorporates (1) a deep recurrent neural network (RNN) for automated feature learning and classification, (2) multimodal architecture for seamless integration of prior knowledge (secondary structure), (3) an attention mechanism for improving long-term dependence modeling, and (4) an RNN-based class activation mapping for highlighting the learned representations that can contrast pre-miRNAs and non-pre-miRNAs. In our experiments with recent benchmarks, the proposed approach outperformed the compared state-of-the-art alternatives in terms of various performance metrics.

 

논문제목: Quantized Memory-Augmented Neural Networks

대회명: AAAI 2018

발표자: 박성식(서울대)

지도교수: 윤성로 교수

저자: 박성식, 김세준, 이세일, 배호, 윤성로

논문초록: Memory-augmented neural networks (MANNs) refer to a class of neural network models equipped with external memory (such as neural Turing machines and memory networks). These neural networks outperform conventional recurrent neural networks (RNNs) in terms of learning long-term dependency, allowing them to solve intriguing AI tasks that would otherwise be hard to address. This paper concerns the problem of quantizing MANNs. Quantization is known to be effective when we deploy deep models on embedded systems with limited resources. Furthermore, quantization can substantially reduce the energy consumption of the inference procedure. These benefits justify recent developments of quantized multilayer perceptrons, convolutional networks, and RNNs. However, no prior work has reported the successful quantization of MANNs. The in-depth analysis presented here reveals various challenges that do not appear in the quantization of the

other networks. Without addressing them properly, quantized MANNs would normally suffer from excessive quantization error which leads to degraded performance. In this paper, we identify memory addressing (specifically, content-based addressing) as the main reason for the performance degradation and propose a robust quantization method for MANNs to address the challenge. In our experiments, we achieved a computation-energy gain of 22× with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46% and 30% with 8-bit fixed-point and binary quantization, respectively, compared to the MANN quantized using

conventional techniques.

 

 

논문제목: Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification

대회명: IJICAI2018

발표자: 이성민(서울대)

지도교수: 김선 교수

저자: 이성민, 서석준, 김선

논문초록: Network biology has been successfully used to help reveal complex mechanisms of disease, especially cancer. On the other hand, network biology requires in-depth knowledge to construct diseasespecific networks, but our current knowledge is very limited even with the recent advances in human cancer biology. Deep learning has shown an ability to address the problem like this. However, it conventionally used grid-like structured data, thus application of deep learning technologies to the human disease subtypes is yet to be explored. To overcome the issue, we propose a hybrid model, which integrates two key components 1) graph convolution neural network (graph CNN) and 2) relation network (RN). Experimental results on synthetic data and breast cancer data demonstrate that our proposed method shows better performances than existing methods.

 

[컴퓨터시스템/스토리지]

논문제목: Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations

대회명: ISCA2017

발표자: 박창현(KAIST)

지도교수: 허재혁 교수

저자: 박창현, 허태경, 정준기, 허재혁

논문초록: To mitigate excessive TLB misses in large memory applications, techniques such as large pages, variable length segments, and HW coalescing, increase the coverage of limited hardware translation entries by exploiting the contiguous memory allocation. However, recent studies show that in non-uniform memory systems, using large pages often leads to performance degradation, or allocating large chunks of memory becomes more difficult due to memory fragmentation. Although each of the prior techniques favors its own best chunk size, diverse contiguity of memory allocation in real systems cannot always provide the optimal chunk of each technique. Under such fragmented and diverse memory allocations, this paper proposes a novel HW-SW hybrid translation architecture, which can adapt to different memory mappings efficiently. In the proposed hybrid coalescing technique, the operating system encodes memory contiguity information in a subset of page table entries, called anchor entries. During address translation through TLBs, an anchor entry provides translation for contiguous pages following the anchor entry. As a smaller number of anchor entries can cover a large portion of virtual address space, the efficiency of TLB can be significantly improved. The most important benefit of hybrid coalescing is its ability to change the coverage of the anchor entry dynamically, reflecting the current allocation contiguity status. By using the contiguity information directly set by the operating system, the technique can provide scalable translation coverage improvements with minor hardware changes, while allowing the flexibility of memory allocation. Our experimental results show that across diverse allocation scenarios with different distributions of contiguous memory chunks, the proposed scheme can effectively reap the potential translation coverage improvement from the existing contiguity.

 

논문제목: RFLUSH: Rethink the Flush

대회명: USENIX FAST2018

발표자: 연제성(충북대)

지도교수: 이은지 교수

저자: 연제성 정민성 이성진 이은지

논문초록: A FLUSH command has been used for decades to enforce persistence and ordering of updates in a storage device. The command forces all the data in the volatile buffer to non-volatile media to achieve persistency. This lumpsum approach to flushing has two performance consequences. First, it slows down non-volatile materialization of the writes that actually need to be flushed. Second, it deprives the writes that need not to be flushed of an opportunity for absorbing future writes and coalescing. We attempt to characterize the problems of this semantic gap of flushing in storage devices and propose RFLUSH that allows a fine-grained control over flushing in them. The RFLUSH command delivers a range of LBAs that need to be flushed and thus enables the storage device to force only a subset of data in its buffer. We implemented this fine-grained flush command in a storage device using an open-source flash development platform and modified the F2FS file system to make use of the command in processing fsync requests as a case study. Performance evaluation using the prototype implementation shows that the inclusion of RFLUSH improves the throughput by up to 5.6x; reduces the write traffic by up to 43%; and eliminates the long tail in the response time.

 

논문제목: Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree

대회명: USENIX FAST2018

발표자: 황득연(UNIST)

지도교수: 남범석 교수

저자: Deukyeon Hwang and Wook-Hee Kim, UNIST; Youjip Won, Hanyang University; Beomseok Nam, UNIST

논문초록: With the emergence of byte-addressable persistent memory (PM), a cache line, instead of a page, is expected to be the unit of data transfer between volatile and nonvolatile devices, but the failure-atomicity of write operations is guaranteed in the granularity of 8 bytes rather than cache lines. This granularity mismatch problem has generated interest in redesigning block-based data structures such as B+-trees, and attempts have been made to use in-memory data structures for PM. However, various methods of modifying B+-trees for PM degrade the efficiency of B+-trees due to the additional metadata and high rebalancing overhead caused by logging methods. In this study, we develop Failure-Atomic ShifT (FAST) and Failure-Atomic In-place Rebalance (FAIR) algorithms. FAST and FAIR modify legacy B+-trees in a byte-addressable fashion but solves the granularity mismatch problem. Every 8-byte store instruction used in the FAST and FAIR algorithms transforms a B+-tree into another consistent state or a transient inconsistent state that read operations can tolerate. By making read operations transient inconsistency, we can eliminate expensive copy-on-write, logging, and even the necessity of read latches so that read transactions can be non-blocking. Our experimental results show that legacy B+-trees with FAST and FAIR schemes outperform the state-of-the-art persistent indexing structures by a large margin.

 

[프로그래밍언어]

논문제목: Taming Undefined Behavior in LLVM

대회명: PLDI 2017

발표자: 이준영(서울대)

지도교수: 허충길 교수

저자: Juneyoung Lee, Yoonseung Kim, YoungJu Song, Chung-Kil Hur (Seoul National University),

Sanjoy Das (Azul Systems), David Majnemer (Google), John Regehr (University of Utah),

Nuno P. Lopes (Microsoft Research)

논문초록: A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis.

In this paper we study an aspect of IR design that has received little attention: the role of undefined behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel’s, and Microsoft’s, supports one or more forms of undefined behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM’s IR fails to justify some cases of loop unswitching, global value numbering, and other important “textbook” optimizations, causing long-standing bugs.

We present solutions to the problems we have identified in LLVM’s IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

 

논문제목: Data-Driven Context-Sensitivity for Points-to Analysis

대회명: OOPSLA 2017

발표자: 전민석(고려대)

지도교수: 오학주 교수

저자: 정세훈, 전민석, 차성덕, 오학주

논문초록: We present a new data-driven approach to achieve highly cost-effective context-sensitive points-to analysis for Java. While context-sensitivity has greater impact on the analysis precision and performance than any other precision-improving techniques, it is difficult to accurately identify the methods that would benefit the most from context-sensitivity and decide how much context-sensitivity should be used for them. Manually designing such rules is a nontrivial and laborious task that often delivers suboptimal results in practice. To overcome these challenges, we propose an automated and data-driven approach that learns to effectively apply context-sensitivity from codebases. In our approach, points-to analysis is equipped with a parameterized and heuristic rules, in disjunctive form of properties on program elements, that decide when and how much to apply context-sensitivity. We present a greedy algorithm that efficiently learns the parameter of the heuristic rules. We implemented our approach in the Doop framework and evaluated using three types of context-sensitive analyses: conventional object-sensitivity, selective hybrid object-sensitivity, and type-sensitivity. In all cases, experimental results show that our approach significantly outperforms existing techniques.

 

논문제목: Automatically Generating Search Heuristics for Concolic Testing

대회명: ICSE 2018

발표자: 차수영(고려대)

지도교수: 오학주 교수

저자: 차수영, 홍성준, 이준희, 오학주

논문초록: We present a technique to automatically generate search heuristics for concolic testing.

A key challenge in concolic testing is how to effectively explore the program’s execution paths to achieve high code coverage in a limited time budget.

Concolic testing employs a search heuristic to address this challenge, which favors exploring particular types of paths that are most likely to maximize the final coverage.

However, manually designing a good search heuristic is nontrivial and typically ends up with suboptimal and unstable outcomes.

The goal of this paper is to overcome this shortcoming of concolic testing by automatically generating search heuristics.

We define a class of search heuristics, namely a parameterized heuristic, and present an algorithm that efficiently finds an optimal heuristic for each subject program.

Experimental results with open-source C programs show that our technique successfully generates search heuristics that significantly outperform existing manually-crafted

heuristics in terms of branch coverage and bug-finding.

 


서울시 서초구 방배로 76 (방배동, 머리재빌딩 401호) 우)06704 | (Tel)1588-2728 | (Fax)02-521-1352 | 고유번호 : 114-82-03170 | 대표 : 엄영익

Copyright (c) KIISE. All rights reserved.