A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

A survey of android application and malware hardening

V Sihag, M Vardhan, P Singh - Computer Science Review, 2021 - Elsevier
In the age of increasing mobile and smart connectivity, malware poses an ever evolving
threat to individuals, societies and nations. Anti-malware companies are often the first and …

Unsupervised translation of programming languages

B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …

Learning to represent programs with graphs

M Allamanis, M Brockschmidt, M Khademi - arXiv preprint arXiv …, 2017 - arxiv.org
Learning tasks on source code (ie, formal languages) have been considered recently, but
most work has tried to transfer natural language methods and does not capitalize on the …

Droidcat: Effective android malware detection and categorization via app-level profiling

H Cai, N Meng, B Ryder, D Yao - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Most existing Android malware detection and categorization techniques are static
approaches, which suffer from evasion attacks, such as obfuscation. By analyzing program …

Language-agnostic representation learning of source code from structure and context

D Zügner, T Kirschstein, M Catasta, J Leskovec… - arXiv preprint arXiv …, 2021 - arxiv.org
Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two
complementary representations of the same computer program. Traditionally, designers of …

A general path-based representation for predicting program properties

U Alon, M Zilberstein, O Levy, E Yahav - ACM SIGPLAN Notices, 2018 - dl.acm.org
Predicting program properties such as names or expression types has a wide range of
applications. It can ease the task of programming, and increase programmer productivity. A …

DOBF: A deobfuscation pre-training objective for programming languages

MA Lachaux, B Roziere… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recent advances in self-supervised learning have dramatically improved the state of the art
on a wide variety of tasks. However, research in language model pre-training has mostly …

Machine learning in compiler optimization

Z Wang, M O'Boyle - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
In the last decade, machine-learning-based compilation has moved from an obscure
research niche to a mainstream activity. In this paper, we describe the relationship between …

Learning natural coding conventions

M Allamanis, ET Barr, C Bird, C Sutton - Proceedings of the 22nd acm …, 2014 - dl.acm.org
Every programmer has a characteristic style, ranging from preferences about identifier
naming to preferences about object relationships and design patterns. Coding conventions …