Skip to content
Lando Löper edited this page Aug 3, 2020 · 5 revisions

Welcome to the code-embeddings wiki!

Motivation

Documentation plays an important role in the process of software development. It helps other developers to better understand the software's source code and enables them to build on each others ideas.

The aim of this project is to improve the experience of developers with poorly documented code, when reaching out to the original authors is not an option.

Methodology

When working with third-party or legacy code, it is often more important understanding what a piece of code is doing, than understanding the underlying specifics of how it is doing it. Based on this assumption and the premise that function names offer a comprehensive description of what the function is doing, the high-level approach to the research question is modeled as follows:

We extract function names and their bodies from a large number of code repositories from GitHub and train a machine learning model to predict the function names based on the code in the source body.

1. Representation

Before the source code of the function bodies can be fed into the model, it has to brought into a suitable representation. According to the proposal of Alon, Uri et. al the source code is represented by a set of compositional path over its abstract syntax tree (AST). The idea behind the choice of this representation is that

Clone this wiki locally