SPEGQL: OpenAI scholars final project

The OpenAI scholars program requires a final open source project. I've been eagerly working on it and I'm happy to present it now. Along with this blog post, my final presentation, code and an academic paper, I present  Semantic Parsing English-to-GraphQL (SPEGQL).

I presented this project in my final project proposal. So in this post, I'll cover some of the highlights. The full details can be found in this paper.  

Background

GraphQL

GraphQL is a query language for your api.

It's become very popular recently because of several reasons. It represents the schema as a graph, nested relations of any depth can be easily queried, it can aggregate data over multiple datasources and responses are predictable among other things.

Semantic Parsing

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. In this case I wanted to semantically parse English to GraphQL.

Why this project?

I had a few reasons to work on this project:

  • I wanted to understand the limits of general language models for Semantic Parsing
  • This project could potentially ease the learning curve for new developers of GraphQL
  • Potential tooling for non technical data users such as managers to gain insights into their data

Objective

Given an English prompt:

“What is the name and date of the song released most recently?”

And some GraphQL Schema

type song {
 artist: artist
 artist_name: String
 country: String
 f_id: Int
 file: files
 genre: genre
 genre_is: String
 languages: String
 rating: Int
 releasedate: String
 resolution: Int
 song_name: String
}

...

Find a corresponding GraphQL Query:

query {
 song(limit: 1, order_by: {releasedate: desc}) {
   song_name
   releasedate
 }
}

This objective could be tested by passing the prompt and schema though a model to output a query. The process is as follows:

Methods

The process required multiple steps

  1. Create an English to GraphQL dataset
  2. Run experiments on Encoder-Decoder Transformer models (Bart and T5)
  3. Collect data and results
  4. Implement a graphical interface to interact with the model

Results

  • 46 - 50% exact set matching accuracy on GraphQL validation dataset

A couple of example videos will help show results as well

A query is generated for a schema and question.
The model is able to generalize to new, different schemas it was not trained on. 

That is a short overview of my project.

Here is the main Repo for creating and validating the Dataset:

https://github.com/acarrera94/sql-to-graphql

I also created an example notebook for anyone who wants to try out the model. This model is finetuned on GraphQL and SQL and can create queries for both languages:

https://colab.research.google.com/drive/1l1h8RlEl-IS0XfkDh66qikH4UsD19KF6?usp=sharing

I'm currently working on a paper that details the whole process, that's linked here.

And finally, I also gave a presentation about my project at OpenAI:

OpenAI Scholars Spring 2020: Final Projects
Our third class of OpenAI Scholars [/blog/openai-scholars-spring-2020/] presented their final projects at virtual Demo Day, showcasing their research results from over the past five months. These projects investigated problems such as analyzing how GPT-2 represents grammar, measuring the interpreta…
Show Comments