Semantic Parsing English to GraphQL

openai Jul 2, 2020

For the last few months I've been eagerly working on building a Natural Language Processing model. I present Semantic Parsing English-to-GraphQL (SPEGQL).

In this post, I'll cover some of the highlights, and include my presentation at OpenAI.

Background

GraphQL

GraphQL is a query language for your API.

Recently GraphQL has gained a lot of popularity. Some of the main benefits I'll note are: it represents the API's schema as a graph, nested relations of any depth can be easily queried, it can aggregate data over multiple datasources and responses are predictable among other things.

Semantic Parsing

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. In this case I wanted to semantically parse from English, to GraphQL.

Why this project?

I had a few reasons to work on this project:

I wanted to understand the limits of general language models for Semantic Parsing
This project could potentially ease the learning curve for new developers of GraphQL
It has potential use as tooling for non technical data users, such as managers, to gain insights into their data

Objective

Given an English prompt:

“What is the name and date of the song released most recently?”

And some GraphQL Schema

type song {
 artist: artist
 artist_name: String
 country: String
 f_id: Int
 file: files
 genre: genre
 genre_is: String
 languages: String
 rating: Int
 releasedate: String
 resolution: Int
 song_name: String
}

...

Find a corresponding GraphQL Query:

query {
 song(limit: 1, order_by: {releasedate: desc}) {
   song_name
   releasedate
 }
}

This objective could be tested by passing the prompt and schema though a model (In this case T5) to output a query. The process is as follows:

Methods

The process required multiple steps

Create an English to GraphQL dataset
Run experiments on Encoder-Decoder Transformer models (Bart and T5)
Collect data and results
Implement a graphical interface to interact with the model

Results

46 - 50% exact set matching accuracy on GraphQL validation dataset

A couple of example videos will help show results as well

A query is generated for a schema and question.

The model is able to generalize to new, different schemas it was not trained on.

Other Notes

Here is the main Repo for creating and validating the Dataset:

https://github.com/acarrera94/sql-to-graphql

I also created an example notebook for anyone who wants to try out the model. This model is finetuned on GraphQL and SQL and can create queries for both languages:

https://colab.research.google.com/drive/1l1h8RlEl-IS0XfkDh66qikH4UsD19KF6?usp=sharing

I'm currently working on a paper that details the whole process, that's linked here.

And finally, I also gave a presentation about my project at OpenAI: