Semantic Parsing English to GraphQL

openai Jul 2, 2020

For the last few months I've been eagerly working on building a Natural Language Processing model. I present  Semantic Parsing English-to-GraphQL (SPEGQL).

In this post, I'll cover some of the highlights, and include my presentation at OpenAI.



GraphQL is a query language for your API.

Recently GraphQL has gained a lot of popularity. Some of the main benefits I'll note are: it represents the API's schema as a graph, nested relations of any depth can be easily queried, it can aggregate data over multiple datasources and responses are predictable among other things.

Semantic Parsing

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. In this case I wanted to semantically parse from English, to GraphQL.

Why this project?

I had a few reasons to work on this project:

  • I wanted to understand the limits of general language models for Semantic Parsing
  • This project could potentially ease the learning curve for new developers of GraphQL
  • It has potential use as tooling for non technical data users, such as managers, to gain insights into their data


Given an English prompt:

“What is the name and date of the song released most recently?”

And some GraphQL Schema

type song {
 artist: artist
 artist_name: String
 country: String
 f_id: Int
 file: files
 genre: genre
 genre_is: String
 languages: String
 rating: Int
 releasedate: String
 resolution: Int
 song_name: String


Find a corresponding GraphQL Query:

query {
 song(limit: 1, order_by: {releasedate: desc}) {

This objective could be tested by passing the prompt and schema though a model (In this case T5) to output a query. The process is as follows:


The process required multiple steps

  1. Create an English to GraphQL dataset
  2. Run experiments on Encoder-Decoder Transformer models (Bart and T5)
  3. Collect data and results
  4. Implement a graphical interface to interact with the model


  • 46 - 50% exact set matching accuracy on GraphQL validation dataset

A couple of example videos will help show results as well

A query is generated for a schema and question.
The model is able to generalize to new, different schemas it was not trained on. 

Other Notes

Here is the main Repo for creating and validating the Dataset:

I also created an example notebook for anyone who wants to try out the model. This model is finetuned on GraphQL and SQL and can create queries for both languages:

I'm currently working on a paper that details the whole process, that's linked here.

And finally, I also gave a presentation about my project at OpenAI:

OpenAI Scholars Spring 2020: Final Projects
Our third class of OpenAI Scholars [/blog/openai-scholars-spring-2020/] presented their final projects at virtual Demo Day, showcasing their researchresults from over the past five months. These projects investigated problemssuch as analyzing how GPT-2 represents grammar, measuring the interpreta…