Text-to-GraphQL, OpenAI Project Proposal

openai Apr 20, 2020

The OpenAI scholars program ends with the deliverable of an open source final project. I'll be working on this project from now until the start of July when the program ends. This post will give a brief overview and a link to my proposal for that project.


The core goal of this project is simple. Given a natural english question and a GraphQL schema, give an equivalent GraphQL query. This means that the question and the schema will pass through a model, and that model will return a query that represents the question correctly.

For example the question:

How many players do we have?

Will return the following query:

query {
  players_aggregate {
    aggregate {

The query could be sent to the GraphQL endpoint and return a valid result.

There are a few deliverables leading up to, and following this main objective.


Dataset creation (building a GraphQL dataset based on the Spider SQL dataset)

Training Multiple models (BART, Transformer, T5, Reformer)

Interaction tool (input questions into a GUI)

Submitting a Paper (including info on my dataset and and results from models)

The full proposal is found here:

OpenAI final project proposal
Text-to-GraphQL Andre Carrera April 6, 2020 Description Over the last several years, there has been a lot of headway made on text-to-SQL tasks through the introduction of new datasets and challenges. The problem presented is, given a natural english question and a database schema, produce a SQL q...

I'd love to hear feedback! Please reach out to me on twitter @AndreCarreraPaz


So why this project?

The main reasons I'll explain below are that:

  1. This is an intersection of my interests. NLP, GraphQL. Deep learning and Engineering.
  2. I think this will be a useful contribution to the community.

NLP is a field that has advanced significantly in the last few years. With GPT-2 for example, we can now see a network generate text based off of any prompt. It's fun to play with, and there's even games that have come from it. BERT, is another NLP tool that can help with things such Text Classification and Categorization and even question answering.

GraphQL is an amazing developer tool. It's a query language for your API. That means that you can have a typed schema on top of any API with any set of business logic and databases underneath. It's nice that data is returned in a hierarchy and the tooling that has come with it is incredible. For example, there is GUI schema exploration with GraphiQL. And The Apollo libraries can generate typed code for your frontends.

Also, I like the idea of combining expert systems with deep learning models to get the best of both worlds. Broadly, this means that can use an unstructured question to get results from structured data. This combined with the ease of learning GraphQL makes for an interesting project. And hopefully with the release of dataset and code accompanying this project, future work will improve upon my results.

Stay tuned, the following blog posts will cover progress on my final project.