Friday, November 4th, 2022

Natural Language Processing

Madeline Drake

GFT (General Fine-Tuning)

Recently, deep nets have demonstrated significant progress with exciting results. Much of this work has been reported in leading media outlets and academic conferences such as ACL and NeurIPS. We have developed a “little” language, GFT, that makes deep nets look like regression. GFT is approachable to a broad audience, and is so easy to use that non-programmers and programmers alike can replicate many of these results.

Although there have been many recent promising results, it is important to set appropriate expectations. One of the points of GFT is to demystify deep nets. No one would suggest that regression-like methods are “magical” or even artificially intelligent. While deep nets can do a lot, there are many classic problems that have been open for decades in AI and for centuries in linguistics that go beyond regression-like methods.

Better Together: Text + Context

We are building embeddings of 200M documents from Semantic Scholar.  Some embeddings capture text (titles and abstracts) and others capture context (properties of other documents such as citations). Embeddings are N by K matrices where N is the number of documents and K are the number of hidden dimensions. Similarity of documents can be computed by cosines of two vectors of length K. Text embeddings are typically based on BERT and context embeddings are based on node2vec encodings of citation graphs.

At the Institute for Experiential AI, we are committed to further exploring and advancing Natural Language Processing research.