News
AI Event

What Are Large Language Models? A Q&A with Walid Saba

By
No items found.
April 16, 2024
Share this post
What Are Large Language Models? A Q&A with Walid Saba

What exactly is a large language model and is it accurate to say it “models” language? As part of the Expeditions in Experiential AI Series, Walid Saba explained how language models only model “statistical regularities” in language, and that a better approach could be found in, what he calls, “bottom-up reverse-engineering of language at scale.”

After the talk, which you can find here, Walid stuck around to answer questions from the audience and was kind enough to write down his responses to those we didn’t have time for.

How do the words originally get encoded? (e.g. Man and Grandfather) - was that done manually at some point by someone in the past?

No, words are not encoded manually. There are many techniques to represent a word as a vector of real numbers (essentially a point in multi-dimensional space). The basic idea is to process a large corpus and for each word w we consider a window of size n (n words before the word and n words after the word). These contexts are encoded and used to train a neural network on these contexts (basically, you are using various contexts where w occurs to “teach” the network on various contexts in which w usually appears - that’s what makes the vectors of ‘apple’ and ‘orange’ similar because ‘apple’ and ‘orange’ occur in similar contexts). Once the output layer of the network starts predicting words with some accuracy, the weights of the layer just before the output are read and they make up the vector corresponding to that word. See this, this, and this.

I am creating a RAG for an unknown therapeutic where there is little proven science (meaning little beyond phase I clinical data). I believe the data I am supplementing has 20-30% of its content that is not very relevant to the solution. Do you believe this will still be a good approach to find a novel solution?  If not, is there a better approach?

The short answer is no: if only 20-30% of the context is relevant to the solution, then that is not enough. However, the main question here is what is your end goal, is it answering questions? If that’s the case then you do not have enough relevant content. To answer your question “is there a better approach?”, I would need to know what is the end goal, prediction, answering questions, etc. RAG works best if you want to answer questions and you have enough relevant information/content where the content has answers to most questions asked.

Can you provide more recommendations for using transformers in Python?

Checkout this and this. Example below.Firstpip intsall transformers Then, in Pythonfrom transformers import pipelinesentiment = pipeline("sentiment-analysis")res = sentiment(["My new phone has an amazing camera."])print(res)

Moving forward, how do we include “common knowledge base” and/or “common sense” into LLMs? Thanks for the great talk, finally a critical view on a subject with an annoyingly high amount of hype around it. Follow up: Would you see the return of inferential logic (i.e. second order and higher order logic) as a possible way to add symbolic representation of information encoded (somehow) inside an LLM?

I see many logics eventually being used to build true AGI - we cannot do planning or commonsense reasoning without using various modal and higher order logics. I also do not think we can do this in the current paradigm of LLMs, namely neural networks. We need to employ symbolic representations. But in my opinion this will not be in what currently is “hybrid” approaches or “neuro-symbolic” approaches but we need to build a “modular society of mind” where we have modules that are highly specialized and where the system as a whole is hybrid but not any specific module. See this (I think there’s a PDF available online somewhere!).

Thanks Walid, very interesting presentation! As you suggest “factual information” is decreasing, does this mean in the future, researchers may only want to use data before recent years such as 2023 to train LLMs (LLMs generated a lot of content last and this year)? Will this cause some crucial issues down the AI road?

First of all we do have factual information in corporate databases, in knowledge graphs, in documents/manuals, etc. LLMs will not be the source of our knowledge nor will we rely on the answers they provide in mission critical systems. We will eventually build true language understanding that will interface with factual sources. The current dominant tech (LLMs/ChatGPT/etc.) will be replaced be more sound technology in the future.

Thank you, very timely and insightful. Wall Street Journal published an article this morning about the potential hype surrounding AI, and referenced Elon Musk's prediction that AI will surpass all of human intelligence in 5 years and Vinod Khosla's prediction that AI will take over 80% of 80% of the jobs that exist today within 10 years. While these predictions may seem optimistic, the scale of capital, compute, and human capital dedicated to AGI is unprecedented relative to any other period in human history. How do you balance where AI is today with the potential of where AI can go given the exponential increase in resources?

Yes, the hype is unprecedented, and, unfortunately it is being fueled by (i) greed; and (ii) little knowledge by newcomers that do not fully understand the tech. New studies have shown that while $50 billion has been invested in the past few years (due to the hype), all these start-ups have generated $3 billion in return (50 in, only 3 out!). A bubble will soon burst and I fear that if we do not stop the misguided hype, we will hit another AI Winter. That’s what I (and few others) have been sounding the alarm about for several years. Tone down the hype and stop making unscientific and unrealistic claims. Hopefully some sanity will start to take over the field before the industry experiences a huge disappointment.

Thank you for this very interesting presentation. How about using an ontology like the wordnet, would this be an approach to follow?

Wordnet can and has been used by many in various NLP/NLU projects (I myself used it quite a bit). Today, however, while we can still use WordNet, we can do a lot more than that and use not just basic semantic relations between concepts but more sophisticated ontologies and knowledge graphs. Actually, one ambitious goal of mine and others is to build THE ontology that seems to be universal and seems to be implicit in all natural languages (the universal ontology of the Lingua Universalis!)

Do you think LLM based generative models can be used for automated software development? If so, what would be the risky points?

While there are many claims that this can be done, I must say that what can be done is just an “aid” to software engineering - and that was the old dream of CASE : Computer-Aided Software Engineering . But that is all that can be done, regardless of any other (misguided and misinformed) claim. We can NEVER fully automate software development in the sense of having an AI come up - from scratch - with a novel solution to a novel problem. All we can do is retrieve some relevant components that we can stitch together to solve some problems. Genuine novel solutions require “human intuition” - that Oracle, that Eureka, that no AI we have can come up with. Unless, that is, one day we can build an AI that has “intuition” and true “creativity”. So far, this is science fiction! 

Hello, in how many years should we expect new LLMs based on symbolic reasoning?

I hope it will not take long, as I am working exactly on that problem🙂

In your thinking, can linguistic symbolic grounding methods be used for general (non-linguistic) symbolic grounding? If so, would reasoning functions be the same?

Very tough question and the answer to that will take much longer than we can do here. The short answer is that a lot of symbolic knowledge that is needed for language is also needed in non-linguistic reasoning. For example, the knowledge of the “Containment” cognitive template that essentially is needed to understand that when some object x (say a glove) is contained in some object y (say a bag) then the location of the glove is always the location of y. This basic symbolic knowledge is needed in language understanding and in planning, in problem solving, and in general commonsense reasoning. Actually, without an encoding of such knowledge we will never have true AI. GREAT QUESTION 🙂

Has there been any integration of causal inference in LLMs and then testing causal assumptions?

There are attempts to “induce” casualty by some smart prompting, but I personally think these are very limited if not hopeless efforts. Modeling causality requires complex modeling of world knowledge and all of that requires that we go beyond LLMs and neural networks.

Where can we learn more about symbolic reasoning and category theory?

Symbolic reasoning is a huge subject that involves linguistics, cognitive science, logic, philosophy, etc. I would start with a couple of new books that are a good introduction as to why we need symbolic reasoning and these books have in turn good references:Rebooting AI: Building Artificial Intelligence We Can Trust Here. Machines like Us: Toward AI with Common Sense Here.

Is there any information about how chat platforms like ChatGPT, Gemini, Claude, etc. determine how to respond to a specific prompt. Is it always the most likely next token, or do they also take into account other factors, such as credibility, reliability, relevance, etc.?

The “next token prediction” is what the LLM does, but chat platforms like ChatGPT are more than an LLM - a chat and dialogue engine is built on top of the LLM that recursively predicts next word thus generating text. The chat system on top of the LLM does many things like remember the context of the discussion, and it also does some post-processing on the predictions of the LLM, etc. So we should not confuse the chatbot with the LLM that the bot uses. Now as to credibility, reliability, relevance, etc… these bots are good with relevance - no doubt! They can always stay on topic (using textual similarity!) But credibility (reliability) is another issue and here they cannot be relied on - LLMs do not have any access to the ground truth and they are not even concerned with truth and common knowledge. 

I am from the German Aerospace Center (DLR). I would like to ask, if we want to capture the structural patterns of a code, which approach would you suggest? I am currently using Abstract Syntax Trees but, while vectorization I think the structure is not retained (AST->WORD2VEC->Training. Would you like to kindly share your thoughts on the topic please?

Code is not just structure - that is just the syntactic part. Code is also semantics (the denotational and the procedural semantics). I am afraid that vectorization (embeddings) cannot do the job. Also, remember that all you do with vectors is similarity, but the semantics of code/programs is a lot more complex than just similarity. All we can do with similarity is retrieve similar code (or code that is concerned with a similar problem).

How do you see the way forward to application of ontologies to LLMs?

Ontologies are very important - and I mean here ontologies that are not just taxonomies or concept hierarchies as the term “ontology” is often used in AI today. I am here referring to ontology in the metaphysical sense - i.e., that universal ontology of the conceptual structure that is underneath ALL natural languages. That’s actually what I work on. But that has very little to do with current LLMs, but will eventually be very relevant to SYMBOLIC LLMs, once we develop those. 

Galactica, Codex, and Sparrow for Mathematical Reasoning. Where are we now and where will we be in 2-5 years?

Galactical was touted as a lot more than it can (or that it can ever do). The best thing we can say about Galactica, Codex, and Sparrow is that they are domain specific LLMs - or LLMs that have been fine tuned to do some scientific findings. But the idea that LLMs can eventually bemused in scientific discovery is science fiction (IMHO). And… any talk of automatic programming (or talk of an AI Software Engineer) is silly, to say the least. Automatic programming cannot happen unless we build human-like machines that have genuine “creativity” and “intuition” - machines that can experience a Eureka (“I got it”) moment! 

When the model becomes explainable, are we reaching the true NLU?

Explainability is required in any case and not just for NLU and alone does not mean we have NLU. But if we do have true understanding in a symbolic and conceptual setting then yes, we will have fully explainable AI.

How will Quantum Computing influence the different approaches?

In my opinion advances in Quantum Computing will only affect performance and not the algorithmic advances in AI. Quantum Computing is a new substrate for computing but they do not change the class of computable functions so their impact will be more on the speed of computation, which - if all predictions come true - will magnitude faster than current computers. But all these advances are not relevant to discoveries in the conceptual and theoretical models related to cognition (Quantum Computing is about the brain and our dark matter is still the mind)

You can watch Walid’s full talk here or read a recap here.