A site devoted mostly to everything related to Information Technology under the sun - among other things.

Sunday, March 2, 2025

Automated Exploration of Knowledge with Large Language Models, Concept Extraction, and Cyc - A Proposal

In a previous post, I outlined how LLMs, Concept Extraction tools, and CYC could be combined to create a more structured, semantically rich representation of the generated text.  

In this post, I would like to build on that proposal and outline a way to automatically query LLMs based on their initial text in response to a user's question.

The process is as follows:

Overview of Steps:

  1. User poses a question to LLM 
  2. LLM Generates Text
  3. Concept Extraction is applied to LLM Text
  4. CYC uses to those extracted concepts to find related concepts (please see below)
  5. Questions are created from those related concepts
  6. Those questions are posed to the LLM & Summarized
I have already discussed the substance of steps 1 to 3 in my previous post.  Here I am going to outline steps 4 & 5.

Step 4:

The application of Concept Extraction tools to the LLM Text output results in a set of concepts.  To generate more concepts starting from a set of initial concepts and navigate to their "nearest" or related concepts, you can follow these steps using Cyc's knowledge base. The goal is to explore the relationships between concepts and expand the collection by finding the most connected or related concepts to each starting concept.

Overview of Steps:

  1. Start with a collection of initial concepts.
  2. For each concept, find its nearest related concepts based on the relationships in the Cyc knowledge base (e.g., “isa,” “part-of,” “related-to”).
  3. Navigate to these related concepts, recursively, if necessary, to explore further connections.
  4. Generate new concepts by expanding from the nearest neighbors.
  5. Visualization or Compilation

Key Queries in CycL for Navigating Concepts:

  • Isa (Is-A) Relationships: This tells you what class a concept belongs to. It’s often used to explore a concept’s category.
  • Part-Of: This is useful to find what parts or components a concept belongs to.
  • Related-To: This can be used to find other concepts that are semantically related to a given concept.
  • Sub-collection: This allows you to find all instances of a specific concept.

Example: Navigation from Concepts

Let's say you have a collection of starting concepts, such as Cow, Milk, and Farm.

You can navigate the relationships by querying Cyc for the nearest concepts based on different relationships. We’ll perform a step-by-step expansion for each of these concepts, showing how to retrieve the "nearest" related concepts.

Step A: Start with a Collection of Concepts

Let’s say you have the following concepts to begin with:

  • Cow
  • Milk
  • Farm

These are the seed concepts from which you want to expand.

Step B: Define Queries to Explore Nearest Concepts

Now, you can create queries for each starting concept to find the nearest related concepts. Here’s how you can perform some basic queries in CycL for this task.

Query 1: Find the "Is-A" (Classification) Relationships

This query identifies what category the concept belongs to. For example, for Cow, you can check for its broader classification.

(isa Cow ?X)

This asks, “What categories does Cow belong to?” and will return categories such as Mammal, DomesticAnimal, etc.

Query 2: Find "Related-To" Concepts

Next, you can explore relationships with other concepts. For example, you could ask:

(relatedTo Cow ?X)

This will give you concepts that are semantically related to Cow, such as Milk, Udder, Farm, etc.

Query 3: Find "Part-Of" Relationships

The Part-Of relationship is useful for exploring components or parts of a concept. For example:

(partOf Cow ?X)

This will return things that are part of a Cow, like Udder or Hoof.

Step C: Process the Results and Expand

Once you have the results from these queries, you can expand the set of concepts by finding the nearest related concepts from each of the starting concepts.

Example Expansion (for the concept "Cow"):

  1. Starting Concept: Cow
    • Query: isa Cow ?X
      • Result: Mammal, DomesticAnimal
    • Query: relatedTo Cow ?X
      • Result: Milk, Farm, Udder
    • Query: partOf Cow ?X
      • Result: Udder, Hoof

From Cow, the nearest related concepts are:

    • Mammal
    • DomesticAnimal
    • Milk
    • Farm
    • Udder
    • Hoof
  1. Starting Concept: Milk
    • Query: isa Milk ?X
      • Result: SubstanceProducedByMammals
    • Query: relatedTo Milk ?X
      • Result: Dairy, Farm
    • Query: partOf Milk ?X
      • Result: Cow

From Milk, the nearest related concepts are:

    • SubstanceProducedByMammals
    • Dairy
    • Farm
  1. Starting Concept: Farm
    • Query: isa Farm ?X
      • Result: AgriculturalFacility
    • Query: relatedTo Farm ?X
      • Result: Milk, Cattle, DairyFarm
    • Query: partOf Farm ?X
      • Result: Field, Barn

From Farm, the nearest related concepts are:

    • AgriculturalFacility
    • Cattle
    • DairyFarm
    • Milk
    • Field
    • Barn

Step D: Recursively Expand to New Concepts

If you want to explore even further from each of the nearest concepts, you can repeat the same queries for each new concept that was found. For instance:

  • If Milk led you to Dairy, you can now run:
  • (relatedTo Dairy ?X)

This might give you related concepts like Cheese, Butter, Yogurt, etc.

  • Similarly, if Cattle was found under Farm, you could run:
  • (relatedTo Cattle ?X)

And this might lead you to concepts like Beef, Livestock, etc.

Step E: Visualization or Compilation

Once you’ve collected a large set of related concepts, you can organize or visualize them as a concept graph or network. This allows you to see how the concepts are connected, and explore new insights based on their relationships.  It also enables a user to prune the concepts and to keep those that are germane to his interests.

Example Flow:

  • Start with: Cow
  • Expand: Mammal, DomesticAnimal, Milk, Farm, Udder, Hoof
  • Next: Explore Milk → Dairy, SubstanceProducedByMammals, Farm
  • Next: Explore Farm → AgriculturalFacility, DairyFarm, Cattle, Field, Barn
  • Continue expanding based on relations and parts...

Final Notes:

  • Recursion: You can use recursion to keep expanding from one set of related concepts to the next. For each concept, you check its is-a, related-to, and part-of relationships to find more concepts.
  • Prioritization: Depending on your task, you might prioritize certain relationships (e.g., “is-a” vs. “related-to”) to focus on more hierarchical or semantic links.
  • Reasoning: The ability to reason about relationships (e.g., if "A produces B" and "B is needed for C", then "A may be needed for C") helps enrich the exploration.

Step 5:

Next, we need to automatically create new questions to pose to the LLM.  The simplest way is to begin with the "5W": "why," "when," "how," "what," "where," and "who".  We create new question by applying the "5Ws" to the newly discovered neartest-neighbor concepts.  In order to do so meaningfully, we use the Cyc Query Language (CQL).

CQL has the ability to identify which of the "5W" questions may be applicable to a concept by evaluating the nature of the concept and the relationships it has within the Cyc knowledge base.  Here’s how CQL could potentially help identify which questions can apply:

  1. What: This is generally applied to objects, entities, or types. If the concept is a class or object, "what" could be used to query its characteristics or define what it is. For instance, querying "What is a dog?" or "What are the properties of a cat?"
  2. Where: Applied when the concept involves a location, spatial information, or events tied to geographical or spatial contexts. If the concept is related to a place or location, "where" would likely apply. For instance, "Where is the Eiffel Tower located?"
  3. Who: Applied to concepts that refer to people, individuals, or specific agents. If the concept represents a person or an actor in a particular event, "who" would apply. Example: "Who is the president of the United States?"
  4. When: Used for temporal concepts, events, or instances that are related to time. If the concept has a temporal dimension (such as an event or an occurrence), then "when" would apply. Example: "When did World War II start?"
  5. How: Typically used when the concept is associated with processes, methods, or causes. If the concept is a process, causal relationship, or method of doing something, "how" would be applicable. Example: "How does photosynthesis work?"
  6. Why: Applied for causal explanations or reasons. This question type is most often relevant to concepts related to reasons, causes, or justifications. For instance, "Why do leaves turn yellow in the fall?"

To determine which of these questions can be applied, you would need to check the class or type of concept and its relationships in the knowledge base and then build a query accordingly. For example:

  • What could be queried with: (#$isa #$myConcept #$Thing)
  • Where could be queried with: (#$isa #$myConcept #$Location)
  • Who could be queried with: (#$isa #$myConcept #$Person)
  • When could be queried with: (#$isa #$myConcept #$Time)
  • How could be queried with: (#$isa #$Concept #$Process)
  • Why could be queried with: (#$cause #$myConcept ?Cause)
In this manner, we can programmatically create basic new textual questions to pose to LLMs (via API).  We could also use "And" and "Or" to combinatorially compose those basic questions into new compound questions, which, we can then proceed to pose to the LLMs and generate new results.

One can programmatically invoke summarizer tool (e.g.: QuillBot, Grammarly, Ahrefs, Google Gemini, Vertex AI PaLM API, Document AI, TLDR This, Notta, Semrush, ChatGPT, ClickUp, Get Digest, Scribbr, Summary Generator, Paraphraser.io, Jasper, and Writesonic) to present a summary to the user.

Needless to say, there has to be user interface to any system that is built based on these ideas here to specify the depth and extent of the search for new concepts, to enable canned questions (in addition to the "5W") as well as user defined questions, calibration-like parameters for when to stop the search, how many combinatorial questions to generate, and a flexible architecture in which LLMs, Concept Extraction tools, and Summarize tools could be seamlessly swapped in and out.

I think one application of such an approach could be to explorations of Philosophy of Sciences, e.g., we start with an ontology from a subfield of science such as Physics, use the concepts from that ontology to generate new questions via CYC, and then pose those questions to the Corpus of Thomist and Neo-Thomist philosophy and see what we get.

About Me

My photo
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive