Setting up
To go over this guide, you will need Docker and Python 3.x installed. To quickly run Memgraph Platform (Memgraph database + MAGE library + Memgraph Lab) for the first time, do the following: On Linux/MacOS:memgraph-mage
and memgraph-lab
Docker services in two separate containers. Now you have Memgraph up and running! Read more about the installation process on Memgraph documentation.
To use LangChain, install and import all the necessary packages. We’ll use the package manager pip, along with the --user
flag, to ensure proper permissions. If you’ve installed Python 3.4 or a later version, pip
is included by default. You can install all the required packages using the following command:
Natural language querying
Memgraph’s integration with LangChain includes natural language querying. To utilized it, first do all the necessary imports. We will discuss them as they appear in the code. First, instantiateMemgraphGraph
. This object holds the connection to the running Memgraph instance. Make sure to set up all the environment variables properly.
refresh_schema
is initially set to False
because there is still no data in the database and we want to avoid unnecessary database calls.
Populating the database
To populate the database, first make sure it’s empty. The most efficient way to do that is to switch to the in-memory analytical storage mode, drop the graph and go back to the in-memory transactional mode. Learn more about Memgraph’s storage modes. The data we’ll add to the database is about video games of different genres available on various platforms and related to publishers.graph
object holds the query
method. That method executes query in Memgraph and it is also used by the MemgraphQAChain
to query the database.
Refresh graph schema
Since the new data is created in Memgraph, it is necessary to refresh the schema. The generated schema will be used by theMemgraphQAChain
to instruct LLM to better generate Cypher queries.
Querying the database
To interact with the OpenAI API, you must configure your API key as an environment variable. This ensures proper authorization for your requests. You can find more information on obtaining your API key here. To configure the API key, you can use Python os package:MemgraphQAChain
, which will be utilized in the question-answering process based on your graph data. The temperature parameter
is set to zero to ensure predictable and consistent answers. You can set verbose
parameter to True
to receive more detailed messages regarding query generation.
Chain modifiers
To modify the behavior of your chain and obtain more context or additional information, you can modify the chain’s parameters.Return direct query results
Thereturn_direct
modifier specifies whether to return the direct results of the executed Cypher query or the processed natural language response.
Return query intermediate steps
Thereturn_intermediate_steps
chain modifier enhances the returned response by including the intermediate steps of the query in addition to the initial query result.
Limit the number of query results
Thetop_k
modifier can be used when you want to restrict the maximum number of query results.
Advanced querying
As the complexity of your solution grows, you might encounter different use-cases that require careful handling. Ensuring your application’s scalability is essential to maintain a smooth user flow without any hitches. Let’s instantiate our chain once again and attempt to ask some questions that users might potentially ask.Prompt refinement
To address this, we can adjust the initial Cypher prompt of the QA chain. This involves adding guidance to the LLM on how users can refer to specific platforms, such as PS5 in our case. We achieve this using the LangChain PromptTemplate, creating a modified initial prompt. This modified prompt is then supplied as an argument to our refinedMemgraphQAChain
instance.
Constructing knowledge graph
Transforming unstructured data to structured is not an easy or straightforward task. This guide will show how LLMs can be utilized to help us there and how to construct a knowledge graph in Memgraph. After knowledge graph is created, you can use it for your GraphRAG application. The steps of constructing a knowledge graph from the text are:- Extracting structured information from text: LLM is used to extract structured graph information from text in a form of nodes and relationships.
- Storing into Memgraph: Storing the extracted structured graph information into Memgraph.
Extracting structured information from text
Besides all the imports in the setup section, importLLMGraphTransformer
and Document
which will be used to extract structured information from text.
LLMGraphTransformer
from the desired LLM and convert the document to the graph structure.
Storing into Memgraph
Once you have the data ready in a format ofGraphDocument
, that is, nodes and relationships, you can use add_graph_documents
method to import it into Memgraph. That method transforms the list of graph_documents
into appropriate Cypher queries that need to be executed in Memgraph. Once that’s done, a knowledge graph is stored in Memgraph.
The graph construction process is non-deterministic, since LLM which is used to generate nodes and relationships from unstructured data in non-deterministic.
Additional options
Additionally, you have the flexibility to define specific types of nodes and relationships for extraction according to your requirements.__Entity__
labels on all nodes which will be indexed for faster retrieval.
include_source
to True
and then the source document is stored and it is linked to the nodes in the graph using the MENTIONS
relationship.
id
property is generated since the document didn’t have any id
.
You can combine having both __Entity__
label and document source. Still, be aware that both take up memory, especially source included due to long strings for content.
In the end, you can query the knowledge graph, as explained in the section before: