What are the steps needed to build this RAG application and deploy it?

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport.

What are the steps needed to build this RAG application and deploy it?
A . Ingest documents from a source C> Index the documents and saves to Vector Search C> User submits queries against an LLM C> LLM retrieves relevant documents C> Evaluate model C> LLM generates a response C> Deploy it using Model Serving
B . Ingest documents from a source C> Index the documents and save to Vector Search C> User submits queries against an LLM C> LLM retrieves relevant documents C> LLM generates a response -> Evaluate model C> Deploy it using Model Serving
C . Ingest documents from a source C> Index the documents and save to Vector Search C> Evaluate model C> Deploy it using Model Serving
D . User submits queries against an LLM C> Ingest documents from a source C> Index the documents and save to Vector Search C> LLM retrieves relevant documents C> LLM generates a response C> Evaluate model C> Deploy it using Model Serving

Answer: B

Explanation:

The Generative AI Engineer needs to follow a methodical pipeline to build and deploy a Retrieval-Augmented Generation (RAG) application. The steps outlined in option B accurately reflect this process:

Ingest documents from a source: This is the first step, where the engineer collects documents (e.g., technical regulations) that will be used for retrieval when the application answers user questions.

Index the documents and save to Vector Search: Once the documents are ingested, they need to be embedded using a technique like embeddings (e.g., with a pre-trained model like BERT) and stored in a vector database (such as Pinecone or FAISS). This enables fast retrieval based on user queries.

User submits queries against an LLM: Users interact with the application by submitting their queries.

These queries will be passed to the LLM.

LLM retrieves relevant documents: The LLM works with the vector store to retrieve the most relevant

documents based on their vector representations.

LLM generates a response: Using the retrieved documents, the LLM generates a response that is tailored to the user’s question.

Evaluate model: After generating responses, the system must be evaluated to ensure the retrieved documents are relevant and the generated response is accurate. Metrics such as accuracy, relevance, and user satisfaction can be used for evaluation.

Deploy it using Model Serving: Once the RAG pipeline is ready and evaluated, it is deployed using a model-serving platform such as Databricks Model Serving. This enables real-time inference and response generation for users.

By following these steps, the Generative AI Engineer ensures that the RAG application is both efficient and effective for the task of answering technical regulation questions.