langchain-Beam-transforms
This section provides a conceptual guide on the transforms offered by Langchain-Beam library. For explanation about what is transform (PTransform) and other Apache Beam-related concepts, please refer to this page
LLM Transform
The LLM transform integrates Large Language Models as PTransform
The LangchainBeam
is an Apache Beam PTransform
that uses LangChain’s ChatModel interface,
to integrate LLMs from various providers like OpenAI. The transform accepts a String
as input, where each element is processed using a specified language model via a LangchainModelHandler
and yields a LangchainBeamOutput
, containing the model’s responses for each input element.
Input and Output
- Input: A
PCollection<String>
representing the input strings to be processed by the language model. - Output: A
PCollection<LangchainBeamOutput>
containing the model’s responses and the corresponding input elements.
Usage in Pipeline
LangchainBeam
transform can be instantiated in pipeline using the run()
method and it takes a
LangchainModelHandler
as input.
LangchainBeam.run(handler);
the LangchainModelHandler
stores the instruction prompt and model options, which together define how the language model will process each input element. Here’s how these inputs can be provided:
-
Instruction Prompt: The instruction prompt is a string that specifies the task or operation the language model should perform on the input data. It provides the necessary context and instructions for the model to generate appropriate outputs. For example, a prompt might instruct the model to “Categorize the product review as Positive or Negative.”
-
Model Options: A configuration object that specifies the provide, language model to use and provides other options of the model.
Example:
// Define the instruction prompt for LLM
String instructionPrompt = "Categorize the product review as Positive or Negative.";
// Create model options
OpenAiModelOptions modelOptions = OpenAiModelOptions.builder()
.modelName("gpt-4o-mini")
.apiKey(apiKey)
.build();
// Initialize LangchainModelHandler with the prompt and model options
LangchainModelHandler handler = new LangchainModelHandler(instructionPrompt, modelOptions);
//create transform
LangchainBeam.run(handler);
Execution:
During pipeline execution the transform will use the model to process the input element based on the
provided instruction prompt and the transform with output a LangchainBeamOutput
object, which encapsulates the The model's response and the The input element that was processed.
LangchainBeamOutput out;
out.getOutput() // returns model's output
out.getInputElement() // returns input element
LangchainBeamOutput
is a Serializable class and it is serialized using default coder provided by Beam. So, the object can be directly passed on to next transform without any aditional coder step.
Embedding Transform
This transform integrates Embedding models as PTransform
to generate vector embeddings for text in beam pipeline.