'파이프라인' 태그의 글 목록

'파이프라인'에 해당되는 글 2건

2024.01.04 [LangChain] Pipeline으로 여러개의 프롬프트를 조합해서 쓰기 (Composing multiple prompts together using Pipeline)
2021.09.26 [Kubeflow] 파이프라인 개념과 구성요소 개요 (Conceptual overview of Kubeflow Pipelines)

[LangChain] Pipeline으로 여러개의 프롬프트를 조합해서 쓰기 (Composing multiple prompts together using Pipeline)

Deep Learning (TF, Keras, PyTorch)/Natural Language Processing 2024. 1. 4. 23:53

LangChain으로 LLM 모델을 사용한 애플리케이션을 만들 때 기존의 여러개 프롬프트를 재사용하고 조합해서 최종 프롬프트를 만들어서 사용해야 할 경우가 있습니다. 이때 LangChain의 PipelinePrompt 를 사용하면 되는데요, PipeliePrompt는 다음과 같이 Final promt, Pipeline prompts 의 두 개 부분으로 구성되어 있습니다.

- (1) Final prompt: 최종적으로 반환되는 프롬프트.

- (2) Pipeline prompts: string 이름과 prompt template 으로 구성되는 튜플의 리스트(a list of tuples). 각 프롬프트는 포맷이 정의되고, 동일한 이름의 변수로 파이프라인의 다음 template으로 전달됨.

간단한 예를 들어보겠습니다.

- (1) Final prompt: final_prompt

- (2) Pipeline prompts: introduction_prompt, example_prompt, start_prompt

- (3) Composing all together

: PipelinePromptTemplate(

final_prompt=final_prompt,

pipeline_prompts=[

("introduction", introduction_prompt),

("example", example_prompt),

("start", start_prompt)

])

먼저, 터미널에서 pip install로 langchain, openai 모듈을 설치합니다.

! pip install langchain openai

다음으로 실습에 필요한 모듈을 importing 합니다.

from langchain.prompts.pipeline import PipelinePromptTemplate
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

OpenAI의 Chat Model을 사용해서 답변을 생성할 것이므로 OPENAI API KEY를 환경변수에 등록합니다.

import os 

os.environ["OPENAI_API_KEY"]="sk-xxxx..."

(1) Final prompt 생성

여러개의 하위 프롬프트를 구성요소로 가지는 최종 프롬프트를 작성합니다. (나중에 변수이름과 하위 구성요소 프롬프트를 튜플로 매핑시키고, 하나의 리스트로 묶어줍니다.)

# Final template and prompt
final_template = """{introduction}

{example}

{start}"""

final_prompt = ChatPromptTemplate.from_template(final_template)

(2) Pipeline prompts 생성

(2-1) instruction_prompt 생성

# Introduction template and prompt
introduction_template = """You are an expert on {topic}."""

introduction_prompt = ChatPromptTemplate.from_template(introduction_template)

(2-2) exmaple_prompt 생성

# Example template and prompt
example_template = """Here is an example of an interaction:

Input: {example_input}
Output: {example_output}"""

example_prompt = ChatPromptTemplate.from_template(example_template)

(2-3) start_prompt 생성

# Start template and prompt
start_template = """Now, do this for real. 

Input: {input}
Output:"""

start_prompt = ChatPromptTemplate.from_template(start_template)

(3) 여러개의 Pipeline prompts를 PipelinePrompt를 이용해서 하나의 Pipeline으로 조합하기 (Composing all together)

# Composing multiple prompts together using PipelinePromptTemplate()
input_prompts = [
    ("introduction", introduction_prompt), 
    ("example", example_prompt), 
    ("start", start_prompt), 
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=final_prompt, pipeline_prompts=input_prompts
)


pipeline_prompt.input_variables
# ['example_input', 'example_output', 'input', 'topic']

최종 pipeline_prompt를 구성하는 각 하위 프롬프트의 변수이름에 값을 입력해서 프린트해보았습니다. 이제 프롬프트는 준비가 다 되었습니다.

print(
    pipeline_prompt.format(
        topic="Math", 
        example_input="It costs 2 dollar per apple. How much do you need in total to buy 10 apples?", 
        example_output="Since 2 x 10 = 20, you need a total of $20.", 
        input="How much does it cost to buy 6 cans of beer that cost $3 each?",
    )
)


# Human: [HumanMessage(content='You are an expert on Math.')]

# [HumanMessage(content='Here is an example of an interaction:\n\nInput: It costs 2 dollar per apple. How much do you need in total to buy 10 apples?\nOutput: Since 2 x 10 = 20, you need a total of $20.')]

# [HumanMessage(content='Now, do this for real. \n\nInput: How much does it cost to buy 6 cans of beer that cost $3 each?\nOutput:')]

(4) pipeline_prompt + Chat Model + Output Parser를 Chaining 하기

# Chat Model
model = ChatOpenAI(temperature=0, model='gpt-4')

# Output Parser
parser = StrOutputParser()

# Chaining
chain = pipeline_prompt | model | parser

(5) chain.invoke() 로 Chat Model 실행하고 결과 반환받기

두개를 실행시켜 봤는데 예제를 통해 가르켜준대로 잘 대답을 해주네요.

chain.invoke({
    "topic": "Math", 
    "example_input": "It costs 2 dollar per apple. How much do you need in total to buy 10 apples?", 
    "example_output": "Since 2 x 10 = 20, you need a total of $20.", 
    "input": "How much does it cost to buy 6 cans of beer that cost $3 each?"
})

# 'Since 6 x 3 = 18, you need a total of $18.'



chain.invoke({
    "topic": "Math", 
    "example_input": "It costs 2 dollar per apple. How much do you need in total to buy 10 apples?", 
    "example_output": "Since 2 x 10 = 20, you need a total of $20.", 
    "input": "How much does it cost in total to buy 6 cans of beer that cost $3 each \
    and two snacks that cost $5 each?"
})

# 'Since (6 x $3) + (2 x $5) = $18 + $10 = $28, you need a total of $28.'

[ Reference ]

* LangChain - Pipeline: https://python.langchain.com/docs/modules/model_io/prompts/pipeline

이번 포스팅이 많은 도움이 되었기를 바랍니다.

행복한 데이터 과학자 되세요. :-)

728x90

저작자표시 비영리 변경금지

'Deep Learning (TF, Keras, PyTorch) > Natural Language Processing' 카테고리의 다른 글

[LangChain] 표준 인터페이스, 비동기 인터페이스 (Standard Interfaces, Async Interfaces in LangChain) (0)	2024.01.06
Chain-of-Thought Prompting 이란 무엇인가? (0)	2024.01.05
[LangChain] 문자열 값을 반환하는 함수를 사용한 부분적 프롬프트 템플릿 (Partial prompt templates with functions) (0)	2024.01.03
[LangChain] Chat Model로 Few-shot Prompting 하기 (0)	2024.01.03
[LangChain] LLM Model로 Few-shot Prompting 하기 (2)	2024.01.02

Posted by Rfriend

[Kubeflow] 파이프라인 개념과 구성요소 개요 (Conceptual overview of Kubeflow Pipelines)

Kubeflow 2021. 9. 26. 23:12

지난번 포스팅에서는 K8s 의 툴인 kubectl을 설치하고 사용하는 방법 (https://rfriend.tistory.com/684) 을 소개하였습니다.

이번 포스팅에서는 Kubeflow를 이용했을 때 얻을 수 있는 큰 혜택 중의 하나인 파이프라인(Pipeline)의 구성요소에 대해서 코드 없이 개념적인 내용을 소개하겠습니다. (* Kubeflow.org 의 파이프라인 페이지 내용을 번역하였음.)

(1) Kubeflow Pipelines 의 개념적인 개요

(2) Kubeflow Pipelines 의 Component

(3) Kubeflow Pipelines 의 그래프 (Graph)

(4) Kubeflow Pipelines 의 실험 (Experiment)

(5) Kubeflow Pipelines 의 실행과 순환실행 (Run and Recurring Run)

(6) Kubeflow Pipelines 의 실행 트리거 (Run Trigger)

(7) Kubeflow Pipelines 의 단계(Step)

(8) Kubeflow Pipelines 의 산출물 Artifact (Output Artifact)

(1) Kubeflow Pipelines 의 개념적인 개요

파이프라인은 기계학습 워크플로우 (Machine Learning Workflow) 를 표현한 것으로서, 워크플로우의 모든 구성요소들을 포함하고, 이들 구성요소들이 서로 어떻게 관련되어 있는지를 그래프(Graph)의 형태로 표현합니다. 파이프라인을 실행하기 위해 필요한 파라미터의 입력값과, 각 구성요소(components)의 입력값과 출력값을 정의함으로써 파이프라인을 설정할 수 있습니다.

파이프라인을 실행시키면 시스템은 기계학습 워크플로우의 단계(steps)에 해당하는 만큼 한개 또는 여러개의 Kubernetes Pods 를 뜨웁니다. Pods는 Docker container를 시작하고, Container는 차례로 워크플로우 안의 프로그램을 실행시킵니다.

일단 파이프라인을 개발하고 나면, Kubeflow Pipelines UI 나 Kubeflow Pipelines SDK 를 이용해서 업로드할 수 있습니다.

Kubeflow Pipelines 플랫폼 구성은 아래와 같습니다.

- (a) 실험(experiments), 작업(jobs)과 실행(runs)을 관리하고 추적하는 사용자 인터페이스 (UI)

- (b) 다단계 기계학습 워크플로우를 스케줄링(scheduling multi-step ML workflows)하는 엔진

- (c) 파이프라인과 컴포넌트를 정의하고 조작하는 SDK(SW Development Kit)

- (d) SDK를 사용해서 시스템과 상호작용하는 노트북(notebooks)

Kubeflow Pipelines 의 목적은 다음과 같습니다.

- (a) End-to-End orchestration: 기계학습의 전체 파이프라인을 단순화하고 조정할 수 있게 함.

- (b) 쉬운 실험(easy experimentation): 수많은 아이디어와 기술을 시도하고 다양한 실험결과를 관리할 수 있게 함.

- (c) 쉬운 재사용(easy re-use): 매번 다시 구축할 필요 없이 기존의 컴포넌트와 파이프라인을 재사용해서 빠르게 end-to-end 솔루션을 생성할 수 있게 함.

(2) Kubeflow Pipelines 의 Component

Pipeline Component 는 기계학습 워크플로우에서 한 개의 단계를 실행하는 독립적이고 자기충족적인 코드 집합 (self-contained set of code)입니다. 가령, 데이터 전처리, 데이터 변환, 모델 훈련 등이 독립적인 워크플로우의 단계가 될 수 있습니다. Pipeline Component 는 함수(function)과 유사한데요, 이름, 파라미터, 반환값, (코드블록) 바디를 가지고 있습니다. 레고블록의 하나 하나의 조각블록을 생각하면 이해하기 쉬울거 같아요.

Component 세부 사항(component specifications)으로 아래의 것들을 정의합니다.

- (a) component 인터페이스: 인풋과 아웃풋

- (b) component 실행: 컨테이너 이미지, 실행 명령어

- (c) component 메타 데이터: 컴포넌트 이름, 설명

(3) Kubeflow Pipelines 의 그래프(Graph)

그래프(Graph)는 Pipeline의 실행을 Kubeflow Pipelines UI 에 노드(node)와 에지(edge)의 그래프 형태의 그림으로 표현한 것입니다. 그래프는 Pipeline이 실행되었거나 실행중인 단계(steps)를 보여주며, 화살표로 각 단계의 구성요소 간 부모/자식 관계 (parent/child relationships) 를 나타냅니다. 그래프는 파이프라인이 실행되기 시작하자마자 바로 볼 수 있습니다. 그래프 안의 각 노드는 파이프라인의 단계에 해당하며 각 단계에 맞추어서 이름이 부여됩니다.

[ Pipeline Graph 예시 ]

위의 그래프 예시화면에서 보면 각 노드의 상단 우측에 아이콘이 있는데요, 이 아이콘은 각 단계의 진행상태를 의미합니다. 진행상태에는 '실행 중 (running)', '실행 성공 (succeeded)', '실행 실패 (failed)', '건너뜀 (skipped)' 등이 있습니다. 만약 부모 노드에서 조건절이 있고 조건에 해당된다면 자식노드는 '건너뜀 (skipped)' 상태가 될 수 있습니다.

(4) Kubeflow Pipelines 의 실험 (Experiment)

실험(Experiment) 메뉴는 Pipelines 에서 다른 종류의 설정들(different configurations)을 시도해볼 수 있는 작업공간입니다. 우리는 실험 화면에서 다양한 설정의 실험 실행을 논리적인 그룹들로 조직할 수 있습니다. 실험은 순환반복실행(recurring runs)과 같은 임의적인 실행도 포함할 수 있습니다.

[ Kubeflow Pipelines - Experiment UI]

(5) Kubeflow Pipelines 의 실행과 순환실행 (Run and Recurring Run)

실행(a run)은 파아프라인에서 한번 수행(a single execution)하는 것을 말합니다. 실행(runs)은 당신이 시도하는 모든 실험들의 불변하는 로그로 이루어지며, 재현이 가능하도록(reproducibility) 독립적이고 자기충족적(self-contained) 이도록 설계가 됩니다. 우리는 Kubeflow Pipeline UI 의 상세 페이지(details page)에서 실행의 진척도를 추적할 수 있으며, 수행시간 그래프와 산출물 artifacts, 그리고 실행의 각 단계별 로그도 볼 수 있습니다.

순환실행(recurring run), 또는 Kubeflow Pipelines backend APIs 안의 job, 은 파이프라인의 반복할 수 있는 실행(a repeatable run of a pipeline)을 말합니다. 순환실행을 위한 환경설정은 명시된 모든 파라미터값과 실행 트리거(run trigger) 를 가진 파이프라인의 복사를 포함합니다. 당신은 순환실행을 어느 실험 안에서나 시작할 수 있으며, 그것은 주기적으로 실행 환경설정의 새로운 복사를 시작할 것입니다. 당신은 Kubeflow Pipelines UI에서 순환실행의 활성화/비활성화를 선택할 수 있습니다. (위 Experiment UI의 하단에 있는 Run Type: One-off vs. Recurring). 당신은 또한 동시실행의 최대개수도 구체적으로 명시할 수 있으며, 병렬로 시작하는 실행의 숫자도 제한할 수 있습니다. 이런 기능들은 만약 파이프라인이 오랜 시간동안 수행이 될 것으로 예상되고 자주 실행이 된다면 유용하게 사용될 수 있습니다.

(6) Kubeflow Pipelines 의 실행 트리거 (Run Trigger)

실행 트리거(run trigger)는 시스템에게 언제 순환실행 환경설정(recurring rum configuration)이 새로운 실행을 생성해야 할지를 알려주는 표시(flag)입니다. 아래의 두가지 유형의 순환실행을 사용할 수 있습니다.

-. 주기적 (Periodic): 가령 매 2시간 또는 매 45분 간격마다 주기적으로 실행하는 것과 같이, 시간간격 기반의 실행 스케줄링(for an interval-based scheduling of runs)에 사용.

-. 특정 시기 (Cron): 명시적으로 실행 스케줄링의 '특정 시간 문구(cron semactics)'를 지정하고자 할 때 사용.

(7) Kubeflow Pipelines 의 단계(Step)

단계(step)는 파이프라인 안의 구성요소들 중의 하나의 실행(an execution of one of the components in the pipeline)을 의미합니다. 단계(step)와 구성요소(component)와의 관계는 인스턴스화(steps = component instances)의 하나로서, 실행(run)과 파이프라인(pipeline)의 관계와 상당히 유사합니다.

복잡한 파이프라인에서는 구성요소(components)는 순환문에서 여러번 수행할 수도 있으며, 또는 파이프라인 코드 안에서 if/else 조건절을 분석한 후에 조건부로 수행할 수도 있습니다.

(8) Kubeflow Pipelines 의 산출물 Artifact (Output Artifact)

산출물 artifact 는 파이프라인 구성요소에 의해서 생성되는 산출물로서, Kubeflow Pipelines UI 가 이를 이해하고 풍부한 시각화를 생성할 수 있습니다. 파이프라인 구성요소(pipeline components)에 산출물 artifact 를 포함시킴으로써 모델 성능평가, 실행을 위한 빠른 신속한 의사결정, 또는 다른 실행들 간의 비교를 하는데 유용하게 사용할 수 있습니다. Artifact 는 또한 파이프라인의 다양한 구성요소가 어떻게 작동하는지를 이해하는 것을 가능하게 해줍니다. Artifact 는 평범한 텍스트 뷰부터 풍부한 상호작용 시각화까지 다양한 형태를 띨 수 있습니다.