Gemini Deep Research Agent API 공개

3 weeks ago 24

구글이 Gemini Deep Research Agent를 API로 공개했습니다. Deep Research란 사용자가 던진 질문에 대해 AI가 스스로 검색 계획을 세우고, 여러 웹페이지를 탐색·비교·종합하여 출처가 달린 긴 호흡의 보고서를 자동 생성해주는 기능입니다. 기존에는 Google AI Studio 웹 UI에서만 쓸 수 있었는데, 이번에 Interactions API라는 새로운 비동기 방식의 인터페이스를 통해 개발자가 직접 자기 앱에 통합할 수 있게 되었습니다. 기존 generate_content 호출과 달리 작업이 백그라운드에서 수 분에 걸쳐 실행되므로, 요청을 보낸 뒤 주기적으로 완료 여부를 확인(polling)하거나 스트리밍으로 진행 상황을 받아보는 구조입니다.

제공 모델

deep-research-preview-04-2026 : 속도와 효율에 초점. 클라이언트 UI에 실시간 스트리밍하기에 적합합니다.
deep-research-max-preview-04-2026 : 최대한의 포괄성 추구. 자동화된 맥락 수집과 종합에 강점이 있습니다.

핵심 기능 요약

협업형 계획 수립(Collaborative Planning) : 리서치를 바로 실행하지 않고, 에이전트가 먼저 계획안을 제시합니다. 사용자가 검토·수정한 뒤 승인하면 그때 본격 실행에 들어갑니다.
차트 및 인포그래픽 자동 생성 : visualization="auto" 옵션을 켜면 에이전트가 자체적으로 차트와 그래프를 만들어 base64 인코딩 이미지로 반환합니다.
MCP 서버 연동 : Model Context Protocol(외부 도구를 LLM에 연결하는 개방형 규격)을 지원하여 금융 데이터 등 외부 서비스의 도구를 에이전트에 붙일 수 있습니다.
확장 도구 세트 : 구글 검색, URL 콘텐츠 읽기, 코드 실행이 기본 탑재. 파일 검색(업로드 문서 대상)과 MCP 서버를 선택적으로 추가할 수 있습니다.
멀티모달 입력 : 텍스트뿐 아니라 이미지, PDF, 오디오 파일을 리서치 맥락으로 함께 넘길 수 있습니다.
실시간 스트리밍과 사고 요약 : 리서치 진행 상황을 실시간으로 스트리밍받을 수 있으며, thinking_summaries="auto"를 켜면 에이전트의 중간 추론 과정도 요약 형태로 확인할 수 있습니다.

주요 코드 예제

가장 기본적인 사용법입니다. background=True로 비동기 작업을 시작하고, 10초 간격으로 완료 여부를 폴링합니다.

import time from google import genai client = genai.Client() interaction = client.interactions.create( input="Research the history of Google TPUs.", agent="deep-research-preview-04-2026", background=True, ) while True: interaction = client.interactions.get(interaction.id) if interaction.status == "completed": print(interaction.outputs[-1].text) break elif interaction.status == "failed": print(f"Research failed: {interaction.error}") break time.sleep(10)

협업형 계획 수립 흐름입니다. 먼저 collaborative_planning=True로 계획만 받고, 피드백을 주고, 마지막에 False로 바꿔야 실제 리서치가 시작됩니다. 단순히 "go ahead"라는 텍스트만 보내고 플래그를 바꾸지 않으면 보고서가 생성되지 않는 점에 주의가 필요합니다.

# 1단계: 계획 요청 plan = client.interactions.create( agent="deep-research-preview-04-2026", input="Research Google TPUs vs competitor hardware.", agent_config={"type": "deep-research", "collaborative_planning": True}, background=True, ) while (result := client.interactions.get(id=plan.id)).status != "completed": time.sleep(5) print(result.outputs[-1].text) # 계획안 출력 # 2단계: 계획 수정 (previous_interaction_id로 대화 이어가기) refined = client.interactions.create( agent="deep-research-preview-04-2026", input="Add a section comparing power efficiency.", agent_config={"type": "deep-research", "collaborative_planning": True}, previous_interaction_id=plan.id, background=True, ) while (result := client.interactions.get(id=refined.id)).status != "completed": time.sleep(5) print(result.outputs[-1].text) # 수정된 계획안 # 3단계: 승인 후 실행 (반드시 collaborative_planning=False로 전환) report = client.interactions.create( agent="deep-research-preview-04-2026", input="Plan looks good!", agent_config={"type": "deep-research", "collaborative_planning": False}, previous_interaction_id=refined.id, background=True, ) while (result := client.interactions.get(id=report.id)).status != "completed": time.sleep(5) print(result.outputs[-1].text) # 최종 보고서

차트 생성과 멀티모달 입력 예제입니다. 시각화 옵션은 켜두되, 프롬프트에서 구체적으로 어떤 차트를 원하는지 명시하면 더 좋은 결과를 얻을 수 있습니다.

# 차트 포함 리서치 interaction = client.interactions.create( agent="deep-research-preview-04-2026", input="Analyze global semiconductor market trends. Include charts showing market share changes.", agent_config={"type": "deep-research", "visualization": "auto"}, background=True, ) # PDF 논문을 맥락으로 넘기는 멀티모달 리서치 interaction = client.interactions.create( agent="deep-research-preview-04-2026", input=[ {"type": "text", "text": "What has been the impact of this research paper?"}, {"type": "document", "uri": "https://arxiv.org/pdf/1706.03762";, "mime_type": "application/pdf"}, ], background=True, )

MCP 서버를 연결하여 외부 금융 데이터를 에이전트에 제공하는 예제입니다. allowed_tools로 에이전트가 호출할 수 있는 도구를 제한할 수도 있습니다.

interaction = client.interactions.create( agent="deep-research-preview-04-2026", input="Research how recent geopolitical events influenced USD interest rates", tools=[ { "type": "mcp_server", "name": "Finance Data Provider", "url": "https://finance.example.com/mcp";, "headers": {"Authorization": "Bearer my-token"}, } ], background=True, )

차별점

단순한 RAG(검색 증강 생성, 외부 문서를 검색해 LLM에 넘기는 기법)나 한 번의 질의응답이 아니라, 계획-검색-종합이라는 긴 호흡의 리서치 워크플로우를 API 하나로 자동화한 점이 눈에 띕니다. 특히 협업형 계획 수립은 "에이전트가 알아서 하되, 방향은 사람이 잡는다"는 설계 철학을 잘 보여줍니다.
공개 웹 검색과 비공개 문서 검색을 도구 설정만으로 조합할 수 있어, 기업 내부 자료 기반 리서치에도 활용 가능성이 열려 있습니다.

시사점

AI 리서치 에이전트가 API 수준으로 내려옴에 따라, 별도의 에이전트 프레임워크 없이도 애플리케이션에 "딥 리서치" 기능을 직접 통합할 수 있게 되었습니다. 다만, 비동기 폴링 방식의 API 구조는 기존 동기식 LLM 호출에 익숙한 개발자에게 설계 패턴의 전환을 요구하며, 수 분 단위의 응답 지연을 UX 차원에서 어떻게 다룰지가 실제 도입 시 핵심 과제가 될 것으로 보입니다.

Read Entire Article