Home

Awesome

Generative AI Document Summarization

Description

Tagline

Create summaries of a large corpus of documents using Generative AI.

Detailed

This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical Character Recognition (OCR) and BigQuery.

PreDeploy

To deploy this blueprint you must have an active billing account and billing permissions.

Architecture

Document Summarization using Generative AI

  1. The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
  2. The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
  3. The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
  4. The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
  5. The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
  6. The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
  7. As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
  8. As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.

Documentation

Deployment Duration

Configuration: 1 mins Deployment: 10 mins

Cost

Cost Details

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

Inputs

NameDescriptionTypeDefaultRequired
bucket_nameThe name of the bucket to createstring"genai-webhook"no
gcf_timeout_secondsGCF execution timeoutnumber900no
project_idThe Google Cloud project ID to deploy tostringn/ayes
regionGoogle Cloud regionstring"us-central1"no
time_to_enable_apisWait time to enable APIs in new projectsstring"180s"no
webhook_nameName of the webhookstring"webhook"no
webhook_pathPath to the webhook directorystring"webhook"no

Outputs

NameDescription
genai_doc_summary_colab_urlThe URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution
neos_walkthrough_urlThe URL to launch the in-console tutorial for the Generative AI Document Summarization solution
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

APIs

A project with the following APIs enabled must be used to host the resources of this module:

Contributing

Refer to the contribution guidelines for information on contributing to this module.

Security Disclosures

Please see our security disclosure process.