[Audio] MCAP SMART SCAN – a solution designed to process and validate documents.
[Audio] The MCAP OPS team currently faces challenges in the manual verification of Proof of Engagement (POE) documents, including time-consuming processes. They are frequently encountering errors due to duplicate documents being uploaded. Partners often enter the same details for different customers attending a particular event. This leads to issues for the operations team, such as approving the same document multiple times. The manual nature of the verification process increases the risk of human errors, which can lead to inaccuracies and delays in the overall process. Therefore, the MCAP OPS team needs to find an (AI ) driven solution to streamline the verification of POE documents by detecting duplicate information among the partner uploaded documents.
[Audio] Solution Overview: The solution leverages Azure OpenAI with GPT-4. Here's how it works: 1. Document Extraction: When a partner uploads a document, it is first processed using a PDF library to extract its content into a string format. 2. Prompt Processing with GPT-4: The extracted text is then passed to Azure OpenAI GPT-4. We have configured GPT-4 to accept specific prompts, allowing it to analyze the document and extract relevant data. The output includes structured information such as questions and answers extracted from the POE document. 3. Knowledge Base Integration: If the uploaded document is new (in other words, the first time it is being processed), the extracted data (questions, answers, and document information) is stored in Azure Search. Azure Search acts as a knowledge base for all processed documents. The information is stored in a vector database within Azure. 4. Duplicate Detection: When a new document is uploaded, it is cross-checked against the knowledge base using the stored vector data. The system compares the new document's questions and answers with the previously stored ones to identify similarities. If a match is found, the solution flags the document as a potential duplicate. 5. Similarity Scoring: To determine duplicacy, a similarity score is calculated. For the current implementation, any document with a similarity score greater than 5 is flagged as a duplicate. This threshold is configurable and can be adjusted based on requirements. If the similarity score is below 5, the document is treated as unique. 6. Feedback: Once a duplicate is detected, the system provides a detailed response to GPT-4, including a summary of similar files, their content, and specific questions and answers that overlap.
[Audio] This is what the proposed system looks like. Users can select and upload a document in pdf format. In this example, this is how the summary looks after the upload and processing are complete. No duplicate or similar documents were found in the knowledge base. The uploaded document is treated as unique and added to the knowledge base for future reference. The "Summary" section confirms that no matching reference files or similar questions were identified.
[Audio] This is another screen that elaborates on the scenario where a duplicate or similar document is detected upon upload. The Similar Questions section on the right identifies overlapping or duplicate content from previously uploaded documents. Based on the analysis, the system concludes that the uploaded file contains duplicate or similar information to what already exists in the knowledge base. Hence the file is not uploaded to the knowledge base, as it doesn't add new, unique data.
[Audio] Feature Expansion: The planned future expansions include support for extracting data from various document types using Azure Document Intelligence and other tools. The system will also handle multiple templates with predefined question prompts via Azure OpenAI Assistant for accurate document processing. Additionally, a backend process will upload historical documents to a Vector store for improved search and similarity matching.