Skip to content
@ArchAIve-Project

ArchAIve

An AI-powered artefact digitisation platform for the preservation and proliferation of heritage and culture. Developed for SCCCI.

The ArchAIve Project

Welcome to the ArchAIve GitHub organisation - this digital space withholds source code repositories for the ArchAIve project. 🏛️

ArchAIve is an AI-powered artefact digitisation platform for the preservation and proliferation of heritage and culture. It has been designed and developed by the ArchAIve team in response to a real problem statement by the Singapore Chinese Chamber of Commerce and Industry (SCCCI). The project was developed as part of the IT3100 AI Applications Project module in the NYP Diploma in IT course. 📖📚

Skip to a section:

Introduction

Screenshot of ArchAIve Website Homepage

ArchAIve is an AI-powered artefact digitisation platform for the preservation and proliferation of heritage and culture.

With custom-built and highly specialised AI models and ML pipelines, the platform enables the use of AI to automate interpretation and documentation of historical artefacts through a streamlined experience.

Since it was the team's first time developing a solution for a real-world client, we poured in much effort into making a robust, powerful, and efficient system design. The result is truly remarkable - ArchAIve is a project bigger than any we've done before.

The ArchAIve team comprises of:

  • Prakhar Trivedi (Team Lead, Image Captioning, Automatic Categorisation, User Management)
  • Joon Jun Han (Traditional Chinese OCR, AI Transcription Processing, Catalogue Browsing)
  • Toh Zheng Yu (Image Classification, Archivus Chatbot, Data Import & Processing)

The Problem

We picked Problem Statement #19 of the SCCCI x NYP collaboration. After examining problem descriptions and conversations with the company, we surmise the problem statement as follows:

How might we use Artificial Intelligence and software technologies to efficiently and effectively recognise, understand, preserve and catalog Chinese historical artefacts digitally?

Several thousands of historical meeting minutes written in Chinese calligraphy have been scanned with no index and no intuitive digital mapping of the information.

Additionally, photographs of many eras and key moments in history are available, but the tedium of manual recognition of faces and parties present makes it really difficult and time-consuming to understand a larger timeline.

This is the bipolar nature of the target data ArchAIve aims to digitise.

The following key pain points can be identified:

  1. Difficulty in interpreting Traditional Chinese calligraphy writing in old meeting minutes
  2. Tedium of converting handwritten text into digital form
  3. Challenges in recognising individuals in photographs
  4. Absence of a unified platform for digital artefact management

Our Solution

Our solution is what we endearingly call ArchAIve, a highly intelligent artefact digitisation platform. ArchAIve is a platform that maintains a nuanced interplay of AI, data, and users – several complex models, pipelines and processing is condensed and streamlined into a densely-packed user experience, offering simplicity while having no compromise on performance.

Image Gallery

Login

Login Page

It all starts here. Since ArchAIve is meant for SCCCI staff, only authorised people with valid accounts (provisioned by an admin) can login and access the suite of features. Security is robust throughout the user experience, with any privilege violation resulting in a safe redirect to the homepage.


Data Processing

Data Processing

Data processing page

Artefact Upload

Artefact upload modal

The typical user flow involves navigating to the Data Processing page through the sidebar. Here, artefact images are grouped up into "batches"; each batch has a distinct stage that it is in. This structured format makes the data importing procedure much smoother and intuitive.

Batches can be in one of the following stages:

  • Pending: Artefact images (be it handwritten traditional Chinese calligraphy meeting minute scans or event photos) can still be uploaded to a batch in this stage. No processing has been carried out.
  • Processing: A batch has been "confirmed" (no more artefact images can be uploaded) and a highly-optimised background workflow comprising of a densely-orchestrated pipeline of specialised ML models and LLM calls is underway. Artefacts in a batch are processed through the AI pipeline simultaneously, resulting in unbelievably fast processing performance.
  • Vetting: In this stage, all artefacts have been completely processed. Based on AI interpretations of image data, freshly generated metadata is associated with each artefact. Putting into practice a human-in-the-loop approach, ArchAIve now invites users to look through each artefact's metadata and make any corrections if necessary. Though artefact metadata can be edited later on, this stage ensures that a batch is rich with accurate and comprehensive metadata information, resulting in better AI-based catalogue integration performance in a later stage.
  • Integration: In this background workflow, an LLM automatically categorises event photo artefacts into appropriate categories based on their metadata information, resulting in a highly-organised catalogue. The LLM has the ability to associate event photo artefacts with existing categories, revise an existing category's name or description, or propose a new category entirely if it is promising that more artefacts would be associated down the road. Meeting minute artefacts are not involved in this stage, as they are sequential artefacts that need to be manually added to "books".
  • Completion: At this stage, all artefacts in the batch have been integrated into the catalogue successfully. Research and reference efforts can now be carried out smoothly.

It took our team quite some time to design this intricate framework for data importing. Since it is one of the most critical parts of the system, we designed the process to be as frictionless and intuitive as possible.


Data Studio

MM Data Studio

Clear, intuitive meeting minute or event photo artefact editing in the Data Studio. Easy navigation with left and right arrows.

Metadata information generated:

  • Meeting minute artefacts
    • Traditional Chinese Transcription: Custom OCR pipeline process and model. Refined with a Visual Language model.
    • Simplified Chinese Transcription: Translation through an LLM.
    • English Transcription: Translation through an LLM.
    • Summary: Based on English translation with an LLM.
    • Key Entity Detection: NER carried out with NLTK on English translation. Extracted entities speed up reference and indexing.
  • Event photo artefacts
    • Caption: Custom image captioning model fine-tuned on SCCCI data. Refined with a Visual Language model.
    • Key Figure Headshots: Face recognition system detects faces, matches against existing face data or creates if face has not been seen before. Figure is represented by a cropped headshot crop of the original artefact image.

Group View

Group View

A generic group view, allowing users to easily view books/categories/batches.

Manage Associations

Intuitively manage any artefact's associations with a Spotify-esque modal with simple toggle switches.

Group Creation

Simple, one-click book/category creation in the Catalogue Browser.

A completed batch, book, or category can be viewed in the Group View. View all artefacts in an organised, grid-like layout. Carry out research and reference, edit group details, and manage artefact associations all in one place.

Groups (specifically books/categories) can be created through a modal in the Catalogue Browser. Batch group view can be accessed by clicking on the batch name in Data Processing (applicable for batches in specific stages only).


Catalogue Browser

Catalogue Browser

The epicentre of the entire ArchAIve experience: Catalogue Browser.

Event Photo Artefact Details

Viewing of event photo artefacts.

Meeting Minute Artefact Details

Viewing of meeting minute artefacts.

A Netflix-esque Catalogue Browser serves as the main interface for users to easily access the entire stored catalogue. Meeting minutes are encapsulated within books within the first section of the page. Categories follow vertically, with the name of every category presented neatly. Artefact preview cards have a satisfying hover effect, with smart and natural navigation through arrows on either sides.

Clicking an artefact preview card opens up the artefact detail popover. This seamless experience can be controlled intuitively through arrow keys, with natural navigation along a book/category's artefact sequence. Data updates dynamically, providing an extremely smooth and comfortable research and reference experience for SCCCI users.


Archivus

Archivus Chatbot

Engage with digitised artefacts into a wholly unique and innovative manner with Archivus. Powered by LLMs, Archivus is a chatbot that has been supplied with metadata information for any artefact. Ask questions related to the artefact, engage in roleplay, and carry out research and reference efforts in an engaging, exciting manner thanks to AI.

Archivus is intuitvely placed in the artefact detail popover in Catalogue Browser. Chats are conversational, contextual and ephemeral.


Key Figures

Key Figures

A key figures page consolidates all faces detected by ArchAIve's face recognition system across the catalogue in one place. Users are able to easily rename/delete figures here too.


Account Management

User Management

Grid-like format with each user card shown with profile picture, name and role.

Superuser Account Management

Superuser-view of a regular account. Account activity (logged timely when actions are carried out by the user), profile information and critical actions are available.

A superuser (a pre-determined account with special admin privileges) can view, manage, and create user accounts through the User Management page in the Admin Console conveniently.

Superusers are able to invalidate login sessions, delete profile pictures (moderation), update account information, reset user passwords, and delete accounts entirely all in one place. Superuser actions are also logged as account activity to minimise confusion for the target user.

Architecture

Tech stack diagram:

Architecture Diagram White BG

Workflow Diagram:

ArchAIve System Overview

Final Thoughts

Our journey in developing ArchAIve was everything but stable. Change was the only constant. Updates from stakeholders and demanding deliverables plagued our journey. The unpredictability of training and refining AI models was frustrating. We iterated constantly.

We are proud to have persevered through many setbacks by staying steadfastly focused on our vision and goals. We absolutely love the end result, and we hope you loved it too. We hope you have a newfound interest in the enablement of heritage and culture preservation with technology.


Dive deeper into the nuanced business logic or explore the smooth user experience! Thank you for reading about ArchAIve! 🔥🤩

©️ 2025 The ArchAIve Team. All rights reserved.

Pinned Loading

  1. Frontend Frontend Public

    An intuitive, pleasant interface to interacting with the backend system, thanks to consistent design practices and one-click wonders.

    JavaScript 1

  2. Backend Backend Public

    A complex Flask API system empowered by custom ML models, LLMs and processing to facilitate artefact digitisation.

    Python 1

Repositories

Showing 3 of 3 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…