1. Certificate Data Extraction.
Vision: Automating certificate data extraction and classification.
Mission:
● Developed OCR Data Extraction Solution: Led the design and development of an automated solution using Tesseract OCR to extract key certificate details such as Certificate Number, Provider, and Validity Dates.
● Data Cleaning and Structuring: Cleaned and processed raw OCR data, extracting structured information such as Certificate Name, Address, and Description, stored in JSON format in MongoDB.
● Machine Learning Model Integration: Trained a machine learning model to classify certificate names accurately based on the extracted data, improving data management efficiency.
● Deployed in Docker Containers: Deployed the solution in Docker containers, ensuring scalability, portability, and efficient execution of asynchronous tasks.
● Streamlined Data Management: Improved the process of certificate data extraction and classification, reducing manual efforts and enhancing overall data retrieval accuracy.
Tech Stack: Python, FAST API, MongoDB, OCR [Tesseract], Machine Learning, Docker, Linux.
2. Pragya360 ( Backend ).
Vision: Centralised Nugget data management System
Mission:
● Developed Pragya360: Led the design and development of Pragya360, a centralized web application to streamline the management of nugget information across multiple platforms.
● Implemented Verification Workflow: Designed and implemented a verification system where end users create nugget requests that are approved before being made searchable, ensuring only verified information is accessible.
● Enhanced Search Functionality: Integrated NLP-based search capabilities to enable efficient searching across text, improving nugget discoverability.
● System Integration: Successfully integrated the application with existing document management systems, centralizing nugget data and enhancing collaboration across platforms.
● Introduced Feedback Mechanism: Developed a feedback system to gather user input for continuous improvement, ensuring the platform evolves based on user needs and enhances user satisfaction.
Tech Stack: Python, FAST API, MongoDB, Elastic Search, SQL, Docker, Linux.