Arthur Silva
Portfolio
Ad-hoc v.s LLM based System for Information Retrieval in Large Tabular Data: A Comparative Study in Public [...]
● Background: Auditing is key when dealing with public expenses. Despite its importance, frequently auditing efforts must prioritize few targets due to a lack of human resources. However, leveraging the auditing process by developing a system that can automatically process large documents is a feasible task. ● Problem: The Information Retrieval (IR) problem considered in this work relies on two components: (i) the text to be searched and (ii) the data source where the required information is supposed to be. The first component is not standardized, presenting a challenge to an automated solution. The second component is structured; however, it is available in a large data source, which may consist of an obstacle for some automated IR methods. Specifically, given a drug specification, our system must find all available products that match this description in a large data source. ● Solution: This work investigates two different information retrieval solutions. The first approach basically relies on apriori knowledge of the problem for preprocessing the text and computing words similarity. The second approach leverages a powerful LLM to search in the same data source. ● IS Theory: Information Processing Theory Research Method: Proof of Concept Experimental Results: The results show that the proposed Ad-hoc method reaches accuracies from 72.4% up to 86.9% while the LLM based approach struggles to find satisfactory results mainly by its non-deterministic behavior and the hallucination problem. ● Contribution: With regard to the industry, the developed system has the potential to significantly improve the quality and scale of auditing processes. For the academy, the present work unveils limitations of using LLM based approaches for searching in large structured tabular data (± 25000 rows). ● Keywords: Information Retrieval, Tabular data, Natural Language Processing, LLM, Audit, Public Procurement, Medicine.
IR-MED
● IR-Med - An Ad Hoc Information Retrieval Approach for Medicines’ Purchasing Public Notices.
A Tool for Assisting the Audit of Municipal Public Notices for Medicines Purchasing
● Health public managers publish requirements for public procurement of medicines containing lists of pharmaceuticals and their respective unit prices, desired quantity and total purchase. Auditors need to identify whether the declared pricing is according with market values. This is a manual task and prone to errors. ● This paper proposes a solution to automate control of public procurement of medicines. ● The proposed tool was able, given an pharmaceutical and dosage, to identify similar products available on the market and thus, based on tax invoices data, assess whether the price quoted in the tender belongs to market prices distribution. ● The results obtained show that the tool is able of carrying out a part of the audit for a large quantity of the items. ● Keywords: audit, public notice, medicines, machine learning.