← Back to list
Registration: 30.07.2025

Arthur Silva

Specialization: Data Scientist
— I hold a bachelor’s degree in Computer Science and have 2.5 years of experience in data analysis. — My professional experience includes 1.5 years at Oi S/A, where I supported the supervision of ETL processes in a complex production environment. — I also spent one year as a data mining intern at the Court of Accounts of the State of Pernambuco (TCE-PE), contributing to the development of a tool for the automated validation of pricing in public procurement notices for medicines, using Python, SQL, and natural language processing techniques. — I am currently expanding my skill set by studying Power BI and AWS. — Additionally, I participated in a five-month academic exchange program in the United States, supported by an initiative of the state government of Pernambuco, Brazil.
— I hold a bachelor’s degree in Computer Science and have 2.5 years of experience in data analysis. — My professional experience includes 1.5 years at Oi S/A, where I supported the supervision of ETL processes in a complex production environment. — I also spent one year as a data mining intern at the Court of Accounts of the State of Pernambuco (TCE-PE), contributing to the development of a tool for the automated validation of pricing in public procurement notices for medicines, using Python, SQL, and natural language processing techniques. — I am currently expanding my skill set by studying Power BI and AWS. — Additionally, I participated in a five-month academic exchange program in the United States, supported by an initiative of the state government of Pernambuco, Brazil.

Portfolio

Ad-hoc v.s LLM based System for Information Retrieval in Large Tabular Data: A Comparative Study in Public [...]

● Background: Auditing is key when dealing with public expenses. Despite its importance, frequently auditing efforts must prioritize few targets due to a lack of human resources. However, leveraging the auditing process by developing a system that can automatically process large documents is a feasible task. ● Problem: The Information Retrieval (IR) problem considered in this work relies on two components: (i) the text to be searched and (ii) the data source where the required information is supposed to be. The first component is not standardized, presenting a challenge to an automated solution. The second component is structured; however, it is available in a large data source, which may consist of an obstacle for some automated IR methods. Specifically, given a drug specification, our system must find all available products that match this description in a large data source. ● Solution: This work investigates two different information retrieval solutions. The first approach basically relies on apriori knowledge of the problem for preprocessing the text and computing words similarity. The second approach leverages a powerful LLM to search in the same data source. ● IS Theory: Information Processing Theory Research Method: Proof of Concept Experimental Results: The results show that the proposed Ad-hoc method reaches accuracies from 72.4% up to 86.9% while the LLM based approach struggles to find satisfactory results mainly by its non-deterministic behavior and the hallucination problem. ● Contribution: With regard to the industry, the developed system has the potential to significantly improve the quality and scale of auditing processes. For the academy, the present work unveils limitations of using LLM based approaches for searching in large structured tabular data (± 25000 rows). ● Keywords: Information Retrieval, Tabular data, Natural Language Processing, LLM, Audit, Public Procurement, Medicine.

IR-MED

● IR-Med - An Ad Hoc Information Retrieval Approach for Medicines’ Purchasing Public Notices.

A Tool for Assisting the Audit of Municipal Public Notices for Medicines Purchasing

● Health public managers publish requirements for public procurement of medicines containing lists of pharmaceuticals and their respective unit prices, desired quantity and total purchase. Auditors need to identify whether the declared pricing is according with market values. This is a manual task and prone to errors. ● This paper proposes a solution to automate control of public procurement of medicines. ● The proposed tool was able, given an pharmaceutical and dosage, to identify similar products available on the market and thus, based on tax invoices data, assess whether the price quoted in the tender belongs to market prices distribution. ● The results obtained show that the tool is able of carrying out a part of the audit for a large quantity of the items. ● Keywords: audit, public notice, medicines, machine learning.

Skills

Python
Machine Learning
Data Mining
ETL
Matplotlib
Numpy
Pandas
SQL
JavaScript
Java
C
C#
R
Artificial Intelligence
Genetic Algorithms
LangChain
LLM
NLP
Neural Networks
Reinforcement Learning
Scikit-Learn
ETL
Google Sheets
Matplotlib
Microsoft Excel
Numpy
Pandas
Power BI
Text MIning
Neo4j
AWS
Docker
Git
Github
HTML/CSS
Statistics

Work experience

Data Scientist
05.2023 - 03.2024 |Tribunal de Contas do Estado de Pernambuco
Data Mining, Git, GitHub, Google Sheets, Microsoft Excel, LangChain, LLM, Machine Learning, Matplotlib, NLP, Numpy, Pandas, Python, Scikit-Learn, SQL, Statistics, Text Mining
● Contributed to a project focused on analyzing electronic invoices for medicine sales to evaluate pricing practices in public procurement notices. ● Developed Python scripts for data extraction, transformation, and analysis, including a custom algorithm for the automatic identification of active ingredients mentioned in the notices. ● The workflows were integrated with a PostgreSQL database for structured data storage and querying.
Digital Relationship Analyst
03.2022 - 06.2022 |Oi S/A
ETL, Google Sheets, Microsoft Excel, SQL, PostgreSQL, MariaDB
● Assisted in supervising the execution of ETL processes—daily, weekly, and monthly—in a complex production environment involving multiple integrated SQL databases (PostgreSQL, MariaDB, and others) and additional sources such as Hadoop. ● Provided support in error detection, troubleshooting, and responding to inquiries related to the production environment. ● Also responsible for verifying the fulfillment of prerequisites for the creation and updating of projects within the production environment.
Digital Resident
12.2020 - 01.2022 |Oi S/A
ETL, Google Sheets, Microsoft Excel, SQL
● Digital Resident is a position at Oi S/A, a Brazilian telecom company, established through a partnership with the high school I went to. ● This one-year trainee program provides residents with the opportunity to learn about the company and the specific sector they will be working in.

Educational background

Computer Science (Bachelor’s Degree)
2019 - 2025
Universidade Federal Rural de Pernambuco

Languages

EnglishAdvancedPortugueseNative