Research

Research projects and papers across software engineering, security, blockchain, and applied machine learning.

Research Agenda

Trustworthy AI for software engineering. How can AI systems used by developers expose their assumptions, behave more predictably, and remain understandable enough to support real engineering workflows?

Security and reliability of LLM-based systems. I study failure modes created when agentic systems, model hubs, and tool-integrated LLM applications meet unsafe defaults or weak contracts.

Empirical software engineering. Repository mining, developer discussion analysis, and deployment-aware evidence gathering are central to how I frame and validate research questions.

Projects and Papers

An Empirical Study on Remote Code Execution in ML Model Hosting Ecosystems

June 2025 - Oct 2025 · Submitted to TOSEM 2026

Cross-platform study of ~45,000 repositories across five ML platforms (Hugging Face, ModelScope, OpenCSG, OpenMMLab, PyTorch Hub) with co-authors Mohammad Latif Siddiq and Joanna C. Santos. Detected security issues using static analyzers (Bandit, CodeQL, Semgrep) and YARA malware signatures: found CWE-502 (unsafe deserialization) in 74.54% and CWE-95 (eval injection) in 15.02% of affected repositories; 10.41% of Hugging Face repos contain security smells. Analyzed 600+ developer discussions to build a taxonomy of security misconceptions; found 6.6% SafeTensors adoption and heavy trust_remote_code usage. Submitted to TOSEM 2026.

PythonBanditCodeQLSemgrepYARACWE Analysis

The Choice Can Be the Attack: Auditing Aligned Backdoors in LLM Agents

August 2025 - Present · In Progress

Agentic LLM systems are vulnerable to aligned backdoors: triggered behaviors that still satisfy the user's instruction but systematically steer which acceptable option the agent selects. This ongoing work introduces CCA, an endpoint-black-box audit with environment instrumentation, constraint-preserving randomized counterfactual environments, and pooled discrete-choice estimation over eligible options. The current evaluation covers synthetic WebShop stress tests and a real 3B WebShop endpoint case study to measure trigger-dependent steering, identify suspect-only effects, and clarify when counterfactual randomization alone is sufficient versus when feature-controlled choice modeling is necessary.

CCACCIAWebShopCounterfactual AuditingDiscrete Choice Modeling

VeriSchema: Multi-Agent Framework for Generating Relational DB Schema & ERD

July 2025 - Present · Targeting PVLDB

Extended SchemaAgent with Dr. Sukarna Barua (BUET) using LangGraph StateGraph architecture with conditional routing and a three-tier auto-repair system. Designed a six-stage pipeline with specialized agents for entity extraction, relationship mining, and normalization with Z3 formal verification. Implemented component-level retry logic with violation analysis, reducing redundant LLM calls by 80%.

Collaborating with Dr. Sukarna Barua, Assistant Professor at BUET, who specializes in software engineering, data science, and machine learning applications.

PythonLangGraphStateGraphZ3 SolverSQLAlchemyText2Schema

Sentiment Analysis of Anonymous Crisis Reports in Bangladesh

Sep 2024 - Nov 2024

Developed uReporter, Bangladesh's first anonymous reporting system during the 2024 national crisis. Analyzed 124 crowd-sourced reports using six transformer models with a multilingual NLP pipeline for Bengali and Romanized Bengali. Demonstrated how anonymous crowd-sourcing can help understand socio-political dynamics in the Global South.

Collaborated with the uReporter team at BUET. The project has received coverage from international media including BBC Bengali, Saudi Gazette, and Global Voices.

PythonBERTXLM-RoBERTaTransformersBengali NLPFlask

Patient-Centric Blockchain Framework for EHR Management

June 2022 - May 2023

Undergraduate Thesis. Designed a blockchain framework with encrypted off-chain IPFS storage and on-chain Ethereum access control under Professor ASM Latiful Hoque (BUET). Implemented ERC-721 based patient records with AES-GCM encryption, ECIES key wrapping, and EIP-712 signed permissions. Evaluated performance and security on 10,000 synthetic patient records to assess scalability and privacy.

Supervised by Professor ASM Latiful Hoque from BUET (my undergraduate thesis supervisor), who specializes in data warehousing, data mining, big data analytics, and database technologies.

SolidityEthereumIPFSReactWeb3.jsAES-GCMECIES