Research

Research projects and papers across software engineering, security, blockchain, and applied machine learning.

Research Agenda

Trustworthy AI for software engineering. How can AI systems used by developers expose their assumptions, behave more predictably, and remain understandable enough to support real engineering workflows?

Security and reliability of LLM-based systems. I study failure modes created when agentic systems, model hubs, and tool-integrated LLM applications meet unsafe defaults or weak contracts.

Empirical software engineering. Repository mining, developer discussion analysis, and deployment-aware evidence gathering are central to how I frame and validate research questions.

Projects and Papers

An Empirical Study on Remote Code Execution in ML Model Hosting Ecosystems

An Empirical Study on Remote Code Execution in ML Model Hosting Ecosystems

Cross-platform study of ~45,000 repositories across five ML platforms (Hugging Face, ModelScope, OpenCSG, OpenMMLab, PyTorch Hub) with co-authors Mohammad Latif Siddiq and Joanna C. Santos. Detected security issues using static analyzers (Bandit, CodeQL, Semgrep) and YARA malware signatures: found CWE-502 (unsafe deserialization) in 74.54% and CWE-95 (eval injection) in 15.02% of affected repositories; 10.41% of Hugging Face repos contain security smells. Analyzed 600+ developer discussions to build a taxonomy of security misconceptions; found 6.6% SafeTensors adoption and heavy trust_remote_code usage. Submitted to TOSEM 2026.

PythonBanditCodeQLSemgrepYARACWE Analysis

The Choice Can Be the Attack: Auditing Aligned Backdoors in LLM Agents

Agentic LLM systems are vulnerable to aligned backdoors: triggered behaviors that still satisfy the user's instruction but systematically steer which acceptable option the agent selects. This ongoing work introduces CCA, an endpoint-black-box audit with environment instrumentation, constraint-preserving randomized counterfactual environments, and pooled discrete-choice estimation over eligible options. The current evaluation covers synthetic WebShop stress tests and a real 3B WebShop endpoint case study to measure trigger-dependent steering, identify suspect-only effects, and clarify when counterfactual randomization alone is sufficient versus when feature-controlled choice modeling is necessary.

CCACCIAWebShopCounterfactual AuditingDiscrete Choice Modeling

VeriSchema: Multi-Agent Framework for Generating Relational DB Schema & ERD

Extended SchemaAgent with Dr. Sukarna Barua (BUET) using LangGraph StateGraph architecture with conditional routing and a three-tier auto-repair system. Designed a six-stage pipeline with specialized agents for entity extraction, relationship mining, and normalization with Z3 formal verification. Implemented component-level retry logic with violation analysis, reducing redundant LLM calls by 80%.

PythonLangGraphStateGraphZ3 SolverSQLAlchemyText2Schema

Sentiment Analysis of Anonymous Crisis Reports in Bangladesh

Developed uReporter, Bangladesh's first anonymous reporting system during the 2024 national crisis. Analyzed 124 crowd-sourced reports using six transformer models with a multilingual NLP pipeline for Bengali and Romanized Bengali. Demonstrated how anonymous crowd-sourcing can help understand socio-political dynamics in the Global South.

PythonBERTXLM-RoBERTaTransformersBengali NLPFlask

Patient-Centric Blockchain Framework for EHR Management

Patient-Centric Blockchain Framework for EHR Management

Undergraduate Thesis. Designed a blockchain framework with encrypted off-chain IPFS storage and on-chain Ethereum access control under Professor ASM Latiful Hoque (BUET). Implemented ERC-721 based patient records with AES-GCM encryption, ECIES key wrapping, and EIP-712 signed permissions. Evaluated performance and security on 10,000 synthetic patient records to assess scalability and privacy.

SolidityEthereumIPFSReactWeb3.jsAES-GCMECIES