Tag Archives: completeness

Opening the black box. Learn about explainable AI tools

This is an excerpt from one of my latest articles published through Technological Forecasting and Social Change. Its content has been adapted for this blog.

Suggested citation: Camilleri, M.A. (2026). Opening the black box: Operational principles, tools and frameworks that advance explainable artificial intelligence (XAI) models, Technological Forecasting and Social Changehttps://doi.org/10.1016/j.techfore.2026.124710

Expainable artificial intelligence (XAI) has emerged as a critical area of study of AI research. This may be due to the increased stakeholders who are exerting their pressure on practitioners to become as accountable as possible, during the development and maintenance of their AI models. XAI concepts span from foundational notions in artificial intelligence and machine learning, to specialized constructs such as interpretability, transparency, and human-centered design. Additionally, XAI research is informed by insights from human-computer interaction and decision science, which ultimately emphasize user engagement and trust. Appendix A presents concise definitions of the core terminology underpinning XAI. It offers a conceptual grounding for scholars, practitioners and regulatory stakeholders seeking to enhance their knowledge and understanding of this evolving field. The findings from this review exercise indicated that they are genuinely concerned about the complexity and opacity of modern AI systems, as they are aware that AI technologies are being integrated into critical decision-making environments, ranging from healthcare or medical systems to finance, legal and/or public administration.

This systematic review confirms that stakeholders are expecting practitioners to develop explainable AI systems, that are not only accurate, but also interpretable, transparent and trustworthy. The findings suggest that XAI seeks to bridge the gap between technical performance and human understanding, by providing meaningful explanations for outputs generated by machine learning models, especially for those that function as “black boxes”. Several commentators indicated that XAI aims to foster user trust, supports accountability and ensures ethical and regulatory compliance.

The findings from this study confirm that the growing use of ML in sensitive areas like healthcare, finance, education and employment has sparked the stakeholders’ concerns over the opacity aspects of black boxes and on the possible liabilities of practitioners who research, develop and maintain AI-driven solutions. Generally, XAI practices can be divided into two broad categories: (i) inherently interpretable models, such as decision trees and linear regressions, and (ii) post-hoc interpretability methods for more complex black-box models like deep neural networks. The latter one generates both local and global explanations through feature attribution, perturbation analysis and visualizations.

For the time being, several complex ML models operate as black boxes, as they hinder the ability of their users and regulators to understand, contest or improve their outputs. XAI addresses these contentious issues by providing tools, methodologies and frameworks that are intended to enhance the interpretability of AI systems through substantive compliance mechanisms, ethical standards and normative guidelines.

Indeed, this research indicates that inherently interpretable models, counterfactual reasoning, ongoing fairness audits, human-in-the-loop (HITL) approaches as well as post-hoc explanations may contribute to improving the transparency and trustworthiness of ML algorithms (Mosqueira-Rey et al., 2023Panigutti et al., 2021). Whilst counterfactual explanations enable practitioners to explore “what-if” scenarios and offer them actionable insights that can improve their model’s comprehensibility for decision subjects; regular fairness audits could analyze model outcomes across demographic groups and may possibly identify potential biases (Holzinger, 2021). In addition, human-in-the-loop (HITL) approaches and post-hoc explanations (as well as retrospective interpretability techniques) can enhance contextual accuracy and accountability.

Practitioners may avail themselves of a range of tools and libraries to implement XAI techniques including open-source options like SHAP, LIME, ELI5 and Alibi, among others, that offer model-agnostic interpretability. Moreover, they may use IBM’s AIX360 and Microsoft’s InterpretML to support them in the explainability of their datasets and machine learning models throughout their AI application lifecycle. Both resources include a diverse set of algorithms, code, guides, tutorials and demos that can help users better understand and explain AI models. Furthermore, they may utilize Google’s What-If Tool (WIT), an interactive visual interface designed to help data scientists, machine learning practitioners and AI ethicists explore, analyze and explain ML models. Such tools enable non-experts, researchers and practitioners to assess model fairness, evaluate performance, deploy responsible systems and make alternative predictions.

Currently, there are a number of XAI frameworks and evaluation standards that can institutionalize transparency. For example, these include initiatives like the United States’ Government’s Defense Advanced Research Projects Agency (DARPA) XAI Program whose AI systems’ decision-making processes can be understood and trusted by humans. DARPA funded a variety of interdisciplinary teams included academia, industry and national labs that explored human-AI interactions (interfaces and feedback loops to improve user trust and usability), interpretable ML models, post-hoc visual and symbolic explainable methods for black-box models like deep neural networks, cognitive psychology integration that design explanations that align with how humans reason and make sense of information. Similarly, Microsoft’s Prediction-Decision-Recommendation (PDR) framework offers an operational and predictive model for building trustworthy AI recommendation systems that are aligned with human values. PDR was introduced as part of Microsoft’s efforts in responsible AI, particularly in enterprise and applied settings.

Both DARPA’s XAI Program and Microsoft’s PDR (Prediction-Decision-Recommendation) framework can incorporate both quantitative and qualitative assessments in their XAI evaluation. For example, DARPA-funded XAI projects quantitative assessment involves an examination of the models’ (i) Fidelity of their logic, (ii) Completeness, and (iii) Simplicity. They use performance metrics to evaluate task accuracy, latency or robustness under explanation constraints. The qualitative assessment emphasizes human-centered evaluation as it investigates perceptions about task effectiveness as well as user trustworthiness, user expectations, and user satisfaction levels with AI models.

Furthermore, standards such as IEEE P7003 Standard for Algorithmic Bias Consideration that is part of the Institute of Electrical and Electronics Engineers (IEEE) P7000 series of standards for Ethically Aligned Design in autonomous and intelligent systems (AIS) is aimed at providing technical guidance for identifying, documenting and mitigating algorithmic bias in AI systems throughout their design, development and deployment. Other tools like Fairlearn and Testing with Concept Activation Vectors (TCAV) (a post-hoc explainability method) help assess model behavior against abstract social concepts. They intend to assist developers and data scientists in assessing and improving the fairness of ML technologies including ensemble methods and deep neural networks.

XAI challenges and methodological limitations

While deep learning infrastructures like black-box AI models often exhibit remarkable predictive performance, they suffer from a lack of interpretability, as it is difficult to understand the internal logic or rationale behind their decision-making processes and predictions. The lack of transparency and trustworthiness of black box models undermines the stakeholders’ efforts to audit or assign accountability for model-driven actions. It may prove hard to ensure that AI developers and systems administrators are held answerable for model-driven actions, when and if errors or harm occur, especially in certain industry sectors like healthcare diagnostics, financial aspects like credit-scoring, and/or welfare allocations, among others. In these contexts, their ML technologies’ decisions may have profound and potentially irreversible effects on the individuals’ lives. Without insight into how decisions are made, affected parties have limited avenues for recourse to action or to equitable remedies, thereby undermining procedural fairness, that could lead to the erosion of public trust in algorithmic systems.

ML systems are typically trained on datasets that may embed historical or structural biases, thereby posing risks of perpetuating inequitable outcomes in automated decision-making. This may result in a situation where a decision-making process or algorithm disproportionately and negatively affects vulnerable or underrepresented groups in society, even if there is no explicit intent to discriminate against them, particularly if their data may be under-sampled or misrepresented in training sets.

AI models’ predictive accuracy and fairness may degrade over time due to effects of data drift on the performance of machine learning models. Shifts in the underlying data distribution or changes in real-world contexts (e.g. political, economic, social, technological and/or ethical issues) can cause the models to produce less reliable or biased outcomes, thereby necessitating continuous monitoring, periodic retraining and fairness audits to ensure sustained performance and regulatory compliance. Such changes may either occur gradually (concept drift) or abruptly (covariate shift). Consequentially, models trained on historical data may no longer generalize well. Their ML systems may yield suboptimal outcomes that can impact on the livelihoods of individuals and specific groups in society. For instance, a financial institution that relies on a credit-scoring model that was trained before major economic fluctuations (e.g. inflation, recession and/or rises in taxes, duties and tariffs) could penalize individuals from economically disrupted regions without accounting for recent changes in income dynamics. Alternatively, low-income or minority borrowers including single mothers, immigrants or disabled persons (among other vulnerable groups in society) could be denied fair access to bank credit, as AI systems may fail to reflect new socioeconomic changes in the labor market. As a result, AI systems risk perpetuating or exacerbating existing inequalities without adequate and sufficient mechanisms that ensure that AI systems are fair and up to date with the latest developments.

It is imperative that AI practitioners conduct fairness auditing on a regular basis. They need to evaluate and appraise algorithmic outputs across various demographic groups to identify and correct any disparate impacts. Such audits must go beyond one-time assessments and need to become an integral part of the AI lifecycle, in order to ensure that models evolve in ways that uphold ethical standards and regulatory requirements. When combined, explainability, monitoring and fairness auditing can establish a trustworthy AI that is clearly aligned with societal expectations of justice, equity and accountability.

Indeed, XAI techniques can help address ethical and performance-related concerns by providing transparency into model behavior, as stakeholders including regulatory bodies, AI developers, auditors and affected individuals have a legitimate right to understand how specific outcomes are generated. Practitioners who maintain AI systems ought to regularly monitor the models to identify early warning signs of degradation. They are expected to recalibrate them before harmful consequences arise.

AI practitioners are encouraged to advance interpretable and efficient models that are responsive to the diverse needs of end users including data scientists, domain experts and end-users. Their human-centred evaluation of XAI methods may usually focus on the development of comprehensible explanations. Hence, they refer to common metrics including: (i) sparsity (meaning that explanations highlight only the most important factors); (ii) explanation complexity (referring to how simple or complicated an explanation is); (iii) simulatability (which is the extent to which practitioners can anticipate the model’s decision after seeing the explanation); and (iv) coverage, that indicates how many cases an explanation applies to). In addition, user-centered outcomes such as trust in the system, improved task performance and the time required to understand the explanation are also considered by AI administrators.

More importantly, their systems ought to be legally and ethically justifiable as well as socially defensible. They are required to comply with relevant regulatory frameworks governing the deployment of their models and to meet the transparency and auditability standards set by specific jurisdictions, such as the European Union’s General Data Protection Regulation (GDPR) and its AI Act (2024), among others. Table 1 features a comparison matrix that provides a non-exhaustive list of XAI tools. It outlines their strengths, weaknesses / limitations and identifies potential domains in which these tools can be applied.

Table 1. A comparison matrix of XAI tools that specifies their key metrics, strengths, weaknesses/limitations and domain fit.

XAI toolTypeCore metricsSupporting / human-centered metricsStrengthsWeaknesses / limitationsPossible domains
Inherently interpretable models (decision trees, linear/logistic regression, rule-based models)Model classSparsity

Explanation length / complexity

Rule length

Simulatability

Time-to-understanding
Coverage (model-wide)

User trust score
– Transparent, easy to explain
– Supports regulatory compliance
– High interpretability without post-hoc tools
– Limited predictive power for complex patterns
– May oversimplify high-dimensional data
– Finance (credit scoring)
– Public administration
– Healthcare triage
– Education and HR screening
Post-hoc interpretability (general category)Methodological classExplanation length / complexity

Sparsity

Coverage

Visualization clarity
User trust score

Time-to-understanding
– Allows explanation of black-box models
– Generates local and global explanations
– Broad domain applicability
– Risk of misleading explanations
– Does not make the model itself interpretable
– May be computationally intensive
– Deep learning applications
– High-stakes decisions needing model transparency
Counterfactual explanationsMethodSparsity

Explanation length

Time-to-understanding

User trust score
Coverage

Task performance improvement
– Intuitive “what-if” reasoning
– Actionable for decision subjects
– Enhances user agency and contestability
– May propose unrealistic or infeasible scenarios
– Sensitive to feature correlations
– Finance (loan decisions)
– Hiring & admissions
– Healthcare prognosis
Fairness audits (ongoing)Governance mechanismCoverage

Visualization clarity
User trust score
Task performance improvement
– Detects structural biases
– Essential for compliance (e.g., EU AI Act, GDPR)
– Supports trust and equity
– Requires access to sensitive demographic data
– Needs continuous monitoring
– May uncover issues that require costly remediation
– Public sector decision systems
– Finance (credit scoring)
– Policing algorithms
– Welfare allocation
Human-in-the-loop (HITL)Operational approachUser trust score

Task performance improvement

Time-to-understanding
Visualization clarity

Explanation length
– Enhances accountability
– Reduces automation bias
– Supports hybrid decision-making
– Slows automation
– Human reviewers require training
– May introduce human bias
– Healthcare diagnosis
– Legal assessments
– Safety-critical systems
SHapley Additive exPlanations (SHAP)Post-hoc; model-agnosticSparsity

Explanation length / complexity

Coverage

Visualization clarity
Simulatability

Time-to-understanding
– Theoretically grounded (game theory)
– Local and global explanations
– Widely adopted, rich visualization tools
– High computational cost for large models
– Can overwhelm non-experts with detail
– Tabular/structured data
– Finance, insurance, healthcare
Local interpretable model-agnostic explanations (LIME)Post-hoc; model-agnosticSparsity

Explanation length

Coverage
Simulatability

Time-to-understanding
– Simple, intuitive local explanations
– Lightweight and fast
– Works across model types
– Instability of explanations
– Locality sampling may be misleading
– Real-time decisions
– Early-stage diagnostics of ML models
ELI5Model-agnostic toolkitSparsity

Explanation length

Visualization clarity

Simulatability
– Easy-to-use API
– Supports debugging and visualization
– Transparent feature and weight analysis
– Less comprehensive than SHAP/LIME
– Limited deep-learning support
– Education, prototyping, model debugging
AlibiModel-agnostic librarySparsity

Explanation length

Coverage

Rule length (Anchors)
Time-to-understanding

User trust score
– Covers counterfactual, anchors, adversarial detection
– Strong support for fairness evaluation
– Requires technical expertise
– Less widely documented
– Enterprise ML pipelines
– Sensitive domains requiring fairness
IBM AIX360Comprehensive XAI frameworkExplanation length / complexity

Rule length

Simulatability

Coverage
User trust score– Extensive algorithms + documentation
– Open-source, enterprise-ready
– Supports datasets + model explainability
– Large and complex ecosystem
– Potential steep learning curve
– Regulated industries (finance, healthcare)
– Enterprises needing governance support
Microsoft InterpretMLComprehensive XAI frameworkSimulatability

Explanation length

Visualization clarity

Coverage
Time-to-understanding– Supports interpretable models (EBMs)
– Unified dashboard for explanations
– Strong community support
– Less tailored for deep learning
– Integration mainly in Python ecosystem
– Healthcare, HR, education
– Systems needing interpretable boosting models
Google what-if tool (WIT)Visual interfaceVisualization clarity

Coverage
Task performance improvement

User trust score
– No-code/low-code exploration
– Intuitive fairness and performance evaluation
– Highly accessible
– Limited support for large-scale or custom DL architectures
– Requires TensorBoard integration
– Ethical AI reviews
– Education & training
– Exploratory fairness analysis
DARPA XAI programResearch & evaluation frameworkUser trust score

Task performance improvement

Time-to-understanding
Explanation satisfaction

Mental model accuracy
– Integrates cognitive psychology and human reasoning
– Supports interpretable ML + post-hoc methods
– Strong evaluation criteria (Fidelity, Completeness, Simplicity)
– Research-oriented; less plug-and-play
– High complexity, diverse methodologies
– Defense, critical infrastructures
– Human-AI collaboration research
Microsoft prediction–decision–recommendation (PDR) frameworkAI governance & workflow frameworkTask performance improvement

Time-to-understanding

User trust score
Visualization clarity– Aligns predictions with human values
– Designed for enterprise-scale recommender systems
– Supports qualitative + quantitative metrics
– Tailored to recommendation ecosystems
– Limited uptake outside Microsoft platforms
– Recommender systems (retail, media)
– Decision-support platforms
IEEE P7003 algorithmic bias standardEthical & technical standardCoverage

Documentation completeness
User trust score (organizational)– Provides actionable framework for bias mitigation
– Widely recognized ethical standard
– Supports documentation + governance
– Not a technical tool—needs developer interpretation
– Compliance may require significant restructuring
– Public sector AI
– HR and recruitment systems
– Safety-critical decision systems
FairlearnFairness assessment & mitigation libraryCoverage

Visualization clarity
User trust score– Provides disparity metrics
– Offers mitigation algorithms
– Integrates with common ML pipelines
– Requires demographic data
– Does not explain models—focuses on fairness only
– Credit scoring, insurance, hiring
– Any domain requiring fairness constraints
Testing with concept activation vectors (TCAV)
(Implemented in Captum)
Concept-based explainabilitySimulatability (concept-level)

Explanation length

Sparsity (concept selection)
Time-to-understanding

User trust score
– Explains models using human-understandable concepts
– Helps detect stereotype-driven patterns
– Requires well-defined concepts
– Limited to deep models with embeddings
– Computer vision
– Medical imaging
– NLP conceptual bias detection
Model monitoring for drift (concept drift, covariate shift)Governance & operational processCoverage

Visualization clarity
Task performance improvement (operational)

Time-to-understanding (alerts)
– Essential for long-term reliability
– Supports proactive correction
– Aligns with regulatory expectations
– Requires continuous data pipelines
– Resource-intensive in large-scale systems
– Finance (risk models)
– Healthcare (diagnostics)
– Dynamic environments (e-commerce)

A conceptual framework for XAI

Explainable artificial intelligence (XAI) has become a very important area of inquiry for the promotion of responsible AI governance. Regulators, organizations and end-users are increasingly demanding that ML systems are transparent, accountable and fair. Beyond technical performance, these technologies are now expected to protect users’ privacy, safety and security, while remaining inclusive and accessible for the benefit of diverse socio-demographic groups in society, regardless of their age, gender, ability or ethnicity. As a result, XAI is no longer a peripheral consideration; rather, it has become a normative requirement as it advances ethical, trustworthy, and socially legitimate AI systems.

Accordingly, the objectives of XAI, whether explainable ML designs are driven by regulatory compliance, operational transparency policies or for trust building purposes, ought to be embedded across the entire AI lifecycle. The explainability of AI plays a critical role during the research and development phase, from data collection and preprocessing to model training, deployment, monitoring and maintenance. Notwithstanding, AI systems are better positioned to achieve accountability, reliability, and ethical alignment, when explainability is treated as an integral component of process innovation rather than a retrospective add-on,

However, there are instances during model development, where practitioners may have to balance trade-offs between the predictive performance of AI systems and their interpretability. Hence, evaluation criteria need to extend beyond accuracy and efficiency. They should consider the extent to which models generate explanations that are meaningful, accessible and appropriate for different user groups. Therefore, data-related practices are particularly influential at this stage. Transparent data provenance, systematic bias auditing as well as input features that are presented in a way that are easily understandable manner to humans (i.e. human-readable feature engineering) can substantially enhance model interpretability and user trust. In this respect, inherently interpretable models, such as decision trees and generalized additive models (GAMs) offer direct insights into decision logic, in contrast to complex black-box models that rely on post-hoc explanation techniques.

XAI systems require ongoing governance and maintenance once they have been deployed. This includes version control, retraining protocols that are guided by explainability objectives, as well as user feedback mechanisms that support continuous learning and improvement outcomes. The extant literature clearly distinguishes between ante-hoc and post-hoc approaches to explainability. Inherently interpretable models such as linear regression, rule-based systems, decision trees, GAMs and Bayesian models are transparent by design. Such models enable users to directly understand their modus operandi, operational logic and decision-making processes. By contrast, black-box models, including deep neural networks, necessitate post-hoc interpretability methods. Techniques such as SHAP and LIME provide feature-attribution and local explanations, while counterfactual reasoning, fairness audits and human-in-the-loop (HITL) approaches are increasingly employed to enhance transparency, accountability and equity in high-stakes contexts.

This review confirms that SHAP offers model-agnostic explanations by quantifying the contribution of individual features to model outputs, whereas LIME explains specific predictions by locally approximating complex models with interpretable surrogates. In addition, other open-source tools (e.g. ELI5, Alibi) and commercial platforms (e.g. IBM AIX360, Microsoft InterpretML, Google’s What-If Tool) have expanded the XAI ecosystem. Methodological approaches such as counterfactual explanations further support understanding by exploring “what-if” scenarios, while ongoing fairness audits evaluate model behaviors across demographic groups, to identify and mitigate bias. Human-in-the-loop (HITL) approaches complement these techniques by embedding human oversight throughout the AI lifecycle, thereby strengthening contextual accuracy and accountability.

Additionally, several institutions initiatives have led to the formalization of XAI assessment and evaluation standards. For instance, the DARPA XAI Program features quantitative metrics (such as fidelity, completeness, simplicity, robustness and performance), as well as qualitative ones (including human-centered evaluations that examine perceived usefulness, trust, satisfaction and task effectiveness). Yet, despite these advances, many existing XAI approaches remain technique-specific, as they exclusively focus on post-hoc explanations, fairness audits or concept-based methods, often resulting in fragmented evaluation practices.

Against this backdrop, this research puts forward an easy-to-understand, user-centric XAI framework for black-box models. This conceptual framework raises awareness on human-centered evaluation metrics and integrates them as a unifying analytical lens across the AI lifecycle (rather than assessing explainability in isolation). It explicitly links data practices, model design choices and explanation interfaces to measurable user outcomes, as illustrated in Fig. 1.

Fig. 2

Fig. 1. A user-centric explainable artificial intelligence (XAI) framework for black box models.

Firstly, this user-centric XAI framework emphasizes transparent, inclusive and secure training data as a foundation for explainability and trust. While governance-oriented tools and standards (e.g. Fairlearn, IEEE P7003) primarily support compliance and bias detection, this model suggests that inclusiveness, transparency, safety and security metrics during the training phase by ensure that the models are developed in a manner that is fair, interpretable, robust and trustworthy.

The inclusiveness metrics help detect and mitigate biases in training data and model behavior, thereby promoting fairness. They ensure objective and consistent performance of AI systems across diverse user groups. Hence, they lead to explanations that are meaningful and relevant to all stakeholders. The transparency metrics are meant to evaluate how clearly the model’s internal decision-making processes can be understood by their users. During training, these metrics guide the development of models that produce interpretable and accessible explanations, in order to improve user comprehension and trust.

The safety metrics monitor the model’s behavior under various conditions, including during unusual, rare or unexpected situations (a.k.a. edge cases) that challenge the system’s robustness, to prevent harmful or unintended outcomes. The integration of safety considerations in training enhance the systems’ reliability aspects, as they ensure that explanations reflect typical contexts as well as exceptional (or even risky) scenarios. Similarly, the security metrics assess vulnerabilities to adversarial attacks or data manipulation. The inclusion of security metrics in training, models become more robust, and their explanations would enhance confidence levels and reduce potential risks, thereby fostering greater user assurance.

Secondly, the framework incorporates an accountable ante-hoc model layer grounded in inherently interpretable models. Clearly, it is consistent with decision trees and rule-based systems, as this layer prioritizes sparsity, simulatability and explanation conciseness. It facilitates quick understanding and mental simulation of decisions. In doing so, it and accountability beyond what post-hoc methods alone can achieve. The accountability metrics reinforces predictability and can strengthen the trustworthiness and governance of AI systems by: (i) evaluating whether the model’s decision logic can be audited and traced, thereby ensuring each prediction can be explained and justified to stakeholders; (ii) ensuring compliance with ethical and legal standards; (iii) assessing stakeholder understanding and acceptance; and, (iv) facilitating error and bias detection.

There is scope for practitioners to incorporate accountability metrics, if they want their inherently interpretable models to become more auditable, responsible and trustworthy. At the same time, they can enhance the value of ante-hoc explainability by adopting privacy metrics that safeguard sensitive information throughout the interpretability process. Though inherently interpretable models are transparent by design, the privacy metrics would ensure that this transparency does not compromise sensitive data by: (i) measuring risk of sensitive (personal) information exposure; (ii) enforcing data minimization principles to ensure that the model uses only the indispensable data to reduce privacy risks; (iii) Balancing interpretability and data protection (e.g. through anonymization techniques) to maintain explainability while respecting privacy constraints; and, (iv) Supporting compliance with data protection regulations (E.g. by complying with GDPR or other relevant privacy laws).

Thirdly, this framework integrates fair and robust post-hoc explanations with interpretable user interfaces. While tools such as SHAP, LIME, Alibi, and TCAV are commonly evaluated using metrics such as sparsity, complexity, and visualization clarity, this framework extends their application by explicitly prioritizing trust calibration and task performance improvement, particularly when AI systems are employed for decision support in human-in-the-loop (HITL) settings. This emphasis aligns with human-centered evaluation principles advocated in initiatives such as DARPA XAI and Microsoft’s Prediction–Decision–Recommendation (PDR) framework.

Post-hoc explanation methods are applied after a black-box model has been trained and has generated predictions (e.g., SHAP, LIME, and counterfactual explanations). While fairness metrics (e.g., demographic parity, equalized odds, and disparate impact) quantify whether the model’s decisions are biased or discriminatory across different demographic groups, robustness metrics assess stability under perturbations, including both predictive robustness (i.e., stability of model outputs) and explanation robustness (i.e., consistency of explanations under slight input variations). In this context, robustness may refer both to the stability of the model’s predictions and to the consistency of the generated explanations under slight variations in the input data.

The fairness and robustness metrics build user trust and enhance XAI in post-hoc settings as they reveal biases, validate explanation reliability (as explanations are not expected to significantly change if they are meeting and exceeding their robust performance metrics), guide explanation refinement (by monitoring fairness and robustness metrics, developers can fine tune post-hoc methods to produce explanations that are accurate and fairly representative of the model’s decision logic), improve interface transparency, and support regulatory compliance as well as ethical standards that foster increased transparency and accountability of AI systems.

Overall, this conceptual framework offers a coherent, user-oriented benchmark for assessing explainability across data, models, and interfaces, thereby extending existing XAI frameworks developed by technology firms and standards bodies. It implies that ante-hoc (inherently interpretable) models can inform and calibrate post-hoc explanation methods and their associated interfaces. Ante-hoc models may serve as interpretable baselines against which the fidelity and consistency of post-hoc explanations from black-box models are assessed. Therefore, the integration of ante-hoc and black-box models can support the development of more trustworthy systems, particularly by enabling interpretable interfaces to be trained or tested against transparent model logic before deployment in more complex settings. Accordingly, this framework positions ante-hoc models as an intermediary layer between training data and post-hoc explanations. This enables explanation methods and interfaces to be validated against interpretable model logic before being applied to complex black-box systems.

Conclusions

This research synthesizes key contributions in XAI to underline its essential role in promoting responsible governance in the research, development and maintenance of machine learning systems. It discusses about XAI tools, describes their metrics, identifies their strengths as well as their weaknesses / limitations. Moreover, it reports their possible domains. It addresses ethical concerns related to black-box models. Hence, it emphasizes the need for documentation practices that establish the normative and technical baselines for accountability, upon which performance tracking and continuous monitoring are built. One has to consider that robust drift detection and fairness auditing are dependent on these baselines and operate iteratively throughout deployment in order to maintain reliable, transparent and equitable XAI systems.

This contribution’s user-centric XAI framework with its interpretable interfaces that bridge technical innovation and stakeholder ethics are intended to foster responsible AI and ensure that ML models remain interpretable, trustworthy and compliant with ethical and legal standards like GDPR and the EU AI Act, throughout their lifecycle.

Theoretical implications

This research adds value to the extant academic literature focused on XAI. It clarifies key notions and explains the meanings of different terms related to model interpretability, data drift, concept drift and fairness. Moreover, it clarifies how practitioners can build and maintain trustworthy AI systems. It clearly indicates that interpretability is a crucial mechanism for fostering user trust, not just through technical explanations, but also by adhering to clear governance structures and established communication channels. This reasoning aligns with emerging theories related to human-computer interaction and to technology adoption frameworks drawn from social sciences literature, that highlight the importance of transparency and accountability in building user confidence in complex systems.

This research builds on the foundations of established theoretical underpinnings by integrating explainable AI within broader models of technology acceptance, trust and socio-technical dynamics. For example, some elements of this contribution’s conceptual framework are related to the Technology Acceptance Model (TAM) key constructs including to perceived usefulness and perceived ease of use, as these factors clearly align with XAI’s goals of enhancing transparency and interpretability of ML models to foster user adoption. The framework also draws on Trust in Automation theories, particularly where they highlight the rationale for the development of explainable AI systems, to enhance user trust, and to prevent their misuse or disuse. In a similar vein, some commentators argue that XAI literature is grounded in the Socio-Technical Systems (STS) theory. They contend that this theory provides a holistic lens by emphasizing the interplay between technological artifacts and social contexts, thereby reinforcing the need for inclusive, ethical and transparent AI design. Other colleagues maintain that XAI literature is rooted in Responsible Research and Innovation (RRI) frameworks as they raise awareness about anticipatory governance, stakeholder engagement and ethical reflexivity, all of which are operationalized through user-centric and transparent approaches. Together, these models serve as a theoretical basis for this study’s conceptual framework, as they bridge technical, human, ethical and regulatory dimensions to support trustworthy AI ecosystems.

This timely contribution promotes transparent and fair forms of AI knowledge generation, as the reasoning behind ML decisions and predictions ought to be continuously scrutinized and validated. It puts forward a comprehensive framework that synthesizes key dimensions of XAI into a cohesive model. It reports how, why, where and when explainability is evolving within generative AI systems. Generally, by linking design choices to measurable user outcomes across the AI lifecycle. Unlike prior models that are narrowly focused on interpretability techniques, this framework integrates lifecycle governance with human-centered evaluation metrics. It supports the practical implementation of responsible AI principles. By doing so, it advances theoretical understanding while offering actionable guidance for developers, policymakers and stakeholders committed to trustworthy AI.

In sum, it provides a comprehensive explanation of XAI systems for the benefit of their users including AI developers, data scientists, domain experts, business stakeholders, regulators and auditors, end users as well as academic researchers, among others. It enables them to better understand the modus operandi of deep neural networks and complex learning models. It promotes post-hoc explanation techniques and methods that provide explanations for the decisions made by machine learning models after they have been trained. This is particularly important for opaque black box models, ensemble methods or support vector machines, which offer high predictive accuracy but are not clear enough on how they arrive at specific outputs. It identifies XAI tools that can help practitioners assess the validity and reliability of ML models.

This research emphasizes the dynamic challenges of AI deployment. It makes reference to model drift and to data distribution shifts, as they can have a negative impact on the reliability and fairness of explanations over time. This perspective moves beyond static evaluations of XAI. It highlights the need for continuous monitoring and adaptation of AI models. It considers the needs and challenges faced not only by AI developers but also by system administrators and non-expert users. It recognizes that effective XAI must cater to diverse levels of technical understanding and operational requirements.

This article also offers novel, integrated and up-to-date syntheses of both academic research as well as practitioner-oriented tools and frameworks. It bridges the gap between theoretical advancements and their real-world applications across the entire AI lifecycle. It refers to technical aspects including XAI specific tools and techniques, data monitoring, fairness assurance and stakeholder engagement, thereby providing a timely and holistic view of the current XAI landscape.

Practical iplications

This research offers guidance for a wide range of stakeholders involved in the development, deployment and governance of AI systems. It provides actionable insights for developers and system administrators for implementing XAI. It describes specific tools (e.g. SHAP, LIME, ELI5) and platforms that offer concrete entry points for integrating interpretability into their workflows. This article highlights a comparison matrix of leading XAI tools. It outlines their key metrics, strengths, limitations and domain suitability to support informed managerial decision-making.

Additionally, it proposes a user-centric XAI framework tailored for black-box models. This framework offers practical guidance on aligning explainability techniques with organizational capabilities, stakeholder expectations, and contextual constraints. The novel framework provides a tangible structure that embeds responsible AI practices from the initial design phase through ongoing monitoring and updates. It is intended to support practitioners in the development of more robust, reliable and trustworthy AI applications. Its recommendations for the integration of interpretability, regular bias monitoring and fairness auditing (through standardized reporting frameworks, such as model cards and data sheets, combined with automated drift detection tools) can inform policy makers as well as practitioners to advance XAI systems. Hence, the development of internal policies, quasi-substantive rules and workflows are intended to advance responsible AI development and deployment. This may ultimately lead to virtuous outcomes that are intended to foster a culture of ethical AI innovation that enhances public trust and understanding of XAI systems, leading to increased user adoption in different domains.

Limitations and future research directions

Despite its contributions, this study also has its inherent limitations. The systematic review involved the analysis of recent, high-impact academic publications focused on “explainable artificial intelligence” or “explainable AI” or “XAI”. This selection approach, while ensuring relevance and quality, introduces the risk of citation bias, where frequently cited or well-known studies receive disproportionate attention, potentially overshadowing emerging, less-cited, or interdisciplinary work. Consequently, some innovative advancements or niche applications in XAI may not have been fully captured. Additionally, the quickly evolving nature of the field means new developments could have emerged after the review period. Furthermore, the evaluation of XAI tools and frameworks relied on publicly available information and academic studies, which often lack empirical depth or comprehensive real-world validation, thereby limiting the scope for fully assessing practical performance and impact of interpretable models.

Future research can address these limitations and explore plausible areas of study related to XAI. For example, there is scope for conducting longitudinal studies to examine the long-term impact of XAI adoption on system performance, user trust and on the fairness of AI outputs in real-world scenarios. Moreover, other research is required to develop standardized metrics that can evaluate the “quality” of explanations and their effectiveness for different user groups hailing from diverse contexts. Perhaps, prospective researchers can build on this seminal article by promoting the integration of XAI techniques with other responsible AI governance frameworks, such as privacy-secure AI methodologies, robust AI, as well as inclusive, bias-free AI systems in the near future. In addition, they may analyze human-computer interaction aspects of XAI, including how different types of explanations are perceived and understood by diverse stakeholders. It is imperative that developers design effective and interpretable user-centric XAI solutions. Further research in these fields of study will contribute to the continued advancement and to the responsible adoption of explainable AI, as shown in Table 2.

Table 2. Future research directions related to explainable AI (XAI).

Future research areaRationalePotential impact
Context-specific XAITo investigate user backgrounds, domain knowledge of XAI and cultural contexts.Increases usability and accessibility of XAI systems.
Human-computer interaction (HCI) in XAITo explore how different stakeholders perceive, interpret and interact with different types of AI explanations.Improves the design of user-centric and interpretable XAI solutions.
Focus on niche and emerging XAI applicationsTo examine XAI applications in specialized domains (e.g., healthcare, finance, autonomous systems).Expands XAI applicability and domain-specific innovations.
Integration of XAI with responsible AI governance frameworksTo better understand how XAI can be associated with privacy-preserving, robust and bias-free AI methodologies, to advance holistic AI governance frameworks.Promotes trustworthy, fair, and secure XAI deployment.
Empirical validation of XAI tools and frameworksIn-depth and broad empirical studies will shed light on the effectiveness of current XAI tools in real-world applications.Bridges the gap between theoretical models and practical uses of XAI.
Longitudinal studies on XAI adoptionTo analyze the long-term effects of XAI on system performance, user trust and fairness in real-world contexts.Advances knowledge on sustained benefits and risks of XAI use.
Ethical and social implications of XAITo demonstrate the societal impacts, ethical challenges and policy considerations arising from XAI adoption.Guides responsible AI governance deployment that respects societal norms.
Development of standardized evaluation metricsTo create standardized, reliable metrics that can assess XAI quality and its effectiveness across diverse users.Enables consistent benchmarking and comparison of XAI tools.

Appendix A. Key concepts in explainable artificial intelligence research.

XAI key termDescription
AccountabilityAccountability ensures that individuals or organizations can be held responsible for the outcomes and impacts of AI systems, especially in critical applications, where errors or biases could have significant consequences. Individuals and organizations ought to be supported by clear, interpretable explanations that enable oversight and compliance with ethical or regulatory standards.
Artificial Intelligence (AI)AI is a broad field in computer science focused on creating machines that can perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception and decision-making.
Black box / Black-box modelThe black box (model) refers to the opaque nature of various AI models like deep neural networks’ decision-making processes (as users including their developers may not be in a position to understand their modus operandi). While such models can usually achieve high accuracy, they may not be transparent about how their data processing works and on how they produce a specific output.
Counterfactual explanationsCounterfactual explanations are a type of model-agnostic explanation technique used in interpretable and explainable AI (XAI). They describe how an input instance would need to be altered minimally for a machine learning model to yield a different (usually a desired) outcome.
Decision makingDecision-making in the context of AI refers to the process where an AI system uses computational techniques to analyze data, identify patterns, and determine optimal courses of action or choices from a set of alternatives. Unlike human decision-making, which can rely on intuition, experience, or emotion, AI decision-making is data-driven and is based on algorithms.
Decision support systems (DSS)DSS are applications that analyze data and provide valuable insights. They are designed to assist humans in making informed choices. In XAI contexts, DSS integrate explainability into such systems, and transform them from “black boxes” into transparent tools that users can understand and trust, especially in sensitive domains like healthcare, among others.
Deep Learning (DL)DL is a subset of machine learning that focuses on utilizing multilayered (deep) neural networks to learn patterns and representations directly from raw data, to discover intricate features and perform tasks such as classification, regression and representation learning.
Evaluation metricsThe evaluation metrics relate to how AI and XAI systems are assessed and measured. They enable practitioners to objectively evaluate their effectiveness as well as the quality of explanations generated by AI systems. While AI models are typically evaluated on their predictive performance (e.g., in terms of their accuracy), XAI evaluation metrics go beyond this to measure how well explanations help users understand, trust and interact with AI systems. In this case, evaluation metrics may include human-centered metrics (e.g. the users’ trust and satisfaction levels vis-à-vis XAI) as well as quantitative metrics (like measuring the model’s accuracy and comprehensiveness).
Explainable Artificial Intelligence / Explainable AI (XAI)XAI explores methods that provide humans with the ability and intellectual oversight to understand AI outputs. The rationale behind XAI is to increase the interpretability and transparency of AI decisions, actions and predictions. In other words, XAI is intended to answer the “why” and “how” behind AI’ systems, as they often function as blackboxes.
Feature attributionFeature attribution refers to the process of quantifying the contribution or importance of each input feature in a machine learning model’s prediction. It helps explain how much each feature influences a particular decision made by the model. This is especially valuable in interpretable machine learning and explainable AI (XAI), as the understanding why an AI model is advancing a certain prediction is as important as the prediction itself.
Human-AI interaction (HAII) / Human-computer interaction (HCI)HAII and HCI concepts and their variations emphasize the user-centricity aspects of AI. In the context of XAI, both notions suggest that humans are more likely to engage, communicate and collaborate with intuitive and explainable AI interfaces.
Human-in-the-Loop (HITL)HITL approaches refer to systems or processes in AI and ML where human judgement and intervention are actively integrated into the decision-making loop. This involvement can occur at various stages, through data collection, labeling and annotation (often with human input), data preprocessing and curation, model training, model evaluation and validation (with human oversight, especially in high-stakes domains), model deployment as well as during monitoring and maintenance phases. The underlying goal of HITL is to combine the strengths of human intuition, contextual understanding and ethical reasoning with the efficiency and scale of automated systems.
InterpretabilityInterpretability is related to the degree to which a human can understand internal mechanics, in terms of the cause-effect relationships of its decision-making processes of XAI models. This construct suggests that users tend to interact with transparent and trustworthy XAI technologies because they can facilitate the interpretation of their outputs.
Local Interpretable Model-agnostic Explanations (LIME)LIME is a technique that explains individual predictions by approximating a complex model (e.g. in a localized setting), with an interpretable one, such as a linear model. LIME highlights which features could influence a specific decision (by perturbing input data) to observe how predictions change, thereby making black-box models more understandable to users without requiring access to their internal structures.
Machine learningML is a field in AI concerned with the development and study of algorithms that can identify patterns within the data. This allows them to learn from them and to make decisions as well as predictions. Such systems can perform tasks without explicit instructions and could improve their performance over time, as they are exposed to more data.
Mental models / Shared mental modelsShared mental models refer to the mutual understanding and common representation of knowledge between humans and AI agents regarding their respective roles, capabilities and their task at hand. Essentially, they refer to the extent to which there is a shared understanding of how the AI system operates and how it aligns with the overall task.
Neural networks (models)Neural networks are complex machine learning architectures with interconnected “layers” used to learn patterns from data, to perform specific tasks like predictions or classifications. The role of XAI is to provide explanations about how such opaque networks/models work. It clarifies how inputs influence outputs and reveals what the AI model has learned.
Perturbation analysisPerturbation analysis involves systematically altering (perturbing) one or more features of the input data and observing how the model’s output changes.
Post-hoc explanationsPost-hoc explanations are retrospective interpretability techniques that are used to explain the predictions of already trained machine learning models after they have made a decision. Post-hoc explanations are generated after model training and are not part of the original learning process. They aim to interpret how or why a model made a specific decision, without altering the model itself.
SHapley Additive exPlanations (SHAP)SHAP is a method based on Shapley values from cooperative game theory. It is used to explain the output of machine learning models. Basically, SHAP offers consistent and theoretically grounded insights into how individual XAI features contribute to its model decisions, by assigning each feature an “importance value” for a specific prediction. Features with positive SHAP values positively impact the prediction, while those with negative values have a negative impact. The magnitude is a measure of how strong the effect is.
TransparencyTransparency refers to the clarity and understandability of an AI system’s internal workings and decision-making processes. Hence, it allows humans to learn how AI systems process data and make decisions.
TrustTrust refers to the users’ confidence levels they place on XAI systems’ decisions. The individuals’ willingness to avail themselves of XAI technologies relies on their reliability in terms of clarity, consistency and usefulness of their explanations. XAI aims to foster appropriate levels of trust by helping users to better understand how and why AI models make certain decisions, predictions or generate outcomes.
User behavior / User studyUser behavior focuses on how individuals interact with, perceive, and respond to the explanations provided by AI systems. The persons’ cognitive processes, trust, decision-making and reliance on AI systems can influence their engagement levels with XAI technologies.

About the author

Mark Anthony CAMILLERI, Ph.D. (Edinburgh) is an Associate Professor in the Department of Corporate Communication at the University of Malta. He was a Fulbrighter at Northwestern University in Evanston, U.S.A (in 2022). Prof. Camilleri was featured among the world’s top 2% scientists in Elsevier’s “Updated science-wide author databases of standardized citation indicators” (in the past four years). In 2023, he achieved a global rank (ns) of 3854, and was listed 124th among business & management researchers. He serves as a scientific expert and reviewer for various European research councils. He was recognized for his outstanding reviews by Publons and by Emerald (as he received a Literati award in 2022 and 2023). He is an Associate Editor of Business Strategy and the Environment; Sustainable Development and of International Journal of Hospitality Management, among others.

Leave a comment

Filed under AI, artificial intelligence, Explainable AI, Responsible AI