Tag Archives: AI governance

April 30, 2026 · 2:35 pm

Opening the black box. Learn about explainable AI tools

This is an excerpt from one of my latest articles published through Technological Forecasting and Social Change. Its content has been adapted for this blog.

_{Suggested citation: Camilleri, M.A. (2026). Opening the black box: Operational principles, tools and frameworks that advance explainable artificial intelligence (XAI) models, Technological Forecasting and Social Change, https://doi.org/10.1016/j.techfore.2026.124710}

Expainable artificial intelligence (XAI) has emerged as a critical area of study of AI research. This may be due to the increased stakeholders who are exerting their pressure on practitioners to become as accountable as possible, during the development and maintenance of their AI models. XAI concepts span from foundational notions in artificial intelligence and machine learning, to specialized constructs such as interpretability, transparency, and human-centered design. Additionally, XAI research is informed by insights from human-computer interaction and decision science, which ultimately emphasize user engagement and trust. Appendix A presents concise definitions of the core terminology underpinning XAI. It offers a conceptual grounding for scholars, practitioners and regulatory stakeholders seeking to enhance their knowledge and understanding of this evolving field. The findings from this review exercise indicated that they are genuinely concerned about the complexity and opacity of modern AI systems, as they are aware that AI technologies are being integrated into critical decision-making environments, ranging from healthcare or medical systems to finance, legal and/or public administration.

This systematic review confirms that stakeholders are expecting practitioners to develop explainable AI systems, that are not only accurate, but also interpretable, transparent and trustworthy. The findings suggest that XAI seeks to bridge the gap between technical performance and human understanding, by providing meaningful explanations for outputs generated by machine learning models, especially for those that function as “black boxes”. Several commentators indicated that XAI aims to foster user trust, supports accountability and ensures ethical and regulatory compliance.

The findings from this study confirm that the growing use of ML in sensitive areas like healthcare, finance, education and employment has sparked the stakeholders’ concerns over the opacity aspects of black boxes and on the possible liabilities of practitioners who research, develop and maintain AI-driven solutions. Generally, XAI practices can be divided into two broad categories: (i) inherently interpretable models, such as decision trees and linear regressions, and (ii) post-hoc interpretability methods for more complex black-box models like deep neural networks. The latter one generates both local and global explanations through feature attribution, perturbation analysis and visualizations.

For the time being, several complex ML models operate as black boxes, as they hinder the ability of their users and regulators to understand, contest or improve their outputs. XAI addresses these contentious issues by providing tools, methodologies and frameworks that are intended to enhance the interpretability of AI systems through substantive compliance mechanisms, ethical standards and normative guidelines.

Indeed, this research indicates that inherently interpretable models, counterfactual reasoning, ongoing fairness audits, human-in-the-loop (HITL) approaches as well as post-hoc explanations may contribute to improving the transparency and trustworthiness of ML algorithms (Mosqueira-Rey et al., 2023; Panigutti et al., 2021). Whilst counterfactual explanations enable practitioners to explore “what-if” scenarios and offer them actionable insights that can improve their model’s comprehensibility for decision subjects; regular fairness audits could analyze model outcomes across demographic groups and may possibly identify potential biases (Holzinger, 2021). In addition, human-in-the-loop (HITL) approaches and post-hoc explanations (as well as retrospective interpretability techniques) can enhance contextual accuracy and accountability.

Practitioners may avail themselves of a range of tools and libraries to implement XAI techniques including open-source options like SHAP, LIME, ELI5 and Alibi, among others, that offer model-agnostic interpretability. Moreover, they may use IBM’s AIX360 and Microsoft’s InterpretML to support them in the explainability of their datasets and machine learning models throughout their AI application lifecycle. Both resources include a diverse set of algorithms, code, guides, tutorials and demos that can help users better understand and explain AI models. Furthermore, they may utilize Google’s What-If Tool (WIT), an interactive visual interface designed to help data scientists, machine learning practitioners and AI ethicists explore, analyze and explain ML models. Such tools enable non-experts, researchers and practitioners to assess model fairness, evaluate performance, deploy responsible systems and make alternative predictions.

Currently, there are a number of XAI frameworks and evaluation standards that can institutionalize transparency. For example, these include initiatives like the United States’ Government’s Defense Advanced Research Projects Agency (DARPA) XAI Program whose AI systems’ decision-making processes can be understood and trusted by humans. DARPA funded a variety of interdisciplinary teams included academia, industry and national labs that explored human-AI interactions (interfaces and feedback loops to improve user trust and usability), interpretable ML models, post-hoc visual and symbolic explainable methods for black-box models like deep neural networks, cognitive psychology integration that design explanations that align with how humans reason and make sense of information. Similarly, Microsoft’s Prediction-Decision-Recommendation (PDR) framework offers an operational and predictive model for building trustworthy AI recommendation systems that are aligned with human values. PDR was introduced as part of Microsoft’s efforts in responsible AI, particularly in enterprise and applied settings.

Both DARPA’s XAI Program and Microsoft’s PDR (Prediction-Decision-Recommendation) framework can incorporate both quantitative and qualitative assessments in their XAI evaluation. For example, DARPA-funded XAI projects quantitative assessment involves an examination of the models’ (i) Fidelity of their logic, (ii) Completeness, and (iii) Simplicity. They use performance metrics to evaluate task accuracy, latency or robustness under explanation constraints. The qualitative assessment emphasizes human-centered evaluation as it investigates perceptions about task effectiveness as well as user trustworthiness, user expectations, and user satisfaction levels with AI models.

Furthermore, standards such as IEEE P7003 Standard for Algorithmic Bias Consideration that is part of the Institute of Electrical and Electronics Engineers (IEEE) P7000 series of standards for Ethically Aligned Design in autonomous and intelligent systems (AIS) is aimed at providing technical guidance for identifying, documenting and mitigating algorithmic bias in AI systems throughout their design, development and deployment. Other tools like Fairlearn and Testing with Concept Activation Vectors (TCAV) (a post-hoc explainability method) help assess model behavior against abstract social concepts. They intend to assist developers and data scientists in assessing and improving the fairness of ML technologies including ensemble methods and deep neural networks.

XAI challenges and methodological limitations

While deep learning infrastructures like black-box AI models often exhibit remarkable predictive performance, they suffer from a lack of interpretability, as it is difficult to understand the internal logic or rationale behind their decision-making processes and predictions. The lack of transparency and trustworthiness of black box models undermines the stakeholders’ efforts to audit or assign accountability for model-driven actions. It may prove hard to ensure that AI developers and systems administrators are held answerable for model-driven actions, when and if errors or harm occur, especially in certain industry sectors like healthcare diagnostics, financial aspects like credit-scoring, and/or welfare allocations, among others. In these contexts, their ML technologies’ decisions may have profound and potentially irreversible effects on the individuals’ lives. Without insight into how decisions are made, affected parties have limited avenues for recourse to action or to equitable remedies, thereby undermining procedural fairness, that could lead to the erosion of public trust in algorithmic systems.

ML systems are typically trained on datasets that may embed historical or structural biases, thereby posing risks of perpetuating inequitable outcomes in automated decision-making. This may result in a situation where a decision-making process or algorithm disproportionately and negatively affects vulnerable or underrepresented groups in society, even if there is no explicit intent to discriminate against them, particularly if their data may be under-sampled or misrepresented in training sets.

AI models’ predictive accuracy and fairness may degrade over time due to effects of data drift on the performance of machine learning models. Shifts in the underlying data distribution or changes in real-world contexts (e.g. political, economic, social, technological and/or ethical issues) can cause the models to produce less reliable or biased outcomes, thereby necessitating continuous monitoring, periodic retraining and fairness audits to ensure sustained performance and regulatory compliance. Such changes may either occur gradually (concept drift) or abruptly (covariate shift). Consequentially, models trained on historical data may no longer generalize well. Their ML systems may yield suboptimal outcomes that can impact on the livelihoods of individuals and specific groups in society. For instance, a financial institution that relies on a credit-scoring model that was trained before major economic fluctuations (e.g. inflation, recession and/or rises in taxes, duties and tariffs) could penalize individuals from economically disrupted regions without accounting for recent changes in income dynamics. Alternatively, low-income or minority borrowers including single mothers, immigrants or disabled persons (among other vulnerable groups in society) could be denied fair access to bank credit, as AI systems may fail to reflect new socioeconomic changes in the labor market. As a result, AI systems risk perpetuating or exacerbating existing inequalities without adequate and sufficient mechanisms that ensure that AI systems are fair and up to date with the latest developments.

It is imperative that AI practitioners conduct fairness auditing on a regular basis. They need to evaluate and appraise algorithmic outputs across various demographic groups to identify and correct any disparate impacts. Such audits must go beyond one-time assessments and need to become an integral part of the AI lifecycle, in order to ensure that models evolve in ways that uphold ethical standards and regulatory requirements. When combined, explainability, monitoring and fairness auditing can establish a trustworthy AI that is clearly aligned with societal expectations of justice, equity and accountability.

Indeed, XAI techniques can help address ethical and performance-related concerns by providing transparency into model behavior, as stakeholders including regulatory bodies, AI developers, auditors and affected individuals have a legitimate right to understand how specific outcomes are generated. Practitioners who maintain AI systems ought to regularly monitor the models to identify early warning signs of degradation. They are expected to recalibrate them before harmful consequences arise.

AI practitioners are encouraged to advance interpretable and efficient models that are responsive to the diverse needs of end users including data scientists, domain experts and end-users. Their human-centred evaluation of XAI methods may usually focus on the development of comprehensible explanations. Hence, they refer to common metrics including: (i) sparsity (meaning that explanations highlight only the most important factors); (ii) explanation complexity (referring to how simple or complicated an explanation is); (iii) simulatability (which is the extent to which practitioners can anticipate the model’s decision after seeing the explanation); and (iv) coverage, that indicates how many cases an explanation applies to). In addition, user-centered outcomes such as trust in the system, improved task performance and the time required to understand the explanation are also considered by AI administrators.

More importantly, their systems ought to be legally and ethically justifiable as well as socially defensible. They are required to comply with relevant regulatory frameworks governing the deployment of their models and to meet the transparency and auditability standards set by specific jurisdictions, such as the European Union’s General Data Protection Regulation (GDPR) and its AI Act (2024), among others. Table 1 features a comparison matrix that provides a non-exhaustive list of XAI tools. It outlines their strengths, weaknesses / limitations and identifies potential domains in which these tools can be applied.

Table 1. A comparison matrix of XAI tools that specifies their key metrics, strengths, weaknesses/limitations and domain fit.

XAI tool	Type	Core metrics	Supporting / human-centered metrics	Strengths	Weaknesses / limitations	Possible domains
Inherently interpretable models (decision trees, linear/logistic regression, rule-based models)	Model class	Sparsity Explanation length / complexity Rule length Simulatability Time-to-understanding	Coverage (model-wide) User trust score	– Transparent, easy to explain – Supports regulatory compliance – High interpretability without post-hoc tools	– Limited predictive power for complex patterns – May oversimplify high-dimensional data	– Finance (credit scoring) – Public administration – Healthcare triage – Education and HR screening
Post-hoc interpretability (general category)	Methodological class	Explanation length / complexity Sparsity Coverage Visualization clarity	User trust score Time-to-understanding	– Allows explanation of black-box models – Generates local and global explanations – Broad domain applicability	– Risk of misleading explanations – Does not make the model itself interpretable – May be computationally intensive	– Deep learning applications – High-stakes decisions needing model transparency
Counterfactual explanations	Method	Sparsity Explanation length Time-to-understanding User trust score	Coverage Task performance improvement	– Intuitive “what-if” reasoning – Actionable for decision subjects – Enhances user agency and contestability	– May propose unrealistic or infeasible scenarios – Sensitive to feature correlations	– Finance (loan decisions) – Hiring & admissions – Healthcare prognosis
Fairness audits (ongoing)	Governance mechanism	Coverage Visualization clarity	User trust score Task performance improvement	– Detects structural biases – Essential for compliance (e.g., EU AI Act, GDPR) – Supports trust and equity	– Requires access to sensitive demographic data – Needs continuous monitoring – May uncover issues that require costly remediation	– Public sector decision systems – Finance (credit scoring) – Policing algorithms – Welfare allocation
Human-in-the-loop (HITL)	Operational approach	User trust score Task performance improvement Time-to-understanding	Visualization clarity Explanation length	– Enhances accountability – Reduces automation bias – Supports hybrid decision-making	– Slows automation – Human reviewers require training – May introduce human bias	– Healthcare diagnosis – Legal assessments – Safety-critical systems
SHapley Additive exPlanations (SHAP)	Post-hoc; model-agnostic	Sparsity Explanation length / complexity Coverage Visualization clarity	Simulatability Time-to-understanding	– Theoretically grounded (game theory) – Local and global explanations – Widely adopted, rich visualization tools	– High computational cost for large models – Can overwhelm non-experts with detail	– Tabular/structured data – Finance, insurance, healthcare
Local interpretable model-agnostic explanations (LIME)	Post-hoc; model-agnostic	Sparsity Explanation length Coverage	Simulatability Time-to-understanding	– Simple, intuitive local explanations – Lightweight and fast – Works across model types	– Instability of explanations – Locality sampling may be misleading	– Real-time decisions – Early-stage diagnostics of ML models
ELI5	Model-agnostic toolkit	Sparsity Explanation length Visualization clarity	Simulatability	– Easy-to-use API – Supports debugging and visualization – Transparent feature and weight analysis	– Less comprehensive than SHAP/LIME – Limited deep-learning support	– Education, prototyping, model debugging
Alibi	Model-agnostic library	Sparsity Explanation length Coverage Rule length (Anchors)	Time-to-understanding User trust score	– Covers counterfactual, anchors, adversarial detection – Strong support for fairness evaluation	– Requires technical expertise – Less widely documented	– Enterprise ML pipelines – Sensitive domains requiring fairness
IBM AIX360	Comprehensive XAI framework	Explanation length / complexity Rule length Simulatability Coverage	User trust score	– Extensive algorithms + documentation – Open-source, enterprise-ready – Supports datasets + model explainability	– Large and complex ecosystem – Potential steep learning curve	– Regulated industries (finance, healthcare) – Enterprises needing governance support
Microsoft InterpretML	Comprehensive XAI framework	Simulatability Explanation length Visualization clarity Coverage	Time-to-understanding	– Supports interpretable models (EBMs) – Unified dashboard for explanations – Strong community support	– Less tailored for deep learning – Integration mainly in Python ecosystem	– Healthcare, HR, education – Systems needing interpretable boosting models
Google what-if tool (WIT)	Visual interface	Visualization clarity Coverage	Task performance improvement User trust score	– No-code/low-code exploration – Intuitive fairness and performance evaluation – Highly accessible	– Limited support for large-scale or custom DL architectures – Requires TensorBoard integration	– Ethical AI reviews – Education & training – Exploratory fairness analysis
DARPA XAI program	Research & evaluation framework	User trust score Task performance improvement Time-to-understanding	Explanation satisfaction Mental model accuracy	– Integrates cognitive psychology and human reasoning – Supports interpretable ML + post-hoc methods – Strong evaluation criteria (Fidelity, Completeness, Simplicity)	– Research-oriented; less plug-and-play – High complexity, diverse methodologies	– Defense, critical infrastructures – Human-AI collaboration research
Microsoft prediction–decision–recommendation (PDR) framework	AI governance & workflow framework	Task performance improvement Time-to-understanding User trust score	Visualization clarity	– Aligns predictions with human values – Designed for enterprise-scale recommender systems – Supports qualitative + quantitative metrics	– Tailored to recommendation ecosystems – Limited uptake outside Microsoft platforms	– Recommender systems (retail, media) – Decision-support platforms
IEEE P7003 algorithmic bias standard	Ethical & technical standard	Coverage Documentation completeness	User trust score (organizational)	– Provides actionable framework for bias mitigation – Widely recognized ethical standard – Supports documentation + governance	– Not a technical tool—needs developer interpretation – Compliance may require significant restructuring	– Public sector AI – HR and recruitment systems – Safety-critical decision systems
Fairlearn	Fairness assessment & mitigation library	Coverage Visualization clarity	User trust score	– Provides disparity metrics – Offers mitigation algorithms – Integrates with common ML pipelines	– Requires demographic data – Does not explain models—focuses on fairness only	– Credit scoring, insurance, hiring – Any domain requiring fairness constraints
Testing with concept activation vectors (TCAV) (Implemented in Captum)	Concept-based explainability	Simulatability (concept-level) Explanation length Sparsity (concept selection)	Time-to-understanding User trust score	– Explains models using human-understandable concepts – Helps detect stereotype-driven patterns	– Requires well-defined concepts – Limited to deep models with embeddings	– Computer vision – Medical imaging – NLP conceptual bias detection
Model monitoring for drift (concept drift, covariate shift)	Governance & operational process	Coverage Visualization clarity	Task performance improvement (operational) Time-to-understanding (alerts)	– Essential for long-term reliability – Supports proactive correction – Aligns with regulatory expectations	– Requires continuous data pipelines – Resource-intensive in large-scale systems	– Finance (risk models) – Healthcare (diagnostics) – Dynamic environments (e-commerce)

A conceptual framework for XAI

Explainable artificial intelligence (XAI) has become a very important area of inquiry for the promotion of responsible AI governance. Regulators, organizations and end-users are increasingly demanding that ML systems are transparent, accountable and fair. Beyond technical performance, these technologies are now expected to protect users’ privacy, safety and security, while remaining inclusive and accessible for the benefit of diverse socio-demographic groups in society, regardless of their age, gender, ability or ethnicity. As a result, XAI is no longer a peripheral consideration; rather, it has become a normative requirement as it advances ethical, trustworthy, and socially legitimate AI systems.

Accordingly, the objectives of XAI, whether explainable ML designs are driven by regulatory compliance, operational transparency policies or for trust building purposes, ought to be embedded across the entire AI lifecycle. The explainability of AI plays a critical role during the research and development phase, from data collection and preprocessing to model training, deployment, monitoring and maintenance. Notwithstanding, AI systems are better positioned to achieve accountability, reliability, and ethical alignment, when explainability is treated as an integral component of process innovation rather than a retrospective add-on,

However, there are instances during model development, where practitioners may have to balance trade-offs between the predictive performance of AI systems and their interpretability. Hence, evaluation criteria need to extend beyond accuracy and efficiency. They should consider the extent to which models generate explanations that are meaningful, accessible and appropriate for different user groups. Therefore, data-related practices are particularly influential at this stage. Transparent data provenance, systematic bias auditing as well as input features that are presented in a way that are easily understandable manner to humans (i.e. human-readable feature engineering) can substantially enhance model interpretability and user trust. In this respect, inherently interpretable models, such as decision trees and generalized additive models (GAMs) offer direct insights into decision logic, in contrast to complex black-box models that rely on post-hoc explanation techniques.

XAI systems require ongoing governance and maintenance once they have been deployed. This includes version control, retraining protocols that are guided by explainability objectives, as well as user feedback mechanisms that support continuous learning and improvement outcomes. The extant literature clearly distinguishes between ante-hoc and post-hoc approaches to explainability. Inherently interpretable models such as linear regression, rule-based systems, decision trees, GAMs and Bayesian models are transparent by design. Such models enable users to directly understand their modus operandi, operational logic and decision-making processes. By contrast, black-box models, including deep neural networks, necessitate post-hoc interpretability methods. Techniques such as SHAP and LIME provide feature-attribution and local explanations, while counterfactual reasoning, fairness audits and human-in-the-loop (HITL) approaches are increasingly employed to enhance transparency, accountability and equity in high-stakes contexts.

This review confirms that SHAP offers model-agnostic explanations by quantifying the contribution of individual features to model outputs, whereas LIME explains specific predictions by locally approximating complex models with interpretable surrogates. In addition, other open-source tools (e.g. ELI5, Alibi) and commercial platforms (e.g. IBM AIX360, Microsoft InterpretML, Google’s What-If Tool) have expanded the XAI ecosystem. Methodological approaches such as counterfactual explanations further support understanding by exploring “what-if” scenarios, while ongoing fairness audits evaluate model behaviors across demographic groups, to identify and mitigate bias. Human-in-the-loop (HITL) approaches complement these techniques by embedding human oversight throughout the AI lifecycle, thereby strengthening contextual accuracy and accountability.

Additionally, several institutions initiatives have led to the formalization of XAI assessment and evaluation standards. For instance, the DARPA XAI Program features quantitative metrics (such as fidelity, completeness, simplicity, robustness and performance), as well as qualitative ones (including human-centered evaluations that examine perceived usefulness, trust, satisfaction and task effectiveness). Yet, despite these advances, many existing XAI approaches remain technique-specific, as they exclusively focus on post-hoc explanations, fairness audits or concept-based methods, often resulting in fragmented evaluation practices.

Against this backdrop, this research puts forward an easy-to-understand, user-centric XAI framework for black-box models. This conceptual framework raises awareness on human-centered evaluation metrics and integrates them as a unifying analytical lens across the AI lifecycle (rather than assessing explainability in isolation). It explicitly links data practices, model design choices and explanation interfaces to measurable user outcomes, as illustrated in Fig. 1.

Fig. 1. A user-centric explainable artificial intelligence (XAI) framework for black box models.

Firstly, this user-centric XAI framework emphasizes transparent, inclusive and secure training data as a foundation for explainability and trust. While governance-oriented tools and standards (e.g. Fairlearn, IEEE P7003) primarily support compliance and bias detection, this model suggests that inclusiveness, transparency, safety and security metrics during the training phase by ensure that the models are developed in a manner that is fair, interpretable, robust and trustworthy.

The inclusiveness metrics help detect and mitigate biases in training data and model behavior, thereby promoting fairness. They ensure objective and consistent performance of AI systems across diverse user groups. Hence, they lead to explanations that are meaningful and relevant to all stakeholders. The transparency metrics are meant to evaluate how clearly the model’s internal decision-making processes can be understood by their users. During training, these metrics guide the development of models that produce interpretable and accessible explanations, in order to improve user comprehension and trust.

The safety metrics monitor the model’s behavior under various conditions, including during unusual, rare or unexpected situations (a.k.a. edge cases) that challenge the system’s robustness, to prevent harmful or unintended outcomes. The integration of safety considerations in training enhance the systems’ reliability aspects, as they ensure that explanations reflect typical contexts as well as exceptional (or even risky) scenarios. Similarly, the security metrics assess vulnerabilities to adversarial attacks or data manipulation. The inclusion of security metrics in training, models become more robust, and their explanations would enhance confidence levels and reduce potential risks, thereby fostering greater user assurance.

Secondly, the framework incorporates an accountable ante-hoc model layer grounded in inherently interpretable models. Clearly, it is consistent with decision trees and rule-based systems, as this layer prioritizes sparsity, simulatability and explanation conciseness. It facilitates quick understanding and mental simulation of decisions. In doing so, it and accountability beyond what post-hoc methods alone can achieve. The accountability metrics reinforces predictability and can strengthen the trustworthiness and governance of AI systems by: (i) evaluating whether the model’s decision logic can be audited and traced, thereby ensuring each prediction can be explained and justified to stakeholders; (ii) ensuring compliance with ethical and legal standards; (iii) assessing stakeholder understanding and acceptance; and, (iv) facilitating error and bias detection.

There is scope for practitioners to incorporate accountability metrics, if they want their inherently interpretable models to become more auditable, responsible and trustworthy. At the same time, they can enhance the value of ante-hoc explainability by adopting privacy metrics that safeguard sensitive information throughout the interpretability process. Though inherently interpretable models are transparent by design, the privacy metrics would ensure that this transparency does not compromise sensitive data by: (i) measuring risk of sensitive (personal) information exposure; (ii) enforcing data minimization principles to ensure that the model uses only the indispensable data to reduce privacy risks; (iii) Balancing interpretability and data protection (e.g. through anonymization techniques) to maintain explainability while respecting privacy constraints; and, (iv) Supporting compliance with data protection regulations (E.g. by complying with GDPR or other relevant privacy laws).

Thirdly, this framework integrates fair and robust post-hoc explanations with interpretable user interfaces. While tools such as SHAP, LIME, Alibi, and TCAV are commonly evaluated using metrics such as sparsity, complexity, and visualization clarity, this framework extends their application by explicitly prioritizing trust calibration and task performance improvement, particularly when AI systems are employed for decision support in human-in-the-loop (HITL) settings. This emphasis aligns with human-centered evaluation principles advocated in initiatives such as DARPA XAI and Microsoft’s Prediction–Decision–Recommendation (PDR) framework.

Post-hoc explanation methods are applied after a black-box model has been trained and has generated predictions (e.g., SHAP, LIME, and counterfactual explanations). While fairness metrics (e.g., demographic parity, equalized odds, and disparate impact) quantify whether the model’s decisions are biased or discriminatory across different demographic groups, robustness metrics assess stability under perturbations, including both predictive robustness (i.e., stability of model outputs) and explanation robustness (i.e., consistency of explanations under slight input variations). In this context, robustness may refer both to the stability of the model’s predictions and to the consistency of the generated explanations under slight variations in the input data.

The fairness and robustness metrics build user trust and enhance XAI in post-hoc settings as they reveal biases, validate explanation reliability (as explanations are not expected to significantly change if they are meeting and exceeding their robust performance metrics), guide explanation refinement (by monitoring fairness and robustness metrics, developers can fine tune post-hoc methods to produce explanations that are accurate and fairly representative of the model’s decision logic), improve interface transparency, and support regulatory compliance as well as ethical standards that foster increased transparency and accountability of AI systems.

Overall, this conceptual framework offers a coherent, user-oriented benchmark for assessing explainability across data, models, and interfaces, thereby extending existing XAI frameworks developed by technology firms and standards bodies. It implies that ante-hoc (inherently interpretable) models can inform and calibrate post-hoc explanation methods and their associated interfaces. Ante-hoc models may serve as interpretable baselines against which the fidelity and consistency of post-hoc explanations from black-box models are assessed. Therefore, the integration of ante-hoc and black-box models can support the development of more trustworthy systems, particularly by enabling interpretable interfaces to be trained or tested against transparent model logic before deployment in more complex settings. Accordingly, this framework positions ante-hoc models as an intermediary layer between training data and post-hoc explanations. This enables explanation methods and interfaces to be validated against interpretable model logic before being applied to complex black-box systems.

Conclusions

This research synthesizes key contributions in XAI to underline its essential role in promoting responsible governance in the research, development and maintenance of machine learning systems. It discusses about XAI tools, describes their metrics, identifies their strengths as well as their weaknesses / limitations. Moreover, it reports their possible domains. It addresses ethical concerns related to black-box models. Hence, it emphasizes the need for documentation practices that establish the normative and technical baselines for accountability, upon which performance tracking and continuous monitoring are built. One has to consider that robust drift detection and fairness auditing are dependent on these baselines and operate iteratively throughout deployment in order to maintain reliable, transparent and equitable XAI systems.

This contribution’s user-centric XAI framework with its interpretable interfaces that bridge technical innovation and stakeholder ethics are intended to foster responsible AI and ensure that ML models remain interpretable, trustworthy and compliant with ethical and legal standards like GDPR and the EU AI Act, throughout their lifecycle.

Theoretical implications

This research adds value to the extant academic literature focused on XAI. It clarifies key notions and explains the meanings of different terms related to model interpretability, data drift, concept drift and fairness. Moreover, it clarifies how practitioners can build and maintain trustworthy AI systems. It clearly indicates that interpretability is a crucial mechanism for fostering user trust, not just through technical explanations, but also by adhering to clear governance structures and established communication channels. This reasoning aligns with emerging theories related to human-computer interaction and to technology adoption frameworks drawn from social sciences literature, that highlight the importance of transparency and accountability in building user confidence in complex systems.

This research builds on the foundations of established theoretical underpinnings by integrating explainable AI within broader models of technology acceptance, trust and socio-technical dynamics. For example, some elements of this contribution’s conceptual framework are related to the Technology Acceptance Model (TAM) key constructs including to perceived usefulness and perceived ease of use, as these factors clearly align with XAI’s goals of enhancing transparency and interpretability of ML models to foster user adoption. The framework also draws on Trust in Automation theories, particularly where they highlight the rationale for the development of explainable AI systems, to enhance user trust, and to prevent their misuse or disuse. In a similar vein, some commentators argue that XAI literature is grounded in the Socio-Technical Systems (STS) theory. They contend that this theory provides a holistic lens by emphasizing the interplay between technological artifacts and social contexts, thereby reinforcing the need for inclusive, ethical and transparent AI design. Other colleagues maintain that XAI literature is rooted in Responsible Research and Innovation (RRI) frameworks as they raise awareness about anticipatory governance, stakeholder engagement and ethical reflexivity, all of which are operationalized through user-centric and transparent approaches. Together, these models serve as a theoretical basis for this study’s conceptual framework, as they bridge technical, human, ethical and regulatory dimensions to support trustworthy AI ecosystems.

This timely contribution promotes transparent and fair forms of AI knowledge generation, as the reasoning behind ML decisions and predictions ought to be continuously scrutinized and validated. It puts forward a comprehensive framework that synthesizes key dimensions of XAI into a cohesive model. It reports how, why, where and when explainability is evolving within generative AI systems. Generally, by linking design choices to measurable user outcomes across the AI lifecycle. Unlike prior models that are narrowly focused on interpretability techniques, this framework integrates lifecycle governance with human-centered evaluation metrics. It supports the practical implementation of responsible AI principles. By doing so, it advances theoretical understanding while offering actionable guidance for developers, policymakers and stakeholders committed to trustworthy AI.

In sum, it provides a comprehensive explanation of XAI systems for the benefit of their users including AI developers, data scientists, domain experts, business stakeholders, regulators and auditors, end users as well as academic researchers, among others. It enables them to better understand the modus operandi of deep neural networks and complex learning models. It promotes post-hoc explanation techniques and methods that provide explanations for the decisions made by machine learning models after they have been trained. This is particularly important for opaque black box models, ensemble methods or support vector machines, which offer high predictive accuracy but are not clear enough on how they arrive at specific outputs. It identifies XAI tools that can help practitioners assess the validity and reliability of ML models.

This research emphasizes the dynamic challenges of AI deployment. It makes reference to model drift and to data distribution shifts, as they can have a negative impact on the reliability and fairness of explanations over time. This perspective moves beyond static evaluations of XAI. It highlights the need for continuous monitoring and adaptation of AI models. It considers the needs and challenges faced not only by AI developers but also by system administrators and non-expert users. It recognizes that effective XAI must cater to diverse levels of technical understanding and operational requirements.

This article also offers novel, integrated and up-to-date syntheses of both academic research as well as practitioner-oriented tools and frameworks. It bridges the gap between theoretical advancements and their real-world applications across the entire AI lifecycle. It refers to technical aspects including XAI specific tools and techniques, data monitoring, fairness assurance and stakeholder engagement, thereby providing a timely and holistic view of the current XAI landscape.

Practical iplications

This research offers guidance for a wide range of stakeholders involved in the development, deployment and governance of AI systems. It provides actionable insights for developers and system administrators for implementing XAI. It describes specific tools (e.g. SHAP, LIME, ELI5) and platforms that offer concrete entry points for integrating interpretability into their workflows. This article highlights a comparison matrix of leading XAI tools. It outlines their key metrics, strengths, limitations and domain suitability to support informed managerial decision-making.

Additionally, it proposes a user-centric XAI framework tailored for black-box models. This framework offers practical guidance on aligning explainability techniques with organizational capabilities, stakeholder expectations, and contextual constraints. The novel framework provides a tangible structure that embeds responsible AI practices from the initial design phase through ongoing monitoring and updates. It is intended to support practitioners in the development of more robust, reliable and trustworthy AI applications. Its recommendations for the integration of interpretability, regular bias monitoring and fairness auditing (through standardized reporting frameworks, such as model cards and data sheets, combined with automated drift detection tools) can inform policy makers as well as practitioners to advance XAI systems. Hence, the development of internal policies, quasi-substantive rules and workflows are intended to advance responsible AI development and deployment. This may ultimately lead to virtuous outcomes that are intended to foster a culture of ethical AI innovation that enhances public trust and understanding of XAI systems, leading to increased user adoption in different domains.

Limitations and future research directions

Despite its contributions, this study also has its inherent limitations. The systematic review involved the analysis of recent, high-impact academic publications focused on “explainable artificial intelligence” or “explainable AI” or “XAI”. This selection approach, while ensuring relevance and quality, introduces the risk of citation bias, where frequently cited or well-known studies receive disproportionate attention, potentially overshadowing emerging, less-cited, or interdisciplinary work. Consequently, some innovative advancements or niche applications in XAI may not have been fully captured. Additionally, the quickly evolving nature of the field means new developments could have emerged after the review period. Furthermore, the evaluation of XAI tools and frameworks relied on publicly available information and academic studies, which often lack empirical depth or comprehensive real-world validation, thereby limiting the scope for fully assessing practical performance and impact of interpretable models.

Future research can address these limitations and explore plausible areas of study related to XAI. For example, there is scope for conducting longitudinal studies to examine the long-term impact of XAI adoption on system performance, user trust and on the fairness of AI outputs in real-world scenarios. Moreover, other research is required to develop standardized metrics that can evaluate the “quality” of explanations and their effectiveness for different user groups hailing from diverse contexts. Perhaps, prospective researchers can build on this seminal article by promoting the integration of XAI techniques with other responsible AI governance frameworks, such as privacy-secure AI methodologies, robust AI, as well as inclusive, bias-free AI systems in the near future. In addition, they may analyze human-computer interaction aspects of XAI, including how different types of explanations are perceived and understood by diverse stakeholders. It is imperative that developers design effective and interpretable user-centric XAI solutions. Further research in these fields of study will contribute to the continued advancement and to the responsible adoption of explainable AI, as shown in Table 2.

Table 2. Future research directions related to explainable AI (XAI).

Future research area	Rationale	Potential impact
Context-specific XAI	To investigate user backgrounds, domain knowledge of XAI and cultural contexts.	Increases usability and accessibility of XAI systems.
Human-computer interaction (HCI) in XAI	To explore how different stakeholders perceive, interpret and interact with different types of AI explanations.	Improves the design of user-centric and interpretable XAI solutions.
Focus on niche and emerging XAI applications	To examine XAI applications in specialized domains (e.g., healthcare, finance, autonomous systems).	Expands XAI applicability and domain-specific innovations.
Integration of XAI with responsible AI governance frameworks	To better understand how XAI can be associated with privacy-preserving, robust and bias-free AI methodologies, to advance holistic AI governance frameworks.	Promotes trustworthy, fair, and secure XAI deployment.
Empirical validation of XAI tools and frameworks	In-depth and broad empirical studies will shed light on the effectiveness of current XAI tools in real-world applications.	Bridges the gap between theoretical models and practical uses of XAI.
Longitudinal studies on XAI adoption	To analyze the long-term effects of XAI on system performance, user trust and fairness in real-world contexts.	Advances knowledge on sustained benefits and risks of XAI use.
Ethical and social implications of XAI	To demonstrate the societal impacts, ethical challenges and policy considerations arising from XAI adoption.	Guides responsible AI governance deployment that respects societal norms.
Development of standardized evaluation metrics	To create standardized, reliable metrics that can assess XAI quality and its effectiveness across diverse users.	Enables consistent benchmarking and comparison of XAI tools.

Appendix A. Key concepts in explainable artificial intelligence research.

XAI key term	Description
Accountability	Accountability ensures that individuals or organizations can be held responsible for the outcomes and impacts of AI systems, especially in critical applications, where errors or biases could have significant consequences. Individuals and organizations ought to be supported by clear, interpretable explanations that enable oversight and compliance with ethical or regulatory standards.
Artificial Intelligence (AI)	AI is a broad field in computer science focused on creating machines that can perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception and decision-making.
Black box / Black-box model	The black box (model) refers to the opaque nature of various AI models like deep neural networks’ decision-making processes (as users including their developers may not be in a position to understand their modus operandi). While such models can usually achieve high accuracy, they may not be transparent about how their data processing works and on how they produce a specific output.
Counterfactual explanations	Counterfactual explanations are a type of model-agnostic explanation technique used in interpretable and explainable AI (XAI). They describe how an input instance would need to be altered minimally for a machine learning model to yield a different (usually a desired) outcome.
Decision making	Decision-making in the context of AI refers to the process where an AI system uses computational techniques to analyze data, identify patterns, and determine optimal courses of action or choices from a set of alternatives. Unlike human decision-making, which can rely on intuition, experience, or emotion, AI decision-making is data-driven and is based on algorithms.
Decision support systems (DSS)	DSS are applications that analyze data and provide valuable insights. They are designed to assist humans in making informed choices. In XAI contexts, DSS integrate explainability into such systems, and transform them from “black boxes” into transparent tools that users can understand and trust, especially in sensitive domains like healthcare, among others.
Deep Learning (DL)	DL is a subset of machine learning that focuses on utilizing multilayered (deep) neural networks to learn patterns and representations directly from raw data, to discover intricate features and perform tasks such as classification, regression and representation learning.
Evaluation metrics	The evaluation metrics relate to how AI and XAI systems are assessed and measured. They enable practitioners to objectively evaluate their effectiveness as well as the quality of explanations generated by AI systems. While AI models are typically evaluated on their predictive performance (e.g., in terms of their accuracy), XAI evaluation metrics go beyond this to measure how well explanations help users understand, trust and interact with AI systems. In this case, evaluation metrics may include human-centered metrics (e.g. the users’ trust and satisfaction levels vis-à-vis XAI) as well as quantitative metrics (like measuring the model’s accuracy and comprehensiveness).
Explainable Artificial Intelligence / Explainable AI (XAI)	XAI explores methods that provide humans with the ability and intellectual oversight to understand AI outputs. The rationale behind XAI is to increase the interpretability and transparency of AI decisions, actions and predictions. In other words, XAI is intended to answer the “why” and “how” behind AI’ systems, as they often function as blackboxes.
Feature attribution	Feature attribution refers to the process of quantifying the contribution or importance of each input feature in a machine learning model’s prediction. It helps explain how much each feature influences a particular decision made by the model. This is especially valuable in interpretable machine learning and explainable AI (XAI), as the understanding why an AI model is advancing a certain prediction is as important as the prediction itself.
Human-AI interaction (HAII) / Human-computer interaction (HCI)	HAII and HCI concepts and their variations emphasize the user-centricity aspects of AI. In the context of XAI, both notions suggest that humans are more likely to engage, communicate and collaborate with intuitive and explainable AI interfaces.
Human-in-the-Loop (HITL)	HITL approaches refer to systems or processes in AI and ML where human judgement and intervention are actively integrated into the decision-making loop. This involvement can occur at various stages, through data collection, labeling and annotation (often with human input), data preprocessing and curation, model training, model evaluation and validation (with human oversight, especially in high-stakes domains), model deployment as well as during monitoring and maintenance phases. The underlying goal of HITL is to combine the strengths of human intuition, contextual understanding and ethical reasoning with the efficiency and scale of automated systems.
Interpretability	Interpretability is related to the degree to which a human can understand internal mechanics, in terms of the cause-effect relationships of its decision-making processes of XAI models. This construct suggests that users tend to interact with transparent and trustworthy XAI technologies because they can facilitate the interpretation of their outputs.
Local Interpretable Model-agnostic Explanations (LIME)	LIME is a technique that explains individual predictions by approximating a complex model (e.g. in a localized setting), with an interpretable one, such as a linear model. LIME highlights which features could influence a specific decision (by perturbing input data) to observe how predictions change, thereby making black-box models more understandable to users without requiring access to their internal structures.
Machine learning	ML is a field in AI concerned with the development and study of algorithms that can identify patterns within the data. This allows them to learn from them and to make decisions as well as predictions. Such systems can perform tasks without explicit instructions and could improve their performance over time, as they are exposed to more data.
Mental models / Shared mental models	Shared mental models refer to the mutual understanding and common representation of knowledge between humans and AI agents regarding their respective roles, capabilities and their task at hand. Essentially, they refer to the extent to which there is a shared understanding of how the AI system operates and how it aligns with the overall task.
Neural networks (models)	Neural networks are complex machine learning architectures with interconnected “layers” used to learn patterns from data, to perform specific tasks like predictions or classifications. The role of XAI is to provide explanations about how such opaque networks/models work. It clarifies how inputs influence outputs and reveals what the AI model has learned.
Perturbation analysis	Perturbation analysis involves systematically altering (perturbing) one or more features of the input data and observing how the model’s output changes.
Post-hoc explanations	Post-hoc explanations are retrospective interpretability techniques that are used to explain the predictions of already trained machine learning models after they have made a decision. Post-hoc explanations are generated after model training and are not part of the original learning process. They aim to interpret how or why a model made a specific decision, without altering the model itself.
SHapley Additive exPlanations (SHAP)	SHAP is a method based on Shapley values from cooperative game theory. It is used to explain the output of machine learning models. Basically, SHAP offers consistent and theoretically grounded insights into how individual XAI features contribute to its model decisions, by assigning each feature an “importance value” for a specific prediction. Features with positive SHAP values positively impact the prediction, while those with negative values have a negative impact. The magnitude is a measure of how strong the effect is.
Transparency	Transparency refers to the clarity and understandability of an AI system’s internal workings and decision-making processes. Hence, it allows humans to learn how AI systems process data and make decisions.
Trust	Trust refers to the users’ confidence levels they place on XAI systems’ decisions. The individuals’ willingness to avail themselves of XAI technologies relies on their reliability in terms of clarity, consistency and usefulness of their explanations. XAI aims to foster appropriate levels of trust by helping users to better understand how and why AI models make certain decisions, predictions or generate outcomes.
User behavior / User study	User behavior focuses on how individuals interact with, perceive, and respond to the explanations provided by AI systems. The persons’ cognitive processes, trust, decision-making and reliance on AI systems can influence their engagement levels with XAI technologies.

About the author

Mark Anthony CAMILLERI, Ph.D. (Edinburgh) is an Associate Professor in the Department of Corporate Communication at the University of Malta. He was a Fulbrighter at Northwestern University in Evanston, U.S.A (in 2022). Prof. Camilleri was featured among the world’s top 2% scientists in Elsevier’s “Updated science-wide author databases of standardized citation indicators” (in the past four years). In 2023, he achieved a global rank (ns) of 3854, and was listed 124th among business & management researchers. He serves as a scientific expert and reviewer for various European research councils. He was recognized for his outstanding reviews by Publons and by Emerald (as he received a Literati award in 2022 and 2023). He is an Associate Editor of Business Strategy and the Environment; Sustainable Development and of International Journal of Hospitality Management, among others.

The Service Industries Journal: Call for papers focused on ethical AI

Special Issue: Ethical implications of artificial intelligence (AI) and automation in service industries: Addressing algorithmic bias, opacity and unclear accountability mechanisms

Overview

Artificial intelligence (AI) and automation technologies are transforming service industries, including finance, healthcare, hospitality, retail, education, public services and digital platforms. While algorithmic decision-making systems, service robots, chatbots, predictive analytics and automated workflows offer enhanced efficiencies, personalization possibilities and scalability potential, these technologies are also raising profound ethical concerns related to their modus operandi and explainability of their outputs (Camilleri, 2024; Hu & Min, 2023).

As AI-driven service systems increasingly mediate interactions between organisations and their stakeholders; ethical failures and bias have the potential to reinforce existing social inequalities, undermine their trustworthiness, service quality, organisational legitimacy and broader societal well-being (Camilleri et al., 2024). Moreover, opaque “black-box” models reduce transparency and could erode user trust in these machine learning technologies (Kordzadeh & Ghasemaghaei, 2022). Unclear accountability structures may obscure responsibility for service failures or might facilitate unintended harmful outcomes (Novelli et al., 2024). These challenges are particularly evidenced in service contexts where human–AI interactions are frequent, relational and consequential.

Such concerns are clearly illustrated in healthcare services (Procter et al., 2023), where AI-driven diagnostic and triage systems are increasingly used to support clinical decision-making. When these technologies rely on biased or unrepresentative training data, they may systematically underdiagnose or misclassify specific demographic groups. Given the high-stakes and the relational nature of healthcare encounters, limited transparency and explainability can significantly diminish patient trust while raising serious ethical and accountability concerns.

Similar issues arise in financial and insurance services (Oke & Cavus, 2025), where automated credit scoring, loan approval and underwriting systems directly influence individuals’ financial inclusion and long-term economic prospects. Algorithmic opacity makes it difficult for customers to understand, question or contest adverse decisions. Therefore, biased models may perpetuate or amplify socioeconomic inequalities. Such an outcome is particularly problematic in service relationships characterised by long-term dependency and trust.

Ethical challenges are also conspicuous in customer service and frontline interactions (Han et al., 2023), where chatbots and virtual assistants handle large volumes of customer inquiries across retail, telecommunications and travel services (Lv et al., 2022). Although these systems offer efficiency and scalability benefits, there are instances where they fail to recognise emotional distress, cultural differences, or exceptional circumstances. Excessive automation can therefore undermine relational service quality, especially when customers are unable to escalate complex or sensitive issues to human agents (Yang et al., 2022).

In public service contexts, governments are progressively deploying AI systems (Willems et al., 2023) to allocate welfare benefits, determine assess eligibility and detect fraud. In such settings, automated decisions can have profound implications for the citizens’ livelihoods and their inclusion in cohesive societies Ethical concerns become particularly acute when accountability is diffused between public agencies and technology providers, as well as when affected individuals lack meaningful mechanisms for appeal, explanation or redress.

Likewise, platform-based and gig economy services are increasingly relying on algorithmic management systems to assign tasks, evaluate performance and to compute remunerations (Kadolkar et al., 2025). These systems often operate as “black boxes,” leaving workers uncertain about how ratings, penalties or income calculations are determined. The resulting lack of transparency and of clear accountability structures can weaken trust, exacerbate power asymmetries and could intensify worker vulnerability within ongoing service relationships.

Notwithstanding, more human resource management and recruitment specialists are adopting AI-enabled tools for résumé screening and to assess their candidates’ credentials (Soleimani et al., 2025). Possible bias embedded within these systems may disadvantage certain social groups. Their limited transparency can prevent applicants from understanding how hiring decisions are made. Such practices raise important ethical questions concerning fairness, informed consent and procedural justice within professional service contexts.

This special issue seeks to advance novel insights into the above ethical implications of AI and automation in services industries. The guest editors look forward to receiving original, interdisciplinary contributions that critically examine how ethical principles can be embedded into the design, governance, implementation and evaluation of AI-enabled service systems.

Aims and scope

The special issue aims to:

· Deepen understanding of ethical risks and dilemmas associated with AI and automation in service industries.

· Explore mechanisms for bias detection, mitigation and governance in service algorithms.

· Examine transparency, explainability and accountability in AI-enabled service encounters.

· Advance responsible, human-centered and sustainable approaches to AI-driven service innovation.

Both conceptual, theoretical and empirical contributions are welcome, including qualitative, quantitative, mixed-methods, experimental, design science as well as critical and/or reflexive approaches.

Indicative themes and topics

Submissions may address, but are not limited to, the following topics:

· Algorithmic bias and discrimination in service delivery;

· Ethical design of AI-enabled service systems;

· Transparency and explainability in automated service decisions;

· Accountability and responsibility in human–AI service interactions;

· AI ethics governance, regulation, and standards in service industries;

· Trust, legitimacy and customer perceptions of AI-driven services;

· Ethical implications of service robots and conversational agents;

· Human oversight and hybrid human–AI service models;

· Data privacy, surveillance and consent in digital service platforms;

· Fairness and inclusion in AI-based personalisation and targeting;

· Responsible AI and ESG considerations in service organisations;

· Cross-cultural and institutional perspectives on AI ethics in services;

· Ethical failures, service recovery and crisis communication involving AI;

· Methodological advances for studying ethics in AI-enabled services.

References

Camilleri, M. A., Zhong, L., Rosenbaum, M. S. & Wirtz, J. (2024). Ethical considerations of service organizations in the information age. The Service Industries Journal, 44(9-10), 634-660.

Camilleri, M. A. (2024). Artificial intelligence governance: Ethical considerations and implications for social responsibility. Expert Systems, 41(7), e13406.

Hu, Y., & Min, H. K. (2023). The dark side of artificial intelligence in service: The “watching-eye” effect and privacy concerns. International Journal of Hospitality Management, 110, 103437.

Kadolkar, I., Kepes, S., & Subramony, M. (2025). Algorithmic management in the gig economy: A systematic review and research integration. Journal of Organizational Behavior, 46(7), 1057-1080.

Kordzadeh, N., & Ghasemaghaei, M. (2022). Algorithmic bias: review, synthesis, and future research directions. European Journal of Information Systems, 31(3), 388-409.

Lv, X., Yang, Y., Qin, D., Cao, X., & Xu, H. (2022). Artificial intelligence service recovery: The role of empathic response in hospitality customers’ continuous usage intention. Computers in Human Behavior, 126, 106993.

Novelli, C., Taddeo, M., & Floridi, L. (2024). Accountability in artificial intelligence: What it is and how it works. AI & Society, 39(4), 1871-1882.

Procter, R., Tolmie, P., & Rouncefield, M. (2023). Holding AI to account: challenges for the delivery of trustworthy AI in healthcare. ACM Transactions on Computer-Human Interaction, 30(2), 1-34.

Soleimani, M., Intezari, A., Arrowsmith, J., Pauleen, D. J., & Taskin, N. (2025). Reducing AI bias in recruitment and selection: an integrative grounded approach. The International Journal of Human Resource Management, 1-36.

Willems, J., Schmid, M. J., Vanderelst, D., Vogel, D., & Ebinger, F. (2023). AI-driven public services and the privacy paradox: do citizens really care about their privacy?. Public Management Review, 25(11), 2116-2134.

Yang, Y., Liu, Y., Lv, X., Ai, J., & Li, Y. (2022). Anthropomorphism and customers’ willingness to use artificial intelligence service agents. Journal of Hospitality Marketing & Management, 31(1), 1-23.

Submission Instructions

Submission guidelines

Manuscripts should be prepared according to The Service Industries Journal’s author guidelines and submitted via the journal’s online submission system. During submission, authors should select the special issue title:

“Ethical implications of artificial intelligence (AI) and automation in service industries: Addressing algorithmic bias, opacity and unclear accountability mechanisms”.

All submissions will undergo a double-blind peer review process in accordance with the journal’s standards and policies of Taylor & Francis.

Important dates

Full paper submission deadline: 31^st January 2027
First round of reviews: 31^st March 2027
Revised manuscript submission: 31^st May 2027
Final acceptance: 31^st August 2027
Expected publication: 30^th November 2027

Contact Information: For informal enquiries regarding the fit of manuscripts or the scope of the special issue, please contact the Leading Guest Editor via Mark.A.Camilleri@um.edu.mt.

An artificial intelligence governance framework

This is an excerpt from my latest contribution on responsible artificial intelligence (AI).

_{Suggested citation: Camilleri, M. A. (2023). Artificial intelligence governance: Ethical considerations and implications for socialresponsibility. Expert Systems, e13406. https://doi.org/10.1111/exsy.13406}

The term “artificial intelligence governance” or “AI governance” integrates the notions of “AI” and “corporate governance”. AI governance is based on formal rules (including legislative acts and binding regulations) as well as on voluntary principles that are intended to guide practitioners in their research, development and maintenance of AI systems (Butcher & Beridze, 2019; Gonzalez et al., 2020). Essentially, it represents a regulatory framework that can support AI practitioners in their strategy formulation and in day-to-day operations (Erdélyi & Goldsmith, 2022; Mullins et al., 2021; Schneider et al., 2022). The rationale behind responsible AI governance is to ensure that automated systems including ML/DL technologies, are supporting individuals and organizations in achieving their long terms objectives, whist safeguarding the interests of all stakeholders (Corea et al., 2023; Hickok et al., 2022).

AI governance requires that the organizational leaders comply with relevant legislation, hard laws and regulations (Mäntymäki et al., 2022). Moreover, they are expected to follow ethical norms, values and standards (Koniakou, 2023). Practitioners ought to be trustworthy, diligent and accountable in how they handle their intellectual capital and other resources including their information technologies, finances as well as members of staff, in order to overcome challenges, minimize uncertainties, risks and any negative repercussions (E.g. decreased human oversight in decision making, among others) (Agbese et al., 2023; Smuha, 2019).

Procedural governance mechanisms ought to be in place to ensure that AI technologies and ML/DL models are operating in a responsible manner. Figure 1 features some of the key elements that are required for the responsible governance of artificial intelligence. The following principles are aimed to provide guidelines for the modus operandi of AI practitioners (including ML/DL developers).

Figure 1. A Responsible Artificial Intelligence Governance Framework

Accountability and transparency

“Accountability” refers to the stakeholders’ expectations about the proper functioning of AI systems, in all stages, including in the design, creation, testing or deployment, in accordance with relevant regulatory frameworks. It is imperative that AI developers are held accountable for the smooth operation of AI systems throughout their lifecycle (Raji et al., 2020). Stakeholders expect them to be accountable by keeping a track record of their AI development processes (Mäntymäki et al., 2022).

The transparency notion refers to the extent to which end-users could be in a position to understand how AI systems work (Andrada et al., 2020; Hollanek, 2020). AI transparency is associated with the degree of comprehension about algorithmic models in terms of “simulatability” (an understanding of AI functioning), “decomposability” (related to how individual components work), and algorithmic transparency (this is associated to the algorithms’ visibility).

In reality, it is difficult to understand how AI systems, including deep learning models and their neural networks are learning (as they acquire, process and store data) during training phases. They are often considered as black box models. It may prove hard to algorithmically translate derived concepts into human-understandable terms, even though developers may use certain jargon to explain their models’ attributes and features. Many legislators are striving in their endeavors to pressurize AI actors to describe the algorithms they use in automated decision-making, yet the publication of algorithms is useless if outsiders cannot access the data of the AI model.

Explainability and interpretability

Explainability is the concept that sheds light on how AI models work, in a way that is comprehensible to a human being. Arguably, the explainabilty of AI systems could improve their transparency, trustworthiness and accountability. At the same time, it can reduce bias and unfairness. The explainability of artificial intelligence systems could clarify how they reached their decisions (Arya et al., 2019; Keller & Drake, 2021). For instance, AI could explain how and why autonomous cars decide to stop or to slow down when there are pedestrians or other vehicles in front of them.

Explainable AI systems might improve consumer trust and may enable engineers to develop other AI models, as they are in a position to track provenance of every process, to ensure reproducibility, and to enable checks and balances (Schneider et al., 2022). Similarly, interpretability refers to the level of accuracy of machine learning programs in terms of linking the causes to the effects (John-Mathews, 2022).

Fairness and inclusiveness

The responsible AI’s fairness dimension refers to the practitioners’ attempts to correct algorithmic biases that may possibly (voluntarily or involuntarily) be included in their automation processes (Bellamy et al., 2019; Mäntymäki, et al., 2022). AI systems can be affected by their developers’ biases that could include preferences or antipathies toward specific demographic variables like genders, age groups and ethnicities, among others (Madaio et al., 2020). Currently, there is no universal definition on AI fairness.

However, recently many multinational corporations have developed instruments that are intended to detect bias and to reduce it as much as possible (John-Mathews et al., 2022). In many cases, AI systems are learning from the data that is fed to them. If the data are skewed and/or if they comprise implicit bias into them, they may result in inappropriate outputs.

Fair AI systems rely on unbiased data (Wu et al., 2020). For this reason, many companies including Facebook, Google, IBM and Microsoft, among others are striving in their endeavors to involve members of staff hailing from diverse backgrounds. These technology conglomerates are trying to become as inclusive and as culturally aware as possible in order to minimize bias from affecting their AI processes. Previous research reported that AI’s bias may result in inequality, discrimination and in the loss of jobs (Butcher & Beridze, 2019).

Privacy and safety for consumers

Consumers are increasingly concerned about the privacy of their data. They have a right to control who has access to their personal information. The data that is collected or used by third parties, without the authorization or voluntary consent of individuals, would result in the violations of their privacy (Zhu et al., 2020; Wu et al., 2022).

AI-enabled products, including dialogue systems like chatbots and virtual assistants, as well as digital assistants (e.g. like Siri, Alexa or Cortana), and/or wearable technologies such as smart watches and sensorial smart socks, among others, are increasingly capturing and storing large quantities of consumer information. The benefits that are delivering these interactive technologies may be offset by a number of challenges. The technology businesses who developed these products are responsible to protect their consumers’ personal data (Rodríguez-Barroso et al., 2020). Their devices are capable of holding a wide variety of information on their users. They are continuously gathering textual, visual, audio, verbal, and other sensory data from consumers. In many cases, the customers are not aware that they are sharing personal information to them.

For example, facial recognition technologies are increasingly being used in different contexts. They may be used by individuals to access websites and social media, in a secure manner and to even authorize their payments through banking and financial services applications. Employers may rely on such systems to track and monitor their employees’ attendance. Marketers can utilize such technologies to target digital advertisements to specific customers. Police and security departments may use them for their surveillance systems and to investigate criminal cases. The adoption of these technologies has often raised concerns about privacy and security issues. According to several data privacy laws that have been enacted in different jurisdictions, organizations are bound to inform users that they are gathering and storing their biometric data. The businesses that employ such technologies are not authorized to use their consumers’ data without their consent.

Companies are expected to communicate about their data privacy policies with their target audiences (Wong, 2020). They have to reassure consumers that the consented data they collect from them is protected and are bound to inform them that they may use their information to improve their customized services to them. The technology giants can reward their consumers to share sensitive information. They could offer them improved personalized services among other incentives, in return for their data. In addition, consumers may be allowed to access their own information and could be provided with more control (or other reasonable options) on how to manage their personal details.

The security and robustness of AI systems

AI algorithms are vulnerable to cyberattacks by malicious actors. Therefore, it is in the interest of AI developers to secure their automated systems and to ensure that they are robust enough against any risks and attempts to hack them (Gehr et al., 2018; Li et al., 2020).

The accessibility to AI models ought to be continuously monitored at all times during their development and deployment (Bertino et al., 2021). There may be instances when AI models could encounter incidental adversities, leading to the corruption of data. Alternatively, they might encounter intentional adversities when they experience sabotage from hackers. In both cases, the AI model will be compromised and can result in system malfunctions (Papagiannidis et al., 2023).

AI models have to prevent such contingent issues from happening. Their developers’ responsibilities are to improve the robustness of their automated systems, and to make them as secure of possible, to reduce the chances of threats, including by inadvertent irregularities, information leakages, as well as by privacy violations like data breaches, contamination and poisoning by malicious actors (Agbese et al., 2023; Hamon et al., 2020).

AI developers should have preventive policies and measures related to the monitoring and control of their data. They ought to invest in security technologies including authentication and/or access systems with encryption software as well as firewalls for their protection against cyberattacks. Routine testing can increase data protection, improve security levels and minimize the risks of incidents.

Conclusions

This review indicates that more academics as well as practitioners, are increasingly devoting their attention to AI as they elaborate about its potential uses, as well as on its opportunities and threats. It reported that its proponents are raising awareness on the benefits of AI systems for individuals as well as for organizations. At the same time, it suggests that a number of scholars and other stakeholders including policy makers, are raising their concerns about its possible perils (e.g. Berente et al., 2021; Gonzalez et al., 2020; Zhang & Lu, 2021).

Many researchers identified some of the risks of AI (Li et al., 2021; Magas & Kiritsis, 2022). In many cases, they warned that AI could disseminate misinformation, foster prejudice, bias and discrimination, raise privacy concerns, and could lead to the loss of jobs (Butcher & Beridze, 2019). A few commentators argue about the “singularity” or the moment where machine learning technologies could even surpass human intelligence (Huang & Rust, 2022). They predict that a critical shift could occur if humans are no longer in a position to control AI anymore.

In this light, this article sought to explore the governance of AI. It sheds light on substantive regulations, as well as on reflexive principles and guidelines, that are intended at practitioners who are researching, testing, developing and implementing AI models. It clearly explains how institutions, non-governmental organizations and technology conglomerates are introducing protocols (including self-regulations) to prevent contingencies from even happening due to inappropriate AI governance.

Debatably, the voluntary or involuntary mishandling of automated systems can expose practitioners to operational disruptions and to significant risks including to their corporate image and reputation (Watts & Adriano, 2021). The nature of AI requires practitioners to develop guardrails to ensure that their algorithms work as they should (Bauer, 2022). It is imperative that businesses comply with relevant legislations and to follow ethical practices (Buhmann & Fieseler, 2023). Ultimately, it is in their interest to operate their company in a responsible manner, and to implement AI governance procedures. This way they can minimize unnecessary risks and safeguard the well-being of all stakeholders.

This contribution has addressed its underlying research objectives. Firstly, it raised awareness on AI governance frameworks that were developed by policy makers and other organizations, including by the businesses themselves. Secondly, it scrutinized the extant academic literature focused on AI governance and on the intersection of AI and CSR. Thirdly, it discussed about essential elements for the promotion of socially responsible behaviors and ethical dispositions of AI developers. In conclusion it put forward an AI governance conceptual model for practitioners.

This research made reference to regulatory instruments that are intended to govern AI expert systems. It reported that, at the moment there are a few jurisdictions that have formalized their AI policies and governance frameworks. Hence, this article urges laggard governments to plan, organize, design and implement regulatory instruments that ensure that individuals and entities are safe when they utilize AI systems for personal benefit, educational and/or for commercial purposes.

Arguably, one has to bear in mind that, in many cases, policy makers have to face a “pacing problem” as the proliferation of innovation is much quicker than legislation. As a result, governments tend to be reactive in the implementation of regulatory interventions relating to innovations. They may be unwilling to hold back the development of disruptive technologies from their societies. Notwithstanding, they may face criticism by a wide array of stakeholders in this regard, as they may have conflicting objectives and expectations.

The governments’ policy is to regulate business and industry to establish technical, safety and quality standards as well as to monitor their compliance. Yet, they may consider introducing different forms of regulation other than the traditional “command and control” mechanisms. They may opt for performance-based and/or market-based incentive approaches, co-regulation and self-regulation schemes, among others (Hepburn, 2009), in order to foster technological innovations.

This research has shown that a number of technology giants, including IBM and Microsoft, among others, are anticipating the regulatory interventions of different governments where they operate their businesses. It reported that they are communicating about their responsible AI governance initiatives as they share information on their policies and practices that are meant to certify, explain and audit their AI developments. Evidently, these companies, among others, are voluntarily self-regulating themselves as they promote accountability, fairness, privacy and robust AI systems. These two organizations, in particular, are raising awareness about their AI governance frameworks to increase their CSR credentials with stakeholders.

Likewise, AI developers who work for other businesses, are expected to forge relationships with external stakeholders including with policy makers as well as with actors including individuals and organizations who share similar interests in AI. Innovative clusters and network developments may result in better AI systems and can also decrease the chances of possible risks. Indeed, practitioners can be in better position if they cooperate with stakeholders for the development of trustworthy AI and if they increase their human capacity to improve the quality of their intellectual properties (Camilleri et al., 2023). This way, they can enhance their competitiveness and growth prospects (Troise & Camilleri, 2021). Arguably, it is in their interest to continuously engage with internal stakeholders (and employees), and to educate them about AI governance dimensions, that are intended to promote accountable, transparent, explainable interpretable reproducible, fair, inclusive and secure AI solutions. Hence, they could maximize AI benefits, minimize their risks as well as associated costs.

Future research directions

Academic colleagues are invited to raise more awareness on AI governance mechanisms as well as on verification and monitoring instruments. They can investigate what, how, when and where protocols could be used to protect and safeguard individuals and entities from possible risks and dangers of AI.

The “what” question involves the identification of AI research and development processes that require regulatory or quasi regulatory instruments (in the absence of relevant legislation) and/or necessitate revisions in existing statutory frameworks.

The “how” question is related to the substance and form of AI regulations, in terms of their completeness, relevance, and accuracy. This argumentation is synonymous with the true and fair view concept applied in the accounting standards of financial statements.

The “when” question is concerned with the timeliness of the regulatory intervention. Policy makers ought to ensure that stringent rules do not hinder or delay the advancement of technological innovations.

The “where” question is meant to identify the context where mandatory regulations or the introduction of soft laws, including non-legally binding principles and guidelines are/are not required.

Future researchers are expected to investigate further these four questions in more depth and breadth. This research indicated that most contributions on AI governance were discursive in nature and/or involved literature reviews. Hence, there is scope for academic colleagues to conduct primary research activities and to utilize different research designs, methodologies and sampling frames to better understand the implications of planning, organizing, implementing and monitoring AI governance frameworks, in diverse contexts.

The full article is also available here: https://www.researchgate.net/publication/372412209_Artificial_intelligence_governance_Ethical_considerations_and_implications_for_social_responsibility

Tag Archives: AI governance

Opening the black box. Learn about explainable AI tools