brill.com

Regulating Online Hate Speech through the Prism of Human Rights Law: The Potential of Localised Content Moderation

️Ayako Hatano
️Mon Oct 23 2023

Online Publication Date:

23 Oct 2023

Abstract

This article explores whether international human rights standards can provide insights into the following questions: who can and should define what constitutes ‘hate speech’ for online moderation, and how can hate speech be detected and moderated on social media platforms? Regarding who should moderate hateful content, the article underscores the responsibility of social media companies, in reference to international human rights law and principles on business and human rights. This paper advocates for a multistakeholder approach where companies work in partnership with and under the monitoring of state actors, civil society and other relevant stakeholders. Given the complexity of what constitutes hate speech, the article proposes the localisation of terms of service and guidelines of social media companies. This approach, in line with human rights standards, enables the meaningful involvement of local experts and civil society groups in formulating, implementing and monitoring online community rules. Addressing the question of how social media companies can detect and moderate hate speech, the article argues that a globally standardised approach reliant on AI content moderation is limited in detecting contextual nuances of hate speech. Drawing from international human rights tools like the Rabat Plan of Action, the article suggests that social media companies should consider diverse local contexts for better interpretation and effective detection, with qualified human moderators with local knowledge and oversight boards with local expertise. By taking a human-centered, localised approach to content moderation and collaborating with relevant stakeholders, businesses can contribute to creating a digital space that upholds fundamental rights, prevents harm, and encourages inclusivity and diversity.

Abstract

1 Introduction^{^*}

In today’s digital age, social media has become a breeding ground for hate speech and xenophobia, which is sweeping across the world.^¹ The spread of hate, prejudice and incitement to violence, and atrocities against minorities is exacerbated by successive waves of technological advancements that have profoundly transformed informational and communicative realities throughout the world. Users of online platforms are often exposed to information and perspectives that align with their beliefs, values, and ideologies, while being shielded from opposing or diverse viewpoints, often due to algorithmic curation. This results in the formation of filtered bubbles or digital echo chambers leading to polarisation, radicalisation, and extremism online,^² with derogatory and dehumanising expressions being amplified and perpetuated in a negative and pervasive spiral of hateful expressions.

Online inflammatory speech has far-reaching consequences beyond the virtual realm, often leading to real-world harm, including fatal outcomes.^³ For instance, in Myanmar, Facebook was weaponised by military leaders and nationalists to incite ethnic tensions, resulting in brutal violence against Rohingya Muslims during a campaign of ethnic cleansing in 2018.^⁴ Similarly, the ethnic cleansing of and violence against the Tigrayan people in Ethiopia in 2021 was fuelled by hate speech on Facebook.^⁵ Additionally, during the COVID-19 pandemic, online incidents of hate speech related to the virus exacerbated hate crimes against marginalised populations, such as people of Asian descent around the world.^⁶ Global social media companies, including Facebook,^⁷ Twitter, YouTube, and others, have faced criticism for their failure to effectively remove harmful content and for erroneous take-down decisions. These incidents have highlighted the profound threats posed by hate speech to the fundamental rights of individuals and to public goods such as peace and social stability exacerbated in the digital age.^⁸ As a result, the need to strike a balance between protecting freedom of expression and privacy while regulating hate speech has come to a critical crossroads that requires urgent attention.

Online hate speech, while not intrinsically different from expressions of hate in offline settings, is characterised by its anonymity, the speed of spread, itinerancy, permanence, and complex cross-jurisdictional character.^⁹ Unlike the dissemination of hate speech through conventional channels, victims of online hate speech may face difficulties ascertaining the identities of the anonymous perpetrators in the real world.^¹⁰ Anonymous or pseudonymous characters can also easily accelerate the destructive behaviour of people engaging in online activities.^¹¹ Further, online information travels easily across multiple platforms.^¹² Even if a post is taken down or a website is forced to close, online perpetrators can still maintain their existence by migrating to another place, possibly in another jurisdiction with less stringent regulations on hate speech. Thus, incidents of hate speech can grow exponentially online, and in many cases harmful content can remain online indefinitely.

These characteristics of online hate speech make it challenging for governments to regulate online hate speech beyond their limited jurisdictional reaches. It is, in particular, difficult as different countries have diverse hate speech laws rooted in their historical, philosophical, and constitutional traditions.^¹³ Despite efforts for cooperation among multilateral countries, particularly in Europe through criminal law,^¹⁴ regulatory circumvention and attempts to evade legal liability for hateful content persist in numerous jurisdictions. State-centered efforts to regulate through national or regional criminal law have therefore had limited effectiveness in combatting online hate speech.^¹⁵

Furthermore, governments or political leaders who are also users of social media platforms are not always virtuous regulators, and may mobilise hate speech for their own purpose.^¹⁶ Some governments view hate speech legislation as a means to limit speech they dislike and silence their critics.^¹⁷ Even in a case where a government may have benevolent intentions, strong regulation of hate speech may have unintended consequences such as silencing marginalised people. Governments may also face challenges in effectively enforcing national legislation for regulating content on social media and other online spaces due to the technological and legal complexities involved, as well as the resources and budget required to address these issues.

Given the limitations of state-centered national and regional approaches, businesses that operate beyond national or regional borders, in particular big social media companies which also often have a large transnational reach, may offer a more effective means of addressing online hate speech. Codes of Conduct or Terms of Service (‘ToS’) agreements allow social media companies to conduct content moderation by reviewing and monitoring user-generated content on their online platforms to ensure certain standards and guidelines are satisfied. These platforms can regulate content that originates from jurisdictions with little or no regulation on hate speech, take measures such as removing content that violates their community guidelines or cancelling services of users who break the ToS agreements.^¹⁸

This paper emphasises the role of businesses that operate beyond national or regional borders in the effective and legitimate regulation of online hate speech. It specifically focuses on the content regulation practices employed by large social media companies, which rely on a combination of algorithmic tools, user reporting and human review to detect and enforce their content moderation rules. There are several modes of moderation employed by these companies.^¹⁹ Human moderation, or manual moderation, involves humans manually monitoring and screening user-generated content submitted to an online platform, and following platform-specific rules and guidelines. Automated moderation, on the other hand, automatically accepts, refuses, or sends user-generated content to human moderation based on the platform’s specific rules and guidelines. This often involves the use of artificial intelligence (‘AI’) content moderation, which employs machine learning models built from platform-specific data to efficiently detect and handle unwanted user-generated content.

As online hate speech continues to be a pervasive human rights concern worldwide, this paper delves into the potential for international human rights standards to offer effective regulatory guidance for content moderation and oversight on social media platforms. While human rights-based legislative and policy debates on this issue are US- and Euro-centric, the scope of the problem extends beyond these jurisdictions.^²⁰ Hence, it is crucial to broaden the discourse to encompass diverse national, regional and international environments. This study explores whether human rights standards can provide insights into pertinent questions regarding hate speech online, such as who can and should define what constitutes ‘hate speech’ to be moderated, and how it can be regulated on these platforms.

Against this background, this paper examines the role of content moderation carried out by social media platforms in combatting hate speech through an international human rights law lens. Based on a critical review of the centralised and automated approach for content moderation, the paper explores the possibility for community and human-centered content moderation in alignment with international human rights standards. By examining the challenges and potential solutions in regulating hate speech on social media platforms from a human rights perspective, this study aims to contribute to a nuanced understanding of the complex issues involved, and foster a more inclusive and rights-based approach to content moderation beyond areas of national jurisdiction.

2 Content Moderation through the International Human Rights Framework^²¹

2.1 Who Defines the ‘Hate Speech’ to Be Moderated—Social Media Companies as a Prominent Stakeholder in International Law

International law has traditionally been conceptualised to be binding principally on sovereign states, meaning that non-state actors such as transnational companies lack corresponding international legal obligations, despite their significant de facto economic, financial, institutional or lawmaking power.^²² However, in the 21st century, due to the growing role of businesses in both human rights infringements and protection, several international rules and standards have been developed to set forth the ‘responsibilities’ of businesses in human rights protection. The UN Guiding Principles on Business and Human Rights (‘UNGPs’), endorsed by the UN Human Rights Council in June 2011, suggest that ‘[b]usiness enterprises should respect human rights’ and ‘they should avoid infringing on the human rights of others and should address adverse human rights impacts with which they are involved’.^²³ These principles emphasise that businesses’ obligation to uphold human rights supersedes their adherence to domestic laws and regulations concerning the protection of human rights.^²⁴ Private companies are particularly expected to establish policies to address human rights issues, and to identify, prevent, mitigate and account for how they address the adverse human rights impacts of their activities, in partnership with relevant outside actors.^²⁵

Deriving from the UNGPs, David Kaye, the then UN Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression, proposed in his 2018 report that social media companies’ content moderation should be aligned with international human rights standards.^²⁶ He urged social network service providers to review their terms of use to ensure the consistency of those terms with international human rights law.^²⁷ He argued that this would lead to the establishment of a mechanism to regulate user content on these platforms in line with human rights protections.^²⁸ Furthermore, Ahmed Shaheed, the former UN Special Rapporteur on freedom of religion or belief, emphasised in his 2020 report the influential role of technology (‘tech’) companies and social media in combatting hate speech.^²⁹ Shaheed stressed the significance of implementing the international human rights guidance into practical measures, underscoring the importance of these entities in upholding human rights.^³⁰

Alongside these legal experts’ efforts to engage private companies as a stakeholder for combatting hate speech, states and intergovernmental organisations have also committed to tackling the problem, including in 2015 when developing the UN’s 17 Sustainable Development Goals (‘SDGs’).^³¹ The SDGs affirm the right to protection from harassment and the protection of freedom of expression. The SDGs also promote a multistakeholder approach, recognising that the business sector is an important stakeholder in implementing the agenda necessary to support achievement of the SDGs.^³² In June 2019, the UN Secretary-General launched the Strategy and Plan of Action on Hate Speech. The Strategy leads the organisation’s effort to combat discriminatory statements nationally and globally including by leveraging partnerships with tech and social media companies, in particular social media platforms, media outlets, and other private sector and civil society actors.^³³ Further, the UN Guidance Note on Addressing and Countering COVID-19 related Hate Speech, published in May 2020, urges social media and tech companies to jointly work with states to address discriminatory statements.^³⁴ It encourages social media and tech companies to implement strategies to uphold human rights in line with UN guidance and framework documents, including by undertaking assessments of potential risks of human rights infringement associated with the dissemination of COVID-19 related hateful content.^³⁵

It is commendable that service providers have responded to a great extent positively to the need for aligning their rules with human rights law as they offer an internationally recognised framework.^³⁶ Their voluntary commitments, however, may not guarantee full compliance with international human rights principles, as their profit-making priorities may impact their adherence to these standards. In essence, hosting dangerous and socially objectionable content can sometimes be more financially lucrative for these companies than moderating it. In particular, emotionally charged content, including hateful or divisive content, often drives more engagement, keeping users on social media platforms for longer periods of time.^³⁷ This is accelerated by algorithms designed to maximise engagement and keep users on the platform for as long as possible.^³⁸ This increased engagement can lead to higher views of advertisements and engagement (for example, through clicks or subscriptions), thereby generating more revenue for the company hosting the platform.^³⁹ In Myanmar, Facebook’s algorithm is criticised for having proactively amplified anti-Rohingya content.^⁴⁰ According to Amnesty International, Facebook’s function of auto-playing recommended videos contributed to increasing the number of views of anti-Rohingya videos.^⁴¹ The Mozilla Foundation has also reported that YouTube is recommending videos with misinformation, violent content and hate speech, along with scams.^⁴²

Companies often fail to take concrete action in response to social problems caused by their services unless they face significant criticism directly impacting their business. A notable example is Facebook, which only increased the number of human moderators with local language expertise in Myanmar and Ethiopia after severe criticism regarding the atrocities in those countries.^⁴³ YouTube announced improvements to its system for removing inappropriate videos and indicated it intended to increase the number of human reviewers monitoring content.^⁴⁴ However, such action occurred only after strong criticism including the pulling of advertising from the platform by major companies in an effort to avoid being associated with extreme content capable of tarnishing their brand image.^⁴⁵ The danger of business entities prioritising market mechanisms is illustrated in Twitter’s approach towards human rights, which transformed significantly after Elon Musk, a US business magnate and investor, took ownership of the company in 2022. Musk attempted to push forward plans to loosen moderation guidelines, resulting in the exacerbation and proliferation of hate speech on Twitter.^⁴⁶ He also laid off the entire team dedicated to human rights monitoring, prompting the Office of the High Commissioner for Human Rights (‘OHCHR’) to issue an open letter urging the company under his leadership to uphold human rights.^⁴⁷ This serves as a cautionary example of the inherent risks posed by profit-driven approaches to the operation of business entities, particularly when corporate decisions are concentrated in the hands of an individual. In such cases, the impartiality and effectiveness of online content moderation practices are particularly likely to be at risk.

Given this situation, strong concerns remain that engaging the private sector may result in delegating the responsibility for defining hate speech, which leaves arbitrary censorship decisions to private companies.^⁴⁸ However, international human rights principles establish that states and state parties have the obligation to ensure that businesses act in accordance with human rights standards. General Comment 34 of the Human Rights Committee states that ‘the obligation [to respect and protect human rights] also requires States parties to ensure that persons are protected from any acts by private persons or entities that would impair the enjoyment of the freedoms of opinion and expression’.^⁴⁹ The International Convention on the Elimination of all Forms of Racial Discrimination (‘ICERD’) also emphasises the role of private actors in preventing hateful discrimination and demands state parties to implement legislative measures holding private actors accountable.^⁵⁰ The General Recommendation No 35 (‘GR35’)^⁵¹ issued in 2013 by the Committee on the Elimination of Racial Discrimination (‘CERD’) recognises ‘(i)nformed, ethical and objective media, including social media and the Internet, have an essential role in promoting responsibility in the dissemination of ideas and opinions’ and emphasises ‘[i]n addition to putting in place appropriate legislation for the media in line with international standards, States parties should encourage the public and private media to adopt codes of professional ethics and press codes that incorporate respect for the principles of the Convention and other fundamental human rights standards’.^⁵² The UNGPs also highlight the obligations on states under international law to protect against human rights abuse by third parties, including business enterprises.^⁵³ Additionally, in response to a resolution by the Human Rights Council, an intergovernmental working group was established in 2014 to develop an international legally binding instrument to regulate the activities of transnational corporations and other business enterprises in accordance with international human rights law.^⁵⁴ Moreover, recent observations and jurisprudence urge states to regulate the extra-territorial impacts of private business activities.^⁵⁵ Hence, permitting platform companies to unilaterally establish, interpret and enforce ToS in a capricious manner, resulting in the breach of international human rights standards, may also constitute a failure on the part of states to fulfill their positive obligations to safeguard fundamental rights and prevent human rights violations.^⁵⁶

Furthermore, although platform companies assert that they are not legally bound by international human rights law,^⁵⁷ there is growing legal framework and jurisprudence at the national and regional level that supports binding private companies to such standards and holding them accountable for their breach. For example, the European Union (‘EU’) has set rules for the global digital economy which require companies to act in accordance with international human rights standards. The General Data Protection Regulation (‘GDPR’) of 2016, for example, obliges tech firms to be transparent with their data usage practices.^⁵⁸ The EU’s Code of Conduct on Countering Illegal Hate Speech Online, agreed with major Internet intermediaries including Facebook, Microsoft, Twitter and YouTube in 2016, sets guidelines for online platforms to tackle illegal hate speech in a timely manner.^⁵⁹ The EU’s Digital Services Act, which entered into force in November 2022, similarly sets high standards for effective intervention to mitigate harmful speech, due process and the protection of fundamental rights online.^⁶⁰ Even though they are only applicable to services offered to users in the EU,^⁶¹ the impact of these laws may extend beyond Europe. The significant market size of the EU means, for instance, that businesses often cannot afford to ignore EU regulations, and global companies tend to conform to the most stringent standards to avoid harsh penalties and the costs of developing separate business models for different jurisdictions.^⁶² EU laws can also work as a standard setter for international regulations.

Company compliance with international human rights law is, moreover, not solely led by state actors, but also involves global and local civil societies. The role for such groups is supported by various resolutions of the UN Human Rights Council, including Resolution 17/4 on human rights and transnational corporations and other business enterprises, which acknowledges the important roles for, and expertise of, civil societies in promoting and protecting human rights in the context of business activities.^⁶³ The UNGPs also emphasise the vital role of civil society, including non-governmental organisations and human rights defenders, in fostering respect for human rights on the part of businesses, and for holding both states and businesses accountable for transgressions of human rights standards.^⁶⁴

Furthermore, there are emerging initiatives to incentivise platform companies to take action through cooperative frameworks between governments, tech companies and civil society. One such initiative, called the Christchurch Call to Action (the ‘Call’), was established in 2019 by then Prime Minister Ardern of New Zealand and President Macron of France.^⁶⁵ The Call pursues the goal of eliminating terrorist and violent extremist content online that transcends borders and platforms. The Call has gained support from more than 120 countries, tech companies, and civil society organisations, including tech giants such as Amazon, Facebook, Google, YouTube, LINE and Twitter.^⁶⁶ Grounded in support for regulatory or policy measures consistent with a free, open and secure Internet and with international human rights law, the Call supports signatories to share crisis response protocols on an interlinked communications network that enables a rapid and coordinated response to online incidents.^⁶⁷ This kind of collaborative platform, which consists of multiple stakeholders in different sectors, helps to promote an environment where social media platform companies recognise their social responsibility and invest in achieving certain regulatory goals. In this regard, the Santa Clara Principles, developed in 2018 and revised in 2021, are also commendable.^⁶⁸ The Principles emerged from a collaborative endeavour involving human rights organisations, advocates, and academic experts. They provide a set of standards for social media platforms, emphasising the need for meaningful transparency and accountability in content moderation, guided by a human rights-centered approach.^⁶⁹ It is notable that major social media companies have endorsed these principles.^⁷⁰

In summary, international human rights law, extending beyond being mere state obligations in a legal sense, serves as essential legal and policy instruments accessible to businesses, governments, civil society organisations, individuals and other relevant stakeholders. The internationally recognised human rights framework has offered a common, universal language to promote the exchange and partnership among those stakeholders.^⁷¹ It created an environment where businesses are increasingly acknowledging the importance of actively addressing online hate speech. The recognition of the need for proactive action against hate speech reflects the impact it has on individuals and communities and the responsibility of businesses to uphold human rights principles in their operations. By aligning with these standards and collaborating with relevant stakeholders, business can contribute to the effective regulation of online hate speech and the promotion of a safer and more inclusive digital environment. This collective effort is essential for creating a digital space that upholds fundamental rights, prevents harm, and encourages inclusivity and diversity.

2.2 What Constitutes ‘Hate Speech’—A Human Rights Law Approach for Balancing Regulation of Hate Speech with the Protection of Freedom of Expression

What constitutes hate speech is one of the most controversial and complex questions in international law. International law and its subsets do not provide a definition of hate speech but have developed rules and standards to regulate speech which incites discrimination, hostility or violence. The Convention on the Prevention and Punishment of the Crime of Genocide prohibits ‘direct and public incitement to commit genocide’.^⁷² Article 20(2) of the International Covenant on Civil and Political Rights (‘ICCPR’) prohibits ‘any advocacy of national, racial or religious hatred that constitutes incitement to discrimination, hostility or violence’.^⁷³ In terms of racial hate speech, the ICERD in article 4 prohibits ‘dissemination of ideas based on racial superiority or hatred, incitement to racial discrimination, as well as all acts of violence or incitement to such acts against any race or group of persons of another colour or ethnic origin’.^⁷⁴ These provisions can guide platform companies in setting their ToS for moderating hate speech.

An international human rights-based approach toward content regulation would help social media platforms to strike a balance between the protection of freedom of expression and the regulation of hate speech. Article 19(2) of ICCPR guarantees freedom of expression in any form and through any media of choice.^⁷⁵ Article 19(3) states that this freedom may be subject to certain restrictions that are provided by law and are necessary ‘(a) for the respect of rights or reputations of others’ and ‘(b) protection of national security or public order, or public health or morals.’^⁷⁶ This means that regulations of speech must meet three conditions: (a) legality, (b) legitimacy, and (c) necessity and proportionality.^⁷⁷ GR 35 of CERD on combating racist hate speech provides a detailed interpretation of articles 4, 5 and 7 of ICERD to outline a number of actions to effectively limit racially hateful statements while ensuring individuals have the right to express their opinions.^⁷⁸ Notably, in an effort to strike a balance between article 19 and article 20 of the ICCPR, the OHCHR in February 2013 launched the Rabat Plan of Action on the prohibition of advocacy of national, racial or religious hatred that constitutes incitement to discrimination, hostility or violence.^⁷⁹ The Rabat Plan sets a high threshold for restricting the freedom of speech, requiring an analysis of six elements to identify whether prohibitions on speech are consistent with this principle, namely: (1) the social and political context, (2) status of the speaker, (3) intent to incite the audience against a target group, (4) content and form of the speech, (5) extent of its dissemination and (6) likelihood of harm, including imminence.^⁸⁰ In his 2019 report, Kaye called on companies, in regulating incitement, to consider the six factors suggested in the Rabat Plan.^⁸¹

However, according to international law scholar Evelyn Aswad, current regulatory measures on hate speech by social media platforms is primarily driven by economic interests, rather than human rights concerns.^⁸² In her study, Aswad examined Twitter’s approach to hate speech and freedom of expression and explored the implications of aligning company codes of conduct with the UNGP and with international human rights law. Her research reveals that Twitter’s rules on hate speech do not fully adhere to the tripartite test outlined in the ICCPR, particularly in terms of the requirement that restrictions on speech should be clearly defined and not ambiguous.^⁸³ Thus, the effort to make sure that the definition of hate (or regulated) speech within a company’s ToS in alignment with international human rights standards holds significant importance.

To ensure that the definition of hate speech in a company’s ToS takes into account international human rights standards such as the six factors identified in the Rabat Plan, it is crucial for these companies to involve local human rights experts and civil society groups in the development of their ToS. These individuals and groups have a better understanding of the ‘social and political context’ in the community and can help create localised ToS which take into account the unique factors of the community. Developing localised ToS with the active and meaningful participation of local human rights experts and civil society groups in a transparent and inclusive manner can help ensure that ToS align with international human rights standards. Such development of ToS can increase the expertise, transparency, accountability, representation, and legitimacy of the resulting ToS.^⁸⁴ Meaningful participation in decision-making processes that affect one’s life is fundamental to international human rights principles.^⁸⁵ Thus, it is crucial for individuals and communities to be actively involved in the formulation, implementation, and monitoring of the rules of the online community that affect their rights and well-being.

In summary, international human rights law and principles can provide meaningful guidance for platform companies to determine the actual content or categories of hate speech that should be moderated in their ToS. However, this guidance raises further questions, such as how to effectively detect and address such speech.

2.3 How to Detect and Moderate ‘Hate Speech’—Content Moderation through a Human Rights Approach

Although social media companies can establish rules for moderating content on their platforms through their ToS, they cannot provide a list of hateful content that applies globally. This is due to the nuanced and multifaceted nature of hate speech, which varies across historical, cultural, legal, political, economic and linguistic contexts, as well as the power imbalance between speakers and targeted groups. Political science scholar Alexandra A. Siegel underscores the challenges involved in identifying hate speech when it is expressed through nuanced language aimed at denigrating a particular group.^⁸⁶ Such subtleties often elude casual observers, making detection even more difficult. This issue is particularly pronounced in the realm of online communication, where speech patterns are rapidly evolving and can adopt a highly specialised form. Siegel also emphasises that online communities frequently employ code words as substitutes for explicit racial slurs, further adding complexity to the detection of hate speech.^⁸⁷ Thus, to identify and address hate speech effectively and legitimately, it is crucial to consider the factors outlined in the Rabat Plan.^⁸⁸

While addressing the six factors outlined in the Rabat Plan is crucial for properly regulating hate speech, how to detect hate speech in interpreting and applying a company’s ToS in a concrete case remains a challenge. This section examines how social media companies can detect and moderate online hate speech in a more transparent, legitimate and effective way that aligns with international human rights standards, highlighting some challenges in global and automated approaches, and the possibility of localised, human-centered oversight mechanisms.

2.3.1 Challenges in Global Content Moderation

In response to the growing demand for more transparent, legitimate and effective content moderation, Facebook created a quasi-independent Oversight Board (‘Board’), which is funded by a trust established and financed by Facebook.^⁸⁹ The Board consists of a dozen members, including legal experts,^⁹⁰ and is responsible for making ‘binding’ decisions on content moderation cases for Facebook and Instagram.^⁹¹ On the one hand, the Board has been met with excitement and praise as a positive step for social media governance.^⁹² On the other hand, it is criticised as being intended to earn Facebook public goodwill and deflect blame for lax content moderation.^⁹³ Its decisions in practice have been more inclined towards upholding freedom of speech and expression. This in part is due to its limited remit, but also due to the fact that it includes powerful voices of constitutional lawyers from the US, a state home to the most speech-protective jurisprudence in the world.^⁹⁴

Concerns with platform-controlled content moderation are linked to the unavoidable issues associated with centralised approaches applicable to content moderation across multinational and multicultural user bases. Social media companies operate across jurisdictions where national legislative systems take different approaches to freedom of speech and the interpretation of ‘hate speech’ based on their legal, political, historical, cultural, societal and linguistical contexts. Because of this, it is not easy to reach a shared understanding of what constitutes hate speech or grounds for the removal of content, and when and how anti-hate speech interventions could and should take place. Given the volume of reports, most of which require complex and careful consideration due to the differing socio-cultural contexts in which they arise, and the limited capacity and number of board members and staff, it is almost impossible for a centralised body, including the so-called ‘Supreme Court of Facebook’, to deal effectively with all global complaints related to speech on a platform.^⁹⁵ Thus, globally-centralised content moderation may not be feasible, at least in the foreseeable future, as no entity is currently capable of taking up this task outside major social media platforms.

The limitations of global approaches to content moderation become particularly pronounced in relation to the moderation of non-English language content. Facebook supports posts in 110 languages; however, it only has the capacity to review content in 70 languages.^⁹⁶ While the number of removals of hate speech-containing content more than doubled globally in 2020, data shows that almost 90% of Facebook’s expenditure on misinformation is spent on English-language content.^⁹⁷ In regards to Twitter, for example, despite Japan having the second largest Twitter user population, Japanese content on Twitter has been exposed to lax moderation.^⁹⁸ Twitter’s content moderation system, which is built on AI, does not flag a number of taunting words, which a Japanese speaker can note at a glance as being highly problematic, or harmful.^⁹⁹ Some of those words which can constitute hate speech when used in connection with certain words referring to characteristics of people or groups of people, thus prompting the need for more careful, context- based scrutiny.

The unequal allocation of resources for content moderation was also evident in the case of Facebook’s failures in content moderation in Myanmar and Ethiopia. The incitement of violence or even genocide by prominent figures, in clear violation of international human rights law and Facebook’s community standards, remained visible to millions of people on the platform. This was primarily due to a shortage of content moderators with a comprehensive understanding of the local socio-linguistic contexts.^¹⁰⁰ According to research conducted by an NGO, Facebook’s ability to identify hate speech in the major languages of Ethiopia is severely lacking.^¹⁰¹ The same issue was revealed in terms of Myanmar.^¹⁰²

Incorrect or inappropriate content removals are also frequent in Facebook’s moderation of content in the Arabic language, even though Arabic is among the most common languages used on Facebook’s platforms, with millions of users worldwide.^¹⁰³ Such difficulties arise in part because Arabic dialects are unique to each region and country, with their vocabularies influenced by different historical backgrounds and cultural contexts. They pose challenges to both human moderators and automated moderation systems, which are unable to catch harmful content in different dialects requiring interpretation in localised contexts.^¹⁰⁴

2.3.2 Challenges in AI-Based Automated Content Moderation

Automated AI content moderation has been successful to some extent in the timely identification and removal of hateful content. Major social networking services increasingly rely on cost-saving ‘machine learning’ systems for content moderation.^¹⁰⁵ However, the technology is not yet capable of identifying nuanced elements, such as the six factors outlined in the Rabat Plan, in certain contexts of speech. Despite progress in AI technology, social, linguistic, legal and political challenges will persist. One of the challenges is that AI moderation is criticised for its inability to read all languages used on social media platforms (including major languages) and its difficulties in interpreting words in context, especially with newer types of content.^¹⁰⁶ Furthermore, automated solutions do not yet exist to capture all hate speech conveyed through various forms of expression, including voices, images, cartoons, memes, art objects, gestures and symbols.^¹⁰⁷ Additionally, hateful expressions are dynamic and evolving, and can use tropes such as slang, circumlocutions, and sarcasm, which are difficult for AI to accurately capture.^¹⁰⁸ AI content moderation can also be biased due to inadequacies in the dataset’s representativeness or a lack of social diversity in the group of annotators.^¹⁰⁹ This can result in a failure to comply with non-discrimination standards outlined in international human rights law.^¹¹⁰

The limitations of AI technology in content moderation highlights the need for human judgement to enable more nuanced approaches.^¹¹¹ While both AI and humans are capable of making mistakes or biased judgements, AI decision-making is black-boxed, while human errors can be examined and analysed for future solutions. To enable more effective and transparent content moderation, there is a need to recruit diverse and well-trained human content moderators who understand not only universal core principles of human rights, but also how content can be perceived in different demographics, cultures, geographies and languages. To enhance the accurate and timely detection of hate speech, social media companies cannot only rely on human content moderators, but also need to prioritise reports from human ‘trusted flaggers’ who possess expertise in identifying and reporting content that is either unlawful or violates the ToS.^¹¹²

Implementing this approach will involve investing in the recruitment of skilled human content moderators who possess local knowledge and language proficiency, enabling them to make context-sensitive judgements. Investment must also be made in the training of moderators to build their capacity to identify hate speech, including, for example, forensic investigation based on human rights principles and context-awareness.^¹¹³ With an increased number of qualified human content moderators, automated systems could synergise with human-centered content moderation efforts. Under such a model, AI systems could swiftly flag potentially harmful content for review by human moderators rather than automatically removing it. This approach requires significant investment from platform companies, but it might eventually lead to better platform performance, user engagement and satisfaction.^¹¹⁴

2.3.3 The Possibility of a Local Oversight Mechanism of Content Moderation—Process of Participation

The previous sections have highlighted the limitations of current content moderation mechanisms from a human rights perspective. The global, one-size-fits-all, automated approach has proven inadequate in dealing with the highly contextualised nature of hateful content. In order to swiftly and accurately identify hateful content, and to implement effective measures for its reduction and prevention of its spread, a more systematic approach is essential. This can be achieved through the establishment of a local oversight mechanism for content moderation.^¹¹⁵

The proposed model for content moderation involves collaboration between different bodies at the global, regional, national and community levels. Local oversight boards would be independent and consist of representatives from various groups such as social media companies, locally trained content moderators, local NGOs focused on freedom of speech and hate issues, media and journalists, academics, youth and social minorities who are at risk of being victims of hate speech. These board members would meet regularly, either online or offline, to review individual content moderation decisions made by social media platforms, in particular focusing on content related to the people in their community, if not necessarily generated within the jurisdiction where the local oversight board is located. Upon receiving a report requiring a nuanced local approach to content moderation, the global team would swiftly reach out to a local oversight board at the national, regional, or community level, to review the content on the basis of international human rights standards and using their expertise in local contexts. Such groups could also be tasked with analysing harmful or derogatory speech on social media and updating local community sub-guidelines. Those guidelines may include examples of hateful speech or expressions specific to their local context. This knowledge in turn could be used to improve AI-based content moderation systems. By employing this mechanism, platform companies could ensure that the security of individuals targeted or adversely affected by online hate speech is given due consideration in the course of moderation.

The nuanced analysis, advice, and decisions of local oversight boards would enable platform companies to take differentiated responses based on the scale and depth of the issues. This would ensure a full range of responses to hate speech beyond the traditional binary approach of allowing or removing the content at issue, such as disabling or limiting monetisation, giving lower ranks and visibility to dangerous content, blocking access from certain geographical areas, or promoting messages against hate speech.^¹¹⁶ This approach would help platform companies meet the necessity and proportionality test under Article 19 of ICCPR, allowing for bespoke and context-sensitive solutions.

Social media companies could also take remedial measures, including education and training of those involved, to increase awareness of relevant standards. The sensitisation of community members would also enable local advocates and civil societies to effectively mobilise counter speech, in particular against coordinated hate movements.^¹¹⁷ They could, for instance, apply a spectrum of counter-interventions to lower the visibility of hate speech, provide support and protection for affected individuals or groups, and prevent hateful or derogatory speech from escalating into something more dangerous, such as incitement to discrimination, hostility, and violence.^¹¹⁸ Presenting an alternative narrative in various forms, from media campaigns to artistic expression, could help to challenge hate narratives with humour, warnings of consequences, and empathy, rather than responding with more offending speech in the opposite direction.^¹¹⁹ This approach levels the field of expression and is more likely to result in de-radicalisation and the peaceful resolution of conflicts.^¹²⁰

The proposed localised monitoring system would not only improve the identification and management of harmful content, but would also enhance access to effective remedies for human rights violations, a crucial component of the UNGPs.^¹²¹ By establishing a locally based transparent decision-making process, the system would enable a more accessible and fair appeals mechanism for users whose content has been removed or blocked, or whose report was not heard. Such a system is currently lacking on many social media platforms. While judicial review remains an important last resort option, the process can be lengthy and costly, and may vary depending on local legal and political systems. Therefore, social media companies should develop remedial systems with local oversight boards, including to document prior cases in order to achieve a balance between regulating hateful content and upholding freedom of speech and expression in a transparent and accountable manner. Such measures would also prevent arbitrary decisions by platform owners.

In sum, local oversight mechanism could contribute to making content moderation a human-centered and localised means of combatting hate speech, based on international human rights standards. The proposed local oversight boards comprised with multiple local stakeholders and experts may have the functions of research, analysis, advice, and content review as well as engaging relevant people in localised ToS development processes and enhancing the accessibility of remedial processes. Thus, the local oversight boards would act as local think tanks, content moderation reviewers, early warning mechanisms and educators for local communities with the aim of strengthening the social media platforms’ ability to identify and address the risks of hate speech. Such coordinated albeit localised approaches can also counter the drawbacks of centralised and automated content moderation.

Establishing these functional oversight boards may entail a substantial investment from social media companies. Nevertheless, such an investment is indispensable to ensure that platforms prioritise human rights and avoid contributing to social issues. While certain platforms may prioritise profits over social responsibility, it is crucial to acknowledge that neglecting human rights concerns can lead to detrimental consequences for their business and invite severe criticism. Therefore, investing in local oversight boards and remedial processes can ultimately prove advantageous for both the platforms and their users.

3 Conclusion

Online hate speech poses significant threats to the lives and dignity of targeted individuals and groups, with potential real-life consequences and the ability to harm public goods in society. The unique characteristics of online hate speech, such as anonymity, speed of dissemination, itinerancy, permanence, and cross-jurisdictional reach, present unprecedented challenges for states to regulate online hate speech, particularly on social media platforms.

Given that online hate speech continues to be a pervasive human rights concern worldwide, this paper explored whether human rights standards can provide insights into pertinent questions regarding hate speech online, including: who can and should define what constitutes ‘hate speech’ to be moderated, and how it can be detected and moderated on social media platforms.

In relation to who should define hate speech, the article highlighted the role of private social media companies as content moderators, recommending that such definitions be developed by reference to the international law and human rights framework, particularly recently developed principles on business and human rights. It moreover recognised the responsibility of social media companies as significant stakeholders for upholding human rights in content moderation activities. The approach to regulation would ensure that companies can take action against hate speech online whilst adhering to international human rights standards and working in partnership with and under the monitoring of state actors, civil society, and other relevant stakeholders.

Given the complexity of what constitutes hate speech, and its susceptibility to legal, political, socio-economic, historical, cultural, linguistic and psychological contexts, the paper proposed localising ToS and guidelines. This approach, in alignment with human rights standards, aims to ensure the legitimacy and efficacy of content moderation. It also facilitates the meaningful engagement of local experts and civil society groups in formulating, implementing, and monitoring online community rules that directly impact their rights and well-being.

This paper also addressed the question of how social media companies can detect and moderate hate speech in a transparent, effective, and legitimate manner. It argues against a globally standardised approach that relies solely or largely on AI content moderation, as it fails to detect contextual meaning and account for the localised nuances of hate speech. Instead, the paper draws insights from international human rights tools like the Rabat Plan of Action, emphasising the importance of social media companies being aware of diverse legal, political, historical, socio-economic, cultural, linguistic, and psychological contexts to interpret and detect hate speech more effectively. The paper advocates for a human-centered and localised approach to content moderation, which involves collaboration between social media companies, local experts, civil society groups, and the government. By considering the expertise and perspectives of these stakeholders, more effective and legitimate measures can be implemented to combat hate speech. This approach encourages the application of broad range of moderation tools and various counter-interventions to address hate speech in a proactive, constructive, and nuanced manner.

This paper has examined the responsibility of social networking companies to moderate hate speech on their platforms through the lens of international human rights law. It highlighted the necessity for more legitimate and nuanced online content moderation, which requires these companies to contribute to achieving greater transparency and accountability at global and local levels through a multistakeholder approach. The roles of relevant actors and stakeholders, including social media and tech companies, governments, civil society organisations, international and regional organisations, journalists, academic institutions, and global and local experts in this collaborative project, require further discussion. There is also a need to clarify the application of international law to non-state actors and address the issue of fragmentation in local oversight board decisions. However, it is crucial to acknowledge the vital role of businesses as key stakeholders in the effort to moderate and regulate hate speech. As technology continues to rapidly evolve, ongoing research and collaboration between stakeholders are pressing needs to ensure the effective and appropriate moderation of hate speech on social media platforms in alignment with international human rights standards.

Acknowledgements

The draft of this paper was presented at the Four Societies Conference in August 2022 organised by the American Society of International Law (ASIL), the Canadian Council on International Law (CCIL), the Australian and New Zealand Society of International Law (ANZSIL) and the Japanese Society of International Law (JSIL). The author expresses gratitude for the opportunity to present this work in the conference and acknowledges the valuable thoughts and comments received on an earlier draft of this article. Special appreciation is extended to Atsuko Kanehara, Charles-Emmanuel Côté, Cymie Payne, Donald Rothwell, Gib van Ert, Gregory Shaffer, Karen Scott, Keiko Ko, Kinji Akashi, Koji Teraya, Mark Agrast, Matthew Schaefer, Shuichi Furuya, and Wes Rist. The author also extends her sincere appreciation to Diyi Liu, Ikuru Nogami, Kate O’Regan, Lisa Hsin, Richard Mackenzie-Gray Scott and Takashi Shimizu for their kind support, comments, and insights. The author is also immensely thankful to the editors of the Australian Year Book of International Law, Esmé Shirlow and Don Rothwell; the publication assistants, Jessica Taylor, Ava Cadee and Harry Fenton; and the Student Editors, Mark Bell, Himani Khatter, Tamara Richardson, and Ruby Wong. The author is also thankful to the two anonymous peer reviewers and all collaborators for their valuable comments and revisions.

	All Time	Past 365 days	Past 30 Days
Abstract Views	0	0	0
Full Text Views	6477	4855	321
PDF Views & Downloads	7554	5416	464

Online Publication Date:

23 Oct 2023

Abstract

1 Introduction^{^*}

2 Content Moderation through the International Human Rights Framework^²¹

2.1 Who Defines the ‘Hate Speech’ to Be Moderated—Social Media Companies as a Prominent Stakeholder in International Law

2.2 What Constitutes ‘Hate Speech’—A Human Rights Law Approach for Balancing Regulation of Hate Speech with the Protection of Freedom of Expression

2.3 How to Detect and Moderate ‘Hate Speech’—Content Moderation through a Human Rights Approach

2.3.1 Challenges in Global Content Moderation

2.3.2 Challenges in AI-Based Automated Content Moderation

2.3.3 The Possibility of a Local Oversight Mechanism of Content Moderation—Process of Participation

3 Conclusion

Acknowledgements

Content Metrics

	All Time	Past 365 days	Past 30 Days
Abstract Views	0	0	0
Full Text Views	6477	4855	321
PDF Views & Downloads	7554	5416	464

Regulating Online Hate Speech through the Prism of Human Rights Law: The Potential of Localised Content Moderation

Abstract

Abstract

1 Introduction*

2 Content Moderation through the International Human Rights Framework21

2.1 Who Defines the ‘Hate Speech’ to Be Moderated—Social Media Companies as a Prominent Stakeholder in International Law

2.2 What Constitutes ‘Hate Speech’—A Human Rights Law Approach for Balancing Regulation of Hate Speech with the Protection of Freedom of Expression

2.3 How to Detect and Moderate ‘Hate Speech’—Content Moderation through a Human Rights Approach

2.3.1 Challenges in Global Content Moderation

2.3.2 Challenges in AI-Based Automated Content Moderation

2.3.3 The Possibility of a Local Oversight Mechanism of Content Moderation—Process of Participation

3 Conclusion

Acknowledgements

Abstract

1 Introduction*

2 Content Moderation through the International Human Rights Framework21

2.1 Who Defines the ‘Hate Speech’ to Be Moderated—Social Media Companies as a Prominent Stakeholder in International Law

2.2 What Constitutes ‘Hate Speech’—A Human Rights Law Approach for Balancing Regulation of Hate Speech with the Protection of Freedom of Expression

2.3 How to Detect and Moderate ‘Hate Speech’—Content Moderation through a Human Rights Approach

2.3.1 Challenges in Global Content Moderation

2.3.2 Challenges in AI-Based Automated Content Moderation

2.3.3 The Possibility of a Local Oversight Mechanism of Content Moderation—Process of Participation

3 Conclusion

Acknowledgements

Content Metrics

1 Introduction^{^*}

2 Content Moderation through the International Human Rights Framework^²¹

1 Introduction^{^*}

2 Content Moderation through the International Human Rights Framework^²¹