Gaps & Opportunities for Inclusive Multilingual Data Science

By Yanina Bellini Saibene in English Community Education

February 13, 2022

On February 11 I participated in a panel called “Gaps & Opportunities for Inclusive Multilingual Data Science” a Fireside Chat - Hosted by The Turing Way

The abtract for the panel said

Panellists for this fireside chat come from diverse backgrounds and experiences across academia, industry and communities in research and data science where the translation aspect is one of their main focus areas. As members of multiple open sciences and research groups across the Global South and North, they will explore a range of questions including:

  • why we should invest in translation work?
  • what challenges do researchers in multilingual communities face and how do they address them?
  • what danger of exclusion or imbalanced representation from different communities exists if we only promote resources written in or translated from English?
  • how we can explicitly support different forms of knowledge production and exchange across multilingual communities?
  • how to emphasise the importance of contextualisation rather than “just translation” of resources in research and data science?
  • what technical as well as ethical requirements we must consider when working in the translation and multilingual research space?

They will bring their combined perspectives as community builders, educators, trainers, researchers and engineers working on projects at both local and global scales.

I shared the live panel with David Pérez-Suárez, Camila Rangel-Smith, Bobby Shabangu y Batool Almarzouq. Anelda van der Walt and Malvika Sharan chair the panel.

Here is a summary of my participation on the panel:


My name is Yanina Bellini Saibene, I am a Latin American cis woman, my native language is Spanish and I learned English as a consequence of wanting to learn to code. I come from a lower middle class family and we couldn’t afford the cost of learning other languages. But libraries and free school were my tool to learn. I have a degree in computer science and a master’s degree in data mining. I am a scientist. I am also a teacher. And I am not a professional translator. However I am part of several communities of practice where we have done collaborative and voluntary translations from English to Spanish. For example, guides to participate in the communities, lessons on programming tools, and books, such as Teaching Tech Together and R for Data Science. We have created R packages where the documentation and data are in Spanish and have encouraged other communities to accept packages and papers in our language for review. I have also been part of the organization of trilingual conferences, such as LatinR and conferences where we accepted talks in any language, such as useR! 2021. I co-founded MetaDocencia (which means MetaTeaching in English), where we create teaching materials in Spanish and are translating them into Portuguese. Due to my personal and community path, I consider that having the material in native language is a fundamental and necessary (but not sufficient) condition to give access to knowledge and to allow the creation of new knowledge. Translations are one of the tools to achieve this. Another tool is to have these conversations. Many thanks to The Turing Way for inviting me to be part of it.

Tell us about the importance of contextualisation rather than “just translation” of resources in research and data science?

Translating is not just taking words and writing them in another language. In collaborative translation processes we have had to make decisions and reach agreements, for example: Which voice we will use? Academic, conversational?, we specify what dialect or regional variation of the language is being used. Which technical terms we will translate and which we will not.

In Spanish we have to decide how we will handle gender. We shoukld use non-sexist language, inclusive language?. We decide to be gender neutral, adjusts wording to avoid having to assign a gender but when we can’t we use feminine-masculine or masculine-feminine splits. For consistency throughout the text and to show that there is no particular hierarchy, it alternates the use of feminine or masculine between chapters, with the use being consistent throughout each chapter.

For bibliographic references, the original title/name is left in English with the words in italics; a translation of the title in Spanish is added in parentheses. When a reference to a Wikipedia entry, a Carpentries lesson, or other online resource occurs, the Spanish version is added to the link if it exists. If one cannot be found, the reference is left in English.

We also translate figures and diagramas, add video subtitles and even record new videos in Spanish with subtitles in English to ilustrate some teaching practices, for example.

And some more, we also change examples choosing regional options like book, cities, rivers, sports, people names, songs, name of variables and constans in code. The phrases, analogies and humor can be very tricky and to make it work you may have to change completely from the original.

With each of these decisions we are giving a message to our readers, not just the specific content of the book. The good thing about doing it as a community is that we can learn from previous experience and reuse and improve the agreements reached in those experiences.

Can you give us some examples

We change some analogies for regional examples. We choose books for Brasilian authors and change coffe with regional drinks like tereré or mate. We also use regional songs and limeriks by María Elena Walsh in coding examples. We change the names of variables and constants in the code examples. We use regional names when the text mention people.

How was the author’s participation and was it important?

In the case of the R for Data Science book I was a translator and reviewer and had no interaction with the author. In the case of Teaching Tech Together, the author generated a Slack space and was available at all times for queries, clarifications, changes, etc. The other important detail is that the material has a license that allows derivative works, such as a translation, and that it has the rights to be published in other languages. Without this detail, translations cannot be made.

Where should the funders or us as researchers be investing in multilinguality in data?

I hope we can have translations from any language into English, we have a lot to share and teach to those who do not speak our languages. I hope we can also have funding initiatives in more than one language from the beginning, like LatinR. And I would love an infrastructure that allows people with little knowledge of computer tools to participate. I mean, we leave a lot of people out if they have to know git to be able to contribute, for example.

You can see the video in my talk session.

Posted on:
February 13, 2022
6 minute read, 1121 words
English Community Education
MetaDocencia Community Education
See Also:
Proyecto 2 - Las Estrellas del Universo R
Project 2 - The Stars of R-Universe
Project 1 - rOpenSci's Code of Conduct and Code of Conduct Committee