Building a Feminist Data Set for a Feminist AI

Caroline Sinders / San Francisco, CA and New York, USA — Okt. 20, 2017

For the past five years, I’ve been researching online harassment and protest inside of social networks, while using machine learning and ethnography as methods of research. For the web residency, I want to further my work that explores technology and activism and create a feminist data set, but by focusing on illustrating, designing, and visualizing the speculative architecture and data needed for this project. The work for the web residency will focus on speculative design materials that will manifest in a written piece with gifs, illustrating how the data can be analyzed by an AI system. This will result in images and text being created, annotated, and existing inside an essay.

My professional and artistic practice is focused on machine learning and artificial intelligence interventions, specifically around mitigating and confronting bias and harm within technology. To remove bias within machine learning, that ‘removal of bias’ has to be manifested into a ‘thing’ to teach or sway the algorithms. To move machine learning and artificial intelligence forward, and to create equity and equality, I want to create a way to digitize that form of equality, by creating a collaborative feminist language data set to feed to these systems. This data set will be creating and defining what are feminist words, interactions, their definitions, their origins and, potentially, their creators. It will exist as a source that can be applied to data and technology, but it will also exist as a living document, of pushing cyber feminism forward. This project is something of practical poetry, it’s creating a missing slice of how technology understands conversations, context, and words. This project was inspired by the work I was doing as a BuzzFeed/Eyebeam fellow studying the alt-right and hate speech. I was looking at so much violent content, and I created a hate speech dictionary to better bring to light the kind of hidden in plain sight but hateful language the alt-right was using in social networks. As I created this dictionary, I realized I was creating the structure to capture large scale hate speech, and that this could gather enough data to train a machine learning system on. As an activist, I wondered what I could create now to counteract this? What kind of art could I make to protest the inequality and lack of equity that exists inside of social networks? Hate speech has various paradigms to it, it has many ‘slices’- it can be generic or specific to a kind political ideology, meaning it can be broadly an ideology like anti-semitic, or referencing a specific kind of hate group, such as naming a hyper specific neo-nazi chapter or referencing a specific political leader who has a specific kind of slogan. If we can analyze hate, can we create systems to engender spaces of equity, of equality, of protest? What does digital protest within AI look like? In a post Tay bot world, the Microsoft AI gone awry and taught to be a neo-nazi by Twitter, can we create an AI that doesn’t harm? Is it possible to create an Anti-Fa AI, and what would that data set look like? Can data collection be art?

The Feminist Data Set project investigates varying methods of data collection to create a feminist data set. The project explores the potential to disrupt larger systems by generating new forms agency, and asks: can data collection itself function as an artwork? Can it act as a form of protest? The creation of a feminist data set will act as a means to combat bias and introduce the possibility of data collection as a feminist practice, aiming to produce a slice of data to intervene in larger civic and private networks. It involves participation and collaboration. Currently, I have already held one workshop, that took place from October 6-8th at SPACE Art and Technology in London where we started building out this collaborative data set. Participants were invited to bring digital feminist content to build the data set to create feminist AI. Objects included “I Love Dick,” Donna Haraway essays like the Cyborg Manifesto, “I Want a Dyke for President,” Peaches lyrics, Mary Shelley’s Frankenstein and a feminist essay on Frankenstein, as well as other documents.

The first part of the methodology for building this data set is to set parameters. What is feminism? Our’s was designed to be intersectional, and post third wave. This meant having a conversation around what ‘intersectional’ meant and what our definition was, as a group. From there, we had to create parameters, lists, and themes that we were interested in exploring, themes were interested in combating, and questions we had. This resulted in creating a post it note thought experiment where the group wrote out ideas, provocations and topics. That data looked like this:

embodied concepts/thematic- gender fluidity, anthroposcene, nutting, a growing community, society issue, honor, structure: vulnerable, structure: speculative, strength, freedom, diversity, truth, feminism & tech, environment, improvement, misogyny, structure: empathetic, psychedelic, virtual spheres, equity, online presence

action: mothering, bossy, manipulation, occupy, reform, debate, multitasking, threaten, aggressive, agenda, disrupt, political implications, challenge, delicate, engender affect, redefine, virtue

ideologies: feminine, informatics of domination, critical, queer theory, social feminism, counterculture, gender (crossed out), visibility, personal is political, transwomen are women, sex worker rights, implicit bias, liberation(?), no meritocracy, professional

content: CAT, miles franklin, mary shelley, saucepan, gherkin, old maid, scarlet woman, objects of desire, strong character, bitch, trolls/trolling, gender studies, cyborg manifesto, misinformation, donna caraway, occupation, chronology, cyber feminism, I want a dyke for president, a rape in cyberspace, rebecca walker, bell hooks, gender violence, weather data, bodies

currently undefined: value, laws + legality, virginity, color, pretty, trendy, anecdotes, territories, confuse

All of the above is a form of data, beyond words. It is things, locations, items, words, verbs, movements.

We then defined our structure, our terms, and wrote a small manifesto to help guide the data collection process. Every participant brought in multiple forms of data, often in text form, and we debated about how to label this data, if it should be included, and what it meant.

In the end, our data set explored dominance as an idea and theme. How do we relate to dominance, and what does dominance mean for data? We created our call to arms:

OUR TERMS AND OUR DATA:
^dominance- as a theme, as a verb, as a noun, and also as a method of control?
the different expressions of dominance in fictions, theory, music and art.

our data ’set’ expands: books of works of fiction and theory, poetry, music, and video, created in the late 1800s, in the 1960s and post the 1980s to now.

OUR INITIAL INTENTION:
to create a data set that provides a resource that can be used to train an AI to locate feminist and other intersectional ways of thinking across digital media distributed online.

OUR INTENTIONS, in Practice, over the course of two days, we created a data set that questions, examines, and explores themes of dominance. Inspired by the cyborg manifesto, our intention to add ambivalence, and to disrupt the unities of truth/information, mediated by algorithmic computation when it comes to expressing power structures in forms of domination, in particularly in relationship to intersectional feminism.

OUR FUTURE INTENTIONS are to create inputs for an artificial intelligence to challenge dominance by engage in new materials and engage with others. We are building, collaboratively a collection.

Through collaboration, we created a new way to augment intelligence and augmented intelligence systems instead focusing on autonomous systems.

OUR MAIN TERMS: disrupt, dominance

MANIFESTO: we are creating a space/thing/data set/capsule/art to question dominance.

For the Web residency, I want talk the bones of the above and start to illustrate, imagine, diagram, and speculate what this data set could be when larger, and what it could be in practice. What will the AI look like, sound like, how will it process images, or glitch art when this data set is complete, many years from now? My first iteration of the Feminist Data Set from SPACE has given me the structure to better imagine what is being built, and it potential manifestations and future iterations.

Caroline Sinders is an artist and designer living in San Francisco, CA. Caroline is a Creative Dissent fellow at the Yerba Buena Center for the Arts. She has formerly held fellowships with Open Lab Fellow BuzzFeed and Eyebeam, the Studio for Creative Inquiry and the International Center of photography. Her current work explores emotional data in machine learning. She’s also a design researcher with the Anti-Harassment Tools Team at the Wikimedia Foundation focusing on how design can mitigate harassment.

Caroline holds a BFA in photography and imaging from NYU’s Tisch School of the Arts as well as MPS from NYU’s Interactive Telecommunications Program, also in Tisch.