LING 575 — Ethics in NLP: Including Society in Discourse and Design

LING 575 — Ethics in NLP:

Including Society in Discourse & Design

LING 575 — Ethical Considerations in NLP

Course Info

- Lecture: Thursdays, 3:30-5:50 in SAV 130
- Zoom Link: https://washington.zoom.us/my/lingzoom

Instructor Info

Ryan Georgi
Office Hours: Wednesdays 12:00-2:00.
Office: GUG 418-D

Bibliography

Syllabus

Description

As systems involving NLP technology become more and more prevalent in people’s lives, it is more important than ever to consider the societal impacts, both short and long-term of our research in academia, and implementation of systems in industry. As much of the technology developed for machine learning and NLP becomes further democratized, it is also no longer only trained linguists who are implementing systems that rely upon NLP.

The goal of this course is to better understand the ethical considerations in the field of NLP, both in our own conduct, and how to communicate these issues both inside and outside the research community. Additionally, since morality and ethics arise from societies, we will look at how to treat science communication as a bidrectional process, listening to the concerns of various stakeholders and using these external perspectives to inform our work.

We will start with foundations in ethics, and then move to the current and growing research literature on ethics in NLP and allied fields, before considering specific NLP tasks, data sets and training methodologies through the lens of the ethical considerations identified. Course projects are expected to take the form of a term paper analyzing some particular NLP task or data set in terms of the concepts developed through the quarter and looking forward to how ethical best practices could be developed for that task/data set. In particular, I hope to find answers to the following guiding questions over the course of the term:

What ethical considerations arise in the design and deployment of NLP technologies?
Which of these are specific to NLP (as opposed to AI or technology more generally?)
What best practices can/should NLP developers deploy in light of the ethical concerns identified?
What is the best way to communicate effectively with different stakeholders, and what are our responsibilities as listeners?

Note: To request academic accommodations due to a disability, please contact Disability Resources for Students , 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from DSR indicating that you have a disability which requires academic accommodations, please present the letter to the instructor so we can discuss the accommodations you might need in this class.

Grades

KWLA Paper (~2 pages)	15%
Weekly Reading Check-in + Discussion Participation	15%
SciComm Assignment (1-2 Pages)	20%
Term Project (6-8 Pages)	50%

Schedule of Topics and Assignments (subject to change)

Date	Topic	Reading	Due	Slides
1/10	Introduction, organization Why are we here? What do we hope to accomplish?	Hovy and Spruit 2016 plus at least 1 other papers/articles listed under Overviews/Calls to Action (or just one, if you pick something particularly long)		1 – Intro
1/16			KWLA papers: K & W
1/17	What is Ethics? Philosophical foundations	2 items from Philosophical Foundations, at least one of which comes from an author whose perspective varies greatly from your own life experience. Be prepared to discuss the following: What is the main thesis of the reading? What is their definition of ethics? In what ways do they contrast their definition with others? How does this reading relate to ethics in NLP?		2 – Philosophical Underpinnings
1/24	Value Sensitive Design	Read any two other papers from Value Sensitive Design. Reading questions: How could you apply VSD theoretical constructs and methods to the NLP tasks you are most concerned with? Prepare two or three concrete examples. How do VSD theoretical constructs and methods build on or provide counterpoint to what you read in Philosophical Underpinnings? In addition, for an NLP project you are interested in: Make a list of the direct and indirect stakeholders. Identify how each stakeholder group you identify might benefit or be harmed by the technology you are considering. For those who choose the paper by Nathan et al. on value scenarios, write a value scenario like those illustrated in the paper for the technology you are interested in investigating.		3 – VSD
1/31	Accountability: Institutional and Professional Incentivization	Read 1-2 papers from the Human Subjects/Professional code of Conduct Section and Hal Daumé III’s proposed ethics guidelines for the ML and NLP communities		4 – HSD/Professional Ethics
2/7	Science Communication	Read 2 Papers from the Science Communication section of the bibliography, and consider the following questions (also in the W5 Reading Questions): What specific impediments/problems does the author discuss with regards to science communication? What types of alternatives, or best practices does the author suggest? Give an example of one NLP application that you have heard/seen people misunderstand, or express apprehension about not understanding		5 – SciComm
2/14	Data Collection and Human Subjects: Social Media & Crowdsourcing	Read two Papers (or Book Chapters) From the Social Media & Human Subjects section of the bibliography. W6 Reading Questions are here, as well as below: What kind of “disruption” to the methods of data collection does the paper address? If one is not directly addressed, come up with an example related to the concerns of the paper. (e.g.: “publishing” your words on Twitter vs. in 1970; Informed consent vs. ToS) Are there any specific issues that you encountered in the paper that are relevant to the principles of the Belmont Report (Links to an external site.)Links to an external site.? (Respect for Persons, Beneficience, Justice) Which ones? (e.g. Does this data collection potentially focus on a protected group? A coerced one?) Give one suggestion of a “best practice” or rule of thumb that might be used to improve the data collection in the domain discussed in the paper.	Term Paper Proposal Due 2/18	6 – Social Media & Crowdsourcing
2/21	How Bias in NLP Data Arises; Treating Language Data as Ground Truth	Read two papers from the Bias — How It Emerges; Language Data as Ground Truth section of the bibliography and fill out the W6 Reading Q’s. Think about the following questions while reading: If the paper describes an NLP application: What is the objective that the task is seeking to achieve? What biases (what variables) does it end up encoding in the model? How were these latent variables encoded in the training data? If the paper is a more theoretical study of language use: What latent issue(s) do(es) the author describe as being contained within language data? What takeaways for researchers working with language data do these issues present?	SciComm Assignment Due 2/25	7 – Bias
2/28		Read two papers from the Addressing Bias – Algorithmic Fairness section.		8 – Algorithmic Fairness
3/4		Read two papers from the Abusive Language section.	Term Paper Outline Due	9 – Abusive Language
3/11		Read two further papers of interest from the other topics/best practices.	Term Paper Due 3/20	10 – Wrapup