Those readers who are members of the LSA will have received an invitation to vote on two resolutions. One of these is directly relevant to the ethics committee:
Whereas modern computing technology has the potential of advancing linguistic science by enabling linguists to work with datasets at a scale previously unimaginable; and
Whereas this will only be possible if such data are made available and standards ensuring interoperability are followed; and
Whereas data collected, curated, and annotated by linguists forms the empirical base of our field; …
Therefore, be it resolved at the annual business meeting on 8 January 2010 that the Linguistic Society of America encourages members and other working linguists to:
- make the full data sets behind publications available, subject to all relevant ethical and legal concerns; …
- work towards assigning academic credit for the creation and maintenance of linguistic databases and computational tools; and
- when serving as reviewers, expect full data sets to be published (again subject to legal and ethical considerations) and expect claims to be tested against relevant publicly available datasets.
We’d like readers to identify potential ethical concerns relating to this resolution, so that a set of more explicit guidelines can be developed. Here are a couple:
- Did the research participants give permission for the raw material to be made available?
- Is there personal or identifiable information in the data set? Are participants happy for that to be made available?
- Is the material coded with appropriate metadata?
- Is the material available in a form that’s actually usable to other linguists?