YouTube videos as language data?

Hi everyone. I’m posting on behalf of Lise Dobrin, based on an interesting query from Marianne Mithun regarding the status of language data from the internet. I wonder if anyone has encountered any institutional policies regarding the use of YouTube videos as language data, in cases where the video is freely available with no restrictions. My understanding is that this would not count as human subjects research under the usual IRB guidelines, since there are no interactions or interventions with the people depicted in the video, and since the material is freely available to the public. It might, however, fall under human subjects regulations if the people depicted in the video do not know that the video is up there. Also, regardless of regulations, we were wondering how to determine whether use of such a video for research poses any additional risks to the people depicted in the video. Thanks for your thoughts! Elaine

This entry was posted on Wednesday, February 13th, 2013 at 9:59 am and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Responses to YouTube videos as language data?

Claire Bowern says:

February 13, 2013 at 10:26 am

My immediate reaction is that if the people don’t know that videos of them are online, that’s a problem for the person who put the video up, not a problem for subsequent IRBs. If the youtube video is published under a creative commons license (as many are), it’s publication, and therefore not subject to IRB regulations.
Shari Speer says:

February 13, 2013 at 10:38 am

I have not seen policy on the use of YouTube video as data, but because it is publicly available without restriction, there is a good argument for the idea that using it does not fall under the category of human subjects research. The closest policy I’m aware of (here at Ohio State) allows use of the public level of Facebook (the page level that anyone can see) as not human subjects research, but would require consent for the use of other levels (e.g. a facebook “friend” of the researcher would need to consent to participate in research in order for data from pages accessible due to the friend status to be used). The argument in the facebook case is that there is a reasonable assumption of privacy on the part of a facebook user due to 1. the facebook terms of use, which state that the data isn’t to be used for research (by anyone other than facebook), 2. the fact that users can restrict the level of privacy to certain other users and 3. the purpose of the site and postings on it is not research.
Mike Cahill says:

February 13, 2013 at 10:45 am

I don’t know how many videos are posted online without the subject’s knowledge, but it seems common enough so that comic strips are taking notice of the practice! I believe that this practice is indeed unethical if a non-public figure’s actions are posted without their knowledge. But yes, it seems like the ethics violation is theirs, not someone who wants to tap it.
James Crippen says:

February 13, 2013 at 1:51 pm

I agree with Claire’s & Mike’s comments about people being posted without their knowledge. For the researcher, using such data would not be *institutionally* unethical since it would be considered something akin to published materials and hence exempted from review. But we easily forget that institutional review is not the end of ethical responsibility. A researcher knowingly using online recordings of a person who is unaware of those recordings would still be personally unethical even when the recordings were published by some independent third party. In that sort of situation it’s the researcher’s own judgement and reputation at stake, and the institutional review can’t really do much if they already have an exemption policy for published materials (which is usually the case).
Natasha Warner says:

February 18, 2013 at 7:05 pm

Our Human Subjects Office confirmed for us in a training recently that publicly available material, publicly available at the time you submit a research proposal to the office, is not human subjects data. I think you can sort of view it as either exempt or not even human subjects data (and therefore not even needing review to determine it being exempt). We had a case once of someone using publicly available posts to an online dating site that was specifically for gay people. The Human Subjects office had the researcher remove names (including fictitious account names) from the data, in case someone subsequently gets a new job where it’s dangerous to be out as gay and removes their posts. The current Human Subjects office personnel tell us the decision is based on whether the material is public at the time of the proposal submission, not whether it might become non-public later, so that material would still be exempt.
As for the personal ethics of using material on YouTube, if someone in a YouTube video said something embarrassing or incriminating or risk-inducing, and also said their name (or their video and voice seemed identifiable), would you really choose to publish exactly that material, or show it in a talk? I doubt we would. If the content of what they said was important, but not the phonetics, you could present the written transcript with names removed. If the phonetics was what was important, you could probably find a different token of whatever was important about it, that didn’t include risky information.
And if someone doesn’t know they’re on some video on YouTube, but the content isn’t at all problematic or sensitive, then it seems OK to use it. In most cases, the researcher would have no way to determine whether the speaker knows the video has been made publicly available.
Using Facebook posts as data is a separate question: our Human Subjects office, in the recent training, told us that Facebook itself has a much more restrictive policy about using posts (even public ones) as data than Human Subjects offices do. They say that researchers are obligated to follow any ethical guidelines established by the company that makes the data available, even if those exceed the federal standards or the local university’s IRB’s standards. The person doing this training is going to send us a link to the page that states this facebook policy.
Kristine Hildebrandt says:

April 3, 2013 at 10:49 am

I am the P.I. in a currently running project where YouTube video posts (with sub-titles) are one kind of output (for public access to the languages and the stories that representative speakers tell). Since the project involves data collection of different types (discourse, interviews, phonetics data), In the original IRB process, I created an (oral) consent process, worded to be understandable/relevant to these communities, that enabled speakers to decide what (if any) of their data could be made for public release on the Internet. We have not had any real problems with this process. In one case, even, when a speaker gave permission and then told a story that involved a section where there was some bad-mouthing of a relative, we simply omitted/edited-out that section from the story before preparing it for YouTube. So in this sense, by the time our discourses make it to the Internet, they have gone through that process. I have learned that on YouTube (and in the construction of our webpage) there are some settings one can click/set to adjust Creative Commons Licensing options. Having said this, I am always open to hearing/learning more about how we can continue and build on our sense of social responsibility while simultaneously tapping into the tremendous resources that these different Internet platforms can offer us in language data access and language promotion.