BIAS DETECTION
Survey
We did our survey through Prolific, an online survey platform popular for educational and research studies. We decided to use a controlled environment (the survey) rather than social media data for two main reasons:
-
Through the survey, we were able to target the Gen-Z age group (ages 18-24) specifically. We were also able to make sure that all of our participants were English speakers so that we knew that they were properly understanding each question we asked and so that we could accurately analyze their responses.
-
Since gender was one of our research areas and it had a strong effect on the survey results, a controlled environment allowed us to use a blocking method to evenly split our participants between genders. After screening and qualification, 675 of 799 participants were deemed valid and analyzed in our research, with a balanced male/female ratio of 338/337.
Here you can see the breakdown of the screening and qualification we did to filter out unqualified responses:

The first questions that our survey participants answered were demographic questions about their age, race, gender, and employment status. We had 425 white participants, 93 Asian, 59 mixed, 57 black, and 41 other. Here you can see the breakdown of the racial demographics of our survey participants:

As you can see, our participants’ racial distribution was pretty diverse and it reasonably resembled the racial distribution of the most recent U.S. Census in 2022. Since we had not specifically requested a racially representative sample, this confirmed the quality and representativeness of Prolific. Here you can see a comparison of the 2022 U.S. Census versus the demographics of our survey participants:

After answering demographic questions, our survey participants answered 5 questions.
-
What words are indicative of race?
-
What words are indicative of high income?
-
What words are indicative of low income?
-
What words are indicative of females?
-
What words are indicative of males?
Each of these questions was carefully structured to be as neutral as possible; we did not want to influence respondents toward positive or negative answers. For each question, we asked participants to provide at least five words. Partial responses (less than five words per question) and responses that did not answer the question were removed in screening and qualification.
For each question, responses were free-form, allowing respondents the freedom to provide as many words as they wanted past the basic requirements. As a result, a word like video game could have been inputted in by a survey participant as “videogames” or “video-game”. To make sure that these different forms gave us the same results, we conducted data pre-processing to achieve stemming and lemmatization.