Data Collection and Risk Flagging
Participant Demographics and Mental Health Assessment:
Youth aged 13–21 completed a web-based survey reporting demographic details (gender and sexual identity).
Mental health and well-being assessed using validated scales:
Short Warwick-Edinburgh Mental Well-Being Scale (SWEMWBS)
Patient Health Questionnaire (PHQ-9) for depression
Inventory of Statements about Self-Injury (ISAS)
All constructs were measured on a 5-point Likert scale.
Data Upload and Review Process:
Participants uploaded their Instagram data via a secure PHP-based system with AWS for data encryption and storage.
Data was uploaded as JSON files, converted, and securely stored on AWS.
Message Review and Risk Classification:
Participants reviewed their Instagram direct messages (DMs) in reverse chronological order through a web interface.
Messages marked as risky were categorized by type (e.g., sexual content, harassment, self-injury) and severity:
Dataset Overview:
Included 32,055 DM conversations, totaling over 6 million messages.
2,515 conversations flagged as risky, with 3,023 messages categorized by risk type and severity:
402 high-level risks
821 medium-level risks
1,813 low-level risks.
Methodology
We employed a mixed-methods approach, combining both quantitative and qualitative analyses to comprehensively examine the online risk experiences and mental health outcomes of LGBTQ+ versus heterosexual youth.
Chi-Square Tests
To compare online risk experiences between LGBTQ+ and heterosexual youth, specifically regarding both the types and severity of flagged risks, we used chi-square tests of independence. This test is ideal for examining relationships between categorical variables, enabling us to capture distributional differences in flagged risk types and severity levels between LGBTQ+ and heterosexual youth. By examining standardized residuals, we could isolate the most impactful differences, revealing whether LGBTQ+ youth experienced higher or more severe types of risks.
Variables:
Sexual identity: LGBTQ+ or heterosexual.
Types of flagged messages: Categories such as sexual content, harassment, self-injury.
Risk severity levels: Low, medium, and high, based on participants' perceptions of each flagged message.
Multiple Linear Regression Analysis
To understand how different types of online risks affected mental health outcomes, we used multiple linear regression models. This approach was essential for analyzing the predictive relationship between flagged message types and mental health measures, while also accounting for the moderating role of sexual identity. We built separate regression models for each mental health outcome, adding flagged message types as predictors. Including sexual identity as an interaction term allowed us to see if message types impacted LGBTQ+ youth differently. This approach helped us understand the differential effects of online risks on mental health across groups.
Variables:
Predictor variables: Types of flagged messages, including sexual messages, harassment, and self-injury.
Dependent variables: Three key mental health outcomes:
Depression: Measured using a modified version of the Patient Health Questionnaire (PHQ-9).
Self-Harm and Injury: Assessed with the Inventory of Statements about Self-Injury (ISAS).
Mental Well-Being: Measured using the Short Warwick-Edinburgh Mental Well-Being Scale (SWEMWBS).
Interaction term: Sexual identity (LGBTQ+ or heterosexual) to explore whether the relationship between risk types and mental health varied by group.
Welch’s Two-Sample t-Test
To compare mental health scores between LGBTQ+ and heterosexual youth, we used Welch’s two-sample t-test. This variation of the t-test is suited for comparing means between two groups with unequal variances, which was a condition present in our dataset. The test compared the mean scores of mental health outcomes between the two groups.
Variables:
Group: LGBTQ+ or heterosexual.
Mental health measures: Mean scores on depression, self-harm, and mental well-being.
Qualitative Analysis: Thematic Analysis of Flagged Messages
For our third research question, we adopted a qualitative approach to understand the specific nature of online risks faced by LGBTQ+ youth. We conducted a thematic analysis on flagged Instagram direct messages, allowing us to identify patterns and themes within the content. This method extended beyond quantitative frequency counts to capture the deeper, lived experiences of LGBTQ+ youth in private online spaces.
We began by familiarizing ourselves with 1,468 flagged messages across 808 conversations, creating initial codes based on message content. Two researchers refined these codes iteratively, organizing them into thematic categories that reflected LGBTQ+ youth experiences.
Findings
RQ1: Frequency and Severity of Online Risks for LGBTQ+ Youth
Higher Severity but Not Frequency: LGBTQ+ youth did not report more online risk experiences overall compared to heterosexual youth. However, the risks they flagged were of significantly higher severity, often involving sexual or self-injurious content.
Greater Exposure to Sexual and Self-Injury Content: LGBTQ+ youth faced a higher prevalence of messages with sexual content or self-injurious themes, contrasting with heterosexual youth, who frequently reported low-risk content like general harassment.
RQ2: Mental Health Implications of Negative Online Experiences
Worse Mental Health Outcomes for LGBTQ+ Youth: LGBTQ+ youth reported higher depression and self-harm levels and lower well-being than heterosexual peers, confirmed by regression analysis and Welch’s t-tests, indicating their heightened vulnerability to online risks.
Stronger Impact of Harassment: Harassment was more strongly linked to self-harm behaviors among LGBTQ+ youth, suggesting identity-based abuse intensifies mental health challenges for this group.
Complex Role of Sexual Content: Surprisingly, some LGBTQ+ youth associated sexual content with slightly improved well-being, indicating that certain interactions, despite risks, may offer affirming experiences for identity exploration.
RQ3: Unique Nature of Online Risks in LGBTQ+ Youth’s Private Conversations
Sexual Undertones and Identity Exploration: Many risky conversations flagged by LGBTQ+ youth had sexual undertones, even when intent was harassment, posing both threats and opportunities for identity exploration.
Supportive Peer Interactions: LGBTQ+ youth frequently used DMs to share struggles with peers, reflecting the dual role of social media in amplifying risks while offering supportive networks.
These findings emphasize the need for targeted interventions that balance mitigating harmful content with preserving positive online spaces for identity exploration and peer support for LGBTQ+ youth.