top of page
DATA SCIENCE LANGUAGES: R, STATA, SPSS, PYTHON, TABLEAU

"A Sentiment Analysis on "Hormonal" in Context to Reproductive Health Messaging ​​" in R/ R Studio 

​​

​This project aimed to perform a sentiment analysis of the term "hormonal" in the context of reproductive health messaging. By examining how "hormonal" is discussed within this specific domain, we can uncover whether the term carries positive, negative, or neutral sentiments and how it is perceived about reproductive health. The analysis seeks to inform stakeholders in healthcare and communication fields about the language surrounding hormonal health, with potential applications in improving public health messaging and reducing stigma.

Methods

  • Data Collection: Data was collected from various reproductive health messaging sources containing the word "hormonal."

  • Sentiment Analysis: Utilized the tidytext package to tokenize the text and perform sentiment analysis using lexicons such as Bing and NRC.

  • Data Visualization: We visualized the sentiment distribution using ggplot2, highlighting whether "hormonal" is commonly associated with positive, negative, or neutral sentiments

​

Packages: tidyverse, tidytext, textdata, ggplot2, syuzhet, SentimentAnalysis, lexicon

"Semen Quality Analysis Over a Period of 20 Years- Time Series ​" in R/ R Studio 

​​

​

"A Statistical Interpretation of Willingness in Childbearing Amongst Various Age Groups​​" in SPSS

​​​​

This project explores the relationship between socioeconomic factors and fertility intentions across various age groups using data from the Pew Social Trends survey. The analysis focuses on whether financial resources, represented by variables such as income, education, and employment status, are significant predictors of childbearing intentions, and how these relationships vary with age.

​

Statistical Methods:

  • Descriptive Statistics: Conducted initial summary statistics (means, medians, standard deviations) to provide an overview of key socioeconomic variables (income, education, employment) and their distribution across different age groups.

  • Cross-tabulation: Generated contingency tables to examine the distribution of fertility intentions across socioeconomic categories (e.g., income quintiles) and age groups.

  • Chi-Square Tests of Independence: Employed Chi-Squared tests to assess whether fertility intentions are independent of categorical socioeconomic variables (e.g., income levels, education). This tested the null hypothesis that no association exists between socioeconomic status and fertility intentions.

  • Correlation Analysis: Calculated Pearson correlation coefficients to examine linear relationships between continuous socioeconomic variables (e.g., income, age) and fertility intentions, which were coded on a continuous or ordinal scale (e.g., number of children desired).

  • Multivariate Logistic Regression: Built logistic regression models to evaluate how socioeconomic variables (e.g., income, education) predict the probability of intending to have children, controlling for confounding factors like age, marital status, and gender. The model estimated odds ratios to quantify the influence of each predictor.

  • Age Stratification: Conducted subgroup analyses by stratifying the data into age categories to assess how the impact of socioeconomic factors on fertility intentions varies across different life stages.

"Significance of  Hormonal Imbalances & Reproductive Disorders About Weight Analysis" in R/ R Studio

​

Conducted an in-depth analysis using the Menstrual Health and PCOS Risk Detection Dataset, focusing on the interplay between demographic, anthropometric, and reproductive variables to assess PCOS risk. Applied advanced data science techniques to explore complex relationships, including the impact of weight on menstrual health indicators such as cycle length and luteal phase.

 

Key R packages used:​

  • dplyr: Implemented sophisticated data wrangling techniques, including multilevel subsetting by age and weight categories to examine non-linear trends in menstrual cycle variability.

  • ggplot2: Developed advanced visualizations to uncover nuanced correlations between weight, cycle irregularities, and reproductive health metrics, utilizing multi-faceted plotting for subgroup analysis.

  • summary tools: Employed for comprehensive statistical summaries and cross-sectional analysis of menstrual health metrics, facilitating the identification of critical patterns across weight and age cohorts.

  • plotly: Used to create interactive plot data that displayed a plethora of data metrics â€‹

" Hormones, Diabetes, Blood Pressure Dynamics, and Impacts on Pregnancy" using R/ R Studio

​​

​Conducted in-depth research analyzing a dataset of individuals with Skin Thickness > 50, investigating blood pressure levels in women with 1 and 2 pregnancies. Utilized statistical tests such as permutation and z-tests to compare results, revealing no significant difference. Explored glucose levels and BMI variations in individuals over 20 with different diabetes outcomes, identifying significant differences through statistical analysis. The findings challenge existing understandings, emphasizing the complexities of maternal physiology and the importance of rigorous statistical analysis in healthcare research. This project underscores the need for further exploration and interdisciplinary collaboration to enhance patient care outcomes and develop personalized treatment strategies in diabetes management.

​

Skills: Permutation Tests, Z-Tests, P/Z-Value Calculations

bottom of page