Case studies
Anonymised examples of real projects — the challenges, analytical approach, and final outcomes.
Identifying drought-tolerance loci in wheat using GWAS
A PhD student had sequenced 480 wheat lines across two environments but lacked the bioinformatics capacity to run QC, population structure correction, and association testing. The dataset had high missing-data rates and unknown population stratification.
Three significant loci on chromosomes 4A, 6B, and 7D were identified — two of which were novel. The student submitted a first-author manuscript to a plant science journal within 6 months of our engagement.
- Genotype QC: MAF filtering, LD pruning, and missingness thresholds via PLINK
- Population structure: PCA and ADMIXTURE with K=2–6 selection
- GWAS: EMMAX with kinship matrix correction for false positives
- Functional annotation of top SNPs using plant gene databases
Predictors of 30-day mortality in ICU patients with sepsis
A senior resident had a 5-year retrospective dataset of 340 ICU patients with incomplete records, informative censoring, and 18 potential confounders. Standard logistic regression gave unstable results due to event rarity.
Four independent predictors of 30-day mortality were confirmed. The study was accepted in a Q2 clinical journal with minor revisions — the reviewers specifically commended the statistical rigour.
- Multiple imputation for missing values using MICE in R
- Penalised logistic regression (LASSO) for variable selection
- Cox proportional hazards model with time-to-event outcome
- Calibration and discrimination assessment (Hosmer-Lemeshow, AUC)
Efficacy of probiotics in reducing antibiotic-associated diarrhoea: a meta-analysis
A faculty member needed an NMC-eligible publication for promotion. Time was short and previous attempts at meta-analysis had been rejected for methodological issues including inadequate heterogeneity assessment and lack of PRISMA compliance.
Published in a PubMed-indexed journal with an impact factor of 3.2. NMC promotion approved. The faculty member has since commissioned a second meta-analysis with us.
- PROSPERO registration and PRISMA 2020-compliant protocol
- Systematic search across PubMed, Embase, and Cochrane
- Data extraction by two independent reviewers with Cohen's kappa
- Random-effects meta-analysis with I² and prediction intervals
- Subgroup analysis by probiotic strain and patient age
Genomic prediction of milk yield in Murrah buffalo using machine learning
A buffalo breeding programme had 1,200 genotyped animals across three generations but their BLUP-based genomic selection model was underperforming. They needed ML benchmarking against traditional GBLUP without losing interpretability.
XGBoost achieved a 9% improvement in predictive accuracy over GBLUP for 305-day milk yield. A pilot genomic selection programme was launched using the new model, with an agreed 12-month evaluation period.
- GBLUP baseline with genomic relationship matrix (GRM)
- Random forest and gradient boosting feature importance
- XGBoost model tuning with 5-fold cross-validation
- Shapley value interpretation for breeder communication
Have a similar project? Let's talk through what's possible with your data.
Start your project →