
Closed
Posted
I’m running a comparative genomic-selection study, benchmarking about ten prediction models that range from rrBLUP and GBLUP to Random Forest, LightGBM, CNN and ElasticNet. The experiments are in motion but the dataset keeps growing, so I’m looking for a research-minded intern who can jump in, work entirely in R, and keep me posted with regular, concise updates. Here’s where I need your help: • Data preprocessing & cleaning – imputation, SNP quality filters, population structure checks. • Model development & tuning – implement or refine the models above, explore hyper-parameter grids and suggest new algorithms when they make sense. • Performance evaluation & analysis – rigorous cross-validation, predictive-ability metrics, clear visualisations and short interpretive write-ups. We’ll collaborate through a shared Git repo; reproducible R scripts, well-commented notebooks and a comparative results table/figures are the acceptance criteria. If you’re comfortable working in R and excited by genomic prediction research rather than a routine service job, let me know how you’ve tackled similar pipelines and we can get started.
Project ID: 40277389
5 proposals
Remote project
Active 10 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
5 freelancers are bidding on average ₹401 INR/hour for this job

Hello, I’ve read your comparative genomic-selection plan and I’m confident I can step in and keep the pipeline reproducible and moving. I have hands-on experience building end-to-end genomic prediction workflows in R: quality filtering and imputation (including PLINK and snp-ready R workflows), population-structure checks (PCA, kinship), and fitting/benchmarking rrBLUP, GBLUP, Random Forest, LightGBM, ElasticNet and CNN-style models adapted for SNP matrices. I’ll implement robust hyper-parameter grids, stratified cross-validation schemes, and clear predictive-ability metrics, then deliver well-commented R scripts, notebooks and comparative tables/figures in the shared Git repo. I’ll provide concise progress updates and short interpretive notes with each experiment batch to keep decisions evidence-driven. For next steps I can begin by auditing the current repo and running a baseline pipeline on a held-out subset within the first week. What are the target traits, current sample size and genotype format (VCF/PLINK/binary matrix), and do you have GPU access for CNN training? Thanks, Fabian
₹916 INR in 16 days
0.0
0.0

Hello, Your genomic-selection project sounds very interesting. While my primary expertise is in Python-based data science and machine learning, I have strong experience working with data preprocessing, model benchmarking, and reproducible ML pipelines, which are directly relevant to this study. Relevant Experience -------------------------------- • Data preprocessing and cleaning (missing data handling, feature filtering, exploratory analysis) • Machine learning models such as Random Forest, Gradient Boosting (LightGBM/XGBoost), ElasticNet, and deep learning models • Designing cross-validation experiments and model comparison pipelines • Producing clear visualizations, performance metrics, and experiment summaries How I Can Contribute ---------------------------------- • Help design and manage the model benchmarking workflow • Run hyperparameter tuning and comparative evaluation of prediction models • Maintain clean, reproducible scripts and experiment tracking via Git • Provide concise analysis and visual reports of model performance I am comfortable quickly adapting to new research workflows and tools, and I can also pick up R-based scripts if needed while contributing to the modeling and analysis process. I’d be glad to learn more about your dataset and current pipeline.
₹250 INR in 40 days
0.0
0.0

Your project on comparative genomic selection immediately caught my attention. Working with models like rrBLUP, GBLUP, Random Forest, LightGBM, CNN, and ElasticNet in a continuously expanding dataset is exactly the type of research-oriented pipeline I enjoy contributing to. I have experience building reproducible data science workflows in R, especially for experiments where datasets evolve over time and model benchmarking must remain consistent. For genomic-style pipelines, I typically structure the workflow into clear stages so results stay reproducible and easy to interpret
₹250 INR in 40 days
0.0
0.0

Hi, I've analyzed your requirements and I am confident I can deliver high-quality results for this project. Let's discuss details.
₹491 INR in 40 days
0.0
0.0

Stillwater, United States
Payment method verified
Member since Aug 13, 2025
$10-30 USD
$15-25 USD / hour
$30-250 USD
$250-750 USD
£20-250 GBP
₹750-1250 INR / hour
₹600-1500 INR
₹100-400 INR / hour
£250-750 GBP
$15-25 USD / hour
$10-30 USD
₹10000-13000 INR
₹750-1250 INR / hour
₹1500-12500 INR
₹750-1250 INR / hour
$75-150 USD
₹750-1250 INR / hour
$30-250 USD
₹600-1500 INR
₹750-1250 INR / hour