trying to fit linear models in R using relatively large datasets which we now realize is beyond R's natural ability without using something like sparklyr, which we want to avoid.
So, we would like to use a batched training approach that merges the results of model training performed on smaller datasets. Using small batches goes MUCH faster and doesn't run out of memory. However, we need a validated process to merge the results by averaging the fitted coefficients from across all batches.
We've found this package called MuMInwhich promises to do the averaging for us. However, we need some help to help validate the process, craft an alternative if MuMIn doesn't work as planned, and hand over a working example that we can leverage in our work.
The approach we have in mind is as follows:
(1) break up source data set into equal-sized batches
(2) save resulting fitted models from each batch
(3) average the coefficients across all fitted models to create an overall model equivalent to one that was trained over the entire dataset at once