Collaborative network training is a method for training network parameter using a collection of networks that work together. It was designed to 1) enable task objective functions that are not directly differentiable w.r.t. the network output; 2) generate continuous-space actions; 3) apply more direct optimization for achieving a desired task; 4) learn parameters when a process for measuring performance is available, but labeled data is unavailable. The procedure involves three randomly initialized independent networks that use ranking to train one another on a single task. The method incorporates qualities from ensemble and reinforcement learning as well as gradient free optimization methods such as Nelder-Mead.

All the networks are randomly initialized and the loss of each network is determined by ranking each network's performance on the task according to their outputs for a given input.

The method can be used for both function fitting and a type of black-box optimization. See below how the process can find the minima of a variety of non-convex test functions.

And here is an example of function fitting for kinematic processes. We compare it directly to supervised training. Collaborative outperforms supervised training because networks are optimized directly for the task as opposed to minimizing a mean squared error or some other loss.

The method can also be used for learning policies. Here a policy is learned to control a robotic drumstick to generate the appropriate trajectories to achieve the desired rhythm.