Children acquire their linguistic competence from performance data, which often include fast speech errors and other non-trivial distortions compared to the teacher' competence. Thus, learning should be robust to performance phenomena.
We have examined the consequences of this obvious, but often ignored fact for learning algorithms of Optimality Theory (OT) and of Harmonic Grammar (HG). Linguistic competence was modelled either as an OT grammar, or as an HG grammar, whereas performance was mirrored by different implementations of these grammars: exhaustive search (producing grammatical forms only) versus simulated annealing (including also errors). The learning data thus produced served as the input to GLA, a standard online learning algorithm in OT. Within GLA, Paul Boersma's standard update rules were contrasted with Giorgio Magri's recent proposal. For each grammar type / performance type / learning rule combination, the number of learning steps until convergence was measured. We required convergence of the performance patterns, measured using the Jensen--Shannon divergence of samples produced by the learner and by the teacher, and not the convergence of the underlying grammars themselves.
The results show that Magri's update rule learns more quickly than Boersma's update rule. Moreover, simulated annealing produces performance patterns that can be more efficiently learnt in the OT case than in the HG case -- a difference not observed with exhaustive search.