In addition to formal grammatical inference, which focuses on formal proofs of learnability of classes of languages under certain conditions, there is also research in the area of empirical grammatical inference. In this field, the aim is to learn a particular grammar (not necessarily a class of grammars). Additional difficulties may be introduced, such as noise.
To test the practical possibilities (and limits) of empirical grammatical inference systems, several data sets, which often come from competitions are available.
Recent Competitions
- The Omphalos competition was a competition on context free grammars inference.
http://www.irisa.fr/Omphalos/
- The Tenjinno competition looked at learning transducers from synthetic data sets motivated by Machine Translation.
http://web.science.mq.edu.au/~tenjinno/
- The PAutomaC challenge was about learning probability distributions from strings. The (artificial) data were generated using either Hidden Markov Chains (HMM) or (Deterministic or not) probability automaton (PA).
http://ai.cs.umbc.edu/icgi2012/challenge/Pautomac/
- The SPiCe was an on-line competition about guessing the next element in a sequence of symbols.Training datasets consist of whole sequences and the aim was to learn a model that allows the ranking of potential next symbols for a given prefix, that is, the most likely options for a single next symbol.
http://spice.lif.univ-mrs.fr/
DFA Learning
The problem of learning a target DFA from labeled examples has been extensively studied in the literature for over 3 decades. A variety of symbolic, connectionist, and hybrid techniques have been proposed to address this difficult problem. A few benchmark datasets were available against which most of the new algorithms were tested.
- The Abbadingo One Learning Competition
http://www-bcl.cs.may.ie/
In 1996, Barak Pearlmutter and Kevin Lang posted a set of challenging DFA learning problems designed to allow researchers to test their favorite learning algorithms: this was the Abbadingo One Learning Competition. Although the problems were still artificially generated (i.e., the target DFAs were randomly generated), several reasearchers participated in the competition. The eventual winners came up with algorithms that were significant improvements over the existing methods for learning DFA.
- The Gowachin DFA Learning Benchmark
http://www.irisa.fr/Gowachin/
Following the success of Abbadingo One, Kevin Lang, Babak Pearlmutter, and François Coste have teamed up to launch the Gowachin Learning Competition. Users are allowed to generate their own problem (by specifying the size of the target DFA, the number of training examples, and the noise level).
- The GECCO: Learning DFA from Noisy Samples Competition
http://cswww.essex.ac.uk/staff/sml/gecco/NoisyDFA.html
- The STAMINA Competition
http://stamina.chefbe.net/
The competition was about learning regular languages with large alphabets
- The ZULU Competition
http://labh-curien.univ-st-etienne.fr/zulu/
an Interactive Learning Competition.