The training data and test data files provided here were used for the model presented in figure 7 of Knight, C.G., et al., Array-based evolution of DNA aptamers allows modelling of an explicit sequence-fitness landscape. Nucleic Acids Res, 2009. 37(1): e6. Also available from http://dbkgroup.org/Papers/knight_nar09.pdf The files were not included in the original publication and are not explicitly referenced there, but are provided here for use by others. When uncompressed, training.dat.zip is a tab delimited text file, training.dat, with 49,999 rows (the assay failed for one of the 50,000 experimental probes tested) and 305 columns. The testHC.dat file has 5500 rows and 305 columns. Only data on experimental probes used for modelling (training or testing), is included. In both files, the columns are 'probe' (the sequence name) 'sequence' (given 5' to 3' as a text string), 'seqScore' (the in vitro score used in the paper) and 302 explanatory variables (not in the same order in the two files). The details of the explanatory variables are provided in supplementary file 3 of the original paper, which contains supplementary table 4, KnightST4R1.xls, and is also available from http://dbkgroup.org/Papers/KnightST4R1.xls Undefined data (e.g. medpos_C, the median position of a C base along the sequence is undefined for sequences not containing any C bases) are indicated by 'NA', though in the model presented in the paper, NA values were substituted by the column median for continuous variables and the column mode for factors, breaking ties at random. Chris Knight Chris.knight@manchester.ac.uk