Workflows |
|
boxplotWithstats_pdf1.xml | New 15/03/07 |
|
Taverna now has a new plugin system for allowing users to add new functionality into the software. I've been learning how to use this new plugin system by developing a renderer for PDF documents that report the results of a workflow which are automatically generated as part of the workflow itself. The workflow below uses R to plot a box plot and summarise some microarray data retrieved from Maxd. These results are combined into a PDF document which is created by a beanshell script using the iText free java PDF library. N.B. For this workflow to run successfully, the iText jar has to be downloaded and placed in the /lib folder in your local .taverna directory. This is usually $HOME/.taverna/lib (Linux) or c:/Documents and Settings/MyUser/Application Data/Taverna/lib in Windows. In addition, the workflow will only run with the latest update of Taverna, 1.5.1.6.
The results of this workflow is a PDF document which can be displayed using the PDF renderer plugin. The plugin can be installed as follows:
Once the plugin has been installed the workflow can be enacted which will hopefully produce results similar to that shown below. The box plots show the spread of gene expression values from chip measurements. The text below the box plots show some miscellaneous stats about the data.
The PDF document can be viewed using the PDF renderer. It may also be saved onto your file system if required. Addendum: Running this workflow requires a username and password for running the R processor and accessing the data in the Maxd database. |
|
calcGeneExpFreq.xml | New 27/02/07 |
|
R can be used to analyse data from Taverna workflows using a Rserve server which has been wrapped for use as a Taverna processor by Ingo Wassink. The workflow below shows how data retrieved from the MaxD database can be analysed in a simple R script which calculates the frequencies of gene expression values from a given experiment.
The results of this workflow is shown below in the form of a histogram of the gene expression value frequencies calculated by the R script.
Whilst this is a simple analysis, it shows what can be achieved once distributed microarray data is in a form which is accessible by Taverna with R. Addendum: Running this workflow requires a username and password for running the R processor and accessing the data in the Maxd database. |
|
queryMaxD.xml | New 26/02/07 |
|
There are a number of projects developing distributed systems for analysing microarray data, e.g. GEMEPS and the Extensible MicroArray Analysis System. Such systems require a repository for storing the data and metadata generated from microarray experiments. Here in Manchester, Andy Brass's group has developed the MaxD database, a MIAME compliant database for storing gene expression data generated from microarray experiments. Work done by Giles in the DBK group has provided MaxD with a web services interface into the maxdBrowse client for MaxD which is accessible by Taverna. Its still easier to browse the data and metadata for an experiment in MaxD from a browser but once you know the experiment and associated measurements you are interested in then you can retrieve the gene expression values from taverna for further analysis.
The above workflow retrieves the gene expresion values for a given measurement identifier from an experiment stored in MaxD and transforms the data into a comma separated value String.
Its often useful to transform the data into a CSV String as it can then be further analysed using applications which can consume data for mathematical analysis such as R and Matlab. Addendum: Running this workflow requires a username and password for accessing the data in the Maxd database. |
|
interproscan5.xml | New 26/02/07 |
|
Following on from the workflow below, its also possible to merge the results from the EBI's InterProScan service into the input protein's associated SwissProt file, This has the advantage of enabling the user to view the locations of the functional motifs identified by InterProScan in relation to the whole of the protein sequence, again using SeqVista.
The results of InterProScan are merged into the input SwissProt file as features using BioJava by the addInterproFeatures processor. The resulting new SwissProt file can then rendered by SeqVista:
The above sequence shows a fibronectin domain (fn3) highlighted which has been identified by InterProScan using HMMPfam. |
|
emblmerge3.xml | New 25/02/07 |
|
One of the problems with the Graves disease scenario was the multitude of outputs generated by the gene annotation workflow. This made the interpretation of the results difficult especially if one has a list of genes to be annotated by this workflow. One way to ease this problem is to integrate the results into a data model which can then be parsed by applications which are able to render the data in graphical way which aids the intepretation of the data. For example, it might useful to view a set of features on a gene, e.g. the location of affymetrix probe sequences, to understand where they are located. One way to do this is to integrate the range locations of the affymetrix probes into their corresponding gene EMBL file as achieved by the workflow below.
|
|
Using BioJava, the addAffyProbeSeqFeatures processor inserts a range location feature for the Affymetrix probe into the EMBL file associated the probe set identifier:
The EMBL file can then be rendered as a sequence diagram using SeqVista once it it tagged with the chemical/x-embl-dl-nucleotide MIME type. You will see that for this gene, its Affymetrix probe sequences are situated on its 3' end.
The addAffyProbeSeqFeatures processor is a full wsdl web service operation which is a bit overkill as it only adds new features into an EMBL file. Ideally, this work should be done using a beanshell script but unfortunately it has problems dealing with inner classes which BioJava uses for the creation and manipulation of sequence files containing features. |
|
graves-pdbsnpnested4.xml | New 26/02/07 |
|
After leaving Newcastle University, the gene annotation workflow written for the Graves' disease scenario became broken since the services it required were not maintained. With the help of Katy Wolstencroft, it has since been re-written to work with new services to act as a use case for a paper submitted to DILS.
|
|
junger-final.xml | Updated 12/09/06 |
|
This Scufl file implements the process by which Junger et al., (2003) identifies homologs of the transcription factor FOXO in Drosophila using BLAST and a multiple sequence alignment as a workflow in Taverna. The input to the workflow is the human FOXO3a protein sequence labelled by the accession number NP_001446 which is used as the query sequence to perform a tBlastn search against the non-redundant nucleotide database at the NCBI. The nucleotide sequences identified by tBlastn are translated and analysed using 'emma' that generates a multiple alignment which is displayed using 'pretty plot'. Emma and pretty plot are two Emboss tools which have been deployed as Soaplab services for use in this workflow. This workflow also fetches the amino acid sequence of the human FOXO protein from the RefSeq database using the Sequence Retrival System (SRS) hosted by the European Bioinformatics Institute. The SRS is accessed from a Soaplab service interface which is located at http://dbk-ed.mib.man.ac.uk:8080/axis/services/. Perhaps the most interesting part of this workflow is the use of a beanshell script called MultipleSelectWorker. This processor displays a Swing window that allows the user to interact with the running workflow to select those blast hits which he or she wants to perform a multiple alignment with the FOXO protein. N.B. This workflow requires the NCBI Blast DTD and associated MOD files to be present in the root directory of your Taverna installation in order for the 2 XPath processors to extract the Blast accession numbers and definitions. See here.
|
|
|
|
|