In this practice we will use consensus sequences of several well-known
binding sites to find potential binding sites in a set of putative promoter
regions corresponding to coexpressed genes in a DNA-microarray experiment.
 
Consensus sequences are extracted from: 
P. Bucher. Journal of molecular biology 212: 563-578 (1990)
  | 
ADVICE: It is very useful to open 2 or more browser windows, preserving this
text in one of them and running the exercise using another one.
Input sequences:
6 genes of Drosophila melanogaster.
WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden
from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).
Step 1: Exact matches of TATA-box consensus: STATAAAWR
Step 2: Partial matches of TATA-box consensus: STATAAAWR
-  Repeat the process but increasing the number of allowed mismatches in
the pattern (try 1, 2 and 3 in Substitutions box).
 -  Results will be displayed below the headline
"PatID Strand Pattern SeqID Start End Matching_word Score"
 -  Click the button Feature map to enter
a new menu about plotting the results.
 -  Click the button Go to obtain
a graphical output of the reported matches. Browse across the interactive map.
 
Questions:
-  Real TATA-boxes are supposed to appear 20 bp before Transcription Start Site
(TSS). How many of the TATA boxes are in this range? NOTE: TSS annotations might
easily contain errors and therefore ranges and distances will be useless.
 -  How many occurences will you get if you try 9 substitutions (everything)? 
Do you get one occurence in every position of the sequence? Why not? Think about the
option prevent overlapping matches . Try switching it off.
 
Results: