Gene Discovery in DNA:
Read the given file into a variable genome as follows:
fid = fopen(filename,'rt'); % Open the input file if fid < 0 % Quit if file is not opened error(['File ' filename ' not found']); end genome = fgetl(fid); % Read the file into genome fclose(fid); % Close the input filewhere filename was obtained as an argument to the pattern function.
The genome variable is a string of characters. That means it is easy to access each individual character: for example, the tenth character is obtained using genome(10).
Open an output file named outfile.txt for writing like this:
fout = fopen('outfil.txt','wt'); % Open the output file if fout < 0 % Quit if file is not opened error('File outfile.txt cannot be opened for writing'); end
Write to the output file using fprintf like this:
fprintf(fout,'DNA file %s.dat output produced for %s on %s\n\n', ... filename, identity, date());which is a reasonable output file header.
Use an outer loop to locate the beginning of a gene, then use an inner loop to locate the end. Suppose variable i is used in the outside loop to obtain gene-starting characters in the genome string and j is used in the inside loop to obtain gene-ending characters. The value of i is fixed while looking for a gene-ending sequence in the inner loop. When a gene-ending sequence is discovered in the inner loop the value of i must be updated to be the current value of j plus 3. Thus, the outer loop must be a while loop (a for i=... loop does not allow changing i in its body).
Gene start and end sequences must be on three pair boundaries. That is, the test for a start codon can occur only for i=1,4,7,10,13,16,... and the test for an end codon can occur only for j=4,7,10,13,16,... This means the inner for loop should look like this:
for j=i+3:3:length(genome)-2 ... endThe test for the start codon looks like this:
genome(i) == 'T' && genome(i+1) == 'A' && genome(i+2) == 'C'The test for an end codon looks like this:
genome(j) == 'A' && genome(j+1) == 'T' && genome(j+2) == 'T' ||... genome(j) == 'A' && genome(j+1) == 'T' && genome(j+2) == 'C' ||... genome(j) == 'A' && genome(j+1) == 'C' && genome(j+2) == 'T'