The file dna_seq_template_strand.txt
is quite small as you can see by using your file manager or Julia
#in 'code_snippets' folder use "./transcription/dna_seq_template_strand.txt"
#in 'transcription' folder use "./dna_seq_template_strand.txt"
filePath = "./code_snippets/transcription/dna_seq_template_strand.txt"
filesize(filePath)
4449
Here we defined filePath
to our file. Next, we checked its size with filesize to see it is equal to 4449 bytes. This is slightly more than 4 kilobytes (KiB). Such a small file can be easily swallowed by read (the recommended way below) and returned as a one long Str
(type alias for String
).
dna = open(filePath) do file
read(file, Str)
end
dna[1:75]
gagctccccg gatctgtaac gggaggtctc tctcgtgggt tgtgggaggt ccgaactggc\ncggtcccac
Note. For large files you should probably read it line by line with something like
for line in eachline(file) #do sth with line# end
or use a dedicated library.
The nucleotide bases (a
, c
, t
, g
) are grouped by 10. Moreover, notice the \n
character on the right. It is a newline character that tells the computer to print the subsequent characters from the beginning of a new line. We need to splice sequence at positions 2424-2610 and 3397-3542 so let’s get rid of those extra characters to make the counting easier.
dna = replace(dna, " " => "", "\n" => """)
dna[1:75]
gagctccccggatctgtaacgggaggtctctctcgtgggttgtgggaggtccgaactggccggtcccacagggga
This couldn’t be simpler, we just use replace
and itIs => shouldBe
syntax. The spaces (" "
) are replaced with nothing (""
, empty string) and newlines ("\n"
) with nothing (""
, empty string) as well. Effectively this removed them from our dna
string.
String splicing is easily done with indexing and string concatenation operator (*
) like so.
dnaExonsOnly = dna[2424:2610] * dna[3397:3542]
dnaExonsOnly[1:75]
taccgggacacctacgcggaggacggggacgaccgcgacgaccgggagacccctggactgggtcggcgtcggaaa
All that’s left to do is to transcribe to mRNA using the complementarity rule mentioned above. First, let’s rewrite it to Julia’s dictionary.
dna2mrna = Dict(
'a' => 'u',
'c' => 'g',
'g' => 'c',
't' => 'a'
)
And now the transcription itself.
function transcribe(nucleotideBase::Char,
complementarityMap::Dict{Char, Char} = dna2mrna)::Char
return get(complementarityMap, nucleotideBase, nucleotideBase)
end
(
transcribe('a'),
transcribe('g'),
transcribe('x')
)
('u', 'c', 'x')
Our transcribe function takes a character (Char
, String
is build of individual characters) called nucleotideBase
and a default complementarityMap
set to dna2mrna
. It uses get
to return a complementary base to nucleotideBase
(its second argument) or a default (its third argument, in this case just return nucleotideBase
) if a match was not found.
All that’s left to do is to write a transcribe
function for the whole string (dnaExonsOnly
).
function transcribe(dnaSeq::Str)::Str
return map(transcribe, dnaSeq)
end
mRna = transcribe(dnaExonsOnly)
(
dnaExonsOnly[1:10],
mRna[1:10]
)
("taccgggaca"
"auggcccugu")
Here a map function applies previously defined transcribe
on every character of dnaSeq
and glues the obtained characters into a string.
Instead of the above two functions we could have just written
mRna = map(base -> get(dna2mrna, base, base), dnaExonsOnly)
(
dnaExonsOnly[1:10],
mRna[1:10]
)
("taccgggaca"
"auggcccugu")
with the same result, but I felt that the longer version was clearer.