Solution - Build SH*T with Julia

17.2 Solution

The file dna_seq_template_strand.txt is quite small as you can see by using your file manager or Julia

#in 'code_snippets' folder use "./transcription/dna_seq_template_strand.txt"
#in 'transcription' folder use "./dna_seq_template_strand.txt"
filePath = "./code_snippets/transcription/dna_seq_template_strand.txt"
filesize(filePath)

4449

Here we defined filePath to our file. Next, we checked its size with filesize to see it is equal to 4449 bytes. This is slightly more than 4 kilobytes (KiB). Such a small file can be easily swallowed by read (the recommended way below) and returned as a one long Str (type alias for String).

dna = open(filePath) do file
    read(file, Str)
end
dna[1:75]

gagctccccg gatctgtaac gggaggtctc tctcgtgggt tgtgggaggt ccgaactggc\ncggtcccac

Note. For large files you should probably read it line by line with something like for line in eachline(file) #do sth with line# end or use a dedicated library.

The nucleotide bases (a, c, t, g) are grouped by 10. Moreover, notice the \n character on the right. It is a newline character that tells the computer to print the subsequent characters from the beginning of a new line. We need to splice sequence at positions 2424-2610 and 3397-3542 so let’s get rid of those extra characters to make the counting easier.

dna = replace(dna, " " => "", "\n" => """)
dna[1:75]

gagctccccggatctgtaacgggaggtctctctcgtgggttgtgggaggtccgaactggccggtcccacagggga

This couldn’t be simpler, we just use replace and itIs => shouldBe syntax. The spaces (" ") are replaced with nothing ("", empty string) and newlines ("\n") with nothing ("", empty string) as well. Effectively this removed them from our dna string.

String splicing is easily done with indexing and string concatenation operator (*) like so.

dnaExonsOnly = dna[2424:2610] * dna[3397:3542]
dnaExonsOnly[1:75]

taccgggacacctacgcggaggacggggacgaccgcgacgaccgggagacccctggactgggtcggcgtcggaaa

All that’s left to do is to transcribe to mRNA using the complementarity rule mentioned above. First, let’s rewrite it to Julia’s dictionary.

dna2mrna = Dict(
    'a' => 'u',
    'c' => 'g',
    'g' => 'c',
    't' => 'a'
)

And now the transcription itself.

function transcribe(nucleotideBase::Char,
    complementarityMap::Dict{Char, Char} = dna2mrna)::Char
    return get(complementarityMap, nucleotideBase, nucleotideBase)
end

(
    transcribe('a'),
    transcribe('g'),
    transcribe('x')
)

('u', 'c', 'x')

Our transcribe function takes a character (Char, String is build of individual characters) called nucleotideBase and a default complementarityMap set to dna2mrna. It uses get to return a complementary base to nucleotideBase (its second argument) or a default (its third argument, in this case just return nucleotideBase) if a match was not found.

All that’s left to do is to write a transcribe function for the whole string (dnaExonsOnly).

function transcribe(dnaSeq::Str)::Str
    return map(transcribe, dnaSeq)
end

mRna = transcribe(dnaExonsOnly)
(
    dnaExonsOnly[1:10],
    mRna[1:10]
)

("taccgggaca"
 "auggcccugu")

Here a map function applies previously defined transcribe on every character of dnaSeq and glues the obtained characters into a string.

Instead of the above two functions we could have just written

mRna = map(base -> get(dna2mrna, base, base), dnaExonsOnly)
(
    dnaExonsOnly[1:10],
    mRna[1:10]
)

("taccgggaca"
 "auggcccugu")

with the same result, but I felt that the longer version was clearer.

17.1 Problem ← → 18 Translation

CC BY-NC-SA 4.0 Bartlomiej Lukaszuk