MB DNA Analysis
(3rd of April 2006)
Copyright © 2002-2006 by Oleg Simakov
This file was last reviewed on the 3rd April 2006. We recommend you to print this document for a better reading.
MB is a FREE Windows program for DNA analysis.
- restriction sites search and mapping
- plasmid/linear DNA drawing, with a possibility of changing fonts of annotations and restriction sites on the DNA map and mark unique restriction sites with a specified color
- plotting the GC percentage and ORFs for 1 selected frame on the DNA map
- mapping of enzymes’ cuts positions on the map
- ability to save all restriction analysis reports into one HTM file
- multiple sequence alignment (method of hierarchical clustering), various amino acid substitution matrices are included
- phylogenetic tree building
- amino acid analysis (translation, chemical properties, prediction of the secondary structure of the protein using Chou-Fasman method)
- codon usage table calculation for selected ORFs
- primer design (self-hybridisation, combination of 2 primers and homology search within a given sequence)
- open reading frame search for all forward and reverse frames
- molecular weight calculation for single-stranded and double-stranded DNA
- dot plot, two algorithms are included: window/stringency and minimal length of a line
- ability to search for 4 promoter sequences (TATA, Pribnow, -35-Region, CAAT)
- helix analysis tool: make a helical graph and hydrophobicity graph
The program features amino acid library.
All results can be saved in different ways or sent to printer.
The package includes 690 restriction sites (including isoschyzomers) and some test DNA sequences.
You can easily add new restriction sites, DNA and amino acid sequences to the database. It is also possible to import DNA files from Internet (MB supports GenBank and FASTA formats).
Updates for the program can be easily obtained via the automatic update feature.
MB runs under all WINDOWS systems (95, 98, NT, Me, 2000, XP). There may be a Linux or Mac support in the future, but I still do not know. Please write me an email, if you want these systems to be supported. MB can be run on Linux with the WINE emulator. Seek instructions on the official MB homepage.
Copyright and license: MB is a freeware. For free distribution. The program is provided AS IS, without any warranty.
If you have some questions or criticism, I'll be appreciated if you write me (see “Contact information”).
- compatibility with other systems
- improvements to the helix analysis tool
Send your suggestions to email@example.com. Thank you!
You can see the progress of the development on the following page:
The following table represents the basic structure of program files.
"MB.EXE" - program file
“README.TXT" - Readme file
“MB Homepage” – a link to the homepage of the program
“mbhelp.pdf” – main help file
“index.htm” – short info on the features of the program
amino acid pictures and data file ("aa.dat")
directory with DLL and *.wav files, needed to run the program
- DNA files ("*.PRT") with 5'-3' sequences (nucleotides are written in small letters!)
- Amino acid sequence file (*.AMI) containing only AA sequence in “one-letter-code”
- Restriction sites database ("resenz.dat") with 5'-3' nucleotide sequences in small letters
- Description file of DNA/amino sequences: "PROTEIN.0". You can edit it as you wish.
- “PK.DAT”-file contains pK-values for charged amino acids and used for pI estimation
- "SEARCH.0" – this file contains search list of restriction enzymes. It is used in restriction analysis.
- "CONFIG.INI", “PARAMETERS.INI” - configuration files
- “plasmid_map.txt” – this file contains settings for the DNA map. The value in the first line represents the height of the map in pixels. The value in the second line is the width-value. If you cannot see all restriction sites on the map, then increase the numbers. This will enlarge the size of the map and it will become possible to see all annotations.
- “performance.txt” – see Chapter “Configuration”
- All reports have extension "REP" or “HTM”
<ANY DIRECORY NAME> - exported restriction analysis report files
This directory contains “plugins.installed” file. It is where the names of all the plugins, which you download through automatic update are stored. The program can then determine which plugins are already installed on your computer, so you will not download the same plugin for the second time.
You may also discover some temporary files in this directory, they are used by the automatic update.
Directory for plugins (external programs, usually with *.dll extension)
An explanation on the file format for the sequence can be found under “How to add new sequences to the database”-chapter.
Recognition sequences database is accessible from the main panel of the program, as well as the DNA/amino acid sequence database:
Restriction database contains all restriction enzymes’ recognition sequences. The enzymes are sorted into 4 groups: 4 nt cutters, exact (no ambiguous nt) 6 nt cutters, all 6 nt cutters and cutters with more than 8 nt.
You can add the enzymes to the search list by clicking the “Add selected enzyme”. The enzymes in this list will be used in the restriction analysis.
DNA/amino acid sequence database contains all DNA files (file extension .prt) and amino acid files (.ami). Other file types will not be detected. To view the sequence, please click on “Show Sequence” button. In the appearing window it is possible to edit the sequence.
To add new DNA sequence...
From the program:
... go to Main Menu - Database - Import.. >> - DNA/amino acid sequence.
Enter the nucleotide sequence (all characters in file MUST BE IN LOWER CASE, NO SPACES, ONLY NUCLEOTIDES: ‘a’, ’c’, ’g’, ‘t’!).
Then click “save sequence”, specify that the sequence is DNA and click “add”-button. You can also add a description of the sequence. It will be then saved to the “protein.0” file in the database directory.
From the World Wide Web:
To add new DNA sequence from WWW (file should be in FASTA or GenBank format):
Main Menu - Database - Import >> - DNA from Web-Site. In appearing window enter path to import file with sequence. The name of sequence is the name of file under which it will appear in the database. The program will automatically find in the resource file the description of sequence (i.e. in case of GenBank this will be the sentence after the word "DEFINITION").
To add new amino acid sequence...
To add new restriction site...
... go to Main Menu - Database – Import >> - ... Restriction Site.
Fill the fields specifying the name of the restriction site and its sequence. Press 'Add'.
Open the “RESENZ.DAT” file in the DATABASE directory. Add the restriction site at the bottom of the file. Please do not change the existing “table” structure. The number at the end of the line specifies the cut position of the endonuclease.
If recognition sequence of the enzyme contains ambiguous bases then they are declared according to the IUB-IUPAC standard code (see 6.1.1. Basics).
1. To launch this tool go to: Main Menu – Analysis – Restriction Analysis.
2. In appearing window choose a DNA file to analyze.
3. Choose some promoter sequences to search for (TATA-box, Pribnow-box, -35-Region, or CAAT-box).
If these sequences are found, the program will calculate the approximate position of the transcription regions.
4. Choose whether to create a plasmid or linear DNA map.
5. Check the box “Display Enzymes with less than … cuts” to limit the number of enzymes which will be displayed on the restriction and DNA maps.
6. Click “Search for enzymes which do not cut the region”, then click on “Define” to define the region. The program will then make a special section in the report with a list of the enzymes which have their cutting sequences somewhere in the DNA, but NOT in the specified region.
7. Check the “Plot GC percentage on the map” to be able to see the GC content of every 10 bp block for the sequences under 7000 bp. If the sequence is longer, than the program will adjust the length of the fragment. The program will assign a specific color to every percentage level.
8. Check “Display ORFs on the map” box to be able to see the ORFs on the DNA map. You will have to define a minimal length (200 by default) and a frame, which is going to be analyzed (1, 2 or 3). The forward and reverse ORFs will be plotted on the map in different colors (by default: forward is green and reverse is red, but you can change the color in the options)
9. Click OK button
The program will count the bases and their percentage.
Melting temperature of the DNA will be calculated using the following formula:
Tm = (0.41*(nc+ng)/length)+59.9-600/length
“nc”, “ng” are the amounts of cytosine and guanine
“length” is the length of the DNA sequence
The program will search for restriction sites, which are stored in the search list (for this purpose click on the button 'View list…' to see the restriction sites database on the main panel, there you can manage it). The “main report” (NOT the restriction map) will be saved to the file you have previously defined from the analysis window.
If some restriction sites contain ambiguous bases (for example if the enzyme recognises either a A or a G at one position), they are being coded according to IUB-IUPAC standard code:
a A (Adenine)
c C (Cytosine)
g G (Guanine)
t T (Thymine)
r A or G
y C or T
s G or C
w A or T
k G or T
m A or C
b C, G or T
d A, G or T
h A, C or T
v A, C or G
n A, C, G or T
Please note that only restriction sites “positions” are given in your report, and not the cut positions of enzymes, as on the vector map.
If you have chosen to create a plasmid/linear DNA map, then a new window shall appear.
You can add annotations to the map by clicking “Add annotation”.
Then you specify whether to add a promoter/origin or a gene to the sequence.
Please note that the start positions are always lower than the stop positions and maximal value of the end nucleotide is the number of the last nucleotide of the DNA sequence.
You can also save your annotation list to a file, by clicking “save to file”. Before saving please note, that you need to specify your file’s extension (like “my_annotations.TXT”) to be able to load it again.
When loading a list from a file please be sure that none of the given positions extent the maximal length of the DNA.
MB does not check the values, so you can get bad results.
To change fonts please click on the “Fonts”-Menu item from the “Plasmid Map” window
Please remember that it absolutely necessary to have some annotation on the map, before changing their fonts properties.
From the window which will appear, you will be able to change sizes, styles and spaces between the characters only of the annotations. To change parameters of the restriction sites fonts, please go to “Options” Menu – “DNA Drawing” Tab.
If you get a problem that not all restriction sites are displayed then please increase the size of the vector map from the Menu “Options” - Tab “DNA Drawing” - “Plasmid Map Sizes”.
To view the GC palette and the colors definitions of the ORFs please go to: “Map – GC Palette and ORF Colors”. In the window it is possible to copy the GC palette to the clipboard for further editing in an external editor.
Users, who use MB DNA Analysis for the first time, may be a little bit confused about the appearance of the DNA map. So here is an explanation on the information, which such a map can contain:
Linear DNA map has got the same structure.
You can export the list of restriction sites and their cuts positions to Microsoft Excel.
A file named “cuts.map” in the “DATABASE” directory contains all the information. You have to open this file in Excel, choose “space” for the column separation. The data will be then imported and you can sort it afterwards at your wish.
Be aware that the cuts.map file is a temporary file. When you do a new analysis it will be overwritten.
To launch this tool go: Main Menu – Analysis – Protein Analysis.
In appearing window choose a DNA file to analyze.
Set the name of the report file.
Choose whether to make a prediction of the secondary structure (after Chou-Fasman, see below).
Check the “Calculate the codon usage only for ORFs of first frame)” to get the codon usage table only for the found ORFs. This will also display the translation of the ORFs in the “translation” results. Please set the minimal length of an ORF.
If not checked, the translation window will contain the translation of the entire sequence.
The main report with the codon usage table and amino acid count is always saved to the file, which you have previously defined in the starting window. Other reports (like translation window, secondary structure prediction) should be saved separately.
The amino acids are dived into 4 groups:
- hydrophobic: alanine,valine, phenylalanine, proline, methionine, isoleucine, leucine
- charged: aspartic acid, glutamic acid, lysine, arginine
- polar: serine, threonine, tyrosine, histidine, cysteine, asparagine, glutamine, tryptophan
- glycine: is only glycine
The program counts all these residues. It also calculates the molar mass of the protein and the maximal possible number of the disulfide bridges.
The graph, which will be displayed after the analysis process, shows the amino acid sequence with the “propensity” of being in one of the 3 known configurations.
A propensity of more than one means that the amino acid is more likely to be in a given configuration (in other words to be helix-, sheet- or loop-forming).
You can then apply the Chou-Fasman method to determine the configuration:
1.) A cluster of four helix-forming residues out of six contiguous residues will nucleate a helix. The helix segment propagates in both directions until the average value of propensity for the alpha structure falls below 1.00 for a tetrapeptide. The average values can be approximately determined from the graph. Proline however can only occur at the N-terminus of an alpha helix.
2.) A cluster of three beta sheet formers (propensity > 1) out of five contiguous residues nucleates a sheet. The sheet is propagated in both directions until the average value of propensity for the beta structure falls below 1.00 for a tetrapeptide.
3.) For regions containing both alpha and beta forming sequences, the overlapping region is predicted to be helical if its average value of propensity for the alpha structure is greater than the average value of propensity for the beta sheet.
You can use this method “manually” to determine the secondary structure from the propensity graph. The method itself is not implemented in the program due to its inaccuracy: its reliability is only 50%, in the best case 80%.
It is possible to search for forward first frame ORFs. All detected ORFs will be displayed in the translation window. The codon usage will be calculated only for the region of the DNA, which is occupied by the ORFs.
It is also possible to make a separate codon usage count for only one selected ORF. For this purpose, seek the “Extract ORF sequence” window, choose an ORF and click on the “Codon usage for the selected ORF” button. The extra window with a codon usage table has to be saved separately.
To launch this tool go: Main Menu – Analysis – Dot Plot.
Select two DNA or amino acid sequences, which you want to analyze.
MB has two dotplot algorithm: you can either analyze a sequence by giving the window and stringency parameters (by default) or make the program find the matching characters, trace till the first mismatch, if the length of this trace is longer or equal to the parameter you have specified in the “minimal line length”, then the prorgam will draw a line between the first and the last points of the track.
Hint: Click on the map or hold the cursor to display the current position.
To launch this tool go: Main Menu – Analysis – Molecular Weight Calculator.
In the appearing window choose the DNA sequence to analyze, set limits (start position, stop position), and choose whether to calculate molar weight for double-strand or for single-strand DNA.
The results will be saved to "[date].rep" file.
To launch this tool go to: Main Menu – Analysis – Find ORF.
Select the DNA file first, then choose whether you want to search for 6 (forward and reverse) or just 3 (forward) frames and the minimal length of a frame.
The program will list all the detected ORFs, please do not forget to save the results, if you wish to.
To launch this tool go to: Main Menu – Analysis – Calculate Isoelectric Point
Please select the amino acid sequence first, then click “OK”. The amino acid sequence should contain only one-letter-codes, no spaces. For “stop-codons” you may use “-“-sign.
File “test.ami” (name and extension may vary):
If you have only nucleotide sequence of peptide and you want to determine its pI, simply translate the sequence to amino-acids-sequence using “Sequence Translator”-tool in Main Menu – Extras. The file will be created with extension .ami, It will contain you AA-sequence.
Results will be shown in form of graph, by wish, it can be saved to Bitmap-file (*.bmp).
Please note: if the pI-value is not displayed in the results, you have to change some settings (see section “Configuration” – “Isoelectric Point”).
The algorithm of isoelectric point estimation is following (taken from David L. Tabb):
1.) MB counts the amino acid residues, which possibly affect the pI value: lysine, arginine, histidine (basic side chains), aspartic and glutamic acids, cysteine, tryrosine (acidic side chains). N- and C-termini affect also the total charge of the molecule.
2.) Charge determination.
Z (total charge) = Nterm + Cterm + K + R + H + D + E+ C+ Y
The letters with Nterm and Cterm represent the charges of every residue or terminus. The program assumes, that every residue is independent from another (approximation).
3.) pI determination. To calculate whether the group takes positive or negative charge is determined by its pK value. MB uses following data in “PK.DAT” data file in DATABASE directory:
The next step is the determination of concentration ratio (CR). For positive groups this is:
CR = 10pK-pH
For negative groups:
CR = 10pH-pK
Once the CR is generated, the partial charge (PC) is calculated using:
PC = CR/(CR+1)
The summation of the partial charges is done using the formula for Z (total charge, see above). For C-terminus, aspartic acid, glutamine, cysteine and tyrosine the PC is defined as negative, while charges from N-terminus, lysine, arginine and histidine are assumed to be positive.
The pI value is then the pH where the total charge of the molecule is zero.
You can edit “pk.dat” file as you wish, but please keep the table structure. We recommend to backup the file.
It is possible to calculate the melting point of a primer using the following formula:
Tm = 81.5 + Na + 0.41*(GC) - 675/length - 0.65*[FORMAMIDE]
Where Na = 16.6* log10([Na])
GC = GC Percentage in the sequence
[FORMAMIDE] = formamide concentration
The calculated primers’ combinations contain the highest number of complementary bases in a row.
You can also search for homology within a given sequence (template). This means, that the program will try to add the primer to specific positions of the template and check whether the hybridisation is possible. You can adjust the percent of complementary bases (80 by default).
Sequence translator tool can be used for translation of mRNA to 5'-3' DNA and 3'-5' DNA to 5'-3' DNA. You can also use it as “protein encoder” (translates the sequence of amino acid residues to 5'-3' DNA sequence, produces only one of many possible combinations though).
You can also translate a DNA sequence to an amino acid sequence. This will create an output file with AA-sequence in database directory with extension “*.ami”.
In each case you have to enter the name of the file in which the sequence will be written.
This is a library of amino acids.
You can choose amino acid by typing in DNA sequence or by clicking on the list.
The program will display the properties of a molecule and show it’s structure.
The data file for amino acids 'aa.dat' is found in your 'aa' directory. The jpg files of amino acids are also found in this directory.
Choose any DNA sequence from DATABASE directory and the program will read it for you.
Use it to explore the Internet!
External programs (plugins) are placed in a special folder called “PLUGINS”. They all have a *.DLL file extension and are automatically recondized by the program at the startup. If you are experiencing some problems starting the plugins, then please write me (see contact information).
Plugins have their own help files, which can be updated separately.
Use it to configure the display-form of the amino acid sequence (if yes in which form: full name (Alanine) or 3-letter name (ala) or, may be, 1-letter name (A).
[Count bases before restriction sites search]. By clicking this option you will enable the base count in "Restriction Site Analysis" feature.
[Open report files with Windows Notepad]. If you wish, the program will open the reports with NOTEPAD-program (if installed on your computer).
[Computer performance:]. Here you can choose between “high”, “middle” and “low”. You need to choose high level of performance, when you are analysing the sequence with more than 200 restriction sites in search list. The process will go a little bit slower, but the program wont crash. If you choose “low” instead, then the program wont be able to handle such a huge amount of sequences and you will get “Access violation error”.
Analysing sequences wit only 50 or so recognition sequences does not require huge arrays, so you can choose middle or low levels. If you still get “access violation”, then choose “high”.
By default the restriction sites and enzyme cuts mapping is enabled.
Maximal number of sites pro line of map means just the same thing as in “computer performance”. If you are analysing sequence with 200 enzymes, then the program needs to plot them somehow on the map. So it needs more memory to store the information. Enter the number of about 200. Note: huge numbers can cause loss of some seconds in analysis.
Maximal number of sites pro line map is 300. By default it is 50.
The url mentioned is the page for MB update.
[Set pH changes value (max 1):]. This option considers pI-calculations. The way, MB calculates pI-value is quite simple: The program calculates the total charge of peptide at given pH. Afterwards, pH-value is being enlarged by the number, which was set in advanced 2 options. The process lasts till pH reaches 13. If you want get a (relative) precise result, please enter the number of about 0.0001.
Please note: By entering larger numbers it may happen, that the pI would not be calculated, so its recommended to leave the default setting unchanged.
[Mark pI-point with [X] on the graph]. You can also choose, whether you want to display the green X-marker on the graph
The URL mentioned is the page for MB update.
Some setups will require proxy to be enabled. You can ask your system administrator about the required information (host name, port address). If the server requires authentication, it is necessary to define the username and password.
It is possible to switch on the automatic update reminder, which will remind you of possible updates in X-given days.
Define the color of the unique restriction sites on the DNA map.
[Restriction Font Size]
The size of restriction sites’ font
You can choose between normal, bold, bold and italic.
[Plasmid Map Sizes]
You can change the size of the plasmid map in order to see all of the restriction sites
Specify the color for the ORFs on the DNA map. By default the forward ORFs are green, the reverse ORFs are red.
Here you can manage your default windows printers.
To update the program:
go to : Main Menu - Help - Check for updates.
Please note: automatic update within the program is only available in the registered copy of the program (see section “Registration of the program”).
The program will then connect to the web-site and search for possible updates. To change the proxy settings please go to Main Menu “Options” – “Update” – “Proxy”. You can get the proxy information from your system administrator.
You will see following window from which you can choose which of the updates you want to download:
Choose which of the updates you want to download, click on the download button and follow the instructions.
The program can actually install the updates automatically. You can try this out, but sometimes it may not function properly. So here is the manual way of the installation:
After the download is finished, close the program and remember the following:
If you have chosen to download an update for the main program executables, then execute the .exe file from the UPDATE directory and enter the path to mb.exe (like C:\programs\mb)
If you have downloaded a newer version of ResEnz.DAT (restriction database), then copy this file from the UPDATE directory to the DATBASE directory overwriting the existing file.
If you have downloaded a plugin, then you can find it in the UPDATE directory. You can then install it to any other folder. The name of the plugin is saved in “plugins.installed” file, so the next time you start the automatic update it will not inform you that the already installed plugin is available for download.
In any case, please RESTART the program for the changes to take effect!
You can also download the updates from our web-site: http://www.molbiosoft.de following “Updates”-Link. But this will only list all possible downloads without providing you with the information, whether your version is up-to-date.
If you want MB to
remain a freeware program then please take some time to register it.
Registration of the program is absolutely free of charge. By registering your
copy you will help me in the further development of this software. You are free
to contribute your suggestions.
Please note, that the MB project is a work of one person (me), I have spent a lot of my spare time on it, so it is always nice to hear from the user.
· After the registration you will be automatically added to the mailing list
· You can receive technical support
· You will be able to search for new updates or plugins and automatically install them (“Check for updates” feature)
· You can contact the author and ask any question about the program
- Entering the registration key
After sending the registration form via the official homepage, you will get an email with a key (within a few seconds). Simply enter that key in right order in “Enter Key”-tab of “About”–box. Click ok. A conformation message should appear.
Please do not share this key with other people – it is for your personal use only.
If it happens that an error messages will be displayed after clicking “OK”-button, please contact me at firstname.lastname@example.org.
- “Access violation error” during restriction analysis: refer to 5. Configuration (“advanced options”) for detailed explanation
- “Wrong character, process will be terminated” and I/O error 103 during restriction sites search: wrong format of *.prt file you are analysing. Prt-files are DNA files, containing only nucleotide sequence. FORMAT: no spaces, all letters in lowercase, only letters in file are ‘a’,’g’,’c’,’t’.
- Not enough place for all annotations on a plasmid map: You may try to reduce the number of recognition sequences or the font size (see section “Configuration”) OR enlarge the map size parameters in the from menu “Options” – “DNA Drawing” – “Plasmid Map Sizes”.
Please report any bugs to email@example.com
You need to supply following information: your name, version number of the application (MB), text of the error message.
You can write me a letter at: firstname.lastname@example.org
For the latest news please visit my web-site: http://www.molbiosoft.de.
If you want to know, when new updates are available, please subscribe to the mailing list at: http://www.molbiosoft.de/mailing.htm
13. Recent changes
MB Version 6.82 changes:
· restriction analysis ends with a list of non-cutters
· user can now define a region and search for enzymes that cut the DNA but not the selected region
· added a link to the "Enter new sequence" window from the restriction/protein analysis menu
· improved and corrected user manual
· new plugin: helix.dll (analysis of alpha-helices)
· amino acid database: reported problems were fixed
· dot plot: window/stringency algorithm added
· added proxy support for the automatic update feature
View the whole program history at