Build A MAC (Mavericks - OS X 10.9.5)

There are a number of steps that need to be done to take a newly purchased Mac Computer to a fully functioning analysis machine. We have previously provided detailed instructions on seqanswers but will use this going forward to track new builds as Mac OS versions change.

Build Mac OS X 10.9.5 computer (Nov 2014)

1) Setup Basic Environment

Make a Local Directory and Binary Directory

# Open a new terminal window, this ensures the current directory is your $HOME directory

mkdir -p local/bin

Create a Profile and set $HOME/local/bin as a path directory

# Open profile file (a hidden file that tells terminal what to do)

vim .profile

# add the following lines to profile (press "I" to enter insert mode) [see vim page on vim usage]

# This adds the local/bin directory to the path
export PATH=$HOME/local/bin:$PATH
# This alias allows you to type "ll" in the terminal to execute "ls -l"
alias ll='ls -l'
# These alias's are examples that open connections to a computation resource (standard or ftp connections shown)

# save updated profile (press "esc" then ":w" then ":q")

2) Install Xcode

This is software provided by Apple that is used for building applications for Mac computers or iOS devices. It is an excellent code editor but more importantly it contains the application "make" that you will need to install most ngs applications.

Open the APP STORE application on your mac

Search for Xcode (current version is v6.1, that is compatible with OS X 10.9.4+)

Download Xcode (it can take a long time since its 2.4 Gb)

Click on "Launchpad" and open the Xcode application

Flow the prompts to complete the primary install

# Open a new terminal window and enter the following command

This will open a dialogue window that will ask if you want to install the command line tools, select "Install"

xcode-select --install
alias computer='ssh jkeats@computer.tgen.org'
alias cftp='sftp jkeats@computer.tgen.org'

3) Update JAVA version to 1.7

Many of the tools you will use leverage JAVA and some need the 1.7 JDK to be available. To determine which version you have currently do the following

# Open a new terminal window
# Type the following
java -version
# Check to the version output
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

Assuming you have not already update the java version for another reason you will need to download an updated version from Oracle. To download search google for "MAC Java Development Kit". I followed the "JAVA SE Development Kit 7 - Downloads | Oracle" link to download the "jdk-7u71-macosx-x64.dmg" file. To install, first accept the license at the top and click on the file to initiate the download. Once complete, click on the download, then click on the install package and follow the prompts. Then close all terminal windows and ensure the program is quit, then open a new window and check the version.

java -version
# The output should now indicate the updated 1.7 version is available as default
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

4) Install R

R is free software that supports a multitude of statistical and graphic applications. To download, search google for "R" and follow the links to a CRAN download page of your choosing.

Download by clicking the "Download R for (Mac) OS X" link to the pre-compiled binary. (current version is R-3.1.2, Pumpkin Helmet, 2014-10-31).

To install open the install package (R-3.1.2-mavericks.pkg) and follow the prompts.

Once installed open the application (Applications>R)

Then install the following using the Package Installer (Packages & Data > Package Installer).

NOTE: It is good habit to select the "Install Dependencies" radio button

CRAN (binaries)

ggplot2

mass

scales

gdata

gridExtra

Hmisc

vennDiagram, also (bvenn, colorfulVennPlot, eVenn, venneuler)

psych

PBSmapping

data.table

maptools

maps

Bioconductor (binaries)

DESeq

DESeq2

DNAcopy

5) Install Rstudio

Rstudio is a nice graphical interface for using R. It made it much less scary to use R for making graphs as you get instantaneous feed back. To download search google for "R studio" and follow the links to download the Open Source version of RStudio Desktop.

To install, click on the download (RStudio-0.98.1091.dmg) and drag the RStudio application to your applications folder.

6) Install/Update Python

Macs ship with python 2.7 but if you are starting out it is not a bad idea to update to python 3.x. To download the newer version google "python" and follow the links to download the "Mac OS X 64-bit/32-bit installer".

To install, click on the install package (python-3.4.2-macosx10.6.pkg).

NOTE/WARNING: Several essential Mac applications use python 2.7, so you need to leave it as default but you can easily use the 3.4 environment

Open the python interpreter in the terminal

python3

Exit the interpreter

exit()

In a script use the following header line

#!/usr/bin/env python3
OR CALL SCRIPT WITH
python3 YourSupperCoolScript.py

There are a number of very valuable packages you might want for scientific computing included in the scipy.org library such as numpy and pandas. They can be installed using pip as follows:

pip3.4 install numpy
pip3.4 install ipython
pip3.4 install scipy
pip3.4 install matplotlib
pip3.4 install sympy
pip3.4 install pandas
pip3 install jupyter
# samtools in python
pip3.4 install pysam

7) Install MacPorts

There are many unix commands that do not come pre-loaded on Macs like dos2unix, wget or md5sum. To install these and many other tools MacPorts is an excellent resource. To download, google "MacPorts" and follow the "Installing MacPorts" instructions for your OS version (Make sure Xcode is available).

To install, click on the download (MacPorts-2.3.3-10.9-Mavericks.pkg)

Test the install by opening a new terminal window and typing the following

port version

Assuming this produced the version not "command not found" you are ready to install needed ports as follows in a terminal window:

sudo port install wget
sudo port install dos2unix
sudo port install md5deep
sudo port install gawk
sudo port install cairo        ## Used to install Pairoscope

sudo port install doxygen ## Used to install Pairoscope

sudo port install cmake ## Used to install Pairoscope

8) Install IGV

Integrated Genomics Viewer (IGV), is the defacto standard for a light weight GUI application that allows you to visualize a multitude of genomic data formats. There are several version available on the IGV website (http://www.broadinstitute.org/igv/). We generally use the binary version as it can be configured for your exact system unlike the Java Web Start versions. The file can be downloaded through your web browser, but since we have now installed "wget" we will do it from the command line.

Right click on the binary download link (you may have to login first) and select "copy link address"

# Open a new terminal window
cd local
wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.3.40.zip    ## This was the version at time of writing
unzip IGV_2.3.40.zip
cd IGV_2.3.40
# Update the igv.command file to set Xmx2000m to the appropriate value for your system
# Xmx is the maximum amount of memory you will allow IGV to use, I typically recommend 50-75% of the available RAM
vim igv.command
exec java -Xmx4000m
# Now add an alias to your .profile so you can open IGV by typing IGV into the terminal
vim ~/.profile
# Add the following alias
alias IGV='$HOME/local/IGV_2.3.40/igv.command'
# Now to start IGV open a terminal window and type:
IGV

9) Install Sequence Analysis Tools

a) Samtools - This is the base toolset for most manipulations of sequence/binary alignment maps (SAM/BAM)

# Download the current version of from (www.htslib.org)

# Right click on the download and select "copy link address", then use wget in the terminal to download

# Open a new terminal window
cd local
# You might need to delete "/download" from the end of the link for proper function
wget http://sourceforge.net/projects/samtools/files/samtools/1.1/samtools-1.1.tar.bz2
# Uncompress the package
tar xvjf samtools-1.1.tar.bz2
# Build the application and add to you $PATH folder $HOME/local
cd samtools-1.1
make
make prefix=$HOME/local install

b) Bcftools - This package is part of the samtools distribution. It is required for variant calling with samtools

# Download the current version of from (www.htslib.org)

# Right click on the download and select "copy link address", then use wget in the terminal to download

cd ~/local
wget http://sourceforge.net/projects/samtools/files/samtools/1.1/bcftools-1.1.tar.bz2
tar xvjf bcftools-1.1.tar.bz2
cd bcftools-1.1
make
make prefix=$HOME/local install

c) Htslib - This package is part of the samtools distribution and installing it will make tabix and bgzip available

# Download the current version of from (www.htslib.org)

# Right click on the download and select "copy link address", then use wget in the terminal to download

cd ~/local
wget http://sourceforge.net/projects/samtools/files/samtools/1.1/htslib-1.1.tar.bz2
tar xvjf htslib-1.1.tar.bz2
cd htslib-1.1
make
make prefix=$HOME/local install

d) BWA - This is likely the most heavily used and broadly supported next-generation sequencing aligner. To download search google for "bwa aligner" and follow the links to download the current version

# Right click on the download and select "copy link address", then use wget in the terminal to download

cd ~/local
wget http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.10.tar.bz2
tar xvjf bwa-0.7.10.tar.bz2
cd bwa-0.7.10
make
cp bwa ~/local/bin

e) Picard - This set of JAVA applications provides a series of essential tools required for a multitude of sequencing analysis steps. To download, search google for "picard tools" and follow the "Latest Download" link to download the current version

# Right click on the download and select "copy link address", then use wget in the terminal to download

cd ~/local
wget https://github.com/broadinstitute/picard/releases/download/1.125/picard-tools-1.125.zip
unzip picard-tools-1.125.zip
cd picard-tools-1.125
cp picard.jar ~/local/bin/

NOTE: This example version is dramatically different than previous versions as it adopts a GATK like command format as all the applications are embeded in a single .jar file as opposed to independent .jar files for each application seen in previous versions. Clearly this will only work with updated pipelines.

f) GATK - This JAVA application contains a number of "best-practices" applications for post-processing alignment files. It also has a large number of additional applications that you will use frequently. To download, search google for "GATK tools" and follow the download links. Since the Broad Institute has commercialized the toolset with Appistry you will need to first register/login to accept the usage agreement. If you are using it for non-commercial research purposes you can use the free version.

#Due to the license restrictions the only download mechanism is via your web browser. By default the file lands in your "Downloads" folder.

cd ~/Downloads
# To see the actual file name, and sort the file, should be most recent download to the bottom, so you can easily see it
ls -lhtr
mv GenomeAnalysisTK-3.3-0.tar.bz2 ~/local/
cd ~/local
tar xvjf GenomeAnalysisTK-3.3-0.tar.bz2
# For some reason they don't create a versioned subfolder, Argh...
mkdir GenomeAnalysisTK-3.3-0
mv GenomeAnalysisTK.jar GenomeAnalysisTK-3.3-0/
mv resources/ GenomeAnalysisTK-3.3-0/
cd GenomeAnalysisTK-3.3-0
cp GenomeAnalysisTK.jar ~/local/bin/

g) snpEff/snpSift - These are two very handy tools. snpEff will provide annotations for your genome of interest to any VCF file you create. snpSift, can be used to manipulate VCF files for filtering, selection, and conversion to flat files (excel, text.txt, etc....). To download, google "snpeff" and follow the links to download page (http://sourceforge.net/projects/snpeff/files/).

# Right click on the most recent download, I don't do the "latest" as I like to have the versioned download, then select "copy link address".

cd ~/local
wget http://sourceforge.net/projects/snpeff/files/snpEff_v4_0_core.zip
unzip snpEff_v4_0_core.zip
# The download is versioned but the unzipped folder is NOT, so change the folder name to include a version (MAKE SURE TO VERSION CORRECTLY)
mv snpEff snpEff-4.0
cd snpEff-4.0
cp *.jar ~/local/bin

# Now that you have them installed you will need to configure snpEff (we are assuming you are still in the snpEff folder)

# Open the config file and update the data.dir line from (data.dir = ./data/   TO   data.dir = ~/local/snpEff-4.0/data/)
vim snpEff.config
# Find the data.dir line and update it, this makes sure the data folders are connected to the version you are using.
data.dir = ~/local/snpEff-4.0/data/

# Now you are ready to download the appropriate database(s) for you work (we are assuming you are still in the snpEff folder)

# There a thousands of genome and annotation versions available, we use the GRCh37 reference with ensembl 74 annotations
# Check the snpEff website for additional instructions if needed
# WARNING - This can take a long time and there is no progress window so be patient!!
java -jar snpEff.jar download GRCh37.74

h) Circos - This tool is used to produce the fancy circular diagrams that show mutations, structural differences, and copy number differences. You can fine tune these plots to the nth degree. Unfortunately, this is one of the most difficult programs to get installed but the documentation is excellent making it well worth the effort. To download, google "circos plot" or follow the link to (circos.ca) and follow the links to download the core distribution and tools.

#Right click on the most recent core version and select "copy link address", then paste after wget in terminal to download

cd ~/local/
wget http://circos.ca/distribution/circos-0.67-3.tgz
tar xvzf circos-0.67-3.tgz
# Now for the fun part of getting everything setup...  I promise it works
cd circos-0.67-3/bin/
# Step1 - Update the "circos" file to point to the correct environment directory change (#!/bin/env perl  To  #!/usr/bin/env perl)
# make a backup copy
cat circos > OLD_circos
# Open the circos file to edit in vim
vim circos
# Change the first line to
#!/usr/bin/env perl
# Now that you have saved the updated version test to determine which required modules are not available
./circos -module
## This is the output that was produced on my machine
ok       1.26 Carp
missing            Clone
missing            Config::General
ok    3.39_02 Cwd
ok   2.135_06 Data::Dumper
ok       2.51 Digest::MD5
ok       2.84 File::Basename
ok    3.39_02 File::Spec::Functions
ok       0.22 File::Temp
ok       1.51 FindBin
missing            Font::TTF::Font
missing            GD
missing            GD::Polyline
ok       2.38 Getopt::Long
ok       1.16 IO::File
missing            List::MoreUtils
ok       1.25 List::Util
missing            Math::Bezier
ok      1.997 Math::BigFloat
missing            Math::Round
missing            Math::VecStat
ok       1.02 Memoize
ok       1.30 POSIX
missing            Params::Validate
ok       1.51 Pod::Usage
missing            Readonly
missing            Regexp::Common
missing            SVG
missing            Set::IntSpan
missing            Statistics::Basic
ok       2.34 Storable
ok       1.16 Sys::Hostname
ok       2.02 Text::Balanced
missing            Text::Format
ok       1.9725 Time::HiRes
## Now to work on installing all the missing modules using the CPAN downloader in the terminal
sudo perl -MCPAN -e shell
install GD
# You will be asked if you want to install dependancies as it goes, I answered "yes" to all of these queries
install Clone
install Config::General
install Font::TTF::Font
install List::MoreUtils
install Math::Bezier
install Math::Round
install Math::VecStat
install Params::Validate
install Readonly
install Regexp::Common
install SVG
install Set::IntSpan
install Statistics::Basic
install Text::Format
exit
## Now test if all the required modules are now available
./circos -module
# If all went well each and every module should now be marked as 'ok'

#Download the circos tools

cd ~/local/
wget http://circos.ca/distribution/circos-tools-0.20.tgz
tar xvzf circos-tools-0.20.tgz

#Download the tutorials and Test the Install

cd ~/local/
wget http://circos.ca/distribution/circos-tutorials-0.67.tgz
tar xvzf circos-tutorials-0.67.tgz
cd circos-tutorials-0.67/tutorials/2/2/
~/local/circos-0.67-3/bin/circos -conf circos.conf

You should see a series of messages print to the terminal screen. If you navigate to the folder using the finder window you should see a new file called "circos.png". Open the image file and check to ensure it produced a circular image of each human chromosome with each chromosome in a different color.

#To make it easier to use circos we will add an alias to our profile

# Open your profile and add an alias to the circos binary
vim ~/.profile
alias CIRCOS='$HOME/local/circos-0.67-3/bin/circos'
# Close the terminal application and reopen to test the alias function
cd circos-tutorials-0.67/tutorials/2/2/
CIRCOS -conf circos.conf

As our pipeline developer would say... "Much Success!!"

i) Pairoscope - This one was painful to sort out... yet again. But if you follow these instructions it should work well. To access the download you will use "git" but to see how we got to things you can access the Wash U tools website (http://tvap.genome.wustl.edu/tools/) and follow the links to the pairoscope download.

cd ~/local/
git clone https://github.com/genome/pairoscope.git
mkdir pairoscope/build
cd pairoscope/build
cmake ../
make -j
# If you get an error like I did doing this I can save you a day of effort if you follow these steps
cd vendor/src/gtest160/include/gtest/internal
# Now we need to edit the "gtest-port.h" file to change a 1 to a 0 in the middle of the file (I have no idea why this works but it does)
vim gtest-port.h
# You will need to edit line number 437 from  # define GTEST_HAS_TR1_TUPLE 1   TO    # define GTEST_HAS_TR1_TUPLE 0
# To view line numbers
:set nu
# To navigate straight to line 437
:437
# Enter edit mode by typing "i" and then edit the line so it looks like:
# define GTEST_HAS_TR1_TUPLE 0
# Save the changes, you might have to force the save after you exit edit mode by hitting <esc>
:w!
:q
# Okay now lets try it again
cd ~/local/pairoscope/build/
make -j
# Assuming that worked, now move the binary to the local/bin directory
cd bin
cp pairoscope ~/local/bin

j) FASTQC - This is an excellent program for visually checking the quality of your sequencing reads before alignment

cd ~/local/
wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.2.zip
unzip fastqc_v0.11.2.zip
cd FastQC
chmod +x fastqc
# Add a symbolic link to the $HOME/local/bin $PATH directory so you can call the application easily
ln -s ~/local/FastQC/fastqc ~/local/bin/fastqc

k) BedTools - This program is very useful for manipulating and counting

cd ~/local/
wget https://github.com/arq5x/bedtools2/releases/download/v2.22.0/bedtools-2.22.0.tar.gz
tar xvzf bedtools-2.22.0.tar.gz
cd bedtools2
make
cd bin
cp bedtools ~/local/bin

l) SeqTK - This program allows you to manipulate fasta and fastq files

cd ~/local/
git clone https://github.com/lh3/seqtk.git
cd seqtk
make
cp seqtk ~/local/bin
  1. Bowtie
  2. Tophat
  3. STAR
  4. HT-Seq
  5. FeatureCounts
  6. Sailfish
  7. FASTX
  8. SAMBLASTER (LINUX ONLY?)