Unix Basics

UNIX Basics:

Characters with Special Meaning -

*

wildcard of any size

?

wildcard of only 1 character

.

a single period can be used to represent the current directory

>

this redirects the output of a command to a file defined after the >

|

pipe, this pipes a previous command to a new command (i.e. put two steps together)

Short Cuts -

<tab>

Type first unique characters and tab will complete file folder name, case sensitive

<control><A>

Jump cursor to beginning of the current command

<control><E>

Jump cursor to end of current command

Basic Command List -

ls

lists all folders and files in current directory

pwd

prints on screen current directory

    #Get the current folder without the full path
    echo "${PWD##*/}"

cd

allows you to move between directories

Examples:

cd /misc

goes to anything folder that is one level below the root directory. ONLY WORKS ONE LEVEL BELOW ROOT

cd ..

takes you up one directory level

cd ../..

takes you up two directory levels

cd ../misc

takes you up one directory and then down into the misc folder

cd or cd ~

takes you to your home directory

man

brings up manual for each command

Examples:

man ls

manual for list command

man cd

manual for cd command

man man

manual for manual command

** <space> scrolls down, <b> go back page, <q> quits **

mkdir

make a new sub-directory in current directory

Examples:

mkdir -p Misc1/Misc2

Creates two new directories, second inside the first.

rmdir

removes empty sub-directories

history

lists all commands typed in current session

mv

moves files or folders

Usage: mv <file> <NewDirectory>

Examples:

mv file1 file2

renames/overwrites file1 creating an exact duplicate called file2 (file1 no longer exists)

mv file1 Misc2

moves file1 to subdirectory Misc2 that exists in the current directory

rm

remove *USE WITH CAUTION*

Usage: rm <file> (deletes file, IMMEDIATELY)

Best Usage: rm -i <file> (yes/no, query before deletion)

Remove all sub-folders: rm -rf <folder> (deletes indicated folder and any sub-folders or files it contains)

cp

copy function

Usage: cp <source> <target>

Examples:

cp file1 file2

creates duplicate of file1 called file2 in current directory

cp file1../

places a copy of the file one directory up the file tree

wc

word count (outputs the number of lines, words, and bytes in the file)

Usage: wc <file>

Examples:

wc -l file.txt

outputs the number of lines in the file

cut

cuts out data from a tab deliminated file

Usage: cut <file>

Examples:

cut -f3 File.txt

cuts out the third column of the file

sort

sorts lines alphanumerically

uniq

identifies the unique lines

cat

Spits the contents of a text file on the screen. Can be used to open a file to pass to another operation or to concatenate files together

# Watch out for the Useless Use of Cat Award UUCA (ie. cat file.txt | cut -f3 versus cut -f3 file.txt)

Examples: cat file1.txt file2.txt file3.txt > merged_file.txt

# Creates a new file consisting of all file1.txt rows followed by those in file2.txt and then those in file3.txt

less

To view a longer text file, use the less program, which allows you to page forward through the file using the <SPACE> bar and page backward using the b key. When you are done looking at the file, type q to quit less.

head

To view the top of a file (default first 10 rows/lines)

Examples: head -n 50 file.txt

#shows the first 50 rows/lines of the file

tail

To view the end of a file (default last 10 rows/lines

Examples: tail -n 75 file.txt

#shows the last 75 rows/lines of the file

grep

search function

Examples: grep "Query" file.txt

#outputs rows/lines that contain the string Query (row/line specific not column specific)

grep -w "Query" file.txt

#outputs rows/lines containing the query as a whole word (ie finds query but not labqueryresult)

sed

replace function

Examples:

sed 's/Old/New/g' File.txt

substitutes/replaces Old with New in File.txt

bc

calculator function ( add (+) ; subtract (-) ; multiply (*) ; divide (/) ; power (^) ; square-root [ sqrt(value) ]

Examples:

> echo 10/5 | bc

2

> echo 5/10 | bc

0 [THIS IS WRONG, YOU NEED TO TELL THE CALCULATOR THE DECIMALS TO RETURN]

> echo " scale=3 ; 5/10" | bc

.500

paste

joins files/columns together

file1.txt

Jonathan
Kristi

file2.txt

Keats
Allen

paste file1.txt file2.txt

Jonathan    Keats
Kristi    Allen

Useful Unix One Liners I've Come Across:

# Convert a tab-delimitated (.txt) file saved in Excel on a Mac to a Unix formated tab-delimitated so it can be manipulated with Unix apps

tr '\r' '\n' < MacExcelFormat.txt > UnixFormat.txt

# Or alternatively...

cat MacExcelFormat.txt | tr '\r' '\n' > UnixFormat.txt
#Better solution

dos2unix -c mac MacExcelFormat.txt

# Replace some feature in a file with a tab-delimiter (There are a number of possible solutions but this one seems to work on both Mac and Unix machines)

sed 's/featureToReplace/\'$'\t''/g'
awk '{gsub("featureToReplace","\t",$0); print;}'

# Convert a single column file into a single line with space delimination

cat File_with_Single_Column_List | xargs > Single_Line_File_Space_Deliminated

# Replace ENSG with Gene BUT ONLY only on the first line

# Edit header lines of a fasta file (Not complete sure how this works but I read it as "Replace all characters on a line after a ">" until an "E" is encountered"

Example.txt
>123 chr1:1234-5678 ENST0123456789
AGCTAGCT
sed 's/[^=>]*E/E/' Example.txt
>ENST0123456789
AGCTAGCT

# Extract every 4th line starting on line 2 of a file (ie. extract the read sequence from a fastq file)

sed -n '2~4p' data.fastq

# Remove the 7th column from a file with 20 columns (Use 8- instead of 8-20, in case you don't know the last column number, as it grabs all after 8)

cut -f1-6,8- infile.txt > outfile.txt

# Copy all files with a specific extension from all sub-folders to a single folder

find . -name "*.pdf" -exec cp {} /target/directory/ \;

# Move all files with a specific extension from all sub-folders to a single folder

find . -name "*.fastq" -exec mv {} /target/directory/ \;

# Find all files with a specific extension that are NOT symbolic links (-type f, forces to regular files only)

# Copy all files with specific extension and associated folder structure to new location maintaining the folder structure

rsync -av --include '*.pdf' --include '*/' --exclude '*' current/directory destination/directory

#Copy Illumina BCL conversion stats to another directory, keep only the fastq.gz from the demultiplexed samples excluding all other files and the undetermined_indices fastq.gz files

rsync -vaiz --exclude '*/Undetermined_indices/*' --include '*.fastq.gz' --include '*/' --exclude '*' --prune-empty-dirs /Source/Directory /Target/Directory

# Find lines of a file that contain a line from a second file (ie. Find trainees of Bergsagel that trained at Mayo Clinic Arizona)

Bergsagel_Trainees.txt

Keats
Trudel

Mayo_Clinic_Arizona_Trainees.txt

Chng
Schop
Keats
Tiedemann
Sebag
Braggio
Henry

This command works but can be very slow as the size of the two files increases

grep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt

This command is much faster (30 seconds versus 3 hours+)

fgrep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt

results.txt

Keats

# Find the unique entries in a column of a file

cut -f3 MyFile.txt | sort | uniq > Myfile2.txt

This reads file MyFile.txt and finds the third tab-deliminated column and cuts it out to pass to the sort command which is passed to the uniq command to find unique entries and prints the results to Myfile2.txt

NOTE: You have to do the sort command as uniq must only work in linear relationship (ie. Jon, Jon, Esteban, Jon, Rodger, Rodger, Esteban) outputs (Jon, Esteban, Jon, Rodger, Esteban)

# Change all letters in a file from upper-case to lower-case

cat example.fa | tr 'A-Z' 'a-z' > example2.txt

This reads file example.fa sends it to the tr "transinterate" command which changes upper case to lower case then sends it to an output file called example2.txt

Alphabetize Allele Calls

sed 's/CA/AC/g' MCA0554_chr3_filter0_minalign1800_maq20.txt | sed 's/GA/AG/g' | sed 's/GC/CG/g' | sed 's/TA/AT/g' | sed 's/TC/CT/g' | sed 's/TG/GT/g' > MCA0554_orderd.txt

# AWK is Awesome!! (see the awk page, but learn it I would have done my post-doc in half the time.... doh...)

This reads in file.txt and prints out a new file called output.txt that contains colums 1,2,3,4,5,6,7,8 from the original file. It also adds new columns 4 (column 3 value + 60) and 5 (Dummy)

sed '1 s/ENSG/Gene/' input > output
find . -name "*S4U_*bwa.final.bam" -type f
awk '{print $1, $2, $3, $3+60, "DUMMY", $4, $5, $6, $7, $8}' file.txt > output.txt