Computation‎ > ‎

Unix Basics

UNIX Basics:

Characters with Special Meaning -

*
wildcard of any size
?
wildcard of only 1 character
.
a single period can be used to represent the current directory
>
this redirects the output of a command to a file defined after the >
|
pipe, this pipes a previous command to a new command (i.e. put two steps together)

Short Cuts -

<tab>
Type first unique characters and tab will complete file folder name, case sensitive
<control><A>
Jump cursor to beginning of the current command
<control><E>
Jump cursor to end of current command

Basic Command List -

ls
lists all folders and files in current directory
pwd
prints on screen current directory
    #Get the current folder without the full path
    echo "${PWD##*/}"

cd
allows you to move between directories
    Examples:    
            cd /misc
            goes to anything folder that is one level below the root directory. ONLY WORKS ONE LEVEL BELOW ROOT
            cd ..
            takes you up one directory level
            cd ../..
            
takes you up two directory levels
            cd ../misc
            
takes you up one directory and then down into the misc folder
            cd or cd ~
            takes you to your home directory
man
brings up manual for each command
    Examples:
            man ls
            manual for list command
            man cd
            
manual for cd command
            man man
            
manual for manual command
  ** <space> scrolls down, <b> go back page, <q> quits **
mkdir
make a new sub-directory in current directory
    Examples:
            mkdir -p Misc1/Misc2
            Creates two new directories, second inside the first.
rmdir
removes empty sub-directories
history
lists all commands typed in current session
mv
moves files or folders
    Usage: mv <file> <NewDirectory>
    Examples:
            mv file1 file2
            renames/overwrites file1 creating an exact duplicate called file2 (file1 no longer exists)
            mv file1 Misc2
            moves file1 to subdirectory Misc2 that exists in the current directory
rm
remove *USE WITH CAUTION*
    Usage: rm <file> (deletes file, IMMEDIATELY)
    Best Usage: rm -i <file> (yes/no, query before deletion)
    Remove all sub-folders: rm -rf <folder> (deletes indicated folder and any sub-folders or files it contains)
cp
copy function
    Usage: cp <source> <target>
    Examples:
            cp file1 file2
            creates duplicate of file1 called file2 in current directory
            cp file1../
            places a copy of the file one directory up the file tree
wc
word count (outputs the number of lines, words, and bytes in the file)
    Usage: wc <file>
    Examples:
            wc -l file.txt
            outputs the number of lines in the file
cut
cuts out data from a tab deliminated file
    Usage: cut <file>
    Examples:
            cut -f3 File.txt
            cuts out the third column of the file
sort
sorts lines alphanumerically
uniq
identifies the unique lines
cat
Spits the contents of a text file on the screen.  Can be used to open a file to pass to another operation or to concatenate files together
    # Watch out for the Useless Use of Cat Award UUCA (ie. cat file.txt | cut -f3 versus cut -f3 file.txt)
    Examples:    cat file1.txt file2.txt file3.txt > merged_file.txt
                        # Creates a new file consisting of all file1.txt rows followed by those in file2.txt and then those in file3.txt
less
To view a longer text file, use the less program, which allows you to page forward through the file using the <SPACE> bar and page backward using the b key. When you are done looking at the file, type q to quit less.
head
To view the top of a file (default first 10 rows/lines)
    Examples:    head -n 50 file.txt
                        #shows the first 50 rows/lines of the file
tail
To view the end of a file (default last 10 rows/lines
    Examples:    tail -n 75 file.txt
                        #shows the last 75 rows/lines of the file
grep
search function
    Examples:    grep "Query" file.txt
                        #outputs rows/lines that contain the string Query (row/line specific not column specific)
                        grep -w "Query" file.txt
                        #outputs rows/lines containing the query as a whole word (ie finds query but not labqueryresult)
sed
replace function
    Examples:
            sed 's/Old/New/g' File.txt
            substitutes/replaces Old with New in File.txt

bc
calculator function ( add (+) ; subtract (-) ; multiply (*) ; divide (/) ; power (^) ; square-root [ sqrt(value) ]
    Examples:
            > echo 10/5 | bc
            2
            > echo 5/10 | bc
            0 [THIS IS WRONG, YOU NEED TO TELL THE CALCULATOR THE DECIMALS TO RETURN]
            > echo " scale=3 ; 5/10" | bc
            .500

paste
joins files/columns together
file1.txt
Jonathan
Kristi

file2.txt
Keats
Allen

paste file1.txt file2.txt
Jonathan    Keats
Kristi    Allen

Useful Unix One Liners I've Come Across:

# Convert a tab-delimitated (.txt) file saved in Excel on a Mac to a Unix formated tab-delimitated so it can be manipulated with Unix apps
tr '\r' '\n' < MacExcelFormat.txt > UnixFormat.txt
# Or alternatively...
cat MacExcelFormat.txt | tr '\r' '\n' > UnixFormat.txt

#Better solution
dos2unix -c mac MacExcelFormat.txt

# Replace some feature in a file with a tab-delimiter (There are a number of possible solutions but this one seems to work on both Mac and Unix machines)
sed 's/featureToReplace/\'$'\t''/g'
awk '{gsub("featureToReplace","\t",$0); print;}'

# Convert a single column file into a single line with space delimination
cat File_with_Single_Column_List | xargs > Single_Line_File_Space_Deliminated

# Replace ENSG with Gene BUT ONLY only on the first line
sed '1 s/ENSG/Gene/' input > output

# Edit header lines of a fasta file (Not complete sure how this works but I read it as "Replace all characters on a line after a ">" until an "E" is encountered"
Example.txt
>123 chr1:1234-5678 ENST0123456789
AGCTAGCT

sed 's/[^=>]*E/E/' Example.txt


>ENST0123456789
AGCTAGCT

# Extract every 4th line starting on line 2 of a file (ie. extract the read sequence from a fastq file)
sed -n '2~4p' data.fastq

# Remove the 7th column from a file with 20 columns (Use 8- instead of 8-20, in case you don't know the last column number, as it grabs all after 8)
cut -f1-6,8- infile.txt > outfile.txt

# Copy all files with a specific extension from all sub-folders to a single folder
find . -name "*.pdf" -exec cp {} /target/directory/ \;

# Move all files with a specific extension from all sub-folders to a single folder
find . -name "*.fastq" -exec mv {} /target/directory/ \;

# Find all files with a specific extension that are NOT symbolic links (-type f, forces to regular files only)
find . -name "*S4U_*bwa.final.bam" -type f

# Copy all files with specific extension and associated folder structure to new location maintaining the folder structure
rsync -av --include '*.pdf' --include '*/' --exclude '*' current/directory destination/directory


#Copy Illumina BCL conversion stats to another directory, keep only the fastq.gz from the demultiplexed samples excluding all other files and the undetermined_indices fastq.gz files
rsync -vaiz --exclude '*/Undetermined_indices/*' --include '*.fastq.gz' --include '*/' --exclude '*' --prune-empty-dirs /Source/Directory /Target/Directory

# Find lines of a file that contain a line from a second file (ie.  Find trainees of Bergsagel that trained at Mayo Clinic Arizona)
Bergsagel_Trainees.txt
Keats
Trudel

Mayo_Clinic_Arizona_Trainees.txt
Chng
Schop
Keats
Tiedemann
Sebag
Braggio
Henry

This command works but can be very slow as the size of the two files increases
grep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt
This command is much faster (30 seconds versus 3 hours+)
fgrep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt

results.txt
Keats

# Find the unique entries in a column of a file
cut -f3 MyFile.txt | sort | uniq > Myfile2.txt
This reads file MyFile.txt and finds the third tab-deliminated column and cuts it out to pass to the sort command which is passed to the uniq command to find unique entries and prints the results to Myfile2.txt
NOTE: You have to do the sort command as uniq must only work in linear relationship (ie. Jon, Jon, Esteban, Jon, Rodger, Rodger, Esteban) outputs (Jon, Esteban, Jon, Rodger, Esteban) 

# Change all letters in a file from upper-case to lower-case
cat example.fa | tr 'A-Z' 'a-z' > example2.txt
This reads file example.fa sends it to the tr "transinterate" command which changes upper case to lower case then sends it to an output file called example2.txt

Alphabetize Allele Calls

sed 's/CA/AC/g' MCA0554_chr3_filter0_minalign1800_maq20.txt | sed 's/GA/AG/g' | sed 's/GC/CG/g' | sed 's/TA/AT/g' | sed 's/TC/CT/g' | sed 's/TG/GT/g' > MCA0554_orderd.txt

# AWK is Awesome!! (see the awk page, but learn it I would have done my post-doc in half the time.... doh...)
awk '{print $1, $2, $3, $3+60, "DUMMY", $4, $5, $6, $7, $8}' file.txt > output.txt
This reads in file.txt and prints out a new file called output.txt that contains colums 1,2,3,4,5,6,7,8 from the original file. It also adds new columns 4 (column 3 value + 60) and 5 (Dummy)




Comments