Unix Basics
UNIX Basics:
Characters with Special Meaning -
*
wildcard of any size
?
wildcard of only 1 character
.
a single period can be used to represent the current directory
>
this redirects the output of a command to a file defined after the >
|
pipe, this pipes a previous command to a new command (i.e. put two steps together)
Short Cuts -
<tab>
Type first unique characters and tab will complete file folder name, case sensitive
<control><A>
Jump cursor to beginning of the current command
<control><E>
Jump cursor to end of current command
Basic Command List -
ls
lists all folders and files in current directory
pwd
prints on screen current directory
#Get the current folder without the full path
echo "${PWD##*/}"
cd
allows you to move between directories
Examples:
cd /misc
goes to anything folder that is one level below the root directory. ONLY WORKS ONE LEVEL BELOW ROOT
cd ..
takes you up one directory level
cd ../..
takes you up two directory levels
cd ../misc
takes you up one directory and then down into the misc folder
cd or cd ~
takes you to your home directory
man
brings up manual for each command
Examples:
man ls
manual for list command
man cd
manual for cd command
man man
manual for manual command
** <space> scrolls down, <b> go back page, <q> quits **
mkdir
make a new sub-directory in current directory
Examples:
mkdir -p Misc1/Misc2
Creates two new directories, second inside the first.
rmdir
removes empty sub-directories
history
lists all commands typed in current session
mv
moves files or folders
Usage: mv <file> <NewDirectory>
Examples:
mv file1 file2
renames/overwrites file1 creating an exact duplicate called file2 (file1 no longer exists)
mv file1 Misc2
moves file1 to subdirectory Misc2 that exists in the current directory
rm
remove *USE WITH CAUTION*
Usage: rm <file> (deletes file, IMMEDIATELY)
Best Usage: rm -i <file> (yes/no, query before deletion)
Remove all sub-folders: rm -rf <folder> (deletes indicated folder and any sub-folders or files it contains)
cp
copy function
Usage: cp <source> <target>
Examples:
cp file1 file2
creates duplicate of file1 called file2 in current directory
cp file1../
places a copy of the file one directory up the file tree
wc
word count (outputs the number of lines, words, and bytes in the file)
Usage: wc <file>
Examples:
wc -l file.txt
outputs the number of lines in the file
cut
cuts out data from a tab deliminated file
Usage: cut <file>
Examples:
cut -f3 File.txt
cuts out the third column of the file
sort
sorts lines alphanumerically
uniq
identifies the unique lines
cat
Spits the contents of a text file on the screen. Can be used to open a file to pass to another operation or to concatenate files together
# Watch out for the Useless Use of Cat Award UUCA (ie. cat file.txt | cut -f3 versus cut -f3 file.txt)
Examples: cat file1.txt file2.txt file3.txt > merged_file.txt
# Creates a new file consisting of all file1.txt rows followed by those in file2.txt and then those in file3.txt
less
To view a longer text file, use the less program, which allows you to page forward through the file using the <SPACE> bar and page backward using the b key. When you are done looking at the file, type q to quit less.
head
To view the top of a file (default first 10 rows/lines)
Examples: head -n 50 file.txt
#shows the first 50 rows/lines of the file
tail
To view the end of a file (default last 10 rows/lines
Examples: tail -n 75 file.txt
#shows the last 75 rows/lines of the file
grep
search function
Examples: grep "Query" file.txt
#outputs rows/lines that contain the string Query (row/line specific not column specific)
grep -w "Query" file.txt
#outputs rows/lines containing the query as a whole word (ie finds query but not labqueryresult)
sed
replace function
Examples:
sed 's/Old/New/g' File.txt
substitutes/replaces Old with New in File.txt
bc
calculator function ( add (+) ; subtract (-) ; multiply (*) ; divide (/) ; power (^) ; square-root [ sqrt(value) ]
Examples:
> echo 10/5 | bc
2
> echo 5/10 | bc
0 [THIS IS WRONG, YOU NEED TO TELL THE CALCULATOR THE DECIMALS TO RETURN]
> echo " scale=3 ; 5/10" | bc
.500
paste
joins files/columns together
file1.txt
Jonathan
Kristi
file2.txt
Keats
Allen
paste file1.txt file2.txt
Jonathan Keats
Kristi Allen
Useful Unix One Liners I've Come Across:
# Convert a tab-delimitated (.txt) file saved in Excel on a Mac to a Unix formated tab-delimitated so it can be manipulated with Unix apps
tr '\r' '\n' < MacExcelFormat.txt > UnixFormat.txt
# Or alternatively...
cat MacExcelFormat.txt | tr '\r' '\n' > UnixFormat.txt
#Better solution
dos2unix -c mac
MacExcelFormat.txt
# Replace some feature in a file with a tab-delimiter (There are a number of possible solutions but this one seems to work on both Mac and Unix machines)
sed 's/featureToReplace/\'$'\t''/g'
awk '{gsub("featureToReplace","\t",$0); print;}'
# Convert a single column file into a single line with space delimination
cat File_with_Single_Column_List | xargs > Single_Line_File_Space_Deliminated
# Replace ENSG with Gene BUT ONLY only on the first line
# Edit header lines of a fasta file (Not complete sure how this works but I read it as "Replace all characters on a line after a ">" until an "E" is encountered"
Example.txt
>123 chr1:1234-5678 ENST0123456789
AGCTAGCT
sed 's/[^=>]*E/E/' Example.txt
>ENST0123456789
AGCTAGCT
# Extract every 4th line starting on line 2 of a file (ie. extract the read sequence from a fastq file)
sed -n '2~4p' data.fastq
# Remove the 7th column from a file with 20 columns (Use 8- instead of 8-20, in case you don't know the last column number, as it grabs all after 8)
cut -f1-6,8- infile.txt > outfile.txt
# Copy all files with a specific extension from all sub-folders to a single folder
find . -name "*.pdf" -exec cp {} /target/directory/ \;
# Move all files with a specific extension from all sub-folders to a single folder
find . -name "*.fastq" -exec mv {} /target/directory/ \;
# Find all files with a specific extension that are NOT symbolic links (-type f, forces to regular files only)
# Copy all files with specific extension and associated folder structure to new location maintaining the folder structure
rsync -av --include '*.pdf' --include '*/' --exclude '*' current/directory destination/directory
#Copy Illumina BCL conversion stats to another directory, keep only the fastq.gz from the demultiplexed samples excluding all other files and the undetermined_indices fastq.gz files
rsync -vaiz --exclude '*/Undetermined_indices/*' --include '*.fastq.gz' --include '*/' --exclude '*' --prune-empty-dirs /Source/Directory /Target/Directory
# Find lines of a file that contain a line from a second file (ie. Find trainees of Bergsagel that trained at Mayo Clinic Arizona)
Bergsagel_Trainees.txt
Keats
Trudel
Mayo_Clinic_Arizona_Trainees.txt
Chng
Schop
Keats
Tiedemann
Sebag
Braggio
Henry
This command works but can be very slow as the size of the two files increases
grep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt
This command is much faster (30 seconds versus 3 hours+)
fgrep -f Bergsagel_Trainees.txt Mayo_Clinic_Arizona_Trainees.txt > results.txt
results.txt
Keats
# Find the unique entries in a column of a file
cut -f3 MyFile.txt | sort | uniq > Myfile2.txt
This reads file MyFile.txt and finds the third tab-deliminated column and cuts it out to pass to the sort command which is passed to the uniq command to find unique entries and prints the results to Myfile2.txt
NOTE: You have to do the sort command as uniq must only work in linear relationship (ie. Jon, Jon, Esteban, Jon, Rodger, Rodger, Esteban) outputs (Jon, Esteban, Jon, Rodger, Esteban)
# Change all letters in a file from upper-case to lower-case
cat example.fa | tr 'A-Z' 'a-z' > example2.txt
This reads file example.fa sends it to the tr "transinterate" command which changes upper case to lower case then sends it to an output file called example2.txt
Alphabetize Allele Calls
sed 's/CA/AC/g' MCA0554_chr3_filter0_minalign1800_maq20.txt | sed 's/GA/AG/g' | sed 's/GC/CG/g' | sed 's/TA/AT/g' | sed 's/TC/CT/g' | sed 's/TG/GT/g' > MCA0554_orderd.txt
# AWK is Awesome!! (see the awk page, but learn it I would have done my post-doc in half the time.... doh...)
This reads in file.txt and prints out a new file called output.txt that contains colums 1,2,3,4,5,6,7,8 from the original file. It also adds new columns 4 (column 3 value + 60) and 5 (Dummy)
sed '1 s/ENSG/Gene/' input > output
find . -name "*S4U_*bwa.final.bam" -type f
awk '{print $1, $2, $3, $3+60, "DUMMY", $4, $5, $6, $7, $8}' file.txt > output.txt