advanced awk/gawk
The scripts below rely on the built in variables of gawk. For them to work you will need to ensure you have the gawk version of awk.
Pair two files based on key column in each file
Bed file of gene locations
File1.bed
chr1 11869 14409 DDX11L10
chr1 14363 29570 WASH5P
chr1 34554 36081 FAM138F
chr1 34554 36081 FAM138B
chr1 34554 36081 FAM138A
chr1 69055 70108 OR4F5
List of genes you want to know chr start stop for
File2.txt
WASH5P patient_1234 A T
FAM138B patient_2345 A C
OR4F5 Patient_3456 G A
Awk code
$ awk 'FNR==NR { a[$4]=$0;next } ($1 in a) { OFS = "\t" ; print a[$1],$2,$3,$4 }' file1.txt file2.txt
Output
chr1 14363 29570 WASH5P patient_1234 A T
chr1 34554 36081 FAM138B patient_2345 A C
chr1 69055 70108 OR4F5 Patient_3456 G A
Explanation:
FNR==NR
FNR represents the record number (row/line) of the current file awk is currently working on. NR represents the record number (row/line) awk has worked on so far. By setting FNR equal to NR we are telling awk to perform the next action (What is within {} immediately following FNR==NR) only on the first input file and perform the following actions on the next input file.
{ a[$4]=$0;next }
This part is creating a hash table called "a" that uses the 4th ($4) field from file1.bed as the index and is equal to the entire record (row/line). awk reads each record (row/line) one at a time adding a record to the hash table moving to the next record until it reaches the end of file1.txt
($1 in a)
This is a test statement which asks if the first field of the record being read by awk matches an index in the hash table "a" then perform the action immediately after in {}.
{ OFS = "\t" ; print a[$1],$2,$3,$4 }
The first part tell awk to make the output delimited by tab. the second print part tells awk to print the value of the hash that has the index from the first field of file2.txt then field 2,3, and 4 from file2.txt.