-
I have two files. I'd like to join File 2 with/on File 1. File 1:
File 2:
The problem: The entry
I'd like the join to retain all the stuff after I'm thinking a join might not even work in this instance. Might have to do some sort of looping over each string in the line in File 2 and compare to each line in File 1; followed by printing the entire line from File 2 if there's a match. Anyone have any suggestions? Shell preferred, but would take other solutions. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
How about fuzzyjoin? Example from my nb-2022 repo betterspur <- spur %>% fuzzy_inner_join(ncbiP, by = "V2", match_fun = str_detect) |
Beta Was this translation helpful? Give feedback.
-
Potential awk ' BEGIN { FS = "^AC" } \
FNR == NR \
{ array[$0]; next } \
{ for ( item in array ) { if ( match ( $0,item ) ) { print $2 } } } ' \
File1.txt File2.txt \
| sed 's/[[:space:]]*//g' Output:
Explanation:
|
Beta Was this translation helpful? Give feedback.
-
The input file examples above are slightly different than what the solution below used, but are extremely similar (biggest differences is different number of columns in each file). Here's the awk \
-v FS='[;[:space:]]+' \
'NR==FNR \
{array[$1]=$0; next} \
($1 in array) \
{print $2"\t"array[$1]}' \
"File02.txt File01.txt" \
> "${joined_output}" And, here's the code explanation:
Credit goes to help from StackOverflow. |
Beta Was this translation helpful? Give feedback.
The input file examples above are slightly different than what the solution below used, but are extremely similar (biggest differences is different number of columns in each file). Here's the
awk
solution I ended up using:And, here's the code explanation:
awk -v FS='[;[:space:]]+'
: Sets the Field Separator variable to handle;
in UniProt accessions. Allows for proper searching.FNR == NR
: Restricts next block (designated by{}
) to work only on first input file.{array[$1]=$0; next}
: Adds the entire line ($0
) of the first file to t…