I am aware of many one liners in sed , perl and awk but none are working . I have something like this :
awk '/>/ID|1|training">1'
Kindly help.
asked Sep 1, 2022 at 9:20
143 8 8 bronze badges
$\begingroup$ Where is the |1| coming from? Is that an incrementing value that will change for each line? Is it the PE=1 ? Or maybe the SV=1 ? $\endgroup$
Commented Sep 4, 2022 at 10:19awk 'BEGIN < FS=OFS="|" >/^>/ < print ">" $2, "1", "training"; next >1' file.fa
This sets the field separator (FS) and the output field separator (OFS) to a pipe character. Then for each header line, we can just print the fields we want. The output fields will be delimited with pipe characters since we already set OFS to a pipe character in the BEGIN block. We can use next here to force awk to immediately stop processing the current record and move on to the next record, which is just the next line, since, by default, awk's record separator is a newline character.
Since the 1 on the end just returns true, it performs the default action, which is to print the current record stored in $0 . It is just the same as:
awk 'BEGIN < FS=OFS="|" >/^>/ < print ">" $2, "1", "training"; next > < print $0 >'
Which is the same as:
awk 'BEGIN < FS=OFS="|" >/^>/ < print ">" $2, "1", "training"; next > < print >'
answered Sep 1, 2022 at 11:19
3,148 1 1 gold badge 4 4 silver badges 12 12 bronze badges
$\begingroup$ Thank you @ Steve. $\endgroup$
Commented Sep 1, 2022 at 12:41
$\begingroup$ Nice! You could shorten it to something like awk -F'|' '/>/
perl -p -i -e 's/>[a-z]+?\|([A-Z]+[^\|]+?\|).*SV=([0-9]+)/>$1$2\|training/g' myfilename.fa
Not tested yet. Just check the sequence ID starts with a capital A-Z character. Probably wise to test it without the -i
answered Sep 1, 2022 at 12:39 12.8k 5 5 gold badges 28 28 silver badges 47 47 bronze badges $\begingroup$ Many thanks @M_ $\endgroup$ Commented Sep 1, 2022 at 12:42$\begingroup$ Thanks @user1738234 , I'll test it later. I assume that the 1 refers to the SV number. Is that correct? $\endgroup$
Commented Sep 1, 2022 at 12:46 $\begingroup$Here's another simple perl solution:
$ perl -lpe 's/\|.*(\d+)/|$1|training/ if s/>.+?\|/>/' file.fa >A0A075B6G3|1|training
Note that you didn't tell us where the 1 came from, so I am taking the last numbers on the line, the SV=1 in your case. If you want to hard-code the 1 , you can do so easily with:
perl -lpe 's/\|.*/|1|training/ if s/>.+?\|/>/' file.fa
The -lpe options mean "print each line, adding a newline character to each print call, after applying the script given by -e ". So this will read the file line by line, do its thing and print the result.
The s/old/new/ is the substitution operator and will replace old with new . In this case, we are replacing a | character ( \| ) and everything after it ( .* ) until the last set of numbers on the line which we capture and save as $1 with (\d+) . This substitution will only happen if we have already successfully substituted the leading > and everything until the first | ( >.+?\| ) with just a > .
If you want to edit the original file in place, use the -i flag (but give it a value, .bak in the example below, since that ensures a backup file with the original data and that extension will be created):
perl -i .bak -lpe 's/\|.*(\d+)/|$1|training/ if s/>.+?\|/>/' file.fa