You could use awk to quickly build such a BED6 file: $ awk -vFS='' '($0~/^>/)' in.fa > out.bed6 This script looks at the sixth column of your BED file of chromosome, start, and stop positions, to decide whether to do a reverse complement of the queried sequence. If you're okay with using a Perl script, you could convert your headers into stranded BED6 and use the following script with samtools-indexed FASTA to get back stranded sequence: This will produce the following file when run on your data: $ cat subseqs.faįinally, remove the per-contig fasta files rm Contig*fa Now, feed that into fastafetch and fastarevcomp tmpFile= $(mktemp) įastasubseq "$cont".fa $start $len > $tmpFile The fastasubseq program I will use below starts counting from 0 and not from 1, so we also need to subtract 1 from the coordinates to get them right. So, that will print the target contig's name, the start coordinate of the target sequence, the length of the target sequence and a - if it is on the reverse strand. If you're using Debian or Ubuntu or one of their derivatives, it might be enough to run sudo apt install exonerateĬreate one fasta file for each sequence in genome.fasta fastaexplode genome.fasta First, install exonerate as described here. Here, we are interested in fastasubseq and fasterevcomp. The exonerate tool comes with some very handy utilities for sequence manipulation. So, please help me to automate this process. I have larger fasta file and plenty of coordinates sequences to be extracted, So I need to automate this process. I need to extract those sequences from Contig1:12-3 and Contig3:15-7 coordinates and also I want to reverse complement them. But Contig3:15-7 and Contig1:12-3 coordinates are in inverse/reverse order, So samtools could not fetch those sequences. Because, the coordinates of Contig2:5-10is in proper order. It has extracted only Contig2:5-10sequence. I have used the following codes to extract the sequences from genome.fasta based on the coordinates in id.txt by using the following command xargs samtools faidx genome.fasta result.fasta The genome.fasta file is shown below, >Contig1 The coordinates of id.txt file is shown below, Contig3:15-7 See the Tutorial on how to create reverse complement sequence.I have the series of coordinates in id.txt file, whose coordinates sequences are in genome.fasta file. Any entered U are automatically converted into T. Uridine (U) is the replacement for Thymidine (T) in RNA. Reverse complement: CTATGGTCATCCATCTCTACGTGTCAGCCTATACGT Reverse sequence: GATACCAGTAGGTAGAGATGCACAGTCGGATATGCAĬomplement sequence: TGCATATCCGACTGTGCATCTCTACCTACTGGTATC Original sequence: ACGTATAGGCTGACACGTAGAGATGGATGACCATAG The reverse complement sequence is the sequence of the lower strand in the direction of its 5′- to its 3′-end. The reverse sequence is the sequence of the upper strand in the direction from its 3′- to its 5′-end. The complementary sequence is thus the sequence of the lower (antisense) strand in the same direction as the upper strand. This counterpart is called its complementary nucleotide.ĭouble stranded DNA sequences are represented by the upper (sense) strand sequence going in the direction from its 5′- to 3′-end. See also how to create a reverse complement sequence.Įach nucleotide in a double stranded DNA molecule is paired with its Watson-Crick counterpart. Go to the main site of GeneWarrior What is a Reverse Complement sequence? This page is part of the GeneWarrior Documentation.
0 Comments
Leave a Reply. |