Problem

In DNA strings, symbols ‘A’ and ‘T’ are complements of each other, as are ‘C’ and ‘G’.

The reverse complement of a DNA string ss is the string scsc formed by reversing the symbols of ss, then taking the complement of each symbol (e.g., the reverse complement of “GTCA” is “TGAC”).

Given: A DNA string ss of length at most 1000 bp.

Return: The reverse complement scsc of ss.

Sample Dataset

AAAACCCGGT

Sample Output

ACCGGGTTTT

Solution

这道题要求将DNA进行反向互补。

C version

C语言的版本,我用了链表,在读DNA序列的时候,就以反向互补的形式存在链表里,然后就是遍历链表,打印出来。

#include<stdio.h>  
#include<stdlib.h>  
  
typedef struct ntNode {  
  char NT; /* nucleotide */
  struct ntNode *next;
} ntNode;  
    
int main() {
  FILE *INFILE;
  INFILE = fopen("DATA/rosalind_revc.txt", "r");

  ntNode *head, *curr;
  head= NULL;  
  char nt;  
  while ( (nt = fgetc(INFILE)) != EOF) {
    curr = malloc(sizeof(ntNode));  
      switch(nt) {  
          case 'A':
              nt = 'T';     
              break;      
          case 'C':
              nt = 'G';        
              break;      
          case 'G':
              nt = 'C';     
              break;      
          case 'T':
              nt = 'A';     
              break;    
          default:
              nt = ' ';
    }

    curr->NT = nt;
    curr->next = head;
    head = curr;
  }

  curr = head;    
  while(curr) {    
    printf("%c", curr->NT);
    curr = curr->next;
  }   
  printf("\n");  
  return 0;
}

Python version

Python就容易多了,用seq[::-1],就可以反向,然后用词典替换互补碱基。

#!/usr/bin/env python3  
  
fh = open("DATA/rosalind_revc.txt", "r")  
seq = fh.read().strip()

dict={'A':'T', 
      'T':'A',  
      'C':'G',  
      'G':'C'}

res = ''.join([dict[c] for c in seq[::-1]])
print(res)