Install from GitHub¶

This module can be cloned from the GitHub repo to your local machine.

>>> git clone https://github.com/rutendos/base_content.git

Todo: Add to conda and PyPI

Requirements¶

Implementation is done in python 3 (Python 3.6.3)

This tool relies on bedtools (https://bedtools.readthedocs.io/en/latest/)

Python modules:

argparse (https://docs.python.org/3/howto/argparse.html)

>>> pip install argparse

pandas (https://pandas.pydata.org/)

>>> pip install pandas

numpy (http://www.numpy.org/)

>>> pip install numpy

matplotlib (https://matplotlib.org/)

>>> pip install matplotlib

Input Files¶

The python module takes in a .bed file (ranked_file.center.sorted.bed) from TFEA and plots the base content across positions.

The module can also take in Tfit bed files in the format shown below.

In addition to a a bed file of coordinates, the base_content requires a fasta file of the reference genome. The fasta file should be indexed and also should match the format of the bed file. (Make sure that the genome file has been indexed. If a fasta file is indexed it will have a .fai extension.)

USCS format (chr1:start-stop)
Ensembl/NCBI format (1:start-stop)

Algorithm Overview¶

Running base_content¶

Running in the command line¶

To run base_content with TFEA bedfile in the commandline:

python base_content -r /path/to/reference/hg38.fa -b ./my_bedfile.bed -o /output/dir/ -w 1500 -s experiment_name -t

To run base_content with Tfit or other bedfile in the commandline:

python base_content -r /path/to/reference/hg38.fa -b ./my_bedfile.bed -o /output/dir/ -w 1500 -s experiment_name

Running on Fiji¶

Since Fiji is still running python 2 the recommendation is to load a python 3 environment

>>> module load python/3.6.3

Once an environment has been set, install modules to the environment.

>>> pip3 install numpy pandas matplotlib --user

An example sbatch script for a TFEA bed file is shown below.

..example sbatch:

#!/bin/bash

###Name the job
#SBATCH --job-name=Allen2014_ATGC

###Specify the queue
#SBATCH -p short

###Specify WallTime
#SBATCH --time=0:20:00

### Specify the number of nodes/cores
#SBATCH --nodes=1
#SBATCH --ntasks=1

### Allocate the amount of memory needed
#SBATCH --mem=2gb

### Setting to mail when the job is complete
#SBATCH --error /scratch/Users/rusi2317/projects/gc_content/e_and_o/%x.err
#SBATCH --output /scratch/Users/rusi2317/projects/gc_content/e_and_o/%x.out

### Set your email address
#SBATCH --mail-type=ALL
#SBATCH --mail-user=rutendo.sigauke@ucdenver.edu

module purge
module load python/3.6.3
module load python/3.6.3/numpy
module load python/3.6.3/matplotlib
module load python/3.6.3/pandas
module load bedtools/2.25.0

BIN=/scratch/Users/rusi2317/projects/gc_content/bin

OUTDIR=/scratch/Users/rusi2317/projects/gc_content/analysis/Allen2014_v2

GENOME=/scratch/Users/rusi2317/projects/gc_content/genome

BED=/scratch/Users/rusi2317/projects/tfea/output/Allen2014/TFEA_DMSO_1hr-Nutlin_1hr_3/temp_files

NAME=Allen2014_width1000

##run the base_content

python3 ${BIN}/base_content//base_content -r ${GENOME}/hg19.fa -b ${BED}/ranked_file.center.sorted.bed -o ${OUTDIR}/ -w 1000 -s ${NAME}

A bedfile with about 6000 regions should take no more than 5 minutes on fiji.