ChIP-seq Pipeline

All Chip-Seq libraries in the database were processed by a unified python scripting pipeline.
To get started, download the script below and input your CSV file with at least two columns: Sample Name, Sample Fastq, Control Name and Control Fastq (if any).
Output includes two folders namely "peakcall" (narrowPeak files) and "BigWig" (bw files).

Overview of the Pipeline

Tool Version of Pipeline

Tool	Version
sratoolkit	2.11.2
pfastq-dump	0.1.6
trim-galore	0.6.4
cutadapt	2.10
fastqc	0.11.8
bowtie2	2.2.6
samtools	1.10
macs2	2.2.7.1
deeptools	3.4.3

Run the ChIP-seq Pipeline (Docker Version) * Suggested

To execute the ChIP-seq Pipeline Docker version, there is no requirement to configure the environment.

Step 1: Pull the Docker Image

$ docker pull dppss90008/qhistone-pipeline

Step 2: Download bowtie2 Index File

$ wget https://genome-idx.s3.amazonaws.com/bt/TAIR10.zip
$ unzip TAIR10.zip

Step 3: Create a CSV file

The file scheduled the samples to be processed
1. The file name of the CSV file should be Peak-ProcessTable.csv
Example of Peak-ProcessTable.csv
2. If the sample have multiple SRR files, use ";" to delimite them.
3. If the sample have no input control, left blank.

Step 4: Prepare the ChIP-seq data

The file scheduled the SRR file to be downloaded
The file name of the CSV file should be SRAfile.txt
Example of SRAfile.txt
$ docker run -it --rm \
-v /path/of/SRAfiles/:/SRAFiles/ \
dppss90008/qhistone-pipeline \
prefetch --option-file /SRAFiles/SRAfile.txt --output-directory /SRAFiles

Step 5: Run the code

$ docker run -it --rm \
-v /path/of/Peak-ProcessTable.csv:/source \
-v /path/of/TAIR10-Bowtie2Index:/Bowtie2Index \
-v /path/of/SRAfiles/:/SRAFiles/ \
dppss90008/qhistone-pipeline \
conda run -n chipseq python /PeakCallingPipeline/Peak-Calling-Pipeline-Version4.py \
--cores 70 \ # CPU cores for speeding up the pipeline \
--bowtie2index TAIR10 \ # bowtie2 index file downloaded from step 2 \
--wkdir /source \

Run Run the ChIP-seq Pipeline (Source Version) Environmental Setting

Please check the following programs are all installed before running the pipeline !!

Step1: Install sratoolkit

$ wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
$ tar zxvf sratoolkit.current-ubuntu64.tar.gz
$ echo "export PATH=$PATH:/path/to/sratoolkit.3.0.10-ubuntu64/bin" >> ~/.bashrc
$ source ~/.bashrc

Step2: Install pfastq-dump

$ git clone https://github.com/inutano/pfastq-dump
$ cd pfastq-dump
$ chmod a+x bin/pfastq-dump

Step3: Install trim-galore

$ pip install cutadapt
$ wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip
$ unzip fastqc_v0.12.1.zip
$ cd FastQC
$ chmod 755 fastqc
$ sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc
$ cutadapt --version # Check that cutadapt is installed
$ fastqc -v # Check that FastQC is installed
$ curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/0.6.10.tar.gz -o trim_galore.tar.gz
$ tar xvzf trim_galore.tar.gz

Step4: Install bowtie2

$ conda install -c bioconda bowtie2

Step5: Install samtools

$ wget https://github.com/samtools/samtools/releases/download/1.19/samtools-1.19.tar.bz2
$ tar jxvf samtools-1.19.tar.bz2
$ cd samtools-1.19
$ ./configure
$ make
$ make install

Step6: Install macs2

$ pip install macs2

Step7: Install deeptools

$pip install deeptools

Run the ChIP-seq Pipeline

Step 8: Download the ChIP-seq pipeline Download the source code of pipeline
Step 9: Prepare the ChIP-seq data

1. Download the SRR files of the libraries.
2. Store the SRR files in a folder.

Step 10: Create a CSV file,

Step 11: Setting the pipeline

$ nano Peak-Calling-Pipeline-Version4.py
Change the locations of the programes
INFO = {

"prefetch":"/path/to/sratoolkit.3.0.10-ubuntu64/bin/prefetch",
"pfastq-dump":"/path/to/pfastq-dump/bin/pfastq-dump",
"trim-galore":"/path/to/TrimGalore/trim_galore",
"bowtie2":"/path/to/bowtie2",
"samtools":"/path/to/samtools",
"macs2":"/path/to/macs3",
"bamCoverage":"/path/to/bamCoverage",
"GenomeIndex":"/path/to/Bowtie2Index/genome>
# index file of the genome
"SRA_FILES":"/path/to/ncbi/"
# The folder stored the SRR files
"cores": 70,
# CPU cores for speeding up the pipeline
}

$ save the file

Step 12: Run the code

$ python Peak-Calling-Pipeline-Version4.py