Professional transcription factor binding site prediction using Position Weight Matrices from JASPAR, TRANSFAC, and HOCOMOCO databases.
The Position Weight Matrix (PWM) score S for a sequence of length L is calculated as: S = Σk=1L log₂(P(bₖ|TF)/P(bₖ|background)) where bₖ is the base at position k. Scores are converted to p-values using extreme value distribution.
The PWM score S for a sequence segment s of length L aligned with a PWM M is calculated as:
where Mi,b is the frequency of base b at position i in aligned binding sites, and bb is the background frequency of base b.
P-values are calculated using the theoretical distribution of PWM scores under the null hypothesis:
Extreme Value Distribution (EVD) approximation:
P(S ≥ x) ≈ 1 - exp(-K * L * exp(-λx))
where λ and K are parameters estimated from the PWM and background model.
E-value calculation: E = N * P(S ≥ x), where N is the number of tests.
We apply rigorous multiple testing corrections to control false discoveries:
Transcription Factor Binding Site (TFBS) prediction is a computational method to identify specific DNA sequences where transcription factors (TFs) are likely to bind and regulate gene expression. These predictions are essential for understanding gene regulatory networks and identifying potential regulatory elements in genomes.
This tool uses Position Weight Matrices (PWMs) derived from experimentally validated transcription factor binding sites. The primary sources include:
Computational predictions should be validated experimentally. Common validation methods include: