Welcome to pytq Documentation

pytq (Python Task Queue) is a task scheduler library.

Problem we solve:

  1. You have N task to do.
  2. each task has input_data, and after been processed, we got output_data.

pytq provide these feature out-of-the-box (And it’s all customizable).

  1. Save output_data to data-persistence system.
  2. Filter out duplicate input data.
  3. Built-in Multi thread processor boost the speed.
  4. Nice built-in log system.
  5. And its easy to define how you gonna:
    • process your input_data
    • integrate with your data persistence system
    • filter duplicates input_data
    • retrive output_data


Suppose you have some url to crawl, and you don’t want to crawl those url you successfully crawled, and also you want to save your crawled data in database.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

This script implement multi-thread safe, a sqlite backed task queue scheduler.

from pytq import SqliteDictScheduler

# Define your input_data model
class UrlRequest(object):
    def __init__(self, url, context_data=None):
        self.url = url # your have url to crawl
        self.context_data = context_data # and maybe some context data to use

class Scheduler(SqliteDictScheduler):
    # (Required) define how you gonna process your data
    def user_process(self, input_data):
        # you need to implement get_html_from_url yourself
        html = get_html_from_url(input_data.url)

        # you need to implement parse_html yourself
        output_data = parse_html(html)
        return output_data

s = Scheduler(user_db_path="example.sqlite")

# let's have some urls
input_data_queue = [

# execute multi thread process, multiprocess=True)

# print output
for id, outpupt_data in s.items():


class Scheduler(SqliteDictScheduler):
    # (Optional) define the identifier of input_data (for duplicate)
    def user_hash_input(self, input_data):
        return input_data.url

    # (Optional) define how do you save output_data to database
    # Here we just use the default one
    def user_post_process(self, task):

    # (Optional) define how do you skip crawled url
    # Here we just use the default one
    def user_is_duplicate(self, task):
        return self._default_is_duplicate(task)

TODO: more example is coming.


pytq is released on PyPI, so all you need is:

$ pip install pytq

To upgrade to latest version:

$ pip install --upgrade pytq

About the Author

(\ (\
( -.-)o    I am a lovely Rabbit!

Sanhe Hu is a very active Python Developer Since 2010. Now working at Whiskerlabs as a Data Scientist. Research area includes Machine Learning, Big Data Infrastructure, Block Chain, Business Intelligent, Open Cloud, Distribute System. Love photography, vocal, outdoor, arts, game, and also the best Python.