seesaw Package

seesaw Package

ArchiveTeam seesaw kit

config Module

Configuration value manipulation.

class seesaw.config.ConfigInterpolation(s, c)[source]

Bases: object

realize(item)[source]
class seesaw.config.ConfigValue(name, title='', description='', default=None, editable=True, advanced=True)[source]

Bases: object

Configuration value validator.

The collection methods are useful for providing user configurable settings at run time. For example, when a pipeline file is executed by the warrior, the additional config values are presented in the warrior configuration panel.

check_value(value)[source]
collector = None
convert_value(value)[source]
is_valid()[source]
realize(dummy)[source]
set_value(value)[source]
classmethod start_collecting()[source]
classmethod stop_collecting()[source]
class seesaw.config.NumberConfigValue(*args, **kwargs)[source]

Bases: seesaw.config.ConfigValue

check_value(value)[source]
convert_value(value)[source]
class seesaw.config.StringConfigValue(*args, **kwargs)[source]

Bases: seesaw.config.ConfigValue

check_value(value)[source]
seesaw.config.realize(v, item=None)[source]

Makes objects contain concrete values from an item.

A silly example:

class AddExpression(object):
    def realize(self, item):
        return = item['x'] + item['y']

pipeline = Pipeline(ComputeMath(AddExpression()))

In the example, we want to compute an addition expression. The values are defined in the Item.

event Module

Actor model.

class seesaw.event.Event[source]

Bases: object

Lightweight event system.

Example:

my_event_system = Event()
my_event_system += my_listener_callback_function
my_event_system(my_event_data)
fire(*args, **kargs)[source]
getHandlerCount()[source]
handle(handler)[source]
unhandle(handler)[source]

externalprocess Module

Running subprocesses asynchronously.

class seesaw.externalprocess.AsyncPopen(*args, **kwargs)[source]

Bases: object

Asynchronous version of subprocess.Popen.

Deprecated.

classmethod ignore_sigint()[source]
run()[source]
class seesaw.externalprocess.AsyncPopen2(*args, **kwargs)[source]

Bases: object

Adapter for the legacy AsyncPopen

run()[source]
stdin
class seesaw.externalprocess.CurlUpload(target, filename, connect_timeout='60', speed_limit='1', speed_time='900', max_tries=None)[source]

Bases: seesaw.externalprocess.ExternalProcess

Upload with Curl process runner.

class seesaw.externalprocess.ExternalProcess(name, args, max_tries=1, retry_delay=30, accept_on_exit_code=None, retry_on_exit_code=None, env=None)[source]

Bases: seesaw.task.Task

External subprocess runner.

enqueue(item)[source]
handle_process_error(exit_code, item)[source]
handle_process_result(exit_code, item)[source]
on_subprocess_end(item, returncode)[source]
on_subprocess_stdout(pipe, item, data)[source]
process(item)[source]
stdin_data(item)[source]
class seesaw.externalprocess.RsyncUpload(target, files, target_source_path='./', bwlimit='0', max_tries=None, extra_args=None)[source]

Bases: seesaw.externalprocess.ExternalProcess

Upload with Rsync process runner.

stdin_data(item)[source]
class seesaw.externalprocess.WgetDownload(args, max_tries=1, accept_on_exit_code=None, retry_on_exit_code=None, env=None, stdin_data_function=None)[source]

Bases: seesaw.externalprocess.ExternalProcess

Download with Wget process runner.

stdin_data(item)[source]
seesaw.externalprocess.cleanup()[source]

item Module

Managing work units.

class seesaw.item.Item(pipeline, item_id, item_number, keep_data=False, prepare_data_directory=True, **kwargs)[source]

Bases: seesaw.item.ItemData

A thing, or work unit, that needs to be downloaded.

It has properties that are filled by the Task.

An Item behaves like a mutable mapping.

Note

State belonging to a item should be stored on the actual item itself. That is, do not store variables onto a Task unless you know what you are doing.

class ItemState[source]

Bases: object

State of the item.

canceled = 'canceled'
completed = 'completed'
failed = 'failed'
running = 'running'
class TaskStatus[source]

Bases: object

Status of happened on a task.

completed = 'completed'
failed = 'failed'
running = 'running'
cancel()[source]
canceled
clear_data_directory()[source]
complete()[source]
completed
description()[source]
end_time
fail()[source]
failed
finished
item_id
item_number
item_state
log_error(task, *args)[source]
log_output(data, full_line=True)[source]
pipeline
prepare_data_directory()[source]
set_task_status(task, status)[source]
start_time
task_status
class seesaw.item.ItemData(properties=None)[source]

Bases: _abcoll.MutableMapping

Base item data property container.

Args:

properties (dict): Original dict on_property (Event): Fired whenever a property changes.

Callback accepts:

  1. self
  2. key
  3. new value
  4. old value
properties
class seesaw.item.ItemInterpolation(s)[source]

Bases: object

Formats a string using the percent operator during realize().

realize(item)[source]
class seesaw.item.ItemValue(key)[source]

Bases: object

Get an item’s value during realize().

fill(item, value)[source]
realize(item)[source]

pipeline Module

class seesaw.pipeline.Pipeline(*tasks)[source]

Bases: object

The sequence of steps that complete a Task.

Your pipeline will probably be something like this:

  1. Request an assignment from the tracker.
  2. Run Wget to download the file.
  3. Upload the downloaded file with rsync.
  4. Tell the tracker that the assignment is done.
add_task(task)[source]
cancel_items()[source]
enqueue(item)[source]
ui_task_list()[source]

project Module

Project information.

class seesaw.project.Project(title=None, project_html=None, utc_deadline=None)[source]

Bases: object

Briefly describes a project metadata.

This class defines the title of the project, a short description with an optional project logo and an optional deadline. The information will be shown in the web interface when the project is running.

data_for_json()[source]

runner Module

Pipeline execution.

class seesaw.runner.Runner(stop_file=None, concurrent_items=1, max_items=None, keep_data=False)[source]

Bases: object

Executes and manages the lifetime of Pipeline instances.

add_items()[source]
check_stop_file()[source]
is_active()[source]
keep_running()[source]
set_current_pipeline(pipeline)[source]
should_stop()[source]
start()[source]
stop_file_changed()[source]
stop_file_mtime()[source]
stop_gracefully()[source]
class seesaw.runner.SimpleRunner(pipeline, stop_file=None, concurrent_items=1, max_items=None, keep_data=False)[source]

Bases: seesaw.runner.Runner

Executes a single class:Pipeline instance.

forced_stop()[source]
start()[source]

task Module

Managing steps in a work unit.

class seesaw.task.ConditionalTask(condition_function, inner_task)[source]

Bases: seesaw.task.Task

Runs a task optionally.

enqueue(item)[source]
fill_ui_task_list(task_list)[source]
class seesaw.task.LimitConcurrent(concurrency, inner_task)[source]

Bases: seesaw.task.Task

Restricts the number of tasks of the same type that can be run at once.

enqueue(item)[source]
fill_ui_task_list(task_list)[source]
class seesaw.task.PrintItem[source]

Bases: seesaw.task.SimpleTask

Output the name of the Item.

process(item)[source]
class seesaw.task.SetItemKey(key, value)[source]

Bases: seesaw.task.SimpleTask

Set a value onto a task.

process(item)[source]
class seesaw.task.SimpleTask(name)[source]

Bases: seesaw.task.Task

A subclassable Task that should do one small thing well.

Example:

class MyTask(SimpleTask):
    def process(self, item):
        item['my_message'] = 'hello world!'
enqueue(item)[source]
process(item)[source]
class seesaw.task.Task(name)[source]

Bases: object

A step in the download process of an Item.

complete_item(item)[source]
fail_item(item)[source]
fill_ui_task_list(task_list)[source]
start_item(item)[source]
task_cwd(**kwds)[source]

tracker Module

Contacting the work unit server.

A Tracker refers to the Universal Tracker (https://github.com/ArchiveTeam/universal-tracker).

class seesaw.tracker.GetItemFromTracker(tracker_url, downloader, version=None)[source]

Bases: seesaw.tracker.TrackerRequest

Get a single work unit information from the Tracker.

data(item)[source]
process_body(body, item)[source]
class seesaw.tracker.PrepareStatsForTracker(defaults=None, file_groups=None, id_function=None)[source]

Bases: seesaw.task.SimpleTask

Apply statistical values on the item.

process(item)[source]
class seesaw.tracker.SendDoneToTracker(tracker_url, stats)[source]

Bases: seesaw.tracker.TrackerRequest

Inform the Tracker the work unit has been completed.

data(item)[source]
process_body(body, item)[source]
class seesaw.tracker.TrackerRequest(name, tracker_url, tracker_command, may_be_canceled=False)[source]

Bases: seesaw.task.Task

Represents a request to a Tracker.

DEFAULT_RETRY_DELAY = 60
data(item)[source]
enqueue(item)[source]
handle_response(item, response)[source]
increment_retry_delay(max_delay=300)[source]
process_body(body, item)[source]
reset_retry_delay()[source]
schedule_retry(item, message='')[source]
send_request(item)[source]
class seesaw.tracker.UploadWithTracker(tracker_url, downloader, files, version=None, rsync_target_source_path='./', rsync_bwlimit='0', rsync_extra_args=[], curl_connect_timeout='60', curl_speed_limit='1', curl_speed_time='900')[source]

Bases: seesaw.tracker.TrackerRequest

Upload work unit results.

One of the inner task is used depending on the Tracker’s response to where to upload:

  • RsyncUpload
  • CurlUpload
data(item)[source]
process_body(body, item)[source]

util Module

Miscellaneous functions.

seesaw.util.find_executable(name, version, paths, version_arg='-V')[source]

Returns the path of a matching executable.

seesaw.util.test_executable(name, version, path, version_arg='-V')[source]

Try to run an executable and check its version.

seesaw.util.unique_id_str()[source]

Returns a unique string suitable for IDs.

warrior Module

The warrior server.

The warrior phones home to Warrior HQ (https://github.com/ArchiveTeam/warrior-hq).

class seesaw.warrior.BandwidthMonitor(device)[source]

Bases: object

Extracts the bandwidth usage from the system stats.

current_stats()[source]
devre = <_sre.SRE_Pattern object>
update()[source]
class seesaw.warrior.ConfigManager(config_file)[source]

Bases: object

Manages the configuration.

add(config_value)[source]
all_valid()[source]
editable_values()[source]
load()[source]
remove(name)[source]
save()[source]
set_value(name, value)[source]
class seesaw.warrior.Warrior(projects_dir, data_dir, warrior_hq_url, real_shutdown=False, keep_data=False)[source]

Bases: object

The warrior god object.

class Status[source]

Bases: object

INVALID_SETTINGS = 'INVALID_SETTINGS'
NO_PROJECT = 'NO_PROJECT'
REBOOTING = 'REBOOTING'
RESTARTING_PROJECT = 'RESTARTING_PROJECT'
RUNNING_PROJECT = 'RUNNING_PROJECT'
SHUTTING_DOWN = 'SHUTTING_DOWN'
STARTING_PROJECT = 'STARTING_PROJECT'
STOPPING_PROJECT = 'STOPPING_PROJECT'
SWITCHING_PROJECT = 'SWITCHING_PROJECT'
UNINITIALIZED = 'UNINITIALIZED'
bandwidth_stats()[source]
check_project_has_update(**kwargs)[source]
clone_project(project_name, project_path)[source]
collect_install_output(data)[source]
find_lat_lng()[source]
fire_status()[source]
forced_reboot()[source]
forced_stop()[source]
handle_lat_lng(response)[source]
handle_runner_finish(runner)[source]
install_project(**kwargs)[source]
keep_running()[source]
load_pipeline(pipeline_path, context)[source]
max_age_reached()[source]
reboot_gracefully()[source]
schedule_forced_reboot()[source]
select_project(**kwargs)[source]
start()[source]
start_selected_project(**kwargs)[source]
stop_gracefully()[source]
update_project(**kwargs)[source]
update_warrior_hq(**kwargs)[source]
warrior_status()[source]
seesaw.warrior.is_executable(path)[source]
seesaw.warrior.set_file_executable(path)[source]
seesaw.warrior.system_reboot()[source]
seesaw.warrior.system_shutdown()[source]

web Module

The warrior web interface.

class seesaw.web.ApiHandler(application, request, **kwargs)[source]

Bases: seesaw.web_util.BaseWebAdminHandler

Processes API requests.

get(command)[source]
get_template_path()[source]

Override to customize template path for each handler.

By default, we use the template_path application setting. Return None to load templates relative to the calling file.

initialize(warrior=None, runner=None)[source]

Hook for subclass initialization. Called for each request.

A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().

Example:

class ProfileHandler(RequestHandler):
    def initialize(self, database):
        self.database = database

    def get(self, username):
        ...

app = Application([
    (r'/user/(.*)', ProfileHandler, dict(database=database)),
    ])
post(command)[source]
class seesaw.web.IndexHandler(application, request, **kwargs)[source]

Bases: seesaw.web_util.BaseWebAdminHandler

Shows the index.html.

get()[source]
class seesaw.web.ItemMonitor(item)[source]

Bases: object

Pushes item states and information to the client.

handle_item_cancel(item)[source]
handle_item_complete(item)[source]
handle_item_fail(item)[source]
handle_item_output(item, data)[source]
handle_item_property(item, key, new_value, old_value)[source]
handle_item_task_status(item, task, new_status, old_status)[source]
item_for_broadcast()[source]
item_status()[source]
class seesaw.web.SeesawConnection(session)[source]

Bases: sockjs.tornado.conn.SockJSConnection

A WebSocket server that communicates the state of the warrior.

classmethod broadcast(event, message)[source]

Broadcast message to the one or more clients. Use this method if you want to send same message to lots of clients, as it contains several optimizations and will work fast than just having loop in your code.

clients
Clients iterable
message
Message to send.
classmethod broadcast_bandwidth()[source]
classmethod broadcast_project_refresh()[source]
classmethod broadcast_projects()[source]
classmethod broadcast_timestamp()[source]
clients = set([])
emit(event_name, message)[source]

tornadoio to sockjs adapter.

classmethod handle_broadcast_message(warrior, message)[source]
classmethod handle_finish_item(runner, pipeline, item)[source]
classmethod handle_project_installation_failed(warrior, project, output)[source]
classmethod handle_project_installed(warrior, project, output)[source]
classmethod handle_project_installing(warrior, project)[source]
classmethod handle_project_refresh(warrior, project, runner)[source]
classmethod handle_project_selected(warrior, project)[source]
classmethod handle_projects_loaded(warrior, projects)[source]
classmethod handle_runner_status(runner, status)[source]
classmethod handle_start_item(runner, pipeline, item)[source]
classmethod handle_warrior_status(warrior, new_status)[source]
instance_id = '27192-0.681964'
item_monitors = {}
on_close()[source]

Default on_close handler.

on_message(message)[source]

Default on_message handler. Must be overridden in your application

on_open(info)[source]

Default on_open() handler.

Override when you need to do some initialization or request validation. If you return False, connection will be rejected.

You can also throw Tornado HTTPError to close connection.

request
ConnectionInfo object which contains caller IP address, query string parameters and cookies associated with this request (if any).
project = None
runner = None
warrior = None
seesaw.web.hash_string(text)[source]

Generate a digest for broadcast message.

seesaw.web.start_runner_server(project, runner, bind_address='localhost', port_number=8001, http_username=None, http_password=None)[source]

Starts a web interface for a manually run pipeline.

Unlike start_warrior_server(), this UI does not contain an configuration or project management panel.

seesaw.web.start_warrior_server(warrior, bind_address='localhost', port_number=8001, http_username=None, http_password=None)[source]

Starts the warrior web interface.

web_util Module

class seesaw.web_util.BaseWebAdminHandler(application, request, **kwargs)[source]

Bases: tornado.web.RequestHandler

prepare()[source]

Called at the beginning of a request before get/post/etc.

Override this method to perform common initialization regardless of the request method.

Asynchronous support: Decorate this method with .gen.coroutine or .return_future to make it asynchronous (the asynchronous decorator cannot be used on prepare). If this method returns a .Future execution will not proceed until the .Future is done.

New in version 3.1: Asynchronous support.