User Guide

Before You Start

You must be connected to your cluster using its IP and port.

  • Local deployments: kubectl get svc pachyderm-proxy

How to Explore Resources

Select a Project + Repo

  1. Navigate to the Pachyderm Mount > Explore tab.
  2. Select a project/repo combination from the first dropdown.

At this point, you should see a corresponding folder populate in the /pfs/ directory in the file browser.

Switch Between Repo Branches

  1. Navigate to the Pachyderm Mount > Explore tab.
  2. Select the second dropdown and choose an item to switch between existing branches (e.g., master, main, dev,staging).

Explore Directories & Files

  1. Navigate to the Pachyderm Mount > Explore tab.
  2. Select a project/repo combination from the first dropdown.
  3. Scroll to the /pfs/ file browser to view the contents of your repository.
    • These repositories are read-only.
    • File formats that are supported by JupyterLab can be viewed by double-clicking the file.
    • Files and directories can be downloaded to the CWD as needed by right-clicking the file or directory then selecting the Download item. This can be useful for testing your code against your Pachyderm data.

How to Create Resources

Create a Repo & Repo Branch

  1. Open the JupyterLab UI.
  2. Open a Terminal from the launcher.
  3. Input the following:
    pachctl create repo demo
    pachctl create branch demo@master

You can now siwtch to your project’s demo repo and master branch in the Pachyderm Mount > Explore tab.

Tip

Create a Pipeline

Mount Project & Repo

Before we start defining the user code of our pipeline, we should mount the project and repo that we want to work with.

  1. Open the JupyterLab Mount Extension UI.
  2. Navigate to the Pachyderm Mount > Explore tab.
  3. Select a project/repo combination from the first dropdown.
  4. Select a branch from the second dropdown.

Define Input Spec & Load Datums

Now that we have mounted our project and repo, we can define the input spec for our pipeline. This enables us to:

  • Leverage certain input patterns such as a cross, union, or join.
  • Target specific datums in our repository based on a glob pattern.
  1. Navigate to the Pachyderm Mount > Test tab.
  2. Review the default input spec:
    pfs:
       repo: demo
       branch: master
       glob: /*
  3. Update the input spec to match the datums you wish to focus on.
    pfs:
       name: default_demo_master
       repo: demo
       glob: /images/2022/*
    pfs:
       name: default_demo_master
       repo: demo
       branch: master
       glob: /images/**.png
    cross:
       - pfs:
          name: name: default_test-data
          repo: test-data
          glob: /*
       - pfs:
          name: name: default_train-model
          repo: test-model
          glob: /
  4. Select Load Datums.
  5. Traverse the file browser to view the datums that match your glob pattern.
  6. Iterate through this process until you have a glob pattern that matches the datums you wish to focus on.
  7. Select Download Datums to download the datums to your local machine. This makes them available to your notebook.

Define User Code

  1. Launch a new notebook.
  2. Define the user code for your pipeline.
  3. Run the code to ensure it works as expected.
  4. Iterate to refine your code as needed.

Publish a Pipeline

  1. Navigate to Pachyderm Mount > Publish tab.
  2. Provide or validate inputs for all of the following:
    • Pipeline Name: The name of your pipeline.
    • Pipeline Project Name: The project where your pipeline will be created.
    • Container Image Name: The container image that will be used to run your pipeline.
    • Requirements File: The path to a requirements file that will be used to install dependencies in your container.
    • External Files: Any external files that you want to include in your pipeline.
    • Port: The port that your pipeline will run on.
    • Pipline Input Spec: The input spec that you defined in the previous step.
    • GPU Mode: Whether or not your pipeline requires a GPU.
  3. Select Run.

You should see your pipeline appear and begin to run in the Console UI.