Skip to content

Inputs


master.yaml

overview
Refchef uses YAML files that are composed of nested entry and value pairs -- for example, the entry and value pair common_name: yeast. The spacing and indentation of the entries and values are meaningful - Refchef uses the convention of using 2 spaces to indent each subsequent level of the entries and values in the YAML and a : and space are between each entry and value. Some entries in the yaml will have a preceeding - and a space before them (such as - component: and the commands under the commands header), which are required for Refchef to properly process the YAML.

See the master.yaml file specifications for more information.

Example master.yaml before processing:

S_cerevisiae:
  metadata:
    name: S_cerevisiae
    common_name: yeast
    ncbi_taxon_id: 4932
    organism: Saccharomyces cerevisiae
    organization: ensembl
    custom: no
    description: corresponds to genbank id GCA_000146045.2
    downloader: joselynn wallace
    ensembl_release_number: 87
    accession:
      genbank:
      refseq:
  levels:
    references:
    - component: primary
      complete:
        status: false
      commands:
      - wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz
      - wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/CHECKSUMS
      - md5 *.gz > postdownload-checksums.md5
      - gunzip *.gz
      - md5 *.* > final_checksums.md5

The string of text entered in the key field (S_cerevisiae in the above example) will be used to create a folder inside the directory you specify as your output in your config file (cfg.ini or cfg.yaml) or refchef-cook arguments. In the previous quickstart example, we used /Users/jwalla12/references as the output directory for refchef-cook. Here is the collapsed file tree that refchef created, note that the folder containing the primary reference is nested inside a folder named S_cerevisiae based on the key.

./Users/jwalla12/references #this directory is specified in refchef-cook or the config files
└── S_cerevisiae
    ├── bowtie2_index
    ├── bwa_index
    ├── gtf
    └── primary

master.yaml metadata
The metadata section of master.yaml contains information about the references, including the organism name, taxon_id, etc.

Caution

When running a new YAML file to add additional information to a primary reference, metadata entries present in the initial master.yaml file can be omitted (for example, ncbi_taxon_id:, common_name:). When adding indices or annotations to a primary reference already in master.yaml, the metadata in master.yaml will be overwritten by the metadata in the new.yaml file. This could be helpful in situations where you want to update the metadata fields.

master.yaml levels
The levels section contains higher level information about the references, including when they were downloaded and the exact commands used to download and process the references.

Caution

The entry status must be set to false for Refchef to exeecute the commands in the code block. If it is set to true, the code will not execute (even if the -e flag is set). After a code block is executed, the false flag will flip to true automatically and the time: entry will appear under the status header. The time: header will be populated with the datetime stamp the reference was downloaded.

master.yaml commands
This portion of the master.yaml should be populated with the specific commands you want to execute to download and process your reference. Each command should be prepended with a - and a space.

Caution

Each time files are processed using a set of commands in the YAML, the last command must run md5 on all of the files and direct the output to a file called final_checksums.md5.


cfg.yaml

overview
Refchef requires configuration information, which can be passed as arguments or specified in a configuration file. A cfg.yaml is one option for configuration and should contain the following fields. Also indicated below: If filling out the field is required, their expected format, and a brief description of their contents.

See the cfg.yaml file specifications for more information.

example:

config-yaml:
  path-settings:
    reference-directory: /Users/jwalla12/references
    git-directory: /Users/jwalla12/remote_references
    remote-repository: jrwallace/remote_references
  log-settings:
    log: 'yes'

cfg.ini

overview
Refchef requires configuration information, which can be passed as arguments or specified in a configuration file. A cfg.ini is one option for configuration and should contain the following fields. Also indicated below: If filling out the field is required, their expected format, and a brief description of their contents.

See the cfg.ini file specifications for more information.

example:

[path-settings]
reference-directory=/Users/jwalla12/references
git-directory=/Users/jwalla12/remote_references
remote-repository=jrwallace/remote_references
[log-settings]
log=yes
[runtime-settings]
break-on-error=yes
verbose=yes