sandbox.Rmd
Large-scale applications can be overwhelming when developing new algorithms. To keep track of our progress, starting with small data subsets is essential, helping quickly learn about our data before scaling-up our analysis.
Before engaging with data manipulation, we need a place to host it. You
can use build_project()
to setup a standardized folder
structure together with a README file on the purpose of each folder.
This follows a similar infrastructure as the MAS database. The more
member of the lab use it, the easier it will be to exchange data and
code.
Before integrating new, messy data into your project, you might want to look into the MAS data catalog. This registry offers an overview on each dataset in the MAS database, and guiding us to existing documentation with details on the production of the data.
build_sandbox()
helps us extract a subset of the MAS
database that follows the standards of the MAS database. This way, we
can test our algorithms locally and we can later migrate that code into
the main database to upscale our analysis
(click
here to learn about system variables and relative paths).
Once we have our sandbox, we can use compile_metadata()
to
register all collected datasets. This function creates a portable flat
database, equal to the one used in the parent infrastructure. With this
database, we can use every MAS tool with the sandbox just as we would in
the parent database.