MESA is a prototype web-based database solution for the massive amounts of initial data generated by microarray analysis. This has been developed by the Bioinformatics Shared Resource at the Burnham Institute for Medical Research on the Torrey Pines Mesa, San Diego.
Designed to follow the initial work-flow of microarray analysis:
1. Manage raw data and images
2. Enable searching and slicing of data
3. Provide initial data filtering and normalization
4. Export data in predefined formats – for usage in existing software analysis tools
Design was based on NCBI’s Gene Expression Omnibus software, and we have adopted some of their terminology: http://www.ncbi.nlm.nih.gov/geo/
Platform: List of elements (target IDs) that may be detected and quantified in a microarray experiment (e.g., cDNAs, oligonucleotide probesets).
Currently the software is compatible with some Illumina platforms (e.g., Sentrix Human-6 BeadChip).
Sample: A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique ID. A Sample entity must reference only one Platform.
Series: A Series ID links together a group of related Samples.
A Sample Record Contains:
Annotation Data: (e.g., ID, series, platform, description, date, contact name) These will be used for searching and retrieving data.
Raw Data: A table containing rows of target IDs, signal and quality values. (Currently supported platform contains 47,000 rows of data). It will be exported in predefined formats.
TargetID AVG_Signal Detection-value
GI_10047089-S 71.0 0.11865524
GI_10047091-S 5957.3 1.00000000
GI_10047093-S 581.4 0.99670402
GI_10047099-S 351.1 0.99077126
GI_10047103-S 2012.8 1.00000000
GI_10047105-S 141.2 0.88595913
GI_10047121-S 82.0 0.34541859
Data Files (e.g., Cell/Microarray Images) Need to be archived for future reference.
Currently the system uses raw/GCT (Gene Cluster Text file) format for exporting Data.
The GCT format is extremely useful and can be used as input for leading analysis software:
Mike Eisen’s Clustering Software
MATISSE - Integrated Analysis of Functional Modules and High-Throughput Data
Uses Metadot, a leading open source portal software that can be installed in one click, very easy to customize.
MySQL database backend.
PERL plug-in scripts (called GIZMOs).
Can run on Windows/Linux machine.
Sample Annotation Table – Searchable fields describing each sample.
Platform Table – Each platform has a blob of target IDs. (A BLOB is a binary large object that can hold a variable amount of data, our current platforms have about 47,000 target IDs)
Chip Data Table – Each sample has a blob of data containing signal and p-values for each target ID. The blob is retrieved and parsed after a user requests the data.
INSTALLATION & ACCESS
MESA can be installed on a local server in a microarray laboratory, or on a central server.
The server is accessed via any web browser using personal login accounts with administrated access levels.
Install the Metadot package: One click installs Apache, MySQL and the Metadot content management system.
Get metadot free here: http://www.metadot.com/
Install the PERL script: Copy it to the Gizmo directory, restart the server and use the ‘Manage’ menu to add the gizmo into the system.
Start adding data into the system.
INSERTING DATA INTO THE DATABASE:
Annotation Data and file upload/update for each sample may be done manually using a specific form.
Raw data may be uploaded from the ‘Beadstudio’ format in batches using the ‘Upload’ menu.
Data may NOT be deleted from the database by a regular user.
SEARCHING FOR DATA:
Data may be retrieved by searching zero or more fields from the annotation data.
When no field is selected: all data is retrieved.
Free text fields use ‘Like’ to query data without having to type the exact phrase. This is a good way to retrieve mistyped data.
Future implementation should add more user defined functionality.
Search Results Screen
Update Sample Annotation