Information Systems Study

Prepared for American Zoo and Aquarium Association
Dave Mausner, director, Business Intelligence

August 23, 2000

Introduction On June 23, 2000, The American Zoo and Aquarium Association (“AZA”) engaged Braun Consulting, Inc. (“Braun”) to perform a study of the current information systems and procedures in use by AZA membership.

 This engagement was prompted by concerns within AZA that its members’ information management technology ignored recent advances in the software art; that it was restraining staff efficiency and data accuracy; that medical and aquatic specialties were incompletely supported; and that independently-developed solutions threatened the success of data exchange. 

This study focused on the data collection and analytical requirements of several subject areas of collection management:

  • Behavioral observation
  • Husbandry
  • Population management
  • Collection planning
  • Veterinary
  • Aquariums
  • “Colonial” animal phenomena

This study included an analysis of the past, present and future software products and services of the ISIS organization. 

After amalgamating all of the collected information; considering the most likely outcome of the status-quo; and incorporating its expertise in the specialty of large-scale information management: Braun presents its analysis of AZA Information Systems and its recommendations for AZA action. The analysis includes abstracts of several key findings and recommendations. 

As requested, Braun presents a scope description and feasibility analysis of a replacement information management technology which solves the identified issues. This includes descriptions of several alternative scenarios. 

As requested, Braun presents an assessment of the future participation of ISIS.

Participants in this study This study is based on information collected during meetings with these professionals and their institutions.

Persons

Institution, Location

Paul Sieswerda

Aquarium For Wildlife Cons., NYC

Ruth Allard, MEM

AZA, Silver Spring MD

Brandy Smith

AZA, Silver Spring MD

Tom Meehan

Brookfield Zoo, Chicago

Bob Lacy

Brookfield Zoo, Chicago

Tom Schneider

Detroit Zoo

Sue Dubois

Disney’s Animal Kingdom, Orlando

Mark Stetter, DVM

Disney’s Animal Kingdom, Orlando

Jill Mellen, PhD

Disney’s Animal Kingdom, Orlando

John Lehnhardt

Disney’s Animal Kingdom, Orlando

Ruth Mazak

Disney’s Animal Kingdom, Orlando

Nate Plesness

ISIS, Minneapolis

Paul Scobe

ISIS, Minneapolis

Mike Kelly

ISIS, Minneapolis

Kim Hastings

ISIS, Minneapolis

Steven Thompson, PhD

Lincoln Park Zoo, Chicago

Joanne Earnhart

Lincoln Park Zoo, Chicago

Robyn Barbiers

Lincoln Park Zoo, Chicago

Karin Schwartz

Milwaukee County Zoo

Kevin Willis

Minnesota State Zoo, Minneapolis

Brian Banks

Monterey Aquarium

Hans Keller

National Aquarium, Baltimore

Richard Lerner

Ocean Journey, Denver

Ingrid Porton

Saint Louis Zoo

Jerry French

Saint Louis Zoo

Jeff Boehm, DVM

Shedd Aquarium, Chicago

Bob Van Valkenburg

Shedd Aquarium, Chicago

Andy Odum

The Toledo Zoo

Jay Hemdal

The Toledo Zoo

Skip Young

Vancouver Aquarium

Robert Cook, VMD

Wildlife Conservation Society, NYC

Bruce Bohmke

Woodland Park Zoo, Seattle

Abstract of top findings

Data organization discourages correlation  For purely historical reasons, data gathered by different specialists resides in separate databases, managed by independent software applications. Interviews with zoo and aquarium professionals show that their greatest dissatisfaction is the inability to easily relate different specialists’ data to each other, in order to understand the effects of human intervention and enrichment on animal behavior, over time. Staff frequently cited examples of species experts whose knowledge will be lost, eventually, due to their inability to re-create the inferences learned during a lifetime of management or analysis. 

Institutional isolation discourages data conformity  Local observation and data collection standards, when they exist, vary widely. The use and interpretation of observation codes has not produced significant volumes of comparable data. Variations in transmission times of local data to the ISIS collection point, and the infrequent release of the global collection, assure that every observer sees a different picture of the specimen population. Studbook inaccuracy is the result of deducing relationships from event data of widely-varying quality and consistency. 

Over-emphasis on software solutions  Data collection and analysis solutions focus on the development of software. Each major specimen management issue has an associated stack of applications which contain specialized procedures for treating particular data sets. Each solution possesses just the bare minimum of analytical and visualization capabilities. This leads to unproductive data management time to extract, reformat, and import data into external analysis tools. The reliance on software as a bridge between specialties produces a corollary effect: reluctance to reconsider database designs.

Uncoordinated local software solutions  In order to meet immediate needs, institutions with the appropriate resources are undertaking their own software developments. Among AZA members, many will duplicate each others’ efforts, but inevitably neither input nor output formats will be compatible, owing to an absence of standards. Furthermore, the use of non-robust desktop software development tools and amateur software engineers will result in designs which increase unproductive raw-data management time.

Low priority given to data collection  Low expectations of return value on time invested in data collection result in low levels of standards compliance. The use of volunteer observers, a lack of educational resources, and the high work load of registrars compromise significant amounts of historical data. There is considerable disagreement on which data deserve digital storage, and which should be globally- or locally-administered. 

Abstract of top recommendations 

Centralize collection data  AZA should create a central, enterprise-wide, World-Wide-Web-accessible, relational database meeting commercial standards of robustness, security, and flexibility, for use as a benefit of membership. It should include built-in data consistency and accuracy standards, automatically imposed upon all software applications. 

Disengage data from software  AZA members should meet changing administrative and analytical requirements through mediated amendments to the central, operational database design. AZA should encourage the use of off-the-shelf analytical and visualization tools as a replacement for custom software solutions. 

Recognize operational and historical data  AZA should meet the often-cited needs to know current populations; to solve accession and movement data control problems; and to allow for better trend/correlation analysis. To do this, the AZA data model should incorporate both operational, current-status and historical, summarized accumulations of specimen transactions. 

Recognize local and shared data  AZA should establish standards with respect to the residence of data: local data for institutional administration, such as personnel and material inventory, versus shared data for specimen and species inventory and analysis. Clearly, it will continue to be essential to gather both kinds of data. This study mainly addresses the most urgent information  management needs based on sharing the specimen inventory and the computable attributes of the various specimen-related subject areas. 

Employ automation to improve quality  AZA should encourage members to take advantage of data collection technology’s state of the art, in order to encourage compliance, reduce error, and simplify procedures. AZA should devise a long-term, universal data collection approach, including the use of (for example) standard data entry forms, hand-held computing devices, and enclosure environment monitoring devices.

Data Collection Requirements  This chapter summarizes our findings on data collection within the various subject areas. 

Findings 

Data collection is almost universally hand-written. This is widely recognized as a major deficiency, because it is the origin of most of the nonproductive data re-entry performed by registrars and curators. A number of operational aspects of collection management also require data re-entry. 

Data is almost always out-of-date by the time it becomes available to workers seeking universal collection comparison or correlation. This is due to the ISIS data dissemination schedule, and also to the backlog of data entry within institutions. 

Accession and genealogy data are widely believed to be inaccurate due to the decentralized approach to data collection. Loaner and borrower may make conflicting statements about where a specimen resides. This data is subject to varying interpretation in order to produce inventories or even to assess the identity of individuals. 

Family and behavior/accession/event history is recorded in separate databases, leading to inconsistency. Registrar and curator decisions affect the inclusion of data into the universal summary; these decisions are not governed by a universal standard. For example, there is no consensus on the definition or needfulness of:

·        intra-enclosure specimen location;

·        bio-mass consumption in caloric, composition, or mass-units of measure;

·        waste-mass removed from enclosure or water filtration system;

·        aggression typology and criteria of measurement; or

·        training and enrichment response typology and criteria of measurement.

Medical records are extracted from a separate local database for delivery along with the specimen; these records may or may not be in a compatible, machine-readable format for the benefit of the destination staff. These records are generally not accessible while performing a husbandry or collection planning study; staff time must be expended on written surveys or telephone assaults. 

Operational and medical data specifically related to aquatic habitats are collected haphazardly, and are not directly comparable in any universal forum. Some aquariums have taken matters into their own hands, but the cost of programming, and the absence of  agreement on data collection standards, have isolated these efforts. 

Egg/larva data are collected in various media, but generally are not organized for historical study, nor is parentage consistently recorded. There is no ability to produce a meaningful universal egg inventory. 

There are no universal data collection or accuracy standards to facilitate planning, comparison, or historical study; for example, with respect to:

·        biological life stages,

·        aggressions,

·        sexual characteristics,

·        individual or group behaviors,

·        collection plan characterization and group/individual suitability,

·        training or enrichment response,

·        water treatment assessment,

·        pathology, necropsy, laboratory evaluation, drug administration reports,

·        fish and invertebrate taxa measurements,

·        enclosure characterization, measurement, and population status,

·        group immigrations and emigrations, or

·        accession status while specimen in-transit, quarantined, etc.

Software tool relationships 

Data collection includes a substantial amount of raw data movement and data re-entry due to a lack of true interoperability among the leading software products. In order to recognize the existence of a specimen in an application, it is currently necessary to import the specimen definition from another application. In order to perform custom analysis on any collected data, it is currently necessary to export data sets into flat files, and then import those into analytical tools. 

The current data management situation is illustrated by the following figure.

CMS
inventory

ARKS
inventory

MedARKS
veterinary

REGASP
collection plan

SPARKS
genealogy

Raw Data Entry

Raw Data Entry

Upgrade

Data Export

Data Export

Data
Export

External Tools

External Tools

External Tools

Data
Export

Data
Export

Generalized data collection requirements 

From the findings it is clear that the first order of business is to establish AZA standards for data collection, definition, and measurement accuracy. AZA must decide which specific subjects the universal database will support. Then, it must separate the universe of possible data points into those which will add the greatest value to the largest proportion of its membership, and those which are, for the present, marginal. 

The greatest urgency is to collect accurate data in several subjects so that they may be compared. The selected subjects should be drawn from those which traditionally have been separated by software walls. The second urgency is to strictly limit the number of collection data to the indispensable. 

AZA collection procedure standards 

Then, AZA should produce a book of procedures to guide the physical data collection process for major groups of species. This should specify the methods of observation and collection to be used. For example, it could stipulate specific events, interactions, measurements, nomenclatures, and other criteria for data inclusion. It might also include standard forms to be reproduced and used in the field to minimize handwriting and variation among observers. 

Data dictionary 

The next step should be the production of a data dictionary, which may be published in book form, or on the AZA Web site. This dictionary would become the standard used for settling all arguments as to what is collected, and how it is stored, in the universal database. The dictionary might be organized in order to show, for every datum:

·        its official name, for discussion and publication purposes,

·        its internal database name,

·        its internal data format,

·        the limits upon its value,

·        the accuracy standard which governs its measurement,

·        the interactions between this datum and any others,

·        who controls or authorizes its definition,

·        who should collect the datum, and

·        which standard procedure governs its observation and collection.

Single data entry 

The primary data collection system should be a user interface to the universal database. Changes made to this database should be available instantly to all authorized users. If effective procedures and a data dictionary are available, tedious review of standardized field notes can be eliminated. 

Accession, collection planning, survival planning 

Conflicting specimen status, in-transit “limbo”, and manual surveys of suitable collection candidates could be eliminated by using centralized user interfaces and generalized data query capabilities. AZA procedures, together with a compatibly-designed data-entry form, enable certainty as to the current location and owner of a specimen or group.

Inventory 

On-site, off-site, regional, and universal inventories of specimens will be instantaneously updated as changes occur in the database. This is the most requested and beneficial aspect of a replacement system. An analogy to databases such as airline seat reservations is instructive, since both supply large user communities with centralized access control, instant updating, concurrent inquiry, and security. 

Veterinary 

Medical status of every group or individual will be instantly updated as changes occur in the database. Special security provisions may be applied to preserve the anonymity of institutions, when reporting certain classes of medical notes. 

Overall satisfaction with the comprehensive data collection capabilities of MEDARKS is high. This suggests that it should serve as a model for inclusion in the AZA system. 

Husbandry 

Parentage and progeny will be recorded in the same database for individuals and groups. AZA standards will enable tracking individuals which immigrate and emigrate from groups, and tracking groups which divide and recombine. Sufficient data elements will be retained to deduce statistical levels of in-breeding and probability of lineage from specific ancestors. 

Imagery 

Provision must be made for uploading and retaining digital imagery of specimens, identity markers, enclosures, or medical procedures. 

Unshared-local data collection 

This study divides the world of data collection into shared-universal and unshared-local hemispheres. We claim that this strategy solves certain classes of management, accession, and analysis issues; but there is also a practical need to reduce the scope of an initial replacement system development so that it can be designed and deployed in a reasonable amount of time, at an acceptable cost. What is to be done with data which does not meet the criteria for sharing in a universal database? 

Braun makes no claim with respect to such data, owing to the huge number of local administrative requirements. If data are not sharable because they are of strictly-local interest, not standardized, or not related to animal management, then there is little incentive to pay the cost of making such data sharable. For example, some participants suggested keeping shared inventories of consumable goods and foodstuffs; costs of materials; and budgetary and tax information. 

Braun distinguishes (further on) between computable and non-computable data types. There is no direct relationship between the computability of data and its likelihood to be sharable.

Information Analysis Requirements 

This chapter summarizes our findings on information analysis as a function of zoo and aquarium administration. 

Findings 

There is little analytical reporting or statistical processing available to professionals using the most common zoo and aquarium applications. Almost all of the visualization capabilities present today pertain to operational status details. A few specialized products can produce analytical output. SPARKS can produce some population-age tree graphics, for example. 

The existing standard, packaged reports professionals use tend to be sparsely-formatted, character-cell-based technology more suitable for impact printers of the previous decades than the inkjet and video technologies of today. Due to inefficient use of page space, detailed specimen inventory and medical history reports tend to become huge documents. 

The most common application software supplies awkward abilities to customize those reports. They permit the user to create data views based on certain criteria, but there is not complete freedom to express complex data selection criteria in a general way, nor to mix and match databases used by different applications. 

Specialists in the various subject-areas repeated a call for reporting of notion X sorted by notion Y, with selection criterion Z. Every possible combination came up for discussion in the interview sessions. Some of the key requests are summarized below. 

Keepers / trainers / animal husbandry professionals 

  • Specimen inventory by individual, location, group, taxon, egg clutch, contraception history.
  • Operational status by taxon, age, breeding success, geographic location.
  • Collection plan by surpluses, taxon, location, enclosure availability.
  • Egg inventory by taxon, date, parents.
  • Comparisons of local and global behavioral/life-cycle events by taxon, age.

Collection planners 

  • Behavior observation by taxon, enclosure, social group, inter-species group, medical history.
  • Breeding success by enclosure, population density, medical history, behavior observations.
  • Medical contra-indications by taxon by collection plan.

Population management professionals 

  • Taxon distribution by institution, geographic region.
  • Operational status by births today, deaths today, progeny by age.
  • Inventory by taxon, life stage, gestation stage.
  • History of accession and enclosure movement by taxon, group, individual compared to breeding success.
  • Comparison of mortality, birth rate to geographic location, medical history, nutrition history, water condition.

Species survival programmers 

  • Inventory by kinship, probability of relationship by taxon, group, individual.
  • Comparison of mortality, birth rate to geographic location, kinship.

Aquatic specialists 

  • Inventory by taxon, group, individual by accession, nutrition, enclosure, medical indication, age, water treatment.
  • Trend analysis of water quality by taxon, group, individual by medical indications, breeding success, mortality, photoperiod.
  • History of accession and enclosure movement by taxon, group, individual by disease outbreak, drug application, nutrition.
  • Surplus inventory by universal collection plans.

Veterinarians 

  • Pathology, necropsy, microbiology, drug application, lab report summaries by individual history.
  • Comparisons of drug or procedure effectiveness by taxon, drug, procedure.
  • Comparison of medical history to location climatic or water chemistry conditions.
  • Reporting of non-accessioned (fieldwork) procedures.

Generalized reporting requirements 

From the findings we can see that the most frequent request is that the walls between application databases should be pulled down. Most interviewees also voiced a need for more generalized reporting capabilities, to enable the creation of comparisons when inspiration strikes. From comments such as these, we can establish some high-level analytical requirements. 

Managed queries

The most successful strategy to meet the vast majority of perceived and future needs is the use of a managed query application. This is a generalized, graphical user interface to a relational database, which invites the user to select from a list of data items. The user can build a report or visualization by specifying any data items as column values or selection criteria. 

The query management application devises a relational database query behind the scenes, and returns the results to the user. The system user does not need to know how to program, nor even the exact details of how the data is stored in the database. As a result, the user has complete freedom of selection over the entire spectrum of zoo and aquarium data. Moreover, the user can specify any set of data item values as selection criteria, for eliminating unwanted detail or reducing the scope of an investigation. 

It is possible to create managed query environments which highlight the detailed data items most-frequently used by specialists. Each group of professionals could use its own “view” of the database for convenient access to concepts and measures unique to their discipline. 

Correlate anything 

The database must place all subject areas within logical reach of each other, so that investigators may correlate any item with any other, at will. The ability to quickly discover functional relationships is a key to understanding practical interactions. 

This can be accomplished by centralizing all subject areas; using common identities for individuals and groups within the entire database; using standard descriptive indicators for behaviors and medical indications; and by employing a data model which separates every specimen event of interest into a separate, computable data category. 

Analytical measurements 

The database interface should permit the user to create sums, averages, counts, standard deviations, variances, or other statistical indications, at will, over any range of comparable data items. Furthermore, it should permit users to create and experiment with ad-hoc formulae using any combination of numerical datum or measurement, employing conventional arithmetic, trigonometric, or logarithmic  operations. 

User formatting 

The database users must be able to create report and visualization formats to suit their taste or specific needs. For example, it should be possible to create denser inventory reports that use paper real-estate better, reducing the physical bulk of such output. It should also be possible to use color and fonts to create reports suitable for presentation and publication. 

Visual trend analysis: bars, lines, scatters. 

Most professionals requested a means to understand trends. This is most easily accomplished by graphical visualization of data using a variety of formats widely available in database analytical applications. It should not be necessary to export data sets into other applications. Built-in reporting capabilities should include basic visualization in the form of circle, bar, and line charts and scattergrams. 

Data extraction in common formats 

Even if the basic database interface can supply correlations, measurements, formatting, and basic visualizations, there may still be a need to move data into other utilities for processing. The database interface must permit the user to export data sets, derived through the managed query process, into common data formats. 

Some reasonable formats to consider include comma-delimited text, Excel spreadsheets, and Dbase files.

Specialized Requirements 

This chapter describes some requirements which are particular to the AZA membership’s information management environment. 

Specialized technical observations and vocabularies 

Behavioral observation, environmental enrichment results, training results, and medical-veterinary notation received similar reviews in the interview sessions. Professionals agreed that recording animal behavior is essential to long-term understanding of individuals and species, but that using observations with statistical rigor is virtually impossible. Without rigor, comparisons and evaluations become meaningless, and data entry is devalued. 

Most attempts at rigor thus far have followed the lead of ARKS event coding. The two most frequently cited problems with codes are that: they are too general because their qualifying attributes are non-computable, ad-lib text fields; and they are too vague in their functional definition, inviting misinterpretation and misapplication. 

The other case of observation pertains to veterinary pathology and necropsy notes. One practitioner described these as the DVM’s thought-process. Although valuable as a research resource, and to some degree searchable, the notes are rarely coded consistently with respect to spelling or selection of synonymous terms. Hence, lab notes are not computable, either. 

By computable, we mean that a data field can be treated either as a number, upon which a calculation, such as summation or average, can be performed; or as a textual token, bearing a particular meaning in all contexts, which can be compared or counted in order to derive the frequency of an observable notion. 

Computable observations have the very important characteristic that they can be searched for and reliably used as record selection criteria. Professionals generally trust a database retrieval if they sense that searches for a notion will return all instances thereof. This is not currently possible as, for example, a search for “laceration” will miss “wound”. 

While it’s true that a code is a textual token, it does not possess an obvious meaning in human language, leading to its deficiencies. And while ad-lib notes do possess meaning, they are not tokens, hence, not computable. 

The following suggested solutions to these issues may incur research and development costs. 

Greater database complexity 

Observational coding can be avoided by rationalizing each code event as a special kind of transaction, carrying its own set of attributes. Each transaction will be stored in a database table containing only that transaction type. This model will greatly increase the number of database structures, but also will eliminate the deficiencies of coding. Since every transaction attribute is a controlled database field, all such transaction data becomes computable and searchable. This forces all observation events to be rigorous. 

The deficiency in this solution is the complexity of creating new events. An administrator must review the suggested event, then change the database to contain a new structure corresponding to the new event, with its computable fields. Moreover, he or she must create the corresponding data entry form. 

It may be feasible to automate some or all of this process, the cost which is unknown. AZA could restrain the initial cost by developing a standard, reduced set of observation events which all institutions agree to track rigorously in the global database; all other events would then be regarded as local administration data. 

Restricted vocabularies and thesauri 

The vagaries of ad-lib note-taking can be avoided by restricting the vocabulary, and even the grammar, of notes, to a specialized dictionary. A very limited solution has been implemented as a series of drop-down menus. This is very laborious to use when creating lengthy descriptions. Another vision of this solution would call for an auto-correction feature which rejects typed words not present in a dictionary, and replaces valid words with an approved synonym found in a thesaurus. Since every observation would be auto-corrected into a series of standard, iconic notions, all observations become computable and searchable. This forces all observation notes to be rigorous. 

The deficiency of this vision is in selecting the vocabulary. Since professionals use notes to express their thoughts, it may be distressing to see their every other written word auto-corrected into the approved terms of a robot. Some human testing would be required to determine whether the solution is survivable. 

Nevertheless, this technology is available today in the leading desktop word processor. The cost of obtaining, setting up, and deploying this technology is unknown. AZA could restrain the cost by inviting professionals to develop standard, limited vocabularies for each specialty, which would be recorded rigorously in the global database; other, ad-lib notations would be regarded as local administration or non-computable global database fields. 

Automated data collection 

An unusual problem of AZA data collection is that behavioral observations or special notations tend to produce too much raw data. Generally, a curator or registrar filters out comments of local interest before transcribing data to ARKS. This double-entry procedure is time-consuming, and produces a double opportunity for loss of useable data. At the same time, it is all too easy for volunteer observers to bypass important matters that professionals may be expecting to study. 

Braun recommends that AZA seriously consider hand-held devices with programmable data entry forms. Although perhaps still too expensive for every institution, the prices of suitable devices are plummeting. 

Palm-tops have some desirable characteristics. The tiny forms solicit yes-no or pick-list data, eliminating handwriting and spelling as variables. The subject matter of the forms regulates the content of the data collection to just those items deemed worthy of digital recording. By following a simple procedure, the observer can be directed to collect exactly those items pertaining to daily rounds or special studies. This encourages the widespread collection of comparable, computable data. 

An AZA task force might recommend a series of standard forms for various taxon group observers, assuring that member data collections are of comparable levels of detail. 

The major disadvantage is that the observer will not be invited to collect random commentary in the digital medium. One outcome might be a failure to collect a warning sign or other unanticipated event. A backup medium such as handwritten notes might still be required, although before long it is possible that hand-held devices will possess a digital voice recording capability as well.

Vision of AZA System 

This chapter describes the scope of a replacement for the current software patchwork.

General 

The AZA system will be centralized; that is, all global data transactions will occur at the same database. It will consist of an information management Web server and a relational database server. A single logical database will contain all data structures which support the AZA Web applications. This includes data item descriptions and validation rules, which govern the content of individual fields; and data integrity rules, which govern the relationships between tables (equivalent to ARKS files). 

Entry to the applications will be by way of a hyperlink from the AZA Web site. The browser will demand authentication by institution name, user name, and password before granting access to the applications. 

The Web applications will consist of data entry forms and data analysis portals. The forms will be used for entry and review of raw data, much as is done today with such programs as ARKS. The portals will be used for accessing preformatted, standard  reports, or for creating ad-hoc managed query reports. 

Relational database management system 

The AZA system will employ a robust, industrial-strength, multi-user-capable, relational database management system. This has the important characteristic that any data item in any table can be compared with any item in any other table, in order to find a correlation or other relationship between subjects. There are no restrictions inherent to this principle – behavioral, medical, enrichment, lifecycle, accession, enclosure, and ancestry, for example, could be simultaneously compared in order to perform a multi-dimensional analysis. 

A wisely-chosen, forward-looking, relational data model is essential to success. The characteristics of animal observation and planning will change little in years to come. A data model which properly reflects those characteristics will survive new computer technologies, whereas software frequently does not. 

The relational database will be the back-end of the AZA system; the user will not access it directly. 

Transactional, operational, and historical data models 

The AZA database will comprise distinct models: transactional, operational, and historical. These designs serve different purposes: 

The transactional portion will receive the raw data changes as they are performed by institutions. It is organized specifically to enhance the performance, security, and self-validation of data entry and updating. A transaction in this context corresponds to any change in the state of a specimen; what ARKS calls an event. The database will contain every kind of data description which represents an animal, its ownership, location, measurement, identification, gender, program suitability, etc. – as of right now. 

The operational portion will contain one or more subject-oriented data tables which record the current state of animal operations, in a more compact form, more suitable to analysis, visualization, and statistical evaluation. It is organized specifically to enhance its performance during query and retrieval. Transactions arising from data entry will be automatically reformatted and transcribed to the operational database. Some transaction descriptions may be omitted from the operational database for brevity, or if there is no regular need to perform analysis upon them. The database is used to obtain fast snapshots of the current state of the global population. 

The historical portion, called the warehouse, will contain one or more subject-oriented data tables which record the summarized, previous state of operations. It is also organized specifically to enhance query performance. Because it is all about discovering trends and correlations, the warehouse generally sacrifices some details, and automatically summarizes some values. 

For example, the transaction portion might record every new local identity given a specimen which is frequently loaned; the operational portion might record this specimen only by its internal database number; the warehouse might summarize certain numerical facts about its species, but the individual might not appear as such. This is not a limitation; it is designed to promote simplicity of reference to the specific levels of detail needed to perform various kinds of data analysis. Those levels will be selected during detailed analysis and design of the physical solution. 

In any event, all of the data models will be accessible all the time – for special queries, transactional, operational, and historical detail will be mixed and matched to produce the exact results required with minimal computation. 

Raw data entry and review forms 

The AZA system will employ custom Web-based forms for data entry; the forms will communicate with the relational transactional database. Whereas some rudimentary validation will be performed in the form, such as field navigation and picking values from list-boxes, all complex field validation or data integrity rules will be enforced by the relational database server. Violations will be reported on the Web form. 

In general, the same forms will also be used for examining raw data records in a query-by-example mode. In this mode, when values are entered in any fields, the form can be directed to perform a query restricted to records having the same values in the same table fields. Hence, any field which can be changed can also be used as a retrieval condition. 

There will be one or more Web forms for each subject area that requires data entry; equivalent in purpose to the various ARKS or MEDARKS displays. Access to the several forms will be limited according to the security grants associated with the user’s authentication. 

The Web forms will be the front-end of the AZA system; the user interacts directly with them. 

Data analysis portals 

The AZA system will employ reporting and analysis portals. These are Web page hyperlinks which lead directly to pre-formatted reports, which may be viewed or printed; or to a managed-query interface, which may be used dynamically to create custom reports involving any combination of fields in the database. 

Pre-formatted reports will be designed and programmed during Web site development. They will run on a pre-selected schedule, so that their contents will be automatically updated. Reports could be devised which mimic current ARKS, SPARKS, and MEDARKS reports, for users who prefer those formats. 

Managed-query reports are dynamic. The user will interact with a generalized interface which presents classes of database fields, derived from the operational database or the historical warehouse, from which may be selected just those items of interest. The user will drag and drop them into a report creation window. A report generator on the database server will convert the user’s graphical representation of the report into a relational query. Upon the completion of that query, it will display the data on the Web browser. This process may be repeated until the user is satisfied with the ad-hoc report. 

The managed-query report generator is the middle tier of the AZA system; it interacts with both the user and the database server.

Development and Deployment Scenarios 

This chapter evaluates several conventional development and deployment scenarios. 

Development options 

The selection of the development method affects the time between project initiation and the first visible signs of the finished product, and thus, the morale of the user community. This discussion isolates the major methods which could be applied to an AZA system development. Please note that combinations of these methods in various degrees are also possible. 

Big Bang development 

This method assumes that every possible user requirement and design specification has been collected. Then, the development team locks itself into a room and works until it is done. The sponsor sees the entire system, as delivered, at one time. 

Although it may sound laughable, this method is frequently employed in the development of complex software products with long development times and many programmers; the work is commonly shipped off-shore where communication may be difficult, but fee rates are very low. 

The advantage is that the sponsor can reap a huge reduction in development time and cost. The disadvantages are that the specifications must be extremely accurate, leaving nothing to chance; and some fine-grained project control is lost. 

This method should not be used if the specifications evolve during development, or if there is a desire for sponsor review of developing interfaces. The interruptions for communication tend to put the work behind schedule. 

Rapid Application development 

This method assumes that the sponsor takes an active role in the development and will be closely involved in evolving and approving low-level portions of the work. The developers produce a series of quick prototypes, each of which is reviewed, and whose specifications can be tuned by the sponsor. Each successive prototype cycle includes a greater portion of the desired functionality. At some point, the sponsor concludes the cycle and moves the work to production. 

This method is frequently employed in the development of data entry applications having a large number of user interface forms, in which the final look and feel needs adjustment. It’s also used when the sponsor has a good general idea of how the final product should behave, but does not possess detailed specifications. 

The advantage of this method is that the sponsor is frequently made aware of all visible progress of development. This has a major, positive effect on everyone’s enthusiasm. The disadvantage is that there’s a tendency to tweak the specifications past the point of adequacy to the task, greatly increasing cost. 

Parallel development 

This method can be used in addition to either of the above. The sponsor identifies a number of independent components of the overall system design, for example, medical and behavioral; these become separate development tracks. At some point, enough of each track is completed to warrant an integration test in which both tracks are permitted to interact. 

This method is frequently employed in large projects when a definite deployment date must be met. A larger-than-normal development staff is assigned to the various tracks. When the components are truly independent, little communication between groups is necessary. Thus, the extra staff can reduce the overall development time. However, it stands to reason that the overall cost will be no less than for single-track development. 

The database on day one 

The AZA database will require data, which has to come from somewhere. One frequent issue in the interview sessions concerned the disagreement between studbooks, global ISIS data, and local ARKS data. The fundamental problem is that the facts in the databases often do not agree with facts on the ground. Therefore, AZA must confront the initial data scenario. There are several possibilities. 

Start from zero 

When starting from zero, we mean that the database is opened for business without any specimen inventory at all. By agreement, all participants survey their collections and perform a comprehensive data entry. Moreover, they will have to examine their records to locate and enter parentage lines, behavior logs, collection plans, enclosure/tank histories, accession histories, group affiliations, and so on, as far back as necessary. 

The advantage of this undertaking is that the database will be clean. Every animal and group known to exist by an institution will be recorded by that institution; the facts on the ground will be represented by the data. The built-in data validation rules would be activated to catch data errors upon entry. 

The disadvantages are that it is a lot of hard work to perform the initial data entry; and the typing skill of the operator will have a major effect on the outcome. There is a low likelihood that staff everywhere will have sufficient free time to perform data entry. Another option is to temporarily employ local data entry specialists to transcribe written records. Of course, someone will have to produce those records. 

Start from ISIS, add artificial intelligence 

In this scenario, we will transform the latest ISIS databases into the AZA database model, applying programmatic techniques to correct as much missing or incorrect data as possible. This is not to imply that ISIS hasn’t already performed much of this itself; there may be cases in which additional, external data sources can be applied in order to make corrections. For example, local ARKS or SPARKS data sets may contain needed facts. 

The advantage of this method is that the data on day one will be no worse than the ISIS databases, and may be better in certain subject areas. AZA members will, at least, be free to view their specimen inventory on-line and make corrections. The built-in data validation rules would be activated to catch data errors upon entry. 

The disadvantage is that after all is said and done, no amount of computer intelligence will re-create certain data where there is none. Some omissions are forever. 

Start from ISIS, supplement with survey 

This scenario begins like the previous one. Instead of attempting automated corrections, we assume that manual inspection and correction will be the best way to corroborate the facts. Instead of a comprehensive survey, staff can carry new inventory lists on their rounds, to perform a snap survey in the course of their duties, in order to make only obvious corrections. 

The advantage of this method is the same as the previous one. The disadvantage is the need to inspect the inventory lists and perform corrections. Once again, the availability or willingness of staff to undertake the data entry is a limitation of all scenarios. 

Deployment options 

It is hard to separate development from deployment, since both require advanced planning, and there can be a strong relationship between them. The success of the AZA system depends on how the user community is initially exposed to it. Note well: for all scenarios, it is essential that AZA supply a help desk staffed for the entire range of time zones in which users will have system access. 

All-sites rollout 

Similar to a Big Bang, on a certain date, the system is turned on and end-users are welcomed to connect to it. This kind of rollout assumes that comprehensive training has taken place beforehand, that on-line help is constantly at hand, or that the system is transparently simple to operate. 

The advantage of this scenario is that it is simple and quick to unleash. The problem is that user backlash in the form of questions or problems can completely overwhelm the service and support side of the system. This condition occurs rather frequently in industry, as when, for example, a new billing system mails a completely new, confusing format to befuddled customers. 

The choice of the all-sites method depends of the complexity of the system and the sophistication (and tolerance) of its users. For example, Web crawlers and Web information gateways frequently change their layouts and functions, with no warning and very little user confusion, apparently because such Web pages are self-explanatory; and the expectation is that commercial sites change to suit corporate self-image. 

It is unlikely that this scenario would please AZA members. 

Bellwether rollout 

This method opens the new system to selected power users or certain progressive, bellwether institutions. They are initially prepared for the experience with training and perhaps even exposure to the system during development. Production usage begins with this small subset of AZA membership, so that human engineering faults can be caught, assessed, and repaired without swamping the support staff, nor disenchanting the entire community. 

The advantage of this scenario is that it ramps up system usage and simultaneously overcomes glitches, so that by the time all members are on-line, no major issues exist. Its disadvantage is that it can somewhat lengthen the development-test cycle, while the user community is gradually enlarged. 

Another issue is that it complicates the selection of “database day one” strategies. If AZA rolls out a complete system to the entire membership, it can adopt the start from zero or start from pure ISIS approaches, because member use will rapidly increase the data accuracy. But if AZA rolls out only to bellwether members, then the entire database should initially be totally up-to-date, or, one might argue, it is of no great use to those early adopters. 

Hosting options 

As mentioned above, a central AZA system will require centralized administration, maintenance, and support. However, it matters little where each of those functions is located, provided that they are on call when the system is in use. 

There are several basic functions that must be supplied. 

  • Physical location. The system consists of some computing, storage, and network gear, which must be physically housed in a specialized environment. It must be redundant, in order to survive ordinary hardware problems; hardened, to survive local weather conditions; powered to survive interruptions in the electrical grid; connected to the Internet with enough bandwidth to meet peak user demands; and secure against theft, vandalism, and hacking.
  • Physical backup. The system is all about data, and the data must be periodically, systematically copied to another secure medium in case an act of God destroys the entire local infrastructure. The backup medium can be used to swiftly restore the system to operation.
  • Administration. The system must be secure against casual or methodical hacking. This will require a security administrator to oversee the creation of accounts and passwords which grant entry to the application and the back-end database. The administrator might feasibly be responsible for evaluating requests for system enhancement, and periodically engaging services for system modification.
  • Maintenance. The system will be built out of many cooperative software products supplied by a number of vendors. Corrections to those underlying products may be required from time to time in order to meet support agreements. Staff to install these corrections must be available. Furthermore, administrative decisions to enhance the system will require maintenance activity to perform modifications in the database or the front-end applications.
  • Support. The user community will need a source of answers when their data entry or analysis needs cannot be met, or result in confusion. The number of people involved in direct-to-user support must be calculated from the maximum number of users during reasonable working hours across the country. AZA must consider toll-free phone with voice-mail, fax, e-mail, and Web-based “chat” technologies. AZA must establish support responsiveness guidelines for user satisfaction.

 How might AZA supply these features?

 

Self-hosting

 

By far, the most expensive and risky approach is for AZA to take on its own system. If AZA has no current experience with highly interactive Web hosting, then this may not be a reasonable choice. Every staff position and hardware type described above should be considered a minimum roster, and all would need to be hired, leased, or purchased for full-time operation. AZA would require the resources to fund all of those new staff positions. 

Although it is possible to contract for turn-key installation of a complete hardware and software environment, that option depends on AZA possessing the necessary real estate in which to house it. 

The lease cost for the telecommunications infrastructure is impressive. For a user community based upon 150-200 locations, nothing less than a T1-equivalent could be considered; its cost is around $2,000 per month (­±30%). This does not include the lease cost of office space or Internet interface electronics, such as the firewall. 

The main advantage of self-hosting is total control of the environment to suit AZA’s exact specifications. 

Remote hosting 

A lower-cost option involves the leasing of space in a hosting service facility. These operators supply secure, hardened building space, electronics racks, a specified amount of un-interruptible power, a specified amount of Internet connection bandwidth, and a specified frequency of storage backup to secure offsite media. They incorporate service guarantees to meet specific levels of responsiveness to disaster scenarios. 

They do not supply security administration, software maintenance, or application support. Anything that cannot be automated, or is not a hardware function, is the lessee’s (AZA’s) responsibility. They do not sell hardware, although they may impose regulations on what rack size or power consumption levels they can accommodate. 

The basic infrastructure package can be leased for $3,000 to $30,000 per month, depending on service options. This does not include the cost of AZA-owned hardware installed in the remote location. 

The main advantage of remote hosting is that the infrastructure development is left to experts, whose cost is shared by their other customers. 

Conclusion 

Zoo management and data analysis is fundamentally different from commerce, but information management works essentially the same way for everyone. Each interaction between service and client – in this case, Animalia – throws off data of many kinds. Time, location, identity, and value indication are common to all such interactions. The trick is to capture just enough appropriate detail to help the service improve the next interaction.

 That real-time collection management and data analysis will be conducted with Internet technologies is given; it is a technological imperative. AZA must define a data domain which will supply its members with the maximum, near-horizon benefit; this is an informational imperative. 

The greatest benefit will be derived from data which is well defined, widely understood, assiduously updated, and easily accessed. Technology will supply the access; members will supply the updates; AZA must supply the data definitions and facilitate  understanding. These requirements are the foundation of an information delivery system which allows professionals to convert raw data into historical insight. 

Within a short time, curators, veterinarians, and keepers will be able to feed back historical insight directly to planners. As the data come full-circle, technology’s promise will be realized.