Difference between revisions of "Metadata"

From Marspedia
Jump to: navigation, search
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{pp-move-indef|small=yes}}
+
Metadata is information about [[Information Infrastructure|information]], embedded in data files.
{{For|the page on metadata about Wikipedia| Wikipedia:Metadata}}
 
The term '''metadata''' is an ambiguous term which is used for two fundamentally different concepts ([http://en.wikipedia.org/wiki/#Metadata types|types]). Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be "data about the containers of data". Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description (resulting in a disambiguating [http://en.wikipedia.org/wiki/neologism]) would be "data about data content" or "content about content" thus [http://en.wikipedia.org/wiki/Meta Content Framework|metacontent]. Descriptive, Guide and the [http://en.wikipedia.org/wiki/National Information Standards Organization] concept of administrative metadata are all subtypes of metacontent.
 
  
Metadata (metacontent) is traditionally found in the [http://en.wikipedia.org/wiki/library catalog|card catalogs] of [http://en.wikipedia.org/wiki/library|libraries]. As information has become increasingly digital, metadata is also used to describe digital data using [http://en.wikipedia.org/wiki/metadata standards] specific to a particular discipline. By describing the [http://en.wikipedia.org/wiki/Content (media)|contents] and [http://en.wikipedia.org/wiki/Context (computing)|context] of [http://en.wikipedia.org/wiki/computer file|data files], the quality of the original data/files is greatly increased. For example, a [http://en.wikipedia.org/wiki/webpage] may include metadata specifying what language it's written in, what tools were used to create it, and where to go for more on the subject, allowing browsers to automatically improve the experience of users.
+
An article on this topic exists at Wikipedia, [http://en.wikipedia.org/wiki/Metadata '''Metadata''']
  
== Definition ==
+
{{Subminimal}}
Metadata (metacontent) is defined as data providing information about one or more aspects of the data, such as:
 
* Means of creation of the data
 
* Purpose of the data
 
* Time and date of creation
 
* Creator or author of data
 
* Location on a [http://en.wikipedia.org/wiki/computer network] where the data was created
 
* [http://en.wikipedia.org/wiki/Technical standard|Standards] used
 
  
For example, a [http://en.wikipedia.org/wiki/digital image] may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.
+
[[Category:Information Media Sources]]
 
 
Metadata is data. As such, metadata can be stored and managed in a [http://en.wikipedia.org/wiki/database], often called a [http://en.wikipedia.org/wiki/Metadata registry] or [http://en.wikipedia.org/wiki/Metadata repository].<ref>Hüner, K.; Otto, B.; Österle, H.: Collaborative management of business metadata, in: International Journal of Information Management, 2011</ref> However, without context and a point of reference, it can be impossible to identify metadata just by looking at it.<ref>{{cite web|url=http://www.bls.gov/ore/pdf/st000010.pdf |title=Metadata Standards And Metadata Registries: An Overview |format=PDF |date= |accessdate=2011-12-23}}</ref> For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an equation - without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be [http://en.wikipedia.org/wiki/ISBN]s - information that refers to the book, but is not itself the information within the book.
 
 
 
The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of programming language concepts" <ref >{{Citation
 
| last = Extension of programming language concepts
 
| url = http://www.dtic.mil/dtic/tr/fulltext/u2/680815.pdf
 
}}</ref> where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is  "structural metadata" i.e. "data about the containers of data"; rather than the alternate sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogues. <ref >{{Citation
 
|last=Bagley
 
|first=Philip
 
|title=Extension of programming language concepts
 
|year=1968
 
|month=Nov
 
|publisher=University City Science Center
 
|location=Philadelphia
 
}}</ref><ref >"The notion of "metadata" introduced by Bagley". {{Citation
 
| last = Solntseff
 
| first = N+1
 
| last2 = Yezerski
 
| first2 = A
 
| year = 1974
 
| title = A survey of extensible programming languages
 
| series = Annual Review in Automatic Programming
 
| publisher = Elsevier Science Ltd
 
| volume = 7
 
| pages = 267–307
 
| doi = 10.1016/0066-4138(74)90001-9
 
}}</ref> Since then the fields of information management, information science, information technology, librarianship and GIS have widely adopted the term. In these fields the word metadata is defined as "data about data".<ref name=NISO >{{Cite book
 
| last = NISO
 
| authorlink =NISO
 
| title = Understanding Metadata
 
| publisher = NISO Press
 
| date =
 
| url = http://www.niso.org/publications/press/UnderstandingMetadata.pdf
 
| isbn = 1-880124-62-9
 
| accessdate = 5 January 2010 }}
 
</ref> While this is the generally accepted definition, various disciplines have adopted their own more specific explanation and uses of the term.
 
 
 
 
 
=== Libraries ===
 
Metadata has been used in various forms as a means of cataloging archived information. The [http://en.wikipedia.org/wiki/Dewey Decimal System] employed by libraries for the classification of library materials is an early example of metadata usage. Library catalogues used 3x5 inch cards to display a book's title, author, subject matter, and a brief plot synopsis along with an abbreviated [http://en.wikipedia.org/wiki/Alphanumeric|alpha-numeric] identification system which indicated the physical location of the book within the library's shelves.
 
Such data helps classify, aggregate, identify, and locate a particular book. Another form of older metadata collection is the use by US Census Bureau of what is known as the "Long Form." The Long Form asks questions that are used to create demographic data to create patterns and to find patterns of distribution.<ref >{{cite web
 
| title = AGLS Metadata Element Set - Part 2: Usage Guide - A non-technical guide to using AGLS metadata for describing resources
 
| author = National Archives of Australia
 
| year = 2002
 
| url = http://www.naa.gov.au/records-management/publications/agls-element.aspx
 
| accessdate = 17 March 2010}}
 
</ref>
 
For the purposes of this article, an "object" refers to any of the following:
 
*A physical item such as a book, CD, DVD, map, chair, table, flower pot, etc.
 
*An electronic file such as a digital image, digital photo, document, program file, database table, etc.
 
 
 
=== Photographs ===
 
Metadata may be written into a digital photo file that will identify who owns it, copyright & contact information, what camera created the file, along with exposure information and descriptive information such as keywords about the photo, making the file searchable on the computer and/or the Internet. Some metadata is written by the camera and some is input by the photographer and/or software after downloading to a computer. However, not all digital cameras enable you to edit metadata<ref>{{cite web|last=Rutter|first=Chris|title=What is metadata: copyright photos in 4 steps|url=http://www.digitalcameraworld.com/2012/02/28/what-is-metadata-copyright-photos-in-4-steps/|work=Digital Camera Magazine|publisher=Future Publishing}}</ref>; this functionality has been available on most Nikon DSLRs since the [http://en.wikipedia.org/wiki/Nikon D3] and on most new Canon cameras since the [http://en.wikipedia.org/wiki/Canon EOS 7D].
 
 
 
Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to:
 
*[http://en.wikipedia.org/wiki/IPTC Information Interchange Model] IIM (International Press Telecommunications Council),
 
*IPTC Core Schema for XMP
 
*[http://en.wikipedia.org/wiki/Extensible Metadata Platform|XMP] – Extensible Metadata Platform (an Adobe standard)
 
*[http://en.wikipedia.org/wiki/Exif] – Exchangeable image file format, Maintained by CIPA (Camera & Imaging Products Association) and published by JEITA (Japan Electronics and Information Technology Industries Association)
 
*[http://en.wikipedia.org/wiki/Dublin Core] (Dublin Core Metadata Initiative – DCMI)
 
*PLUS (Picture Licensing Universal System).
 
 
 
=== Video ===
 
Metadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) are not directly understandable by a computer, but where efficient search is desirable.
 
 
 
=== Web pages ===
 
Web pages often include metadata in the form of [http://en.wikipedia.org/wiki/Meta element|meta tags]. Description and keywords meta tags are commonly used to describe the Web page's content. Most search engines use this data when adding pages to their search index.
 
 
 
=== Creation of metadata ===
 
Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when a file was created, who created it, when it was last updated, file size and file extension.
 
 
 
== Metadata types ==
 
The metadata application is manyfold covering a large variety of fields of application there are nothing but specialised and well accepted models to specify types of metadata.  Bretheron & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata.<ref >{{Cite conference
 
| first1 = F. P. | last1 = Bretherton
 
|first2 = P.T. | last2 = Singley
 
| title = Metadata: A User's View, Proceedings of the International Conference on Very Large Data Bases (VLDB)
 
| pages = 1091–1094
 
| publisher =
 
| year = 1994}}
 
</ref> '''Structural metadata''' is used to describe the structure of computer systems such as tables, columns and indexes. '''Guide metadata''' is used to help humans find specific items and is usually expressed as a set of keywords in a natural language. According to [http://en.wikipedia.org/wiki/Ralph Kimball] metadata can be divided into 2 similar categories—Technical metadata and Business metadata. '''Technical metadata''' correspond to internal metadata, ''business metadata'' to external metadata. Kimball adds a third category named '''Process metadata'''. On the other hand, NISO distinguishes between three types of metadata: descriptive, structural and administrative.<ref name=NISO/> '''Descriptive metadata''' is the information used to search and locate an object such as title, author, subjects, keywords, publisher; '''structural metadata''' gives a description of how the components of the object are organised; and '''administrative metadata''' refers to the technical information including file type. Two sub-types of administrative metadata are rights management metadata and preservation metadata.
 
 
 
== Metadata structures ==
 
Metadata (metacontent), or more correctly, the vocabularies used to assemble metadata (metacontent) statements, is typically structured according to a standardized concept using a well-defined metadata scheme, including: [http://en.wikipedia.org/wiki/metadata standards] and [http://en.wikipedia.org/wiki/Metadata modeling|metadata models]. Tools such as [http://en.wikipedia.org/wiki/Controlled vocabulary|controlled vocabularies], [http://en.wikipedia.org/wiki/Taxonomy|taxonomies], [http://en.wikipedia.org/wiki/thesauri], [http://en.wikipedia.org/wiki/Data Dictionary|data dictionaries] and [http://en.wikipedia.org/wiki/Metadata registry|metadata registries] can be used to apply further standardization to the metadata.  Structural metadata commonality is also of paramount importance in [http://en.wikipedia.org/wiki/data model] development and in [http://en.wikipedia.org/wiki/database design].
 
 
 
=== Metadata syntax ===
 
Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent).<ref >{{cite web
 
| last = Cathro
 
| first = Warwick
 
| authorlink =
 
| title = Metadata: an overview
 
| year = 1997
 
| url = http://www.nla.gov.au/nla/staffpaper/cathro3.html
 
| accessdate = 6 January 2010
 
}}</ref> A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text, [http://en.wikipedia.org/wiki/HTML], [http://en.wikipedia.org/wiki/XML] and [http://en.wikipedia.org/wiki/Resource Description Framework|RDF].<ref >{{cite web
 
| last = DCMI
 
| authorlink =Dublin_Core_Metadata_Initiative
 
| title = Semantic Recommendations
 
| date =5 Oct 2009
 
| url = http://dublincore.org/specifications/
 
| accessdate = 6 January 2010
 
}}</ref>
 
 
 
A common example of (guide) metacontent is the bibliographic classification, the subject, the [http://en.wikipedia.org/wiki/List of Dewey Decimal classes|Dewey Decimal class number]. There is always an implied statement in any "classification" of some object. To classify an object as, for example, Dewey class number 514 (Topology) (e.g. a book has this number on the spine) the implied statement is: "<book><subject heading><514>. This is a subject-predicate-object triple, or more importantly, a class-attribute-value triple. The first two elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement i.e. "metacontent = metadata + master data". All these elements can be thought of as "vocabulary". Both metadata and master data are vocabularies which can be assembled into metacontent statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT, XSD, Dewey/UDC/LoC, SKOS, ISO-25964, Pantone, Linnaean Binomial Nomenclature etc. Using controlled vocabularies for the  components of metacontent statements, whether for indexing or finding, is endorsed by [http://www-personal.umich.edu/~kdow/ISO_CD_25964-1(E).pdf ISO-25964]: "If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved." This is particularly relevant when considering that the behemoth of the internet, Google, is simply indexing then matching text strings, there is no intelligence or "inferencing" occurring.
 
 
 
=== Hierarchical, linear and planar schemata ===
 
Metadata schema can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements.
 
An example of a hierarchical metadata schema is the [http://en.wikipedia.org/wiki/Learning object metadata|IEEE LOM] schema where metadata elements may belong to a parent metadata element.
 
Metadata schema can also be one dimensional, or linear, where each element is completely discrete from other elements and classified according to one dimension only.
 
An example of a linear metadata schema is [http://en.wikipedia.org/wiki/Dublin Core Metadata Initiative|Dublin Core] schema which is one dimensional.
 
Metadata schema are often two dimensional, or planar, where each element is completely discrete from other elements but classified according to two orthogonal dimensions.<ref >{{cite web
 
| title = Types of Metadata
 
|publisher = [http://en.wikipedia.org/wiki/University of Melbourne]
 
| date =15 August 2006
 
| url = http://www.infodiv.unimelb.edu.au/metadata/add_info.html
 
| accessdate = 6 January 2010 }} {{Dead link|date=October 2010|bot=H3llBot
 
}}</ref>
 
 
 
=== Metadata hypermapping ===
 
In all cases where the metadata schemata exceed the planar depiction, some type of [http://en.wikipedia.org/wiki/hypermap]ping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays.<ref>[http://www.isprs.org/proceedings/XXXII/part4/www.ifp.uni.../kuebler51.pdf THE DESIGN AND DEVELOPMENT OF A GEOLOGIC HYPERMAP PROTOTYPE]{{dead link|date=December 2011}}</ref>
 
 
 
=== Granularity ===
 
Granularity is a term that applies to data as well as to metadata. The degree to which metadata is structured is referred to as its [http://en.wikipedia.org/wiki/Granularity#Data granularity|granularity].  Metadata with a high granularity allows for deeper structured information and enables greater levels of technical manipulation however, a lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity is not only on creation and capture, but moreover on maintenance. As soon as the metadata structures get outdated, the access to the referred data will get outdated. Hence granularity shall take into account the effort to create as well as the effort to maintain.
 
 
 
== Metadata standards ==
 
International standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially [http://en.wikipedia.org/wiki/ANSI] (American National Standards Institute) and [http://en.wikipedia.org/wiki/ISO] (International Organization for Standardization) to reach consensus on standardizing metadata and registries.
 
 
 
The core standard is [http://en.wikipedia.org/wiki/ISO]/[http://en.wikipedia.org/wiki/International Electrotechnical Commission|IEC] 11179-1:2004 <ref >{{cite web
 
  |url=http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=39438
 
  |title=ISO/IEC 11179-1:2004 Information technology - Metadata registries (MDR) - Part 1: Framework
 
  |publisher=Iso.org |date=2009-03-18 |accessdate=2011-12-23
 
}}</ref> and subsequent standards (see [http://en.wikipedia.org/wiki/ISO/IEC 11179]). All yet published registrations according to this standard cover just the definition of metadata and do not serve the structuring of metadata storage or retrieval neither any administrative standardisation. It is important to note that this standard refers to metadata as data about containers of data and not to metadata (metacontent) as data about data contents. It should also be noted that this standard describes itself originally as a "data element" registry, describing disembodied data elements, and explicitly disavows the capability of containing complex structures. Thus the original term "data element" is more applicable than the later applied buzzword "metadata".
 
 
 
== Metadata usage ==
 
=== Data Virtualization ===
 
{{main|Data Virtualization}}
 
Data Virtualization has emerged as the new software technology to complete the virtualization stack in the enterprise. Metadata is used in Data Virtualization servers which are enterprise infrastructure components, alongside Database and Application servers. Metadata in these servers is saved as persistent repository and describes business objects in various enterprise systems and applications.  Structural metadata commonality is also important to support data virtualization and [http://en.wikipedia.org/wiki/data federation].
 
 
 
=== SVN Checkout Metadata ===
 
.SVN hidden files created in the web root folder which can reveal crucial information of the code repositories.
 
 
 
=== Statistics and census services ===
 
Standardization work has had a large impact on efforts to build metadata systems in the statistical community. Several metadata standards are described, and their importance to statistical agencies is discussed. Applications of the standards at the Census Bureau, Environmental Protection Agency, Bureau of Labor Statistics, Statistics Canada, and many others are described. Emphasis is on the impact a metadata registry can have in a statistical agency.
 
 
 
=== Library and information science ===
 
[http://en.wikipedia.org/wiki/library|Libraries] employ metadata in [http://en.wikipedia.org/wiki/library catalog]ues, most commonly as part of an [http://en.wikipedia.org/wiki/Library management system|Integrated Library Management System]. Metadata is obtained by [http://en.wikipedia.org/wiki/Library cataloguing#Cataloging rules|cataloguing] resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system, [http://en.wikipedia.org/wiki/Library management system|ILMS], using the [http://en.wikipedia.org/wiki/MARC standards|MARC] metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question.
 
 
 
More recent and specialized instances of library metadata include the establishment of [http://en.wikipedia.org/wiki/Digital library|digital libraries] including [http://en.wikipedia.org/wiki/eprint|e-print] repositories and digital image libraries. While often based on library principles the focus on non-librarian use, especially in providing metadata means they do not follow traditional or common cataloging approaches. Given the custom nature of included materials metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords or copyright statement. Standard file information such as file size and format are usually automatically included.<ref>Solodovnik, I. (2011). "[http://leo.cilea.it/index.php/jlis/article/view/4663 Metadata issues in Digital Libraries: key concepts and perspectives]". ''JLIS.It'', 2(2). doi:10.4403/jlis.it-4663</ref>
 
 
 
Standardization for library operation has been a key topic in international standardization ([http://en.wikipedia.org/wiki/ISO]) for decades. Standards for metadata in digital libraries include [http://en.wikipedia.org/wiki/Dublin Core], [http://en.wikipedia.org/wiki/METS], [http://en.wikipedia.org/wiki/Metadata Object Description Schema|MODS], [http://en.wikipedia.org/wiki/Data Documentation Initiative|DDI], [http://en.wikipedia.org/wiki/Digital Object Identifier|ISO standard Digital Object Identifier (DOI)], [http://en.wikipedia.org/wiki/Uniform Resource Name|ISO standard Uniform Resource Name (URN)], [http://en.wikipedia.org/wiki/Preservation Metadata: Implementation Strategies (PREMIS)|PREMIS] schema, [http://en.wikipedia.org/wiki/Ecological Metadata Language], and [http://en.wikipedia.org/wiki/Open Archives Initiative Protocol for Metadata Harvesting|OAI-PMH]. Leading libraries in the world give hints on their metadata standards strategies.<ref >{{cite web
 
  |author=Library of Congress Network Development and MARC Standards Office |url=http://www.loc.gov/standards/metadata.html |title=Library of Congress Washington DC on metadata |publisher=Loc.gov |date=2005-09-08 |accessdate=2011-12-23}}</ref><ref>[http://www.d-nb.de/standardisierung/.../metadaten.htm Deutsche Nationalbibliothek Frankfurt on metadata]{{dead link|date=December 2011}}</ref>
 
 
 
=== Metadata and the law ===
 
==== United States ====
 
Problems involving metadata in [http://en.wikipedia.org/wiki/litigation] in the [http://en.wikipedia.org/wiki/United States] are becoming widespread.{{when|date=February 2011}} Courts have looked at various questions involving metadata, including the discoverability of metadata by parties. Although the Federal Rules of Civil Procedure have only specified rules about electronic documents, subsequent case law has elaborated on the requirement of parties to reveal metadata.<ref >{{Cite journal
 
  | last = Gelzer  | first = Reed D.
 
  | title = Metadata, Law, and the Real World: Slowly, the Three Are Merging
 
  | journal = Journal of AHIMA
 
  | volume = 79
 
  | issue = 2
 
  | pages = 56–57, 64
 
  | publisher = American Health Information Management Association
 
  | date = February 2008
 
  | url = http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_036537.hcsp?dDocName=bok1_036537
 
  | accessdate = 8 January 2010}}</ref> In October 2009, the [http://en.wikipedia.org/wiki/Arizona Supreme Court] has ruled that metadata records are public record.<ref >{{Cite news
 
  | last = Walsh  | first = Jim
 
  | title = Ariz. Supreme Court rules electronic data is public record
 
  | newspaper = The Arizona Republic
 
  | location = Arizona, United States
 
  | date = 30 October 2009
 
  | url = http://www.azcentral.com/arizonarepublic/local/articles/2009/10/30/20091030metadata1030.html
 
  | accessdate = 8 January 2010
 
}}</ref>
 
 
 
Document Metadata has proven particularly important in legal environments in which litigation has requested metadata, which can include sensitive information detrimental to a party in court.
 
 
 
Using [http://en.wikipedia.org/wiki/metadata removal tool]s to "clean" documents can mitigate the risks of unwittingly sending sensitive data. This process partially (see [http://en.wikipedia.org/wiki/Data remanence]) protects law firms from potentially damaging leaking of sensitive data through [http://en.wikipedia.org/wiki/Electronic Discovery].
 
 
 
=== Metadata in healthcare ===
 
Australian researches in medicine started a lot of metadata definition for applications in health care. That approach offers the first recognized attempt to adhere to international standards in medical sciences instead of defining a proprietary standard under the WHO umbrella first.
 
 
 
The medical community yet did not approve the need to follow metadata standards despite respective research.<ref>M. Löbe, M. Knuth, R. Mücke [http://ceur-ws.org/Vol-559/Paper1.pdf TIM: A Semantic Web Application for the Specification of Metadata Items in Clinical Research], CEUR-WS.org, urn:nbn:de:0074-559-9</ref>
 
 
 
=== Metadata and data warehousing ===
 
[http://en.wikipedia.org/wiki/Data warehouse] (DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data whereas the [http://en.wikipedia.org/wiki/Business Intelligence] (BI) focuses on the usage of data to facilitate reporting and analysis.<ref>Inmon, W.H. Tech Topic: What is a Data Warehouse? Prism Solutions. Volume 1. 1995.</ref>
 
 
 
The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, cleansed and timely data, extracted from various operational systems in an organization. The extracted data is integrated in the [http://en.wikipedia.org/wiki/data warehouse] environment in order to provide an enterprise wide perspective, one version of the truth. Data is structured in a way to specifically address the reporting and analytic requirements.  The design of structural metadata commonality using a [http://en.wikipedia.org/wiki/data modeling] method such as [http://en.wikipedia.org/wiki/entity relationship model] diagraming is very important in any data warehouse development effort. 
 
 
 
An essential component of a [http://en.wikipedia.org/wiki/data warehouse]/[http://en.wikipedia.org/wiki/business intelligence] system is the metadata and tools to manage and retrieve metadata. [http://en.wikipedia.org/wiki/Ralph Kimball]<ref >{{Cite book
 
  |last=Kimball  |first=Ralph
 
  |authorlink=Ralph Kimball
 
  |title=The Data Warehouse Lifecycle Toolkit
 
  |edition=Second Edition
 
  |location=New York  |publisher=Wiley 
 
  |year=2008
 
  |isbn=978-0-470-14977-5
 
  |ref=harv
 
  |pages=10, 115–117, 131–132, 140, 154–155
 
}}</ref>  describes metadata as the DNA of the data warehouse as metadata defines the elements of the [http://en.wikipedia.org/wiki/data warehouse] and how they work together.
 
 
 
[http://en.wikipedia.org/wiki/Ralph Kimball|Kimball] et al.<ref >{{harvnb|Kimball|2008|pages=116–117}}</ref> refers to three main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata is primarily [http://en.wikipedia.org/wiki/definitional] while business metadata and process metadata are primarily descriptive. Keep in mind that the categories sometimes overlap.
 
 
 
* '''Technical metadata''' defines the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata includes the system metadata which defines the data structures such as: Tables, fields, data types, indexes and partitions in the relational engine, and databases, dimensions, measures, and data mining models. Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists and user security rights.
 
 
 
* '''Business metadata''' is content from the data warehouse described in more user-friendly terms. The business metadata tells you what data you have, where it comes from, what it means and what its relationship is to other data in the data warehouse. Business metadata may also serves as documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata.
 
 
 
* '''Process metadata''' is used to describe the results of various operations in the data warehouse. Within the [http://en.wikipedia.org/wiki/Extract, transform, load|ETL] process all key data from tasks are logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes and rows processed. When troubleshooting the ETL or [http://en.wikipedia.org/wiki/Information retrieval|query] process, this sort of data becomes valuable. Process metadata is the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies - in that case the process metadata becomes the business metadata for the fact and dimension tables. Process metadata is in interest of business people who can use the data to identify the users of their products, which products they are using and what level of service they are receiving.
 
 
 
=== Metadata on the Internet ===
 
The [http://en.wikipedia.org/wiki/HTML] format used to define web pages allows for the inclusion of a variety of types of metadata, from basic descriptive text, dates and keywords to further advanced metadata schemes such as the [http://en.wikipedia.org/wiki/Dublin Core], [http://en.wikipedia.org/wiki/e-GMS], and AGLS<ref>National Archives of Australia, AGLS Metadata Standard, accessed 7 January 2010, [http://www.naa.gov.au/records-management/create-capture-describe/describe/AGLS/index.aspx]</ref> standards. Pages can also be [http://en.wikipedia.org/wiki/geotagging|geotagged] with [http://en.wikipedia.org/wiki/Geographic coordinate system|coordinates]. Metadata may be included in the page's header or in a separate file. [http://en.wikipedia.org/wiki/Microformat]s allow metadata to be added to on-page data in a way that users do not see, but computers can readily access.
 
 
 
Interestingly, many search engines are cautious about using metadata in their ranking algorithms due to exploitation of metadata and the practice of search engine optimization, [http://en.wikipedia.org/wiki/Search engine optimization|SEO], to improve rankings. See [http://en.wikipedia.org/wiki/Meta element] article for further discussion. Studies show that search engines respond to web pages with metadata implementations<ref>The impact of webpage content characteristics on webpage visibility in search engine results http://web.simmons.edu/~braun/467/part_1.pdf</ref>.
 
 
 
=== Metadata on the broadcast industry ===
 
In [http://en.wikipedia.org/wiki/broadcast] industry, metadata are linked to audio and video [http://en.wikipedia.org/wiki/Broadcast media] to:
 
* ''identify'' the media: [http://en.wikipedia.org/wiki/Media clip|clip] or [http://en.wikipedia.org/wiki/playlist] names, duration, [http://en.wikipedia.org/wiki/timecode], etc.
 
* ''describe'' the content: notes regarding the quality of video content, rating, description (for example, during a sport event, [http://en.wikipedia.org/wiki/keywords] like ''goal'', ''red card'' will be associated to some clips)
 
* ''classify'' media: metadata allow to sort the media or to easily and quickly find a video content (a [http://en.wikipedia.org/wiki/TV news] could urgently need some [http://en.wikipedia.org/wiki/archiving|archive content] for a subject). For example, the BBC have a large subject classification system, [http://en.wikipedia.org/wiki/Lonclass], a customized version of the more general-purpose [http://en.wikipedia.org/wiki/Universal Decimal Classification].
 
 
 
These metadata can be linked to the video media thanks to the [http://en.wikipedia.org/wiki/Video server#Broadcast automation|video servers]. All last [http://en.wikipedia.org/wiki/broadcast]ed sport events like [http://en.wikipedia.org/wiki/FIFA World Cup] or [http://en.wikipedia.org/wiki/Olympic Games] use these metadata to distribute their video content to [http://en.wikipedia.org/wiki/TV station]s through [http://en.wikipedia.org/wiki/Index term|keywords]. It's often the host broadcaster<ref>{{cite web|url=http://www.hbs.tv/hostbroadcasting/ |title=HBS is the FIFA host broadcaster |publisher=Hbs.tv |date=2011-08-06 |accessdate=2011-12-23}}</ref> who is in charge of organizing metadata through its ''International Broadcast Centre'' and its [http://en.wikipedia.org/wiki/Video server#Broadcast automation|video servers]. Those metadata are recorded with the images and are entered by metadata operators (''loggers'') who associate in live metadata available in ''metadata grids'' through [http://en.wikipedia.org/wiki/software] (such as [http://en.wikipedia.org/wiki/Multicam(LSM)] or [http://en.wikipedia.org/wiki/IPDirector] used during [http://en.wikipedia.org/wiki/FIFA World Cup] or [http://en.wikipedia.org/wiki/Olympic Games]).<ref>[http://www.evs-global.com/01/MyDocuments/CS_BOB_EVScontributon_0808_ENG.pdf Host Broadcast Media Server and Related Applications]{{dead link|date=December 2011}}</ref><ref>{{cite web|url=http://broadcastengineering.com/worldcup/fifa-world-cup-techonlogy-0610/ |title=logs during sport events |publisher=Broadcastengineering.com |date= |accessdate=2011-12-23}}</ref>
 
 
 
=== Geospatial metadata ===
 
Metadata that describe geographic objects (such as datasets, maps, features, or simply documents with a geospatial component) have a history dating back to at least 1994 (refer [http://libraries.mit.edu/guides/subjects/metadata/standards/fgdc.html MIT Library page on FGDC Metadata]). This class of metadata is described more fully on the [http://en.wikipedia.org/wiki/Geospatial metadata] page.
 
 
 
=== Ecological & environmental metadata ===
 
Ecological and environmental metadata are intended to document the who, what, when, where, why, and how of data collection for a particular study.  Metadata should be generated in a format commonly used by the most relevant science community, such as [http://en.wikipedia.org/wiki/Darwin Core], [http://en.wikipedia.org/wiki/Ecological Metadata Language],<ref >http://knb.ecoinformatics.org/software/eml/eml-2.0.1/index.html</ref> or [http://en.wikipedia.org/wiki/Dublin Core]. Metadata editing tools exist to facilitate metadata generation (e.g. Metavist,<ref >{{cite web|url=http://metavist.djames.net/ |title=Metavist 2 |publisher=Metavist.djames.net |date= |accessdate=2011-12-23}}</ref> [http://en.wikipedia.org/wiki/Mercury: Metadata Search System], Morpho<ref>{{cite web|url=http://knb.ecoinformatics.org/morphoportal.jsp |title=KNB Data :: Morpho |publisher=Knb.ecoinformatics.org |date=2009-05-20 |accessdate=2011-12-23}}</ref>).  Metadata should describe [http://en.wikipedia.org/wiki/data provenance| provenance] of the data (where it originated, as well as any transformations the data underwent) and how to give credit for (cite) the data products.
 
 
 
=== Metadata on CDs and DVDs ===
 
CDs such as recordings of music will carry a layer of metadata about the recordings such as dates, artist, genre, copyright owner, etc.  The metadata, not normally displayed by CD players, can be accessed and displayed by specialized music playback and/or editing applications.
 
 
 
=== Cloud applications ===
 
With the availability of [http://en.wikipedia.org/wiki/Cloud computing|Cloud] applications, which include those to add metadata to content, metadata is increasingly available over the Internet.
 
 
 
== Metadata administration and management ==
 
=== Metadata storage ===
 
{{unreferenced section|date=June 2010}}
 
Metadata can be stored either ''internally'',<ref name=id3>{{cite web
 
| author=Dan O'Neill
 
| url=http://id3.org
 
| title=ID3.org
 
}}</ref> in the same file as the data, or ''externally'', in a separate file. Metadata that is embedded with content is called ''embedded metadata''. A data repository typically stores the metadata ''detached'' from the data. Both ways have advantages and disadvantages:
 
* Internal storage allows transferring metadata together with the data it describes; thus, metadata is always at hand and can be manipulated easily. This method creates high redundancy and does not allow holding metadata together.
 
* External storage allows bundling metadata, for example in a database, for more efficient searching. There is no redundancy and metadata can be transferred simultaneously when using [http://en.wikipedia.org/wiki/Streaming media|streaming]. However, as most formats use [http://en.wikipedia.org/wiki/Uniform Resource Identifier|URIs] for that purpose, the method of how the metadata is linked to its data should be treated with care. What if a resource does not have a URI (resources on a local hard disk or web pages that are created on-the-fly using a content management system)? What if metadata can only be evaluated if there is a connection to the Web, especially when using [http://en.wikipedia.org/wiki/Resource Description Framework|RDF?] How to realize that a resource is replaced by another with the same name but different content?
 
 
 
Moreover, there is the question of data format: storing metadata in a human-readable format such as [http://en.wikipedia.org/wiki/XML] can be useful because users can understand and edit it without specialized tools. On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory.
 
 
 
=== Database management ===
 
Each [http://en.wikipedia.org/wiki/relational database] system has its own mechanisms for storing metadata. Examples of relational-database metadata include:
 
* Tables of all tables in a database, their names, sizes and number of rows in each table.
 
* Tables of columns in each database, what tables they are used in, and the type of data stored in each column.
 
In database terminology, this set of metadata is referred to as the [http://en.wikipedia.org/wiki/database catalog|catalog]. The [http://en.wikipedia.org/wiki/SQL] standard specifies a uniform means to access the catalog, called the [http://en.wikipedia.org/wiki/information schema], but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see [http://en.wikipedia.org/wiki/Oracle metadata]. Programmatic access to metadata is possible using APIs such as [http://en.wikipedia.org/wiki/JDBC], or SchemaCrawler.<ref name=schemacrawler>{{cite web
 
| author=Sualeh Fatehi
 
| url=http://schemacrawler.sourceforge.net/
 
| title=SchemaCrawler
 
| work=SourceForge
 
}}</ref>
 
 
 
{{col-begin}}
 
{{col-break}}
 
* [http://en.wikipedia.org/wiki/Agris: International Information System for the Agricultural Sciences and Technology]
 
* [http://en.wikipedia.org/wiki/Classification scheme]
 
* [http://en.wikipedia.org/wiki/Crosswalk (metadata)]
 
* [http://en.wikipedia.org/wiki/DataONE]
 
* [http://en.wikipedia.org/wiki/Data Dictionary] (aka metadata repository)
 
* [http://en.wikipedia.org/wiki/Dublin Core]
 
* [http://en.wikipedia.org/wiki/Folksonomy]
 
* [http://en.wikipedia.org/wiki/GEOMS – Generic Earth Observation Metadata Standard]
 
* [http://en.wikipedia.org/wiki/IPDirector]
 
* [http://en.wikipedia.org/wiki/ISO/IEC 11179]
 
* [http://en.wikipedia.org/wiki/Knowledge tag]
 
* [http://en.wikipedia.org/wiki/Mercury: Metadata Search System]
 
* [http://en.wikipedia.org/wiki/Meta element]
 
* [http://en.wikipedia.org/wiki/IF-MAP|Metadata Access Point Interface]
 
* [http://en.wikipedia.org/wiki/Metadata discovery]
 
* [http://en.wikipedia.org/wiki/Metadata facility for Java]
 
* [http://en.wikipedia.org/wiki/Wikiversity:4-b: Metadata|Metadata from Wikiversity]
 
{{col-break}}
 
* [http://en.wikipedia.org/wiki/Metadata publishing]
 
* [http://en.wikipedia.org/wiki/Metadata registry]
 
* [http://en.wikipedia.org/wiki/METAFOR] Common Metadata for Climate Modelling Digital Repositories
 
* [http://en.wikipedia.org/wiki/Microcontent]
 
* [http://en.wikipedia.org/wiki/Microformat]
 
* [http://en.wikipedia.org/wiki/Multicam(LSM)]
 
* [http://en.wikipedia.org/wiki/Ontology (computer science)]
 
* [http://en.wikipedia.org/wiki/Official statistics]
 
* [http://en.wikipedia.org/wiki/Paratext]
 
* [http://en.wikipedia.org/wiki/Preservation Metadata]
 
* [http://en.wikipedia.org/wiki/SDMX]
 
* [http://en.wikipedia.org/wiki/Semantic Web]
 
* [http://en.wikipedia.org/wiki/SGML]
 
* [http://en.wikipedia.org/wiki/The Metadata Company]
 
* [http://en.wikipedia.org/wiki/Universal Data Element Framework]
 
* [http://en.wikipedia.org/wiki/Vocabulary OneSource]
 
* [http://en.wikipedia.org/wiki/XSD]
 
{{col-end}}
 
 
 
== References ==
 
{{Reflist|colwidth=30em}}
 
 
 
== External links ==
 
{{Wiktionarypar|metadata}}
 
* [http://mercury.ornl.gov/ornldaac Mercury: Metadata Management, Data Discovery and Access], managed by Oak Ridge National Laboratory [http://en.wikipedia.org/wiki/Distributed Active Archive Center]
 
* [http://www.well.com/~doctorow/metacrap.htm Metacrap: Putting the torch to seven straw-men of the meta-utopia] – [http://en.wikipedia.org/wiki/Cory Doctorow]'s opinion on the limitations of metadata on the [http://en.wikipedia.org/wiki/Internet], 2001
 
* [http://www.anonwatch.com/?p=9 Retrieving Meta Data from Documents and Pictures Online] - AnonWatch
 
* [http://www.niso.org/publications/press/UnderstandingMetadata.pdf Understanding Metadata] - [http://en.wikipedia.org/wiki/NISO], 2004
 
* [http://www.dataone.org DataONE] Investigator Toolkit
 
* {{Cite journal
 
  | journal = Journal of Library Metadata
 
  | publisher = Routledge, Taylor & Francis Group
 
  | url = http://www.informaworld.com/openurl?genre=journal&issn=1938-6389
 
  | issn = 1937-5034
 
  | accessdate = 8 January 2010}}
 
* {{Cite journal
 
  | journal = International Journal of Metadata, Semantics and Ontologies  (IJMSO)
 
  | publisher = Inderscience Publishers
 
  | url = http://www.inderscience.com/ijmso
 
  | issn = 1744-263X
 
  | accessdate = 8 January 2010}}
 
* [https://gcic.af.mil/onesource AFC2IC Vocabulary OneSource Tool]
 
* [http://www.metalounge.org/_literature_52579/Stephen_Machin_%E2%80%93_ON_METADATA_AND_METACONTENT On metadata and metacontent]
 
* [http://library.caltech.edu/laura/ Managing Metadata] blog
 
 
 
{{Software engineering}}
 
{{Data warehouse}}
 
 
 
[http://en.wikipedia.org/wiki/Category:Data management]
 
[http://en.wikipedia.org/wiki/Category:Knowledge representation]
 
[http://en.wikipedia.org/wiki/Category:Library cataloging and classification]
 
[http://en.wikipedia.org/wiki/Category:Metadata| ]
 
[http://en.wikipedia.org/wiki/Category:Technical communication]
 
 
 
 
 
 
 
[http://www.google.com Google]
 

Latest revision as of 18:01, 10 November 2020

Metadata is information about information, embedded in data files.

An article on this topic exists at Wikipedia, Metadata

This article has no or virtually no content. You can help Marspedia by adding something to it.