Linked Data Blog Aggregator

January 24, 2012

AI3:::Adaptive Information (Mike Bergman)

Give Me a Sign: What Do Things Mean on the Semantic Web?

The Triadic of SignsCoca-Cola, Toucans and Charles Sanders Peirce

The crowning achievement of the semantc Web is the simple use of URIs to identify data. Further, if the URI identifier can resolve to a representation of that data, it now becomes an integral part of the HTTP access protocol of the Web while providing a unique identifier for the data. These innovations provide the basis for distributed data at global scale, all accessible via Web devices such as browsers and smartphones that are now a ubiquitous part of our daily lives.

Yet, despite these profound and simple innovations, the semantic Web’s designers and early practitioners and advocates have been mired in a muddled, metaphysical argument of at least a decade over what these URIs mean, what they reference, and what their actual true identity is. These muddles about naming and identity, it might be argued, are due to computer scientists and programmers trying to grapple with issues more properly the domain of philosophers and linguists. But that would be unfair. For philosophers and linguists themselves have for centuries also grappled with these same conundrums [1].

As I argue in this piece, part of the muddle results from attempting to do too much with URIs while another part results from not doing enough. I am also not trying to directly enter the fray of current standards deliberations. (Despite a decade of controversy, I optimistically believe that the messy process of argument and consensus building will work itself out [2].) What I am trying to do in this piece, however, is to look to one of America’s pre-eminent philosophers and logicians, Charles Sanders Peirce (pronounced “purse”), to inform how these controversies of naming, identity and meaning may be dissected and resolved.

‘Identity Crisis’, httpRange-14, and Issue 57

The Web began as a way to hyperlink between documents, generally Web pages expressed in the HTML markup language. These initial links were called URLs (uniform resource locators), and each pointed to various kinds of electronic resources (documents) that could be accessed and retrieved on the Web. These resources could be documents written in HTML or other encodings (PDFs, other electronic formats), images, streaming media like audio or videos, and the like [3].

All was well and good until the idea of the semantic Web, which postulated that information about the real world — concepts, people and things — could also be referenced and made available for reasoning and discussion on the Web. With this idea, the scope of the Web was massively expanded from electronic resources that could be downloaded and accessed via the Web to now include virtually any topic of human discourse. The rub, of course, was that ideas such as abstract concepts or people or things could not be “dereferenced” nor downloaded from the Web.

One of the first things that needed to change was to define a broader concept of a URI “identifier” above the more limited concept of a URL “locator”, since many of these new things that could be referenced on the Web went beyond electronic resources that could be accessed and viewed [3]. But, since what the referent of the URI now actually might be became uncertain — was it a concept or a Web page that could be viewed or something else? — a number of commentators began to note this uncertainty as the “identity crisis” of the Web [4]. The topic took on much fervor and metaphysical argument, such that by 2003, Sandro Hawke, a staffer of the standards-setting W3C (World Wide Web Consortium), was able to say, “This is an old issue, and people are tired of it” [5].

Yet, for many of the reasons described more fully below, the issue refused to go away. The Technical Architecture Group (TAG) of the W3C took up the issue, under a rubric that came to be known as httpRange-14 [6]. The issue was first raised in March 2002 by Tim Berners-Lee, accepted for TAG deliberations in February 2003, with then a resolution offered in June 2005 [7]. (Refer to the original resolution and other information [6] to understand the nuances of this resolution, since particular commentary on that approach is not the focus of this article.) Suffice it to say here, however, that this resolution posited an entirely new distinction of Web content into “information resources” and “non-information resources”, and also recommended the use of the HTTP 303 redirect code for when agents requesting a URI should be directed to concepts versus viewable documents.

This “resolution” has been anything but. Not only can no one clearly distinguish these de novo classes of “information resources” [19], but the whole approach felt arbitrary and kludgy.

Meanwhile, the confusions caused by the “identity crisis” and httpRange-14 continued to perpetuate themselves. In 2006, a major workshop on “Identity, Reference and the Web” (IRW 2006) was held in conjunction with the Web’s major WWW2006 conference in Edinburgh, Scotland, on May 23, 2006 [8]. The various presentations and its summary (by Harry Halpin) are very useful to understand these issues. What was starting to jell at this time was the understanding that the basis of identity and meaning on the Web posed new questions, and ones that philosophers, logicians and linguists needed to be consulted to help inform.

The fiat of the TAG’s 2005 resolution has failed to take hold. Over the ensuing years, various eruptions have occurred on mailing lists and within the TAG itself (now expressed as Issue 57) to revisit these questions and bring the steps moving forward into some coherent new understanding. Though linked data has been premised on best-practice implementation of these resolutions [9], and has been a qualified success, many (myself included) would claim that the extra steps and inefficiencies required from the TAG’s httpRange-14 guidance have been hindrances, not facilitators, of the uptake of linked data (or the semantic Web).

Today, despite the efforts of some to claim the issue closed, it is not. Issue 57 and the periodic bursts from notable semantic Web advocates such as Ian Davis [10], Pat Hayes and Harry Halpin [11], Ed Summers [12], Xiaoshu Wang [13], David Booth [14] and TAG members themselves, such as Larry Masinter [15] and Jonathan Rees [16], point to continued irresolution and discontent within the advocate community. Issue 57 currently remains open. Meanwhile, I think, all of us interested in such matters can express concern that linked data, the semantic Web and interoperable structured data have seen less uptake than any of us had hoped or wanted over the past decade. As I have stated elsewhere, unclear semantics and muddled guidelines help to undercut potential use.

As each of the eruptions over these identity issues has occurred, the competing camps have often been characterized as “talking past one another”; that is, not communicating in such a way as to help resolve to consensus. While it is hardly my position to do so, I try to encapsulate below the various positions and prejudices as I see them in this decades-long debate. I also try to share my own learning that may help inform some common ground. Forgive me if I overly simplify these vexing issues by returning to what I see as some first principles . . . .

What’s in a Name?

Original Coca-Cola bottle

One legacy of the initial document Web is the perception that Web addresses have meaning. We have all heard of the multi-million dollar purchasing of domains [17] and the adjudication that may occur when domains are hijacked from their known brands or trademark owners. This legacy has tended to imbue URIs with a perceived value. It is not by accident, I believe, that many within the semantic Web and linked data communities still refer to “minting” URIs. Some believe that ownership and control over URIs may be equivalent to grabbing up valuable real estate. It is also the case that many believe the “name” given to a URI acts to name the referent to which it refers.

This perception is partially true, partially false, but moreover incomplete in all cases. We can illustrate these points with the global icon, “Coca-Cola”.

As for the naming aspects, let’s dissect what we mean when we use the label “Coca-Cola” (in a URI or otherwise). Perhaps the first thing that comes to mind is “Coca-Cola,” the beverage (which has a description on Wikipedia, among other references). Because of its ubiquity, we may also recognize the image of the Coca-Cola bottle to the left as a symbol for this same beverage. (Though, in the hilarious movie, The Gods, They Must be Crazy, Kalahari Bushmen, who had no prior experience of Coca-Cola, took the bottle to be magical with evil powers [18].) Yet even as reference to the beverage, the naming aspects are a bit cloudy since we could also use the fully qualified synonyms of “Coke”, “Coca-cola” (small C), “Classic Coke” and the hundreds of language variants worldwide.

On the other hand, the label “Coca-Cola” could just as easily conjure The Coca-Cola Company itself. Indeed, the company web site is the location pointed to by the URI of http://www.thecoca-colacompany.com/. But, even that URI, which points to the home Web page of the company, does not do justice to conveying an understanding or description of the company. For that, additional URIs may need to be invoked, such as the description at Wikipedia, the company’s own company description page, plus perhaps the company’s similar heritage page.

Of course, even these links and references only begin to scratch the surface of what the company Coca-Cola actually is: headquarters, manufacturing facilities, 140,000 employees, shareholders, management, legal entities, patents and Coke recipe, and the like. Whether in human languages or URIs, in any attempt to signify something via symbols or words (themselves another form of symbol), we risk ambiguity and incompleteness.

URI shorteners also undercut the idea that a URI necessarily “names” something. Using the service bitly, we can shorten the link to the Wikipedia description of the Coke beverage to http://bit.ly/xnbA6 and we can shorten the link to The Coca-Cola Company Web site to http://bit.ly/9ojUpL. I think we can fairly say that neither of these shortened links “name” their referents. The most we can say about a URI is that it points to something. With the vagaries of meaning in human languages, we might also say that URIs refer to something, denote something or identify (but not in the sense of completely define) something.

From this discussion, we can assert with respect to the use of URIs as “names” that:

  1. In all cases, URIs are pointers to a particular referent
  2. In some cases, URIs do act to “name” some things
  3. Yet, even when used as “names,” there can be ambiguity as to what exactly the referent is that is denoted by the name
  4. Resolving what such “names” mean is a matter of context and reference to further information or links, and
  5. Because URIs may act as “names”, it is appropriate to consider social conventions and contracts (e.g., trademarks, brands, legal status) in adjudicating who can own the URI.

In summary, I think we can say that URIs may act as names, but not in all or most cases, and when used as such are often ambiguous. Absolutely associating URIs as names is way too heavy a burden, and incorrect in most cases.

What is a Resource?

The “name” discussion above masks that in some cases we are talking about a readable Web document or image (such as the Wikipedia description of the Coke beverage or its image) versus the “actual” thing in the real world (the Coke beverage itself or even the company). This distinction is what led to the so-called “identity crisis”, for which Ian Davis has used a toucan as his illustrative thing [10].Keel-billed Toucan

As I note in the conclusion, I like Davis’ approach to the identity conundrum insofar as Web architecture and linked data guidance are concerned. But here my purpose is more subtle: I want to tease apart still further the apparent distinction between an electronic description of something on the Web and the “actual” something. Like Davis, let’s use the toucan.

In our strawman case, we too use a description of the toucan (on Wikipedia) to represent our “information resource” (the accessible, downloadable electronic document). We contrast to that a URI that we mean to convey the actual physical bird (a “non-information resource” in the jumbled jargon of httpRange-14), which we will designate via the URI of http://example.com/toucan.

Despite the tortured (and newly conjured) distinction between “information resource” and “non-information resource”, the first blush reaction is that, sure, there is a difference between an electronic representation that can be accessed and viewed on the Web and its true, “actual” thing. Of course people can not actually be rendered and downloaded on the Web, but their bios and descriptions and portrait images may. While in the abstract such distinctions appear true and obvious, in the specifics that get presented to experts, there is surprising disagreement as to what is actually an “information resource” v. a “non-information resource” [19]. Moreover, as we inspect the real toucan further, even that distinction is quite ambiguous.

When we inspect what might be a definitive description of “toucan” on Wikipedia, we see that the term more broadly represents the family of Ramphastidae, which contains five genera and forty different species. The picture we are showing to the right is but of one of those forty species, that of the keel-billed toucan (Ramphastos sulfuratus). Viewing the images of the full list of toucan species shows just how divergent these various “physical birds” are from one another. Across all species, average sizes vary by more than a factor of three with great variation in bill sizes, coloration and range. Further, if I assert that the picture to the right is actually that of my pet keel-billed toucan, Pretty Bird, then we can also understand that this representation is for a specific individual bird, and not the physical keel-billed toucan species as a whole.

The point of this diversion is not a lecture on toucans, but an affirmation that distinctions between “resources” occur at multiple levels and dimensions. Just as there is no self-evident criteria as to what constitutes an “information resource”, there is also not a self-evident and fully defining set of criteria as to what is the physical “toucan” bird. The meaning of what we call a “toucan” bird is not embodied in its label or even its name, but in the context and accompanying referential information that place the given referent into a context that can be communicated and understood. A URI points to (“refers to”) something that causes us to conjure up an understanding of that thing, be it a general description of a toucan, a picture of a toucan, an understanding of a species of toucan, or a specific toucan bird. Our understanding or interpretation results from the context and surrounding information accompanying the reference.

In other words, a “resource” may be anything, which is just the way the W3C has defined it. There is not a single dimension which, magically, like “information” and “non-information,” can cleanly and definitely place a referent into some state of absolute understanding. To assert that such magic distinctions exist is a flaw of Cartesian logic, which can only be reconciled by looking to more defensible bases in logic [20].

Peirce and the Logic of Signs

The logic behind these distinctions and nuances leads us to Charles Sanders PeirceCharles Sanders Peirce (1839 – 1914). Peirce (pronounced “purse”) was an American logician, philosopher and polymath of the first rank. Along with Frege, he is acknowledged as the father of predicate calculus and the notation system that formed the basis of first-order logic. His symbology and approach arguably provide the logical basis for description logics and other aspects underlying the semantic Web building blocks of the RDF data model and, eventually, the OWL language. Peirce is the acknowledged founder of pragmatism, the philosophy of linking practice and theory in a process akin to the scientific method. He was also the first formulator of existential graphs, an essential basis to the whole field now known as model theory. Though often overlooked in the 20th century, Peirce has lately been enjoying a renaissance with his voluminous writings still being deciphered and published.

The core of Peirce’s world view is based in semiotics, the study and logic of signs. In his seminal writing on this, “What is in a Sign?” [21], he wrote that “every intellectual operation involves a triad of symbols” and “all reasoning is an interpretation of signs of some kind”. Peirce had a predilection for expressing his ideas in “threes” throughout his writings.

Semiotics is often split into three branches: 1) syntactics – relations among signs in formal structures; 2) semantics – relations between signs and the things to which they refer; and 3) pragmatics – relations between signs and the effects they have on the people or agents who use them.

Peirce’s logic of signs in fact is a taxonomy of sign relations, in which signs get reified and expanded via still further signs, ultimately leading to communication, understanding and an approximation of “canonical” truth. Peirce saw the scientific method as itself an example of this process.

A given sign is a representation amongst the triad of the sign itself (which Peirce called a representamen, the actual signifying item that stands in a well-defined kind of relation to the two other things), its object and its interpretant. The object is the actual thing itself. The interpretant is how the agent or the perceiver of the sign understands and interprets the sign. Depending on the context and use, a sign (or representamen) may be either an icon (a likeness), an indicator or index (a pointer or physical linkage to the object) or a symbol (understood convention that represents the object, such as a word or other meaningful signifier).

An interpretant in its barest form is a sign’s meaning, implication, or ramification. For a sign to be effective, it must represent an object in such a way that it is understood and used again. This makes the assignment and use of signs a community process of understanding and acceptance [20], as well as a truth-verifying exercise of testing and confirming accepted associations.

John Sowa has done much to help make some of Peirce’s obscure language and terminology more accessible to lay readers [22]. He has expressed Peirce’s basic triad of sign relations as follows, based around the Yojo animist cat figure used by the character Queequeg in Herman Melville’s Moby-Dick:

The Triangle of Meaning

In this figure, object and symbol are the same as the Peirce triad; concept is the interpretant in this case. The use of the word ‘Yojo’ conjures the concept of cat.

This basic triad representation has been used in many contexts, with various replacements or terms at the nodes. Its basic form is known as the Meaning Triangle, as was popularized by Ogden and Richards in 1923 [23].

The key aspect of signs for Peirce, though, is the ongoing process of interpretation and reference to further signs, a process he called semiosis. A sign of an object leads to interpretants, which, as signs, then lead to further interpretants. In the Sowa example below, we show how meaning triangles can be linked to one another, in this case by abstracting that the triangles themselves are concepts of representation; we can abstract the ideas of both concept and symbol:

Representing an Object by a Concept

We can apply this same cascade of interpretation to the idea of the sign (or representamen), which in this case shows that a name can be related to a word symbol, which in itself is a combination of characters in a string called ‘Yojo’:

Representing Signs of Signs of Signs

According to Sowa [22]:

“What is revolutionary about Peirce’s logic is the explicit recognition of multiple universes of discourse, contexts for enclosing statements about them, and metalanguage for talking about the contexts, how they relate to one another, and how they relate to the world and all its events, states, and inhabitants.
“The advantage of Peircean semiotics is that it firmly situates language and logic within the broader study of signs of all types. The highly disciplined patterns of mathematics and logic, important as they may be for science, lie on a continuum with the looser patterns of everyday speech and with the perceptual and motor patterns, which are organized on geometrical principles that are very different from the syntactic patterns of language or logic.”

Catherine Legg [20] notes that the semiotic process is really one of community involvement and consensus. Each understanding of a sign and each subsequent interpretation helps come to a consensus of what a sign means. It is a way of building a shared understanding that aids communication and effective interpretation. In Peirce’s own writings, the process of interpretation can lead to validation and an eventual “canonical” or normative interpretation. The scientific method itself is an extreme form of the semiotic process, leading ultimately to what might be called accepted “truths”.

Peircean Semiotics of URIs

So, how do Peircean semiotics help inform us about the role and use of URIs? Does this logic help provide guidance on the “identity crisis”?

The Peircean taxonomy of signs has three levels with three possible sign roles at each level, leading to a possible 27 combinations of sign representations. However, because not all sign roles are applicable at all levels, Peirce actually postulated only ten distinct sign representations.

Common to all roles, the URI “sign” is best seen as an index: the URI is a pointer to a representation of some form, be it electronic or otherwise. This representation bears a relation to the actual thing that this referent represents, as is true for all triadic sign relationships. However, in some contexts, again in keeping with additional signs interpreting signs in other roles, the URI “sign” may also play the role of a symbolic “name” or even as a signal that the resource can be downloaded or accessed in electronic form. In other words, by virtue of the conventions that we choose to assign to our signs, we can supply additional information that augments our understanding of what the URI is, what it means, and how it is accessed.

Of course, in these regards, a URI is no different than any other sign in the Peircean world view: it must reside in a triadic relationship to its actual object and an interpretation of that object, with further understanding only coming about by the addition of further signs and interpretations.

In shortened form, this means that a URI, acting alone, can at most play the role of a pointer between an object and its referent. A URI alone, without further signs (information), can not inform us well about names or even what type of resource may be at hand. For these interpretations to be reliable, more information must be layered on, either by accepted convention of the current signs or the addition of still further signs and their interpretations. Since the attempts to deal with the nature of a URI resource by fiat as stipulated by httpRange-14 neither meet the standards of consensus nor empirical validity, the attempt can not by definition become “canonical”. This does not mean that httpRange-14 and its recommended practices can not help in providing more information and aiding interpretation for what the nature of a resource may be. But it does mean that httpRange-14 acting alone is insufficient to resolve ambiguity.

Moreover, what we see in the general nature of Peirce’s logic of signs is the usefulness of adding more “triads” of representation as the process to increase understanding through further interpretation. Kind of sounds like adding on more RDF triples, does it not?

Global is Neither Indiscriminate Nor Unambiguous

Names, references, identity and meaning are not absolutes. They are not philosophically, and they are not in human language. To expect machine communications to hold to different standards and laws than human communications is naive. To effect machine communications our challenge is not to devise new rules, but to observe and apply the best rules and practices that human communications instruct.

There has been an unstated hope at the heart of the semantic Web enterprise that simply expressing statements in the right way (syntax) and in the right form (RDF) is sufficient to facilitate machine communications. But this hope, too, is naive and silly. Just as we do not accept all human utterances as truth, neither will we accept all machine transmissions as reliable. Some of the information will be posted in error; some will be wrong or ill-fitting to our world view; some will be malicious or intended to deceive. Spam and occasionally lousy search results on the Web tell us that Web documents are subject to these sources of unsuitability, why is not the same true of data?

Thus, global data access via the semantic Web is not — and can never be — indiscriminate nor unambiguous. We need to understand and come to trust sources and provenance; we need interpretation and context to decide appropriateness and validity; and we need testing and validation to ensure messages as received are indeed correct. Humans need to do these things in their normal courses of interaction and communication; our machine systems will need to do the same.

These confirmations and decisions as to whether the information we receive is actionable or not will come about via still more information. Some of this information may come about via shared conventions. But most will come about because we choose to provide more context and interpretation for the core messages we hope to communicate.

A Go-Forward Approach

Nearly five years ago Hayes and Halpin put forth a proposal to add ex:refersTo and ex:describedBy to the standard RDF vocabulary as a way for authors to provide context and explanation for what constituted a specific RDF resource [11]. In various ways, many of the other individuals cited in this article have come to similar conclusions. The simple redirect suggestions of both Ian Davis [10] and Ed Summers [12] appear particularly helpful.

Over time, we will likely need further representations about resources regarding such things as source, provenance, context and other interpretations that would help remove ambiguities as to how the information provided by that resource should be consumed or used. These additional interpretations can mechanically be provided via referenced ontologies or embedded RDFa (or similar). These additional interpretations can also be aided by judicious, limited additions of new predicates to basic language specifications for RDF (such as the Hayes and Halpin suggestions).

In the end, of course, any frameworks that achieve consensus and become widely adopted will be simple to use, easy to understand, and straightforward to deploy. The beauty of best practices in predicates and annotations is that failures to provide are easy to test. Parties that wish to have their data consumed have incentive to provide sufficient information so as to enable interpretation.

There is absolutely no reason that these additions can not co-exist with the current httpRange-14 approach. By adding a few other options and making clear the optional use of httpRange-14, we would be very Peirce-like in our go-forward approach: We are being both pragmatic while we add more means to improve our interpretations for what a Web resource is and is meant to be.


[1] Throughout intellectual history, a number of prominent philosophers and logicians have attempted to describe naming, identity and reference of objects and entities. Here are a few that you may likely encounter in various discussions of these topics in reference to the semantic Web; many are noted philosophers of language:

  • Aristotle (384 BC – 322 BC) – founder of formal logic; formulator and proponent of categorization; believed in the innate “universals” of various things in the natural world
  • Rudolf Carnap (1891 – 1970) -  proposed a logical syntax that provided a system of concepts, a language, to enable logical analysis via exactly formula; a basis for natural language processing;rejected the idea and use of metaphysics
  • René Descartes (1596 – 1650) – posited a boundary between mind and the world; the meaning of a sign is the intension of its producer, and is private and incorrigible
  • Friedrich Ludwig Gottlob Frege (1848 – 1925) – one of the formulators of first-order logic, though syntax not adopted; advocated shared senses, which can be objective and sharable
  • Kurt Gödel (1906 – 1978) – his two incompleteness theorems are some of the most important logic contributions of all time; they establish inherent limitations of all but the most trivial axiomatic systems capable of doing arithmetic, as well as for computer programs
  • David Hume (1711 – 1776) – embraced natural empiricism, but kept the Descartes concept of an “idea”
  • Immanuel Kant (1724 – 1804) – one of the major philosophers in history, argued that experience is purely subjective without first being processed by pure reason; a major influence on Peirce
  • Saul Kripke (1940 – ) – proposed the causal theory of reference and what proper names mean via a “baptism” by the namer
  • Gottfried Wilhelm Leibniz (1646 – 1716) – the classic definition of identity is Leibniz’s Law, which states that if two objects have all of their properties in common, they are identical and so only one object
  • Richard Montague (1930 – 1971) – wrote much on logic and set theory; student of Tarski; pioneered a logical approach to natural language semantics; associated with model theory, model-theoretic semantics
  • Charles Sanders Peirce (1839 – 1914) – see main text
  • Willard Van Orman Quine (1908 – 2000) – noted analytical philosopher, advocated the “radical indeterminancy of translation” (can never really know)
  • Bertrand Russell (1872 – 1970) – proposed the direct theory of reference and what it means to “ground in references”; adopted many Peirce arguments without attribution
  • Ferdinand de Saussure (1857 – 1913) – also proposed an alternative view to Peirce of semiotics, one grounded in sociology and linguistics
  • John Rogers Searle (1932 – ) – argues that consciousness is a real physical process in the brain and is subjective; has argued against strong AI (artificial intelligence)
  • Alfred Tarski (1901 – 1983) – analytic philosopher focused on definitions of models and truth; great admirer of Peirce; associated with model theory, model-theoretic semantics
  • Ludwig Josef Johann Wittgenstein (1889 – 1951) – he disavowed his earlier work, arguing that philosophy needed to be grounded in ordinary language, recognzing that the meaning of words is dependent on context, usage, and grammar.
Also, Umberto Eco has been a noted proponent and popularizer of semiotics.
[2] As any practitioner ultimately notes, standards development is a messy, lengthy and trying process. Not all individuals can handle the messiness and polemics involved. Personally, I prefer to try to write cogent articles on specific issues of interest, and then leave it to others to slug it out in the back rooms of standards making. Where the process works well, standards get created that are accepted and adopted. Where the process does not work well, the standards are not embraced as exhibited by real-world use.
[3] Tim Berners-Lee, 2007. What Do HTTP URIs Identify?
This article does not discuss the other sub-category of URIs, URNs (for names). URNs may refer to any standard naming scheme (such as ISBNs for books) and has no direct bearing on any network access protocol, as do URLs and URIs when they are referenceable. Further, URNs are little used in practice.
[4] Kendall Clark was one of the first to question “resource” and other identity ambiguities, noting the tautology between URI and resource as “anything that has identity.” See Kendall Clark, 2002. “Identity Crisis,” in XML.com, Sept 11 2002; see http://www.xml.com/pub/a/2002/09/11/deviant.html. From the topic map community, one notable contribution was from Steve Pepper and Sylvia Schwab, 2003. “Curing the Web’s Identity Crisis,” found at : http://www.ontopia.net/topicmaps/materials/identitycrisis.html.
[5] Sandro Hawke, 2003. Disambiguating RDF Identifiers. W3C, January 2003. See http://www.w3.org/2002/12/rdf-identifiers/.
[6] The issue was framed as what is the proper “range” for HTTP referrals and was also the 14th major TAG issue recorded, hence the name. See further the httpRange-14 Webography .
[7] See W3C, “httpRange-14: What is the range of the HTTP dereference function?”; see http://www.w3.org/2001/tag/issues.html#httpRange-14.
[9] Leo Sauermann and Richard Cyganiak, eds., 2008. Cool URIs for the Semantic Web, W3C Interest Group Note, December 3, 2008. See http://www.w3.org/TR/cooluris/.
[10] Ian Davis, 2010. Is 303 Really Necessary? Blog post, November 2010, accessed 20 January 2012. (See http://blog.iandavis.com/2010/11/04/is-303-really-necessary/.) A considerable thread resulted from this post; see http://markmail.org/thread/mkoc5kxll6bbjbxk.
[11] See first Harry Halpin, 2006. “Identity, Reference and Meaning on the Web,” presented at WWW 2006, May 23, 2006. See http://www.ibiblio.org/hhalpin/irw2006/hhalpin.pdf. This was then followed up with greater elaboration by Patrick J. Hayes and Harry Halpin, 2007. “In Defense of Amibiguity,” http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html.
[12] Ed Summers, 2010. Linking Things and Common Sense, blog post of July 7, 2010. See http://inkdroid.org/journal/2010/07/07/linking-things-and-common-sense/.
[13] Xiaoshu Wang, 2007. URI Identity and Web Architecture Revisited, Word document posted on posterous.com, November 2007. (Former Web documents have been removed.)
[14] David Booth, 2006. “URIs and the Myth of Resource Identity,” see http://dbooth.org/2006/identity/.
[15] See Larry Masinter, 2012. “The ‘tdb’ and ‘duri’ URI Schemes, Based on Dated URIs,” 10th version, IETF Network Working Group Internet-Draft,January 12, 2012. See http://tools.ietf.org/html/draft-masinter-dated-uri-10.
[16] Jonathan Rees has been the scribe and author for many of the background documents related to Issue 57. A recent mailing list entry provides pointers to four relevant documents in this entire discussion. See Jonathan A Rees, 2012. Guide to ISSUE-57 (httpRange-14) document suiteJanuary, 21, 2012.
[17] At least twenty domain names, led by insure.com, have sold for more the $2 million each; see this Wikipedia listing.
[18] In the wonderful movie, The Gods, They Must be Crazy, Bushmen in the Kalahari Desert one day find an unbroken glass Coke bottle that had been thrown out of an airplane. Initially, this strange artifact seems to be another boon from the gods, and the Bushmen find many uses for it. But unlike anything that they have had before, there is only one bottle to go around. This creates jealousy, envy, anger, hatred, even violence. The protagonist, Xi, decides that the bottle is an evil thing and must be thrown off of the edge of the world. The hilarity of the movie comes from that premise and Xi’s encounters with the modern world as he pursues his quest with the magic bottle.
[19] Wang [13]rhetorically asked which of the following things would be categorized as an “information resource”:
  1. A book
  2. A clock
  3. The clock on the wall of my bedroom
  4. A gene
  5. The sequence of a gene
  6. A software
  7. A service
  8. A namespace
  9. An ontology
  10. A language
  11. A number
  12. A concept, such as Dublin Core’s creator.

See the 2007 thread on this issue, mostly by Sean Palmer and Noah Mendelsohn, the latter aknowledging that various experts may only agree on 85% of the items.

[20] See further Catherine Legg, 2010. “Pragmaticsm on the Semantic Web,” in Bergman, M., Paavola, S., Pietarinen, A.-V., & Rydenfelt, H. eds., Ideas in Action: Proceedings of the Applying Peirce Conference, pp. 173–188. Nordic Studies in Pragmatism 1. Helsinki: Nordic Pragmatism Network. See http://www.nordprag.org/nsp/1/Legg.pdf.
[21] Charles Sanders Peirce, 1894. “What is in a Sign?”, see http://www.iupui.edu/~peirce/ep/ep2/ep2book/ch02/ep2ch2.htm.
[22] The figures in particular are from John F. Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS 2000 in Darmstadt, Germany, on August 14, 2000; published in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. May be found at http://www.jfsowa.com/ontology/ontometa.htm. Also see John F. Sowa, 2006. “Peirce’s Contributions to the 21st Century,” presented at International Conference on Conceptual Structures, Aalborg, Denmark, July 17, 2006. See http://www.jfsowa.com/pubs/csp21st.pdf.
[23] C.K. Ogden and I. A. Richards, 1923. The Meaning of Meaning, Harcourt, Brace, and World, New York, 8th edition 1946.

by Mike Bergman at January 24, 2012 03:52 PM

January 05, 2012

Frederick Giasson's Weblog

December 28, 2011

Frederick Giasson's Weblog

Open Semantic Framework Running on Micro Instances

After releasing the new Open Semantic Framework Installer, we started to test it on machines with all kind of different specifications: different CPU limits, different amount of memory, etc. One of the setup that caught our attention was Amazon’s EC2 Micro Instance.

The Micro Instance is a virtual server type that has been introduced by Amazon a little bit more than a year ago. As described by Amazon, Micro Instances are:

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

We were intrigued by this particular type of instance because we wanted to know how the complete Open Semantic Framework stack could operate on such a small server instance.

Micro Instance Specifications

The Micro Instance’s specifications are as follow:

  • 613 MB memory
  • Up to 2 EC2 Compute Units (for short periodic bursts)
  • 32-bit or 64-bit platform
  • I/O Performance: Low

Note that a EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Installing The Stack

Installing the stack on the Amazon Micro Instance, using the OSF Installer, is not the fastest experience in the World. In fact, installing the complete stack takes up to 10 hours (5 minutes of your time, but compiling and installing everything takes about 10 hours of CPU time).

The problem is that installing OSF is a CPU intensive task, while the Micro instance is not. The micro instance can sustain small CPU bursts, but it can’t sustain the creation and compilation of the entire stack. That means that the CPU cycles won’t be available to the instance, and that the CPU consumption of that instance will be throttled by Amazon, which will significantly slow down the installation process.

However, as you will see below, once OSF is installed on the Micro instance, the complete stack responds perfectly to all queries sent to it.

Creating an AMI

The only time you have to spend 10 hours to install the OSF stack on an Amazon Micro Instance is the first time. After that, you would only have to create an Amazon AMI from that vanilla OSF instance for future use. If you proceed that way, you will lower the installation time from 10 hours to a few minutes.

Reading and Searching Data

The testing we did for reading and searching data from structWSF shows that performances are as good as the ones you would get from a small instance with a normal workload. The Crud: Read and the Search structWSF endpoints are fully responsive and operational.

Creating, Updating and Deleting Data

The testing we did for creating, updating and deleting entire datasets takes more time than with a small instance even if the instance is dedicated to that only task, without any other queries processed by the instance at the same time. The reason for this decrease in performances is due to the CPU throttling done by Amazon for this kind of more CPU intensive task. However, since individual records creation, updating and deletion creates “CPU Peaks”, such isolated create/update/delete queries doesn’t greatly affect the overall performances of the instance.

What This Type Of Instance Is Good For?

We found that such small instances were perfect for data collection activities performed by a single person, or a small group of collaborators. We also found that it could be used by low-traffic websites such as personal web portal, personal blogs, etc. The complete OSF stack is fully responsive and our analysis shows that the resources (CPU and Memory) are stable and responsive with a normal workload.

Conclusion

Such a small server instance can easily be used to create a personal data collection endpoint, or a personal, or small, data presentation portal such as Mike’s semantic web Sweet Tools. It is well suited for data portals that require reading and searching of data with occasional data changes (addition, removal and modification of instance records).

by Frederick Giasson at December 28, 2011 08:45 PM

December 21, 2011

Frederick Giasson's Weblog

Volkswagen UK’s Search Engine Powered by structWSF

It is now official, Volkswagen UK‘s search engine is now powered by structWSF. Their new contextual search engine has been released last Friday. I covered the underlying architecture in one of my recent blog post: Volkswagen's RDF Data Management Workflow.

 

 

John Streit, head of technology at Tribal DDB, described the two key advantages of using the structWSF (part of the Open Semantic Framework (OSF)) for their website in an interview with Wired UK:

The first is that it gives you a single place to access data. Streit explains: “Applications often need to retrieve data from multiple sources which adds complexity and development time. By using this technology we can get everything we need from a single place which drastically lowers development time and running costs.” Furthermore the exposure of data improves search and means that it can be repurposed in new and imaginative ways.

by Frederick Giasson at December 21, 2011 06:23 PM

December 14, 2011

Frederick Giasson's Weblog

The Open Semantic Framework Installer

We are excited to introduce the first Open Semantic Framework installation script. This new installer application will install and configure the entire Open Semantic Framework stack for you. It will take about 10 minutes of your time, and will process in the background for a few hours while everything necessary to build the OSF stack is downloaded and compiled. Open Semantic Framework Installer

The only thing you have to do to run the OSF Installer is to issue the few commands outlined below, and then to answer a few questions in the process (which, since most of them use the standard default values, is pretty easy).

The OSF Installer is a major addition to the Open Semantic Framework since it now enables a greater number of people (mere mortals) to install and use the stack, and it enables much faster deployment of the system.

The full installation manual, where each of the steps performed by the installer is explained in detail, is available as a reference here.

Requirements

The current version of the Open Semantic Framework Installer is fully operational on:

  1. Ubuntu 10.04 (Lucid)
  2. 32 Bits Operating System
  3. Access to internet from the server
  4. 5GIG of disk space on the partition where you are installing OSF

Eventually this installer will be upgraded for 64-bits operating systems, and for other Linux distributions. Also, the current installer should work on newer versions of Ubuntu, but it has only been tested to date on the latest LTS version.

Installing the Open Semantic Framework

The only manual steps need to do to install the Open Semantic Framework are to:

  1. Create a folder where to install OSF on your server
  2. Download the osf-install.zip installation package
  3. Make the osf-install.sh installation script executable
  4. Run the osf-install.sh installation script
  5. Answer the questions asked by the installer

Here are the commands you have to run:

1
2
3
4
5
6
cd /mnt/
sudo wget https://github.com/downloads/structureddynamics/Open-Semantic-Framework-Installer/osf-installer-v1.0a3.zip
sudo unzip osf-installer-v1.0a3.zip
cd `ls -d structureddynamics*/`
sudo chmod 755 osf-install.sh
./osf-install.sh

conStruct and structWSF Upgrades

In the process, both conStruct and structWSF have been enhanced to enable automatic upgrading in the future. Starting with structWSF version 1.0a92 and conStruct version 6.x-1.0-beta9, future upgrades should be done automatically using automatic upgrading procedures.

However, to enable this, existing users will have to upgrade their current versions manually to establish the new automatic upgrades baseline.

Next Steps

Once you have installed the OSF stack, you next query the structWSF Web service endpoints, and import datasets using conStruct. Here are a few things you can do to start exploring the Open Semantic Framework:

  1. Start exploring structWSF
  2. Start exploring conStruct
  3. Start exploring Ontologies usage in OSF
  4. Start importing and manipulating datasets
  5. Start exploring the Open Semantic Framework architecture
  6. Start playing with the structWSF web service endpoints

Since everything is installed on your server, so you only have to play with the stack now. If you break something, just ping us on the mailing list or re-install it without worrying about each installation steps!

Help

It may be possible that you experience some issues with this new OSF Installer. If that is the case, I would suggest your to make an outreach to the Open Semantic Web Mailing List so that we fix it on the Git repository.

Just write an email that includes the specifications of the server where you are trying to install OSF on. Then tell us where the issue happens in the installation process. Also add any logs that could be helpful in debugging the issue.

Conclusion

This is the first version of the OSF installer, but this is a real balm for installing OSF. As noted, this installer will eventually be upgraded to support 64-bit servers and other Linux distributions. Also, any help improving this installer from Bash wizards would naturally be greatly welcomed.

by Frederick Giasson at December 14, 2011 12:10 AM

December 12, 2011

AI3:::Adaptive Information (Mike Bergman)

The State of Tooling for Semantic Technologies

State of SemWeb Tools - 2011Number of Semantic Web Tools Passes 1000 for First Time; Many Other Changes

We have been maintaining Sweet Tools, AI3‘s listing of semantic Web and -related tools, for a bit over five years now. Though we had switched to a structWSF-based framework that allows us to update it on a more regular, incremental schedule [1], like all databases, the listing needs to be reviewed and cleaned up on a periodic basis. We have just completed the most recent cleaning and update. We are also now committing to do so on an annual basis.

Thus, this is the inaugural ‘State of Tooling for Semantic Technologies‘ report, and, boy, is it a humdinger. There have been more changes — and more important changes — in this past year than in all four previous years combined. I think it fair to say that semantic technology tooling is now reaching a mature state, the trends of which likely point to future changes as well.

In this past year more tools have been added, more tools have been dropped (or abandoned), and more tools have taken on a professional, sophisticated nature. Further, for the first time, the number of semantic technology and -related tools has passed 1000. This is remarkable, given that more tools have been abandoned or retired than ever before.

Click here to browse the Sweet Tools listing. There is also a simple listing of URL links and categories only.

We first present our key findings and then overall statistics. We conclude with a discussion of observed trends and implications for the near term.

Key Findings

Some of the key findings from the 2011 State of Tooling for Semantic Technologies are:

  • As of the date of this article, there are 1010 tools in the Sweet Tools listing, the first it has passed 1000 total tools
  • A total of 158 new tools have been added to the listing in the last six months, an increase of 17%
  • 75 tools have been abandoned or retired, the most removed at any period over the past five years
  • A further 6%, or 55 tools, have been updated since the last listing
  • Though open source has always been an important component of the listing, it now constitutes more than 80% of all listings; with dual licenses, open source availability is about 83%. Online systems contribute another 9%
  • Key application areas for growth have been in SPARQL, ontology-related areas and linked data
  • Java continues to dominate as the most important language.

Many of these points are elaborated below.

The Statistical Picture

The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories, each with over 6% of the total, are information extraction, general RDF tools, ontology tools, browser tools (RDF, OWL), and parsers or converters. The relative share by category is shown in this diagram (click to expand):

Since the last listing, the fastest growing categories have been SPARQL, linked data, knowledge bases and all things related to ontologies. The relative changes by tools category are shown in this figure:

Though it is true that some of this growth is the result of discovery, based on our own tool needs and investigations, we have also been monitoring this space for some time and serendipity is not a compelling explanation alone. Rather, I think that we are seeing both an increase in practical tools (such as for querying), plus the trends of linked data growth matched with greater sophistication in areas such as ontologies and the OWL language.

The languages these tools are written in have also been pretty constant over the past couple of years, with Java remaining dominant. Java has represented half of all tools in this space, which continues with the most recent tools as well (see below). More than a dozen programming or scripting languages have at least some share of the semantic tooling space (click to expand):

Sweet Tools Languages

With only 160 new tools it is hard to draw firm trends, but it does appear that some languages (Haskell, XSLT) have fallen out of favor, while popularity has grown for Flash/Flex (from a small base), Python and Prolog (with the growth of logic tools):

PHP will likely continue to see some emphasis because of relations to many content management systems (WordPress, Drupal, etc.), though both Python and Ruby seem to be taking some market share in that area.

New Tools

The newest tools added to the listing show somewhat similar trends. Again, Java is the dominant language, but with much increased use of JavaScript and Python and Prolog:

Sweet Tools Languages

The higher incidence of Prolog is likely due to the parallel increase in reasoners and inference engines associated with ontology (OWL) tools.

The increase in comprehensive tool suites and use of Eclipse as a development environment would appear to secure Java’s dominance for some time to come.

Trends and Observations

These dry statistics tend to mask the feel one gets when looking at most of the individual tools across the board. Older academic and government-funded project tools are finally getting cleaned out and abandoned. Those tools that remain have tended to get some version upgrades and improved Web sites to accompany them.

The general feel one gets with regard to semantic technology tooling at the close of 2011 has these noticeable trends:

  • A three-tiered environment – the tools seem to segregate into: 1) a bottom tier of tools (largely) developed by individuals or small groups, now most often found on Google Code or Github; 2) a middle-tier of (largely) government-funded projects, sometimes with multiple developers, often older, but with no apparent driving force for ongoing improvements or commercialization; and 3) a top-tier of more professional and (often) commercially-oriented tools. The latter category is the most noticeable with respect to growth and impact
  • Professionalism – the tools in the apparent top tiers feel to have more professionalism and better (and more attractive) packaging. This professionalism is especially true for the frameworks and composite applications. But, it also applies to many of the EU-funded projects from Europe, which has always been a huge source of new tool developments
  • More complete toolsets – similarly, the upper levels of tools are oriented to pragmatic problems and problem-solving, which often means they embody multiple functions and more complete tooling environments. This category actually appears to be the most visible one exhibiting growth
  • Changing nature of academic releases – yet, even the academic releases seem to be increasing in professionalism and completeness. Though in the lowest tier it is still possible to see cursory or experimental tool releases, newer academic releases (often) seem to be more strategically oriented and parts of broader programmatic emphases. Programs like AKSW from the University of Leipzig or the Freie Universität Berlin or Finland’s Semantic Computing Research Group (SeCo), among many others, tend to be exemplars of this trend
  • Rise of commercial interests and enterprise adoption – the growing maturity of semantic technologies is also drawing commercial interest, and the incubation of new start-ups by academic and research institutions acts to reinforce the above trends. Promising projects and tools are now much more likely to be spun off as potential ventures, with accompanying better packaging, documentation and business models
  • Multiple languages and applications – with this growing complexity and sophistication has also come more complicated apps, combining multiple languages and functions. In fact, for some time the Sweet Tools listing has been justifiably criticized by some as overly “simplifying” the space by classifying tools under (largely) single applications or single languages. By the 2012 survey, it will likely be necessary to better classify the tools using multiple assignments
  • Google code over SourceForge for open source (and an increase in Github, as well) – virtually all projects on SourceForge now feel abandoned or less active. The largest source of open source projects in the semantic technology space is now clearly Google Code. Though of a smaller footprint today, we are also seeing many of the newer open source projects also gravitate to Github. Open source hosting environments are clearly in flux.

I have said this before, and been wrong about it before, but it is hard to see the tooling growth curve continue at its current slope into the future. I think we will see many individual tools spring up on the open source hosting sites like Google and Github, perhaps at relatively the same steady release rate. But, old projects I think will increasingly be abandoned and older projects will not tend to remain available for as long a time. While a relatively few established open source standards, like Solr and Jena, will be the exception, I think we will see shorter shelf lives for most open source tools moving forward. This will lead to a younger tools base than was the case five or more years ago.

I also think we will continue to see the dominance of open source. Proprietary software has increasingly been challenged in the enterprise space. And, especially in semantic technologies, we tend to see many open source tools that are as capable as proprietary ones, and generally more dynamic as well. The emphasis on open data in this environment also tends to favor open source.

Yet, despite the professionalism, sophistication and complexity trends, I do not yet see massive consolidation in the semantic technology space. While we are seeing a rapid maturation of tooling, I don’t think we have yet seen a similar maturation in revenue and business models. While notable semantic technology start-ups like Powerset and Siri have been acquired and are clear successes, these wins still remain much in the minority.


[1] Please use the comments section of this post for suggesting new or overlooked tools. We will incrementally add them to the Sweet Tools listing. Also, please see the About tab of the Sweet Tools results listing for prior releases and statistics.

by Mike Bergman at December 12, 2011 02:29 PM

December 05, 2011

Frederick Giasson's Weblog

Role and Use of Ontologies in the Open Semantic Framework

Ontologies are to the Open Semantic Framework what humans were to the Mechanical Turk. The hidden human in the Mechanical Turk was orchestrating all and every chess move. However, to the observers, the automated chess machine was looking just like it: a new kind of intelligent machine. We were in 1770.

Ontologies plays exactly the same role for the Open Semantic Framework (OSF): they orchestrate all and every moves for all the pieces within OSF. They are what instructs structWSF, the Semantic Components, conStruct, and all other derivate pieces of user interfaces how to behave.

In this (lengthy) blog post, I will present the main ontologies that have an impact on different parts of OSF. We will see how different ontology classes and properties, and how the description of the records indexed in the system, can impact the behaviors of OSF.

In addition to this post, Mike has also published a blog post today that overviews the overall OSF ontology modularization and architecture.

Constituent Ontologies

Let’s take a look at the core ontologies used by the Open Semantic Framework. All these ontologies have been developed in relation to OSF. These, and other external ontologies, have the same role in OSF as the human does in the Mechanical Turk: they instruct the system how to behave.

Here is the list of the core ontologies:

  1. The SCO Ontology (Semantic Component Ontology)
  2. The WSF Ontology (Web Service Framework Ontology)
  3. The AGGR Ontology (Aggregation Ontology)
  4. The irON Ontology (Instance Record an Object Notation Ontology)
  5. One or more domain ontologies, to capture the concepts and relationships for the purposes of a given OSF installation, and
  6. Possibly UMBEL or other upper-level concept ontologies, used for linkages to external systems.

(Note: the internal wiki links to each of these ontologies also provides links to the actual ontology specifications on Github.)

A useful discussion of these ontologies and their interactions in an OSF instance is provided by the ontology modularization document. This current document focuses primarily on the specific properties and roles associated with them in an OSF installation.

Depending on the specific OSF installation, of course, multiple external ontologies may also be employed. Some of the common external ones used in an OSF installation are described by the external ontologies document. These external ontologies are important — indeed essential in order to ensure linkage to the external world — but have little to do with internal OSF control structures. That is why the rest of this discussion focused on internal ontologies only.

Summary Ontology Roles

Ontologies play pivotal roles across all parts of the framework. In a broad sense, the internal OSF ontologies are used for annotations, guiding interactions or relating concepts and information to other information. In specific terms, OSF ontologies may play one or more of these dozen or so roles:

  1. Define record descriptions
  2. Inform interface displays
  3. Integrate different data sources
  4. Define component selections
  5. Define component behaviors
  6. Guide template selection
  7. Provide reasoning and inference
  8. Guide content filtering (with and without inference)
  9. Tag concepts in text documents
  10. Help organize and navigate Web portals
  11. Manage datasets and ontologies, and
  12. Set access permissions and registrations.

In the remainder of this post, for each of these roles, we will see how ontologies affect numerous different parts of the OSF framework. These sections are presented in the order above.

Define Records Descriptions

A central role of ontologies in the Open Semantic Framework is their use to describe any kind of record that gets indexed and managed by the system. Since the framework indexes everything into the RDF data model, ontologies are needed as a schema to describe these RDF resources.

The irON ontology is specifically designed for record descriptions and notations. It interacts with all of the domain and (if used, UMBEL) upper level ontologies.

Inform Interface Displays

Ontologies have an impact in most of the user interfaces that display record information. The property that has the most impact is iron:prefLabel, which is used to display the label within the user interface that refers to a record or record attributes (properties). This label can be used within text, in a list control, in a tree control, or in any other kind of control that displays references to records.

Note: there are also other properties that are considered as fallbacks to iron:prefLabel if a record has no triples using the iron:prefLabel property. These include rdfs:label, dcterms:title, foaf:name, etc.

General User Interface Labels and Descriptions

There are a few properties that have an impact on most of the components of the OSF stack, most of which come from the irON ontology. Here is the list of these irON properties that impact other parts of the system, mainly related to different user interfaces:

irON Property Impact on the different user interfaces
iron:prefLabel Preferred label to refer to an instance record or specific attribute (property)This impacts most of the user interfaces. As soon as a record is described using this property, the user interface uses it to refer to that record (as a link, in a list, as a word, etc.)
iron:altLabel Alternative label to refer to an instance record or specific attribute (property)This impacts most of the user interfaces. As soon as a record is described using this property, and that the user interface needs more than one label to refer to that record, it is displayed in the user interface (as a link, in a list, as a word, etc.)
iron:hiddenLabel Hidden label are labels that shouldn’t be displayed in any user interface, but that may be used by different systems for indexing purposesThis impacts on different indexation system such as Scones. As soon as a record is described using this property, and that a system needs more words (synonyms) to use to describe that record, but that label shouldn’t be displayed in any user interface, these hidden labels will be used.
iron:description Description of an instance record.This impacts most of the user interfaces. As soon as a record is described using this property, and that the user interface needs a description to refers to that record, it will be displayed in the user interface (as a link, in a list, as a word, etc.)
iron:prefURL Preferred URL for an instance recordThis impacts most of the user interfaces. As soon as a record is described using this property, and that the user interface needs a web page URL to refer to that record, it will be displayed in the user interface (as a link)

User Interface ‘Short’ Labels

There are a few properties that impact most of the components of the OSF stack. Here is the list of SCO properties that impact other parts of the system, mainly related to different user interfaces:

SCO Property Impact on the different user interfaces
sco:shortLabel The short label is used to display a short version of the label of an attribute/type where it has to be displayed in a restrained region of a component.This impacts multiple different kinds of user interfaces (including the semantic components) in the way that if the user interface knows that the place available to display the label is limited, it will utilize the sco:shortLabel value before any other label values that may be defined for that record.

Hierarchical Displays

The way ontologies define a class or a property structure also has an impact on different kinds of hierarchical displays. An example of this is the “Filter by Kinds” section of the structSearch and structBrowse modules. The possible filters that may be applied to a search query will be displayed to the user according to the hierarchy as defined in the ontologies.

Integrate Heterogeneous Data Sources

The principle reason why the Open Semantic Framework uses RDF and ontologies to describe all the data it indexes and manages is to facilitate data integration from multiple and heterogeneous data sources. The premise of using RDF and ontologies is:

The RDF framework, along with using ontologies as schema, is the most flexible means currently available to describe any kind of data. The RDF-ontology combination can be used to represent any data coming from any other source, data management system, format, or unstructured to structured basis for describing information. (See further the Advantages and Myths of RDF.)

This foundation leads to the extreme flexibility of the Open Semantic Framework. The rationale behind this flexibility, and its benefits, has been described in many locations within this wiki. You may also want to see this article on One of the Semantic Web's Core Added Value.

Ontologies have a dramatic — and positive — impact on the data integration and presentation tasks within an OSF instance.

Define Component Selections

A key aspect of the SCO Ontology is its use as the means to define what semantic components (or widgets) display what types of information within data records.

These assignments are done via the sControl component. The properties for this component define what components may display what type (class) of data records. Here is the list of SCO properties that impact the sControl’s behaviors:

SCO Property Impact on the sControl component
sco:displayControl Annotate a class or a property to reference it to a display control. This indicates what are the semantic components that can normally be used to display some information about a record of a certain type, or a record that is described using some property.This property impacts the behavior of the sControl component in the sense that for a given record’s description, and a given ontology, the sControl component will select different semantic controls for displays. The actual information displayed and with what widget(s) depends on the type of the record and the properties that are used to describe it.
sco:comparableWith Is is possible to specify a “comparableWith” relation between two predicates. These comparable attributes have the same allowedValue(s), and the semantics of the predicates that are deemed comparable are the same. Since the kinds of values, and their semantics, are the same, they are then considered comparable.This property is normally applied when it is desirable, for example, to plot values of different attributes describing similar records on some visualization component (for example, a linear chart).This property impacts the behavior of the sControl component in the sense that for a given record’s description, and a given ontology, the sControl component will display information about multiple input records depending if the value of some of the properties used to describe it are comparable.

Define Component Behaviors

In the Open Semantic Framework, one of the most important roles of ontologies is to define the interaction between different pieces of the system. Because of the extent of these interactions, this section is the longest and most detailed amongst all of the dozen or so ontology roles.

The SCO ontology can have multiple effects on multiple parts of an OSF instance. This section describes those interactions.

sMap Component

The sMap component had different behaviors depending on how its input record is described. Here is the list of SCO properties that will have an impact on the sMap’s behaviors:

SCO Property Impact on the sMap component
sco:gisMap Reference a map binary file created created from a ShapeFile map file and ClearMapsBuilder. The referenced map file is a serialized ActionScript object.The sco:gisMap defines the first layout that is related to a given resource. Normally, this resource is part of the map related by the gisMap predicate. Read more about maps in the sMapdocumentation page.There is only one gisMap relationship per resource, other relationships should be made with the sco:relatedGisMappredicate.This property impacts the behavior of the sMap component in the sense that it is the record’s description, and is the property that tells the component what map to render to the user.
sco:relatedGisMap Reference a map binary file created created from a ShapeFile map file and ClearMapsBuilder. The referenced map file is a serialized ActionScript object.The sco:relatedGisMap defines a related map layout that is related to a given resource. The resource is related to that map layer in some ways, but it is not necessarily part of the layer. Read more about maps in the sMapdocumentation page.This property impacts the behavior of the sMap component in the sense that it is the record’s description, and is the property that tells the component what map to render to the user.

sWebMap Component

The sWebMap component has different behaviors depending on how its input record is described. Here is the list of SCO and WGS84 properties that impact the behavior of an sWebMap:

SCO/WGS84 Property Impact on the sWebMap component
sco:polygonCoordinates Defines the coordinates of a polygon shape that represents a geographic area determined by a record.Coordinates are defined as coordinates in KMLThis property impacts the behavior of the sWebMap component in the sense that for a given resultset of records, polygon shapes are displayed on a World map for each of the records described with the property.
sco:polylineCoordinates Defines the coordinates of a polyline shape that represents a record on a map.Coordinates are defined as coordinates in KMLThis property impacts the behavior of the sWebMap component in the sense that for a given resultset of records, polylines are displayed on a World map for each of the records described with the property.
sco:mapMarkerImageUrl URL of an icon image to use as a marker on a web map. Normally, this property is used to annotate a Class description. All of the records belonging to that class are marked on a map using this icon image.This property impacts the behavior of the sWebMap component in the sense for a given resultset of records, and a given ontology description, all records of a given type of class that are displayed with the marker icon found at the URL specified for sco:mapMarkerImageUrl.
wgs84:lat Latitude coordinate of a record on a World map.This property impacts the behavior of the sWebMap component in the sense for a given resultset of records, each record with a wgs84:lat property is displayed on the sWebMap at that latitude coordinate.
wgs84:long Longitude coordinate of a record on a World map.This property impacts the behavior of the sWebMap component in the sense for a given resultset of records, each record with a wgs84:long property is displayed on the sWebMap at that longitude coordinate.
wgs84:alt Altitude of a record on a World map.This property impacts the behavior of the sWebMap component in the sense for a given resultset of records, each record with a wgs84:alt property is displayed on the sWebMap at that altitude indicator.

sStory Component

The sStory component has different behaviors depending on how its input record is described. Here is the list of SCO properties that impact an sStory component:

SCO Property Impact on the sStory component
sco:storyUrl URL reference to a webpage representation of the story that got indexed into Scones.This property impacts the behavior of the sStory component in the sense that for a given record’s description, the sStory component refers users to the original webpage URL that got processed by Scones and is displayed in the sStory component.
sco:storyTextUri URI reference to the text of a storyThis property impacts the behavior of the sStory component in the sense that for a given record’s description, the sStory component uses the text document referenced by this property to display in the text display of the sStory component.
sco:storyAnnotatedTextUri URI reference to the annotated text of a story. Annotations are serialized in XML given the GATEformat.This property impacts the behavior of the sStory component in the sense that for a given record’s description, the sStory component usesw the Gate annotated text document referenced by this property to display the tagged concepts in the tags section of the sStory viewer, and also uses it to highlight the tagged terms within the text viewer.

sBarChart and sLinearChart Components

The sBarChart and the sLinearChart components exhibit different behaviors depending on how the input records that are enabled for these component types are described. Here is the list of the SCO properties that impact this behavior:

SCO Property Impact on the sBarChart and the sLinearChart components
sco:comparableWith Is is possible to specify a “comparableWith” relation between two predicates. These comparable attributes have the same allowedValue(s), and the semantics of the predicates that are deemed comparable are the same. Since the kinds of values, and their semantics, are the same, they are then considered comparable.This property is normally applied when it is desirable, for example, to plot values of different attributes describing similar records on some visualization component (for example, a linear chart).This property impacts the behavior of the sLinearChart component in the sense that for a given record’s description, and a given ontology, the sLinearChart component will display the values of the comparable attributes on the a linear chart.
sco:unitType URI reference to a unit type ontology. The sco:unitType property is used to determine the type of unit referenced by a property. For example, if a data property has xsd:float as range, then sco:unitTypedetermines what kind of things referred to by this number.The semantic components make all of the properties that share the same sco:unitTypecomparable (so, possibly displayable on the same semantic component, such as the sBarChart and the sLinearChart).This property impacts the behavior of the sLinarChart and the sBarChart component in the sense that for a given record’s description, and a given ontology, the sLinarChart or the sBarChart can be selected and used to display the values with the same unit type on one of these charts.
sco:orderingValue The value of the sco:orderingValue predicate is used to order the predicate of a set of comparable predicates. This set of comparable predicates is normally created from the set composed of all compatibleWithpredicates. This is normally used to plot, and order, values of different attributes describing similar records on some visualization component (for example, a linear chart).This property impacts the behavior of the sLinarChart and the sBarChart component in the sense that for a given record’s description, and a given ontology, the sLinarChart or the sBarChart will order the values of comparable properties on the charts, according to the ordering value defined for each property.

sRelationBrowser

The sRelationBrowser component exhibits different behaviors depending on how its input record is described. Here is the list of SCO properties that impact the sRelationBrowser component:

SCO Property Impact on the sRelationBrowser component
sco:relationBrowserNodeType Reference to a relation browser node type used to skin a node according to its type. This should be a reference to a type URI defined in a relation browse nodes skins configuration file. If a record is defined with this property, the relation browser tries to find a node of that type to apply to it as a skin.This property impacts the behavior of the sRelationBrowser component in the sense that for a given record’s description, and a given ontology, the sRelationBrowser component uses the skin specified by the sco:relationBrowserNodeType attribute to display the record in the sRelationBrowser component.

sDashboard

The sDashboard component exhibits different behaviors depending on how its input record is described. Here is the list of SCO properties that impact the sDashboard component:

SCO Property Impact on the sDashboard component
sco:dashboardSessionFileUri URI reference to the Dashboard session accessible on the Web.This property impacts the behavior of the sRelationBrowser component in the sense that for a given record’s description, and a given ontology, the sDashboard component loads the Dashboard session referenced by this property.

Guide Visualization Template Selection

One of the core features of the conStruct set of Drupal modules is the ability to use different display templates depending on the types of records available. The selection of these templates is based on the types of those records and the type hierarchies described by the OSF ontologies. This section describes how these ontologies guide template selections.

As a refresher on templates and their use, see the Building conStruct Templates document. It describes how the templating engine works and how to create various templates.

Template Selection

Template selection is the action of binding an instance record to a display template based on its type. Three things are required to make this happen:

  1. Instance records have to be typed
  2. An ontological structure of type relationships (via subClasses) has to exist in one or more OSF ontology(ies), and
  3. A template has to exist for the type of the instance record.

(Note: a specific template by type is not strictly required, since lacking a specific template for the target type, the system will invoke the nearest template up the parental chain in the governing ontology structure, eventually getting to the most generic template available, that for “thing”.)

Impact of Ontologies on Template Selection

conStruct’s templating engine selects record display templates based on the class hierarchy loaded on a OSF instance. It also uses inference on types to select the proper template for a given record.

Let’s say that we try to display information about a foaf:Person instance record. What the system attempts to do is to find a template that displays information about this kind of instance record. First, the foaf:Person type (class) has to be defined in the ontological structure of the OSF instance; if it is not, then no specific template will be selected and the system will default to using the owl_thing.html template (see below). If the type (class) is found, the system will next check to see if a template exists for that specific type. If one exists one, it will use the matching template. If one does not, it will next select the parent class of the type and try the match again. If it again fails, it will continue its test up the parental chain. If all tests fail, it will use the default owl_thing.html template. Whichever template is selected then becomes the basis for formatting and presenting the visual record display.

We can use a simple class hierarchy, matched to a simple set of available template files, to illustrate how ontologies impact the conStruct templating system.

Loaded Class Hierarchy Available template files
owl:Thing
   |
   |
    --> foaf:Agent
            |
            |
            |--> foaf:Person
            |
            |
             --> foaf:Organization
  owl_thing.tpl
  foaf_agent.tpl
  foaf_organization.tpl

Now, let’s say that our OSF portal is about to display information about a foaf:Person record. As we can notice, there is no foaf_person.tpl template available for a foaf:Person. However, because of the ontology structure, the system next attempts to select a template from a parent class of that foaf:Person.

What the system would do is to check if there is a template available for a record of type foaf:Person. Since there is none, it would try to find one for a parent type, so in this case the foaf:Agent class. In our example, there is now a match. The templating engine thus uses the foaf_agent.tpl template to display information about the foaf:Person record.

Were the foaf_agent.tpl not to exist, then the templating engine would fall back to the owl_thing.tpl template, which is considered to be the “generic record display template”, or the template of last resort.

This design means that if:

  1. the ontological structure changes over time, or
  2. new templates get added to the system

then there may be an impact on how the record gets displayed.

The major advantage of this design is that more and more specific formatting templates may be added to an OSF installation over time, both improving the tailored look of results displays and accommodating more structure and relationships as they evolve.

Provide Reasoning and Inference

A standard use of ontologies is for reasoning and inference, and those used by OSF are no exception.

By extension, however, we can also use these same capabilities to check on data consistency and coherence. This is an important feature of the system since the system can detect if there are logical inconsistencies or logical incoherencies that have been developed by the system administrator during ontology growth and development. Having coherent and consistent ontologies means that we have the proper foundations to create consistent and coherent datasets of instance records.

See further the discussion on reasoning using Protégé.

Guide Content Filtering

Filtering data is the action of getting a subset of records from a complete dataset based on some selection criteria. In OSF, the predominant share of filtering is done using the structWSF Search Web service endpoint. The a minority of filtering is done using the SPARQL endpoint. It is also possible to filter via the AGGR aggregation ontology.

Possible filtering criteria for the Search endpoint are:

  1. Filtering by type(s)
  2. Filtering by attribute(s)
  3. Filtering by attribute(s)/value(s)
  4. Filtering by geo-localization (within a given geographical area)

These filtering activities are performed by different tools of the stack, such as:

  • structSearch
  • structBrowse
  • sWebMap

These tools are impacted by the definition of the loaded ontologies. The filtering of the values by types, attributes and attributes/values requires an ontology class or an ontology property as filtering criteria.

Filtering with Inference

Also, the any Search query can be performed with inference enabled. Just like with the template selection section noted above, inference can have a big impact on the number and nature of returned results. Let’s consider this example class structure:

Loaded Class Hierarchy Indexer Records
owl:Thing
   |
   |
    --> bibo:Document
            |
            |
            |
             --> bibo:Image
                     |
                     |
                     |--> muni:HeritageImage
                     |        |
                     |        |
                     |         --> muni:ParkHeritageImage
                     |
                     |
                     |--> muni:NeighborhoodImage
                     |
                     |
                     |
                      --> muni:ParkImage
  <1> a bibo:Image .
  <2> a muni:HeritageImage .
  <3> a muni:HeritageImage .
  <4> a muni:ParkHeritageImage .
  <5> a muni:ParlImage .

This class structure shows a hierarchy of images where the leaf classes are topical image classes (so classes where their individuals are considered images representing one of the topic: Heritage, Neighborhood and Park). Now let’s see how this class structure impacts Search queries, and returned results, by different tools (structSearch, structBrowse, sWebMap and others).

Here is a series of Search queries sent to a structWSF instance that has this class hierarchy loaded, using the sample specification noted above. This tables shows the results potentially returned by the Search endpoint with and without inferencing turned on:

Use Case Type Filter Inference Returned Results
#1 muni:HeritageImage Off
<2> a muni:HeritageImage .
<3> a muni:HeritageImage .
#2 muni:HeritageImage On
<2> a muni:HeritageImage .
<3> a muni:HeritageImage .
<4> a muni:ParkHeritageImage .
#3 bibo:Image Off
#4 bibo:Image On
<1> a bibo:Image .
<2> a muni:HeritageImage .
<3> a muni:HeritageImage .
<4> a muni:ParkHeritageImage .
<5> a muni:ParlImage .

In the Use Case #1, the user requests all of the muni:HeritageImages without inferencing. This means that the Search endpoint will return all of the records that have been typed as muni:HeritageImage. In this case, the records <2> and <3> got returned.

Use Case #2 is a variant of Use Case #1, only now with inferencing enabled. In this use case, the Search endpoint will return all the muni:HeritageImage and all the records that are typed with one of its subtypes (in this case, muni:ParkHeritageImage). For this query, records <2>, <3> and <4> got returned. This case shows where ontologies can have a dramatic impact on the system. If we modify that class hierarchy and put the muni:ParkHeritageImage as being a sub-class-of bibo:Image, then the same results would be returned for Use Case #2 than we got with Use Case #1.

With Use Case #3, the endpoint does not return any results because inferencing is disabled, and because there is no record typed as bibo:Image.

Use Case #4 is a variant of Use Case #3 where inferencing is enabled. The endpoint returns all the image records because all of them are bibo:Image by inference on type.

Filtering via the AGGR Ontology

The AGGR Ontology also has an impact on anything that displays facets of filtered searches. Amongst others, it impacts the structSearch and structBrowse conStruct modules. It also impacts different user interfaces that use the Search Web service endpoint to perform auto-completion tasks.

Tag Concepts in Text Documents

In the Open Semantic Framework, the Scones Web service endpoint is what is used to analyze unstructured text documents, then turning them into semi-structured text documents by automatically tagging concepts. The concept tagging takes place using ontology-based information extraction, or OBIE. Named entity dictionaries are the basis for entity tagging.

These concepts used for the tagging come from selected ontologies loaded on the system. The way these ontologies have been created, and the way the classes and named individuals have been defined, has a dramatic impact on the quality of the documents tagged by Scones.

Scones uses two things from ontologies:

  • its classes
  • its named individuals

Depending on settings, one or both of these sources may be used for scones tagging.

There are a few properties intimately related to the Scones Web service endpoint:

Properties Impact on Scones
iron:prefLabel Preferred label to refer to an instance record.This property impacts the behavior of the Scones tagger in the sense that the value of the iron:prefLabel property is used to detect and tag as a reference the corresponding class or named individual.
iron:altLabel Alternative label to refer to an instance record.This property impacts the behavior of the Scones tagger in the sense that the value(s) of the iron:altLabel property is used to detect and tag as a reference the corresponding class(es) or named individual(s).
iron:hiddenLabel Hidden label are labels that are not displayed in any user interface, but may be used by different systems for indexing purposes (such as for recognizing misspellings).This property impacts the behavior of the Scones tagger in the sense that the value(s) of the iron:hiddenLabel property is used to detect and tag as a reference the corresponding class(es) or named individual(s). As we saw above, hidden labels are not displayed in user interfaces. However, they are used to specify variations in the way some of the other labels may be written. These hidden labels are explicitly used by the Scones tagger.
sco:namedEntity Specifies if a resource can be considered a named entity. Literal value: “true” or “false”.This property impacts the behavior of the Scones tagger in the sense that all of the records with the sco:namedEntity property set to trueare automatically added by the Scones endpoint to its Named Entities Dictionaries.This means that all the records that are specified to be named entities will be used by Scones to tag any input text documents.

Help Navigate and Organize Web Portals

In OSF, ontologies also have an impact on the general organization of a Web portal and how it is navigated.

Portal Navigation

In an OSF portal, its domain ontologies use the sRelationBrowser for general navigation. The relation browser is a tool that lets users dynamically navigate a graph (that is, nodes with arcs that links these nodes). The most widespread usage of the relation browser is to let users navigate the links between ontology concepts. These concepts are the anchor points of what other content is available on the Web portal. By navigating the concepts (classes) structure, users are able to explore an OSF portal’s entire content.

Each node in the sRelationBrowser semantic component is linked to whatever other kinds of related records exist in the system. Depending on the types of those records, other semantic components can then be invoked to display this tightly related content for each node.

Ontologies thus impact navigation and discovery on an OSF portal in two ways:

  1. They impact the navigation of the structure by defining which concepts are linked to other concepts and with what property
  2. They define what related records may get displayed to the user based on their classes and properties.

Layouts Organization

OSF Web portals are mainly organized by Layouts. A layout is a specific page presentation format with specific design, components and ordering and sizing of those components. This page presentation is highly influenced by the kind of things indexed in the system. Generally, layouts present records of a certain type (or family of types), along with specialized functions that users are able to use to perform different actions on that set of records.

Here are a few examples of such layouts:

These layouts aggregate all of the records of a certain type (like images of all kinds), display them using different kind of tools (like an Images Gallery), and filter them depending on different filtering criteria (like mapping the position where each image got captured, on a map, within a specific neighborhood area).

The ontologies impact the general organization of the Web portal because of the kind of things that are indexed in the system interacting with the available layouts.

Manage Datasets and Ontologies

Basic settings for managing datasets and ontologies is provided by the WSF Ontology. It presently does so via two mechanisms.

Datasets Syncing Framework

The Datasets Syncing Framework behaves differently depending on the value of the wsf:crudAction property for each input record.

WSF Property Impact on the DSF
wsf:crudAction States the CRUD action that should be used to index a given record into structWSF. This property is used by the Datasets Syncing Framework to determine if the record feed to it should be created, deleted or updated.The value of this property can be one of:(1) create (2) update (3) deleteThis property impact the behavior of the DSF in the sense that it is the record’s description, using this property that will tells the framework how to behave (create, delete or update) toward the input record. If nothing is specified, the record will simply be ignored.

structOntology

The structOntology conStruct module exhibits different behavior depending on the value of the wsf:ontologyModified property for each input ontology description.

WSF Property Impact on the DSF
wsf:ontologyModified States if an ontology has been modified since the last time it got saved on the file system of the OSF server instance.This property impacts the behavior of the structOntology module in the sense that if, for an input ontology, the description of that ontology states that this property is “true”, then it will notify the user via its loaded ontologies list that the ontology has been modified, and that it has not yet been saved.

Set Access Permissions and Registrations

The WSF Ontology also has a principal purpose to describe the internal state of a structWSF instance such as the internal access control records, the datasets descriptions, the registered web service endpoints, etc. As a result, this ontology can have multiple effects on other parts of an OSF instance.

The WSF Ontology is used to describe three main areas of a structWSF installation:

  1. datasets registry
  2. access definition registry
  3. registered web services endpoints registry

These registries are hosted in some specialized datasets in the triple store (Virtuoso for most OSF installations). The information indexed in these different registries is defined using the WSF ontology.

All structWSF Web services are affected by these registries.

by Frederick Giasson at December 05, 2011 06:02 PM

AI3:::Adaptive Information (Mike Bergman)

An Ontologies Architecture for Ontology-driven Apps

Open Semantic Framework Ontology Modularization and Roles within an OSF Instance

For some time now, Structured Dynamics (SD) has been touting the unique advantages of ODapps, or ontology-driven applications [1]. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules. When these supplements are added to standard ontology functions, we collectively term them adaptive ontologies [2].

To further the discussion around ODapps, today we are publishing two new documents, using the semantic technology foundation of the open semantic framework. OSF is a comprehensive, open source stack of SD and external tools that provides a turnkey environment for enterprises to adopt semantic technologies and approaches. OSF has been designed from the ground up to be an ontology-driven application framework.

The first new document, posted on Fred Giasson’s blog, provides a detailed discussion of the dozen or so roles ontologies can play within an OSF installation. Fred’s document is geared more to specific properties and configurations useful to deploy this framework; that is, the “drivers” in an ODapp setting. The second new document — this one — is more of a broad overview of the modularization and architecture of the constituent ontologies that make up an OSF installation. Both documents have also been posted to SD’s open content TechWiki [3], which now has about 360 technical articles on understanding and implementing an OSF installation, importantly including its ontologies.

OSF Constituent Ontologies

As presently configured, an OSF installation may typically utilize most or all of the following internal ontologies:

  • The SCO Ontology (Semantic Component Ontology)
  • The WSF Ontology (Web Service Framework Ontology)
  • The AGGR Ontology (Aggregation Ontology)
  • The irON Ontology (Instance Record and Object Notation Ontology)
  • One or more domain ontologies, to capture the concepts and relationships for the purposes of a given OSF installation, and
  • Possibly UMBEL (optional) or other upper-level concept ontologies, used for linkages to external systems.

(Note: the internal wiki links to each of these ontologies also provides links to the actual ontology specifications on Github.)

Depending on the specific OSF installation, of course, multiple external ontologies may also be employed. Some of the common external ones used in an OSF installation are described by the external ontologies document on the TechWiki. These external ontologies are important — indeed essential in order to ensure linkage to the external world — but have little to do with internal OSF control structures. That is why the rest of this discussion is focused on internal ontologies only.

The OSF Ontologies Architecture

The actual relationships between these ontologies are shown in the following diagram. Note that the ontologies tend to cluster into two main areas:

  1. Content (or domain) ontologies, which tend to embody more of the traditional ontology functions such as information interoperability. inferencing, reasoning and conceptual and knowledge capture of the applicable domain; and
  2. Administrative ontologies, which govern internal application use and user interface interactions.

This ontology architecture supports the broader open semantic framework:

(click for full size)

The WSF ontology plays a special role in that it sets the overall permission and access rights to the other components and ontologies. The UMBEL ontology (or other upper-level ontologies that might be chosen) is also optional. Such vocabularies are included when interoperability with external applications or knowledge bases is desired.

Summary of OSF Roles

We can further disaggregate these ontology splits with respect to the specific dozen or so ontology roles discussed in Fred’s complementary piece on ontology roles in OSF. These dozen roles are shown by the rows with interactions marked for the various ontologies:

  S
C
O
A
G
G
R
W
S
F
i
r
O
N
D
o
m
a
i
n
U
M
B
E
L
Define record descriptions          
Inform interface displays      
Integrate different data sources      
Define component selections    
Define component behaviors        
Guide template selection      
Provide reasoning and inference      
Guide content filtering (with and without inference)        
Tag concepts in text documents      
Help organize and navigate Web portals        
Manage datasets and ontologies          
Set access permissions and registrations          

One of the unique aspects of adaptive ontologies is their added role in informing user interfaces and supporting specific semantic tools. Note, for example, the role of the content ontologies in informing interface displays, as well as their use in tagging concepts (via information extraction). These additional roles are the reason that these ontologies are shown as straddling both content and administrative functions in the first figure.

See Fred’s piece to learn more about these dozen roles.

Interactions Are More Complex than Arrows

Naturally, a simple drawn arrow between ontologies (first figure) or a checkmark on a matrix (table above) can hide important details of how these interactions between ontologies and components actually work. In an earlier article, we discussed how the whole workflow takes place between users and user interface selections affecting the types of data returned by those selections, and then the semantic components (widgets) used to display them. This example interaction is shown by the following animation:

(click for full size)

The blue nodes show the ontology interactions. These, in turn, instruct how the various components (yellow) and code (green) need to operate. These interactions are the essence of an ontology-driven app. The software is expressively designed to respond to specifications in the ontology(ies) used, and the ontologies themselves embrace some additional properties specific to driving those apps.

Possible Future Directions

ODapps are a relatively new paradigm, from which we continue to learn more about uses and potentials. We have wanted to write the first versions of these two new documents for some time, but have held off as we learned and exploited further the latent potentials in this design. As it stands, we see further potentials in this approach, and will therefore be likely adding new ontologies and capabilities to the general system for some time.

Some of the areas that look promising to us include:

  • A generalized statistical ontology, especially as it can inform data displays in the semantic components
  • Even more capable widgets in business intelligence (BI) uses, with a concomitant expansion of the vocabulary (predicates and classes) in some of the underlying ontologies
  • More aggregation and summation functions supported by the AGGR ontology, and
  • Still further improved permissions and access layers in the WSF ontology.

These potentials arise from the native power of the design basis for ontology-driven apps. Conceptually, the design is simplicity itself. Operationally, the system is extremely flexibile and robust. Strategically, it means that development and specification efforts can now move from coding and programmers to ontologies and the subject matter users who define and depend on them. With these advantages, who can argue with that?


[1] For the most comprehensive discussion of ODapps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
[2] See M.K. Bergman, 2009. “Ontologies as the ‘Engine’ for Data-Driven Applications“, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.
[3] Slight revisions of these documents have been posted to the TechWiki as Role and Use of Ontologies in OSF and OSF Ontologies Modularization and Architecture, respectively.

by Mike Bergman at December 05, 2011 06:01 PM

November 21, 2011

Frederick Giasson's Weblog

Moving Projects from Google Code to GitHub

Last week we slowly migrated Structured Dynamics‘ Google Code Projects to GitHub.We have been thinking about moving to GitHub for some time now, but we only wanted to move projects to it if no prior history and commits were dropped in the process. One motivation for the possible change has been the seeming lack of support by Google for certain long-standing services: we are seeing disturbing trends across a number of existing services. We also needed a migration process that would work with all of our various projects, without losing a trunk, branch, tag or commits (and their related comments).

It was not until recently that I found a workable process. Other people have successfully migrated Google Code SVN projects to GitHub, but I had yet to find a consolidated guide to do it. It is for this last reason that I write this blog post: to help people, if they desire, to move projects from Google Code to GitHub.

Moving from Google Code to GitHub

The protocol outlined below may appear complex, but it looks more intimidating than it really is. Moving a project takes about two to five minutes once your GitHub account and your migration computer is properly configured.

You need four things to move a Google Code SVN project to GitHub:

  1. A Google Code project to move
  2. A GitHub user account
  3. SSH keys, and
  4. A migration computer that is configured to migrate the project from Google Code to GitHub. (in this tutorial, we will use a Ubuntu server; but any other Linux/Windows/Mac computer, properly configured, should do the job)

Create GitHub Account

If you don’t already own a GitHub account, the first step is to create one here.

Create & Configure SSH Keys

Once your account has been created, you have to create and setup the SSH keys that you will use to commit the code into the Git Repository on GitHub:

  1. Go to the SSH Keys Registration page of your account
  2. If you already have a key, then add it to this page, otherwise read this manual to learn how to generate one

Configure Migration Server

The next step is to configure the computer that will be used to migrate the project. For this tutorial, I use a Ubuntu server to do the migration, but any Windows, Linux or Mac computer should do the job if properly configured.

The first step is to install Git and Ruby on that computer:

1
 sudo apt-get install git-core git-svn ruby rubygems

To perform the migration of a Google Code SVN project to GitHub, we are using a Ruby application called svn2git that is now developed by Kevin Menard. The next step is to install svn2git on that computer:

1
 sudo gem install svn2git --source http://gemcutter.org

Migrate Project

Before migrating your project, you have to link the Google Code committers to GitHub accounts. This is done by populating a simple text file that will be given as input to svn2git.

Open the authors.txt file into a temporary folder:

1
 sudo vim /tmp/authors.txt

Then, for each author, you have to add the mapping between their Google Code and GitHub accounts. If a Google Code committer does not exist on GitHub, then you should map it to your own GitHub account.

1
2
 (no author) = Frederick Giasson <fred@f...com>
 fred@f...com = Frederick Giasson <fred@f...com>

The format of this authors.txt file is:

1
 Google-Account-Username = Name-Of-Author-On-GitHub <Email-Of-Author-On-Github

Take note of the first Google Code committer (no author) mapping. This link is required for every authors.txt file. This placeholder is used to map the initial commit performed by the Google Code system. (When Google Code initializes a new project, it uses that username for creating the first commit of any project.)

When you are done, save the file.

Now that set up is complete, you are ready to migrate your project. First, let’s create the folder that will be used to checkout the SVN project on the server, and then to push it on GitHub.

1
2
3
cd /tmp/
mkdir myproject
cd myproject

In this tutorial, we have a normal migration scenario. However, your migration scenario may differ from this one. It is why I would suggest you check out the different scenarios that are supported by svn2git document. Change the following command accordingly. Let’s migrate the Google Code SVN Project into the local Git repository:

1
 /var/lib/gems/1.8/bin/svn2git http://myproject.googlecode.com/svn --authors /tmp/authors.txt

Make sure that no errors have been reported during the process. If it is the case, then refer to the Possible Errors and Fixes section below to troubleshoot your issue.

The next step is to create a new GitHub repository where to migrate the SVN project. Go to this GitHub page to create your new repository. Then you have to configure Git to add a remote link, from the local Git repository you created on your migration computer, to this remote GitHub repository:

1
 git remote add origin git@github.com:you-github-username/myproject.git

Finally, let’s push the local Git repository master, branches and tags to GitHub. The first thing to push onto GitHub is the SVN’s trunk. It is done by running that command:

1
 git push -u origin master

Then, if your project has multiple branches and tags, you can push them, one by one, using the same command. However, you will have to replace master by the name of that branch or tag. If you don’t know what is the exact name of these branches or tags, you can easily list all of them using this Git command:

1
 git show-ref

Once you have progressed through all branched and tags, you are done. If you take a look at your GitHub project’s page, you should see that the trunk, branches, tags and commits are now properly imported into that project.

Possible Errors And Fixes

Fatal Error: Not a valid object name

There are a few things that can go wrong while trying to migrate your project(s).

One of the errors I experienced is a "fatal" error message "Not a valid object name". To fix this, we have to fix a line of code in svn2git. Open the migration.rb file. Check around the line 227 for the method fix_branches(). Remove the first line of that method, and replace the second one by:

1
 svn_branches = @remote.find_all { |b| !@tags.include?(b) && b.strip =~ %r{^svn\/} }

Error: author not existing

While running the svn2git application, the process may finish prematurely. If you check the output, you may see that it can’t find the match for an author. What you will have to do is to add that author to your authors file and re-run svn2git. Otherwise you won’t be able to fully migrate the project.

I’m not quite sure why these minor glitches occurred during my initial migrate, but with the simple fixes above you should be good to go.

by Frederick Giasson at November 21, 2011 09:29 PM

November 15, 2011

AI3:::Adaptive Information (Mike Bergman)

UMBEL Services, Part 4: structOntology

UMBEL Vocabulary and Reference Concept OntologyImproved Ontology Navigation and Management in Read-only and Editable Forms

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part four discusses structOntology, the online ontology viewing and management tool that is an integral part of the open semantic framework (OSF), the framework that hosts the UMBEL portal.

Ontologies are the central governing structure or “brains” of a semantic installation. As provided by the OSF framework, ontologies are also the basis for instructing user interface labels and how the interface behaves. The Web is about global access, immediacy, flexibility and adaptability. Why can’t our use of ontologies be the same?

Unlike similar tools of the past, structOntology exists on the same installation as the ontology that drives it. It is a backoffice ontology editing and management tool that is part of the conStruct tool suite, accessible via the OSF admin panel. There is no need to go off to a separate application, make changes, re-import, and then test. structOntology allows all of that to occur locally with the instance in which it resides. Also, there are some important functionality differences — especially finding and selecting stuff and search — that sets structOntology apart from existing, conventional tools.

Yet, that being said, structOntology is also not the complete Swiss Army knife for ontology management. It is designed for local and immediate use. Its spectrum of functionality is not as complete as other ontology frameworks (for example, supporting reasoners, consistency testers or plug-ins). So, for immmediate and locally relevant use, structOntology appears to be the appropriate tool. For more detailed ontology work or testing, other frameworks are perhaps more useful. And, in recognition of these roles, structOntology also has robust import and export capabilities that enable these dual local-detailed use scenarios. For these distinctions, see further the structOntology v Protégé? document.

structOntology comes in two versions. First, there is the read-only version, which can be made publicly available, that is a great aid to ontology navigation and discovery. This is the version viewable on the UMBEL portal. Second, there is an editable version, which is only available to administrators via a back office function within an OSF instance. Some screen shots of this version, plus pointers to more documentation about it, are provided below.

OWL API as a First-class Citizen

What enables OSF to treat ontologies as a first-class citizen — viewable and editable from within the applications in which they operate — results from the incorporation of the OWL API as one of the major engines underlying the structWSF Web services framework, the key foundational basis to an OSF installation. As noted in Part 2 of this series, the OWL API is one of the four major engines supporting structWSF:

The OWL API is the same engine used by Protégé 4, which is why both structOntology and Protégé are fully interoperable.

Besides interoperabilty, the use of the OWL API also means that other OWL API-based tools, such as reasoners or mappers, may be linked into the system. This design is in keeping with our normative view of an ontology tooling landscape, which Structured Dynamics keeps pursuing in a steady, incremental manner [2]. Further, because of its sibling engines, the OWL API and OSF are also able to leverage the other engines supporting structWSF, such as Solr for advanced search or efficient indexing in the RDF triplestore. (The advantages go both ways, too, such as for example enabling the OWL API to feed appropriate ontology specifications to the GATE text processing area for uses such as ontology-based information extraction [OBIE]). All of this makes for a most powerful and capable foundation to an OSF instance.

The Read-Only Version (UMBEL)

Since UMBEL is a reference ontology and the UMBEL portal is an access point to those references and specifications, we really don’t want casual users making modifications to the ontology [3]. For this reason, only a read-only version of structOntology is provided on the portal.

Access to the structOntology function occurs via the Ontology link on the UMBEL portal. Upon access, you are presented with the main structOntology interface:

The organization of the structOntology application presents all currently available and active ontologies listed in the left panel; UMBEL, of course, is the one selected here. Since this is a read-only version, only the View button shows up in the right-hand panel. (For the options available in the editable version, see below.)

View Option

Upon invoking the View option, the hierarchical tree for the selected ontology appears on the left; structural and definitions on the right. 

You may expand the tree and explore the structure deeper by either clicking on the tree nodes in the left-hand panel or the item links in the right-hand panel. If there are further levels in the tree, you will get the JavaScript ‘working’ icon and then see the tree expanded with the new node information shown to the right.

Also note that your interaction with the structOntology application is recounted via the “breadcrumbs” listing at the upper left of the application. The green arrow icon allows you to expand or collapse various sections in the display.

Tooltips

The tree labels are themselves based on the preferred labels assigned to things. However, if you want to see the actual ontology URI reference, you can do so via the tooltip when mousing over the item:

Ontology view tooltips

The tooltip shows the full URI path (unique identifier) of the selected item.

Classes Tab

This example has been based on the Classes tab, which are the reference concepts in the UMBEL context. In read-only mode, the basic information presented is the tree structure, the item description and prefLabel, and super and sub class information in the right-hand panel. (More options are available in the editable version; see below.)

Properties Tab

Properties — that is the relations or predicates between items or nodes — are presnted in a similar manner to that for Classes. The Properties tab has the same basic layout and operations as the Classes tab, including similar right-hand panels:

The Editable Version

The editable version of structOntology shares all of the functionality of the read-only version. Besides adding editing capabilities, the editable version also has other functionality related to general ontology creation and management. There is separate documentation for the editable version; the examples below are from a different instance than UMBEL.

The editable version is accessed via the backoffice admin function within an OSF instance. When invoked, it also has more management options presented in the right-hand panel:

We’ll highlight some of the differences from the read-only version below.

Create New Option

The first notable addition is the ability to create ontologies (as well as to delete, or Remove, them):

The URL (such as http://purl.org/ontology/myont#) becomes the base URI for the new ontology. The new ontology is created with a basic structure, from which you only need fill in your new concepts or classes and relationships:

Basic stubbing is provided for the new ontology to help bootstrap its development (not shown). Once created, this new ontology also now appears on the available local ontologies when first invoking the structOntology application.

View Option

Most screens are quite similar to the read-only version with the obvious change of replacing labels with edit boxes. It is via these edit fields that the ontology becomes editable. This change is quite evident for the View screen:

StructOntology view.png

Searching

Searching can take place on the currently active ontology or all loaded (available) ontologies. Note that selection was made above via the radiobutton under the search box.

Also, depending on settings, searching can also take place on only the preferred label, or on alternative labels or descriptions (in fact, all annotations). (This is part of the settings.)

When entering search terms, the system automatically attempts to complete the matching search phrase. A minimum of three entered characters guides this auto-completion functionality:

When search is initiated, the potential results list also auto-completes for what you have already typed into the search box. Upon selection of one of these items (or completion of the full search phrase), the structOntology system issues a search query to the remote server, which then acts to auto-populate the ontology tree on the left-hand panel. In this case, we have selected ‘communitiy facilities’:

The desired search results then automatically expand the ontology tree. This is really helpful for longer ontologies (the example one shown has about 3000 concepts and about 6000 axioms) and means quicker initial tree loading. Once completed, the (multiple) occurrences of the search item are shown in highlight throughout the tree.

Note this search is not necessarily restricted to the actual node label. Alternative labels and descriptions may also be used to find the search results. This greatly expands the findability of the search function. Here is a great example of matching the OWL API engine to Solr underneath a structWSF instance.

Tab Structure

The editable version of structOntology offers more detail in the right-hand panel when Viewing an item. These sections include:

  • Annotations
  • Structural relationships
  • Instances
  • Linkage to characteristics, and
  • Advanced settings.

Each section is editable. All have auto-complete. Each section may also be expanded or collapsed.

General Operations

Each panel has an expand and collapse arrow shown at the upper right of its panel. These causes the panel’s individual entries to either be exposed or hidden. At the right of each entry, new entries can be invoked with the green plus symbol; existing entries can be deleted with the red minus symbol. (See Structural Relationships below.)

In working with each panel, note that each entry also has the search and auto-complete features earlier noted. Drag-and-drop is also contextual into these panels or not, depending on the nature of the item selected in the left-hand panel (tree).

Annotations

Annotations provide the descriptions about the thing at hand and its associated metadata. (These are separately defined under the Properties tab, or as part of the imported ontology specification.) The available annotations are displayed in this panel when expanded:

Entries are simply provided by entering values into the text fields and then Saving.

Structural Relationships

The structural relationships are the means to set parent and child relations between concepts, as well as to instruct disjoint or equivalent class relations. The Structural Relationships panel is the key one for setting the interconnections within the graph structure at the heart of the governing ontology.

Most of the key structural relationships in OWL are provided by this panel. (However, note there are some additional and rarely used structural specifications in OWL. These must be set via a third-party external application. Such potential interactions are made possible via the flexible import and export options with structOntology).

Instances (Individuals)

Another right-hand panel provides the facility to assign individuals to the classes (or concepts) established under the prior two panels. In this case, we are looking at some specific ‘community facilities’ to assign to that concept:

As with the prior panels, a new instance may be added or discarded ones deleted. Individual instances and their characteristics may also be updated or changes.

Linkage to Characteristics

Another aspect to OSF ontologies is the ability to relate concepts to various metadata characteristics or attributes that might describe that concept’s instances. This relationship is done via the dedicated hasCharacteristic property, which is assigned via this right-hand panel:

This option has the specific behavior of allowing one or more properties (characteristics) to be asserted for a given a class (concept).

Advanced Options

Display and widget and other options are set under the Advanced Options panel. One item to note are the widgets that may be assigned for displaying a given information item:

The relationship of widgets (or semantic components) to information items is a deserving topic in its own right. For more information about this topic, see the semantic components category.

Contextual Drag-and-Drop

In edit mode, it is possible to drag items from the left-hand tree panel into the specifications at the right. This is contextual. In this first example, we see an attempt to drop a “class” result (or concept) into the annotation panel, which violates the structure of the system and is therefore not allowed (as shown by the visual red X cues):

However, if we drag and drop from the tree in an allowable structural definition, we get the visual green check as a cue the move is legal:

This functionality and feedback means that only allowable assignments can be dropped into a new structural definition.

Export Option

Another piece of functionality in the editable version is the export option. When invoked, Export brings up the save dialog with the ability to assign an ontology file name:

Upon saving, it stores the currently active ontology in RDF/XML format:

Export is not active in UMBEL do to the large size of the ontology. If you want to obtain it directly, you may do so from the UMBEL ontology CVS.

Import Option

An Import option is available in the editable version. structOntology import supports all OWL API serializations, specifically RDF/XML, N3, Manchester Syntax and Turtle. When import is invoked, a file open dialog is presented that enables you to find the ontology on your local hard drive:

The Import feature has no file extension limitations; make care to pick and assign the proper types for importation.

Via the Import and Export buttons, it is possible to work locally with structOntology while exporting to more capable third-party tools. Then, once use of those tools is complete, Import provides the ability to re-import the updated ontology back into the local collection.

File Options

Finally, as a server-based system accessed via Web services, there are some slightly different concepts necessary to keep in mind when using the editable version of structOntology. These distinctions need to be kept in mind because you might be working with the local version or the one on the main server. These file options are:

  • Save — saves all modifications on the file, on the server. Then, all modifications will be used if you do a Reload
  • Unload — removes the currently active ontology from the local instance, but does NOT remove it from the server. It merely acts to remove that ontology for local use in the current session
  • Remove — a full delete of the ontology, both locally and on the server
  • Update — recreates the serializations files created from these ontologies, like the .SRZ files used by structWSF and conStruct; the ironXML schema used by the semantic components, etc. The Update option is the most common one when updating an ontology locally, for which you want the persistent version on the remote server to be kept in sync
  • Reload — reloads the server version. If prior local work had not been updated, then a reload acts as a way to restore the remote instance to the local one without change..

These are all available via buttons under the main right-hand panel in structOntology and are more fully described in the edit version documentation.

Additional Information

Additional information on structOntology may be found in an online video:

UMBEL small logo

This is the fourth of a multi-part series on the newly updated UMBEL services. Other articles in this series are:


[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.
[2] See especially the second figure and the accompanying discussion in this document.
[3] The appropriate pathway for suggested changes to the UMBEL ontology itself is via its official mailing list.

by Mike Bergman at November 15, 2011 07:33 PM

November 10, 2011

AI3:::Adaptive Information (Mike Bergman)

UMBEL Services, Part 3: Concept Browser

UMBEL Vocabulary and Reference Concept OntologyThe OSF Browser is Now More Configurable

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part three deals with the portal’s navigational tool, the concept or relation browser [2]. It is a favorite component of the open semantic framework (OSF).

Discovery and navigation in a graph structure — as is the basis of ontologies and the UMBEL structure — can be difficult. It is made even more difficult when the number of concepts in the object space is large. With 28,000 concepts, UMBEL is one of the largest public ontologies extant. The relation browser is designed specifically to address these difficulties.

The concept browser in UMBEL is invoked via the main menu option or by clicking on the browser icon [] shown in conjuction with a given concept. Here is an example for the concept ‘tractor’:

Note in this case that the More details … link brings you to a detailed concept view, as was covered in the previous part in this series.

With its extreme configurability and flexibility — see further below — the relation browser can be an essential foundation to an open semantic framework installation. But, the best part about the relation browser is that it is fun to use. Clicking bubbles and dynamically moving through the graph structure is like snorkeling thorugh a massive school of silvery fish.

Origins of the Relation Browser

We have been featuring the relation browser since April 2008 when the first UMBEL sandbox was released:

The relation browser is the innovation of Moritz Stefaner, one of the best data and information visualization gurus around. He continues to innovate in large-scale information spaces, and is a frequent speaker at information visualization conferences. Moritz’s Web site and separate blog are each worth perusing for neat graphics and ideas.

Configurability

Since our first efforts with the browser, we have worked to extend its applicability and configurability. The relation browser can be downloaded separately from our semantic components code distribution site.

The relation browser is configured via an XML specification file. Separate specifications are available for the nodes (classes or concept) and connecting edges (predicates or properties). Here are the current configuration options:

NODE PARAMETERS
label
label is the label assigned to a given node; by default, the end of the URI of the type will be used as the label
displayNodeLabel a Boolean value whether to display or hide a label for a specific node
tooltips the tooltip to be displayed when mousing over a specific node
textFont defines the font of the text label on the node; for example: “Verdana”
textColor defines the color of the text label on the node; value in RGB hex format
textSize defines the size of the text to display in the node
image
a URL to an image to use to display at the position of the node
shape a shape of the node to display; available values are “circle”, “block”, “polygon”, “square”, “cross”, “X”, “triangleUp”, “triangleDown”,
“triangleLeft”, “triangleRight”, “diamond”
lineWeight defines the size of the line of the border for the node’s shape
lineColor defines the color of the line of the border for the node’s shape; value in RGB hex format
fillColor defines the color to use within the shape for the node; value in RGB hex format
radius
defines the radius of the node. The radius is an invisible boundary where the edges get attached
backgroundScaleFactor scale factor for the node’s shape background; a scale factor of 1.25 means that it is 125% normal size
textScaleFactor scale factor of the node’s text label
textOffsetX X Offset where to start displaying the text within the node’s shape
textOffsetY Y Offset where to start displaying the text within the node’s shape
textMultilines multi-lines means that each word of a label is displayed on a single line
textMaxWidth maximum width of the text; if longer, then it is truncated with an ellipsis (“…”) appended
textMaxHeight maximum height of the text; if higher, then it is truncated with an ellipsis (“…”) appended
selectedNodeColorOverlay defines a color to overlay on the center (selected) node of the graph; it is defined by a series of 4 different offsets [alpha, red, green, blue] ranging from -255 to 255 in relation to the base node’s values; can, for example, to make the central node of the graph brighter
overNodeColorOverlay defines a color to overlay on a moused over node of the graph; it is defined by a series of 4 different offsets [alpha, red, green, blue] ranging from -255 to 255 in relation to the base node’s values; can, for example, to make a moused over node of the graph brighter
   
EDGE PARAMETERS
displayLabel the label to display over the center of the edge
tooltipLabel the tooltip to be displayed when mousing over a specific edge
directedArrowHead defines the type of the arrow for the edge; available values are “none”, “triangle”, “lines”
textFont defines the font of the text label on the edge
textColor defined the color of the text label on the edge; value in RGB hex format
textSize defines the size of the text to display on the edge
image a URL to an image to use to display over the edge at middle of the two connected nodes
lineWeight defines the size of the line for the edge connector
lineColor defines the color of the line for the edge connector; value in RGB hex format

 

It is also possible to specify a breadcrumb in association with the browser.

Besides these configurations, the API for the relation browser also provides for methods to:

  • Link Nodes to Objects
  • Link Nodes to Displays

Via these mechanisms, the relation browser can become a central focal point for any OSF installation. See further the specifications for additional ideas and tips.

Some Other Examples

Here are some other examples of relation browsers you can see across the Web:

UMBEL small logo

This is the third of a multi-part series on the newly updated UMBEL services. Other articles in this series are:


[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.
[2] Various clients and users have named this widget a number of things, including spider, concept explorer, relation browser and concept browser.

by Mike Bergman at November 10, 2011 12:10 AM

November 07, 2011

AI3:::Adaptive Information (Mike Bergman)

UMBEL Services, Part 2: Full-text, Faceted Search

UMBEL Vocabulary and Reference Concept OntologyOSF Integration with Solr Provides Superior Search

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part focuses on the search function within the UMBEL portal based on the native engines used by the open semantic framework (OSF).

Search uses the integration of RDF and inferencing with full-text, faceted search using Solr. This has been Structured Dynamics’ standard search function for some time, as Fred Giasson initially described in April 2009. It is a very effective way for finding new and related concepts within the UMBEL structure.

Solr, as the Web service-enabled option for its parent Lucene, has most recently become a not uncommon adjunct to semantic technologies, for the very same reasons as outlined herein. However, in 2008, when we first embraced the option, it was not common at all. To my knowledge, within the semantic technology community, only the SWSE (semantic Web search engine) project was using Lucene when we began to adopt it.

The reasons for embracing Solr (or Lucene) are these:

  • Full-text search with a flexible search syntax
  • Ability to add facets (which is very powerful when combined with the structure of RDF)
  • High performance
  • Extensions for locational and time searches and many additional options, and
  • Open source.

Prior to the adoption of add-ons like Solr, RDF-only search suffered from a number of limitations, especially in the lack of a searchable correspondence of labels in relation to the object URIs used in the RDF model (see some of the limitations of standard RDF search).

Because of its advantages, Solr became the first additional main engine underneath our structWSF Web services framework, complementing the central RDF triple store (Virtuosoin most cases). We have subsequently added other main engines, as well, with a current total of four, which other parts in this UMBEL series will later discuss:

Being a main engine underneath structWSF means that datasets are fully indexed and cross-correlated with the capabilities of the other engines at time of ingest. Ingest most commonly occurs when datasets are ingested by the standard import tool; but, it might also be part of the system’s large dataset import scripts or synchronization routines.

The Search Function and Syntax

The standard UMBEL search box is found at the upper left of most site pages. When searching, you may choose these operators or syntax to add to your keywords, for example:

  • park OR city — provides the most results
  • park AND city — both terms must be present; fewer results
  • park city (no quotes) — both terms must be present, and within 5 words of one another; still fewer results, or
  • “park city” — exact phrase in quotes, with the fewest results.

(At present, Booleean operators apply to full-content search, and not filtered search.)

Upon searching, using the default of searching title, alternative labels (synonyms) and description (“TAD”), the standard search results page is displayed:

This page provides the further filtering options of searching by only title, or all content (including the linkings for each concept to its super classes and sub classes, which may produce a quite inclusive results set). These filter options are helpful in being able to sift through the 28,000 concepts within UMBEL.

The results listing provides the UMBEL concept names, their description, their alternative labels and a link [] to view them in the relation browser (to be discussed in more detail in the next part of this series). A simple pagination utility enables the results to be paged.

structWSF Basis and Options

This UMBEL search uses the structWSF Search Web service. It is what ties into the Solr engine to perform the full text searches on the structured data indexed on a structWSF instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters.

Not all of these query syntax or filtering options are active on the UMBEL instance given the simple concept structure of the UMBEL ontology. Turning these options on or off is a relatively straightforward matter of altering some configuration files and ensuring the right parameters are included in the queries issued by the application to the structWSF search endpoint.

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

A. Optional Available Operators

Optionally, the structWSF Search function may be configured to support these operators and conventions. All operators, by the way, must be entered as ALL CAPS:

  • AND, which is the default operator if more than one key word is entered
  • OR, which needs to be specifically entered
  • NOT
  • Phrases, which are denoted by double quotes as this “search phrase”; single quotes are not accepted
  • Wildcard searches on single characters (?) and multiple characters (*), which can be placed anywhere except the beginning of the query term
  • Field searches, whereby the field name is used in the query followed by a colon
  • Nesting, which allows complicated Boolean expressions to be formed (so long as parentheses are balanced), and many more exotic options.

See further the Lucene search engine syntax specification.

B. Optional Available Filters

Each search query can be applied to all, or a subset of, datasets accessible by the requester. Each Search query can be filtered by these different filtering criteria:

  1. Type of the record(s) being requested
  2. Source dataset(s) for the the record
  3. Presence of an attribute describing the record(s)
  4. A specific value, for a specific attribute describing the record(s)
  5. A distance from a lat/long coordinate (for geo-enabled structWSF instance)
  6. A range of lat/long coordinates (for geo-enabled structWSF instance)

These filtering options allow subset searches to occur, as the example above for title and TAD in UMBEL shows. However, these filters can also be combined into more complete and structured selection options as well. For example, this same search utility applied to Structured Dynamics’ Citizen Dan local government sandbox shows how these additional filters may be applied:

  • Clicking on a given “kind” name causes the results display to be restricted to results only for that kind of class.
  • If so selected, the Filter by Dataset tab is also restricted to the datasets that contain results with that class.
  • Once selected, the filter remains in place. To remove it, click on the Remove filter icon [] to restore the “kinds” back to the original listing for this search.

See the example. Such filtering capabilities present all of the “kinds” (actually, classes that have similar members) that are contained within the structure of the individual results that comprise the search results. The number of records (results) returned for each class may also be shown in parentheses.

Single Result (Concept) View

Clicking on an individual instance result in the UMBEL search results view (see above) provides the single result View for that specific UMBEL concept:

This view now provides a detailed description of the UMBEL concept and its structure and relationships. I briefly describe each item denoted by a checkmark.

The concept title and link to the relation browser [] are provided, followed by the actual concept URI identifier. Then the listing shows the alternative labels (synonyms, jargon and acronyms) provided for that concept followed by its (often detailed) description.

The structured information for that concept appears below that material. First shown is the UMBEL SuperType [2] to which the concept belongs, and then its external (non-UMBEL ontology) and internal (UMBEL) super classes and subclasses. There is also the facility to retrieve named individuals (instances) for that concept (see next).

Named Individual Listing

Choosing the ‘Get Entities from Sources’ button may provide example instances for that concept, as is shown below for the ‘Artist’ concept:

Retrieving Named Individuals

This linkage is relatively new for UMBEL (see the version 1.00 release write-up) and is still being expanded. At present, these linkages are limited to only a subset of UMBEL concepts and only linkages to Wikipedia. This aspect of the system is under active development, with more sources and more linked concepts to be released in the future.

UMBEL small logo

This is the second of a multi-part series on the newly updated UMBEL services. Other articles in this series are:


[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.
[2] SuperTypes are a collection of (mostly) similar Reference Concepts. Most of the SuperType classes have been designed to be (mostly) disjoint from the other SuperType classes. SuperTypes thus provide a higher-level of clustering and organization of Reference Concepts for use in user interfaces and for reasoning purposes. Every Reference Concept in UMBEL is assigned to a SuperType; a few are assigned to more than one where disjointedness conditions are not absolute. Each of the 32 UMBEL SuperTypes has a matching predicate for relating to external topics. See further the discusison of SuperTypes in the UMBEL specification.

by Mike Bergman at November 07, 2011 02:28 PM

November 03, 2011

AI3:::Adaptive Information (Mike Bergman)

And, Now, We Pause for a Brief Commercial Message . . .

Winter Park, CO RentalMixing Business and Pleasure

I never talk politics here, and rarely speak of sports or family or personal matters. But I’m making an exception today.

Since we lived in Montana a couple of decades back, skiing and the mountains have been a central theme in my family. Both of my kids learned to ski at Lost Trail before they even turned three. Today, both are impressive skiers. (I’m a different story, but that is immaterial. ;) )

We have skied many places across the Western US, all enjoyable and all remarkable. But, our favorite amongst them has been Winter Park, CO (more specifically, Mary Jane — no Jane, no pain). We have been going there for nearly a decade. The slopes and the beauty are, of course, arguments in themselves. But also what makes Winter Park special is that it offers the best deal on earth for lift tickets (with an annual pass) and has a local clientele that is laid back and into substance and not flash.

As our kids have grown and taken on lives of their own, we have come to treasure those chances when all of us can be together. Sking — but summer activities, too — are great ways to make that happen.

So, it is with immeasurable pleasure that we closed the sale today on a new second home in Winter Park. It is absolutely perfect for all things outdoors. And, since we still have regular lives and work, we will be offering our new place for rental for those many weeks we can not enjoy it ourselves. If mountains and beauty and nature are in your calling, let us know. We have a fantastic place to rent to you in one of the most spectacular places on earth.

by Mike Bergman at November 03, 2011 09:08 AM

October 24, 2011

AI3:::Adaptive Information (Mike Bergman)

UMBEL Services, Part 1: Overview

UMBEL Vocabulary and Reference Concept OntologyNew Portal Update Leverages the Open Semantic Framework

UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. When we first released UMBEL in mid-2008 it was accompanied by a number of Web services and a SPARQL endpoint, and general APIs. In fact, these were the first Web services developed for release by Structured Dynamics. They were the prototypes for what later became the structWSF Web services framework, which incorporated many lessons learned and better practices.

By the time that the structWSF framework had evolved with many additions to comprise the Open Semantic Framework (OSF), those original UMBEL Web services had become quite dated. Thus, upon the last major update to UMBEL to version 1.0 back in February of this year, we removed these dated services.

Like what I earlier mentioned about the cobbler’s children being the last to get new shoes, it has taken us a bit to upgrade the UMBEL services. However, I am pleased to announce we have now completed the transition of UMBEL’s earlier services to use the OSF framework, and specifically the structWSF platform-independent services. As a result, there are both upgraded existing services and some exciting new ones. We will now be using UMBEL as one of our showcases for these expanding OSF features. We will be elaborating upon these features throughout this series, some parts of which will appear on Fred Giasson’s blog.

In this first part, we provide a broad overview of the new UMBEL OSF implementation. We also begin to foretell some of the parts to come that will describe some of these features in more detail.

The Overall Portal

The new UMBEL portal is a fairly classic example of an OSF installation. The content management system hosting the system is Drupal, supplemented with a standard set of third-party modules and our own conStruct semantic technology modules. The theme is a stripped-down modification of the popular Pixture Reloaded theme:

Like other vocabulary sites, the UMBEL portal contains specifications and links to community resources and downloads. It also has some specialty links not shown on typical standards sites.

Much Better Vocabulary Access and Management

The site now most prominently features our structOntology editing and maintenance tool. Built on the OWL API, the same as Protégé 4, structOntology provides the advantage of enabling edits and management of ontologies directly within the applications in which they are used. This is far superior to needing to fire up an external ontology manager and then to re-import the changed ontology. structOntology also has an arguably simpler interface and operation than other ontology management alternatives:

For the UMBEL site, the standard view of using structOntology is read-only. In a subsequent part we will also discuss structOntology’s full editing and maintenance mode.

Improved Discovery and Navigation

Like all standard OSF installations, there are two superior means for discovery and navigation of the information space:  search and the relation browser.

Search uses the integration of RDF and inferencing with full-text, faceted search using Solr. This has been Structured Dynamics’ standard search function for some time, as Fred initially described in April 2009. It is a very effective way for finding new and related concepts within the UMBEL structure.

The relation browser is what is used for casual navigation and discovery. Any concept found via search or other means within the system can have the browser invoked by clicking on its browser icon []. When done, the standard relation browser appears:

The relation browser is highly configurable, as shown by some of our exemplar installations. Note in this case that the More details … link brings you to a detailed concept view, such as this example:

These various tools provide great means for discovery and navigation within the 28,000 concepts in the UMBEL reference space.

Newly Released Web Services and SPARQL Endpoints

We are also now providing updated endpoints for Ontology: Read, Search, Crud: Read, SPARQL and Scones. These will be described with access and query examples in a later part.

Some Cool New Sandboxes

We will also be discussing our OBIE (ontology-based information extraction) and entity tagger, scones, and export and ontology edit and management functions in subsequent posts.

Looking Ahead to Remaining Parts

We anticipate eight or nine more parts in this series explaining most of these options in greater detail. We hope to post a couple per week or so over the coming month. We will conclude with a discussion of next pending UMBEL releases.

UMBEL small logo This is the first of a multi-part series on the newly updated UMBEL services.

[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.

by Mike Bergman at October 24, 2011 04:35 PM

October 22, 2011

HyperDanja (Danny Ayers)

links for 2011-10-22

by danja at October 22, 2011 09:11 PM

October 18, 2011

AI3:::Adaptive Information (Mike Bergman)

Fred’s Hair is on Fire

Structured DynamicsToday’s Post is a Testimony to the Value of Vacations

My partner, Fred Giasson, today posted the second part of his series on open source. Since returning from a well-earned vacation a few weeks back — after more than three years without a break — Fred has been writing and developing up a storm. As someone said to me last week, “Fred’s on fire!” I could not agree more.

I think Fred’s post speaks for itself as to why and how Structured Dynamics has made a conscious choice to embrace open source. The major reason he puts forth — to bootstrap the company without the need for external investment — is unusual in itself. But one thing he is silent about is why this is a compelling reason. I’ll comment on that.

Fred and I have both worked for others dependent on their capital for our ventures (a few more times in my case). Capital is great for expansion and operations, but it can be deadly when visions requiring patience are in play. Structured Dynamics is only now a bit more than halfway through its five-year plan. While semantics technologies are exciting with a world of upside potential, they have also been incubated in academic labs with (as yet) a general lack of practical deployment. The promise is there, but often the delivery and maturation have been lacking. We are committed to play a visible role in correcting that.

The approach Fred outlines was not perhaps easily available to new startups a decade ago. But now, with open source and the Internet, costs of entry and ongoing development have dropped markedly. Yet, surprisingly, the idea of financing a startup via revenues is still not talked about sufficiently — let alone often used as an actual basis for building a company.

I’ve been fortunate to be able to partner with a young, world-class technologist whose maturity exceeds that of individuals many years his senior. He understands that in order to achieve important visions that the stewardship of those ideas can not be left to venture capitalists committed solely or mostly to gaming terms or near-term returns. We’re placing our bets on the paying customer and our own judgment.

So, it is great to see Fred continue his phenomenal development productivity since he returned from Hawaii. The benefit of his vacation is that we are also now getting his insights on his blog again.

by Mike Bergman at October 18, 2011 04:19 AM

Frederick Giasson's Weblog

Open Sources Projects As A Pool Of Resources

In a previous blog post, I wrote about how Open Source may be unnatural, and even counter intuitive, to many people. However, that really begs some questions evident with my current company's strategy.

Why have Mike Bergman and I chosen to develop no less than three major open source projects (structWSF, conStruct and the Semantic Components), encompassing more than 100 000 lines of new code and leveraging between 30 to 50 other open source software and libraries? Why have we open sourced all our software? Why has open source formed the core business strategy of Structured Dynamics in the last three years? How have we been able to profitably sustain the company, even in the midst of the global economic crisis that began in 2008?

I will try to answer these questions in this blog post, perhaps even providing some guidance for newer startups that may follow behind us.

Why Open Sourcing?

Why did Structured Dynamics chose to open source all of its software? There are multiple reasons why people and businesses choose to go open source. For some, it is because they think that it is where the market place is moving. For others it is because they think that a community will emerge around their effort, and then get free resources that improve the piece of software. Some think that their software will promptly be reviewed by professional programmer. Others may think that their system will become more secure. Etc.

For Structured Dynamics the reason why we choose to go open source is somewhat different:

We perceived that by open sourcing our complete software stack we could bootstrap the company without any external investment.

Making a Living out of Open Source Projects

There are multiple ways to do a living from an open source project:

  • Doing consultancy work related to the project
  • Implementing the software(s) into clients’ computer environment(s)
  • Selling training classes
  • Selling support contracts
  • Selling maintenance contracts
  • Selling hosted instances of the software (the SaaS model for one)
  • Selling development time to improve some part(s) of the software
  • Creating conferences around their open source projects
  • Selling proprietary extensions
  • I am probably missing a few, so please add them in a comment section below, and I will make sure to add them to this list.

Depending on the software you are developing, and depending on the business plan of your company, you may be doing one — or multiple — of these things to generate some money from your open source projects.

At Structured Dynamics we are doing some of them: we do get consultancy contracts related to the Open Semantic Framework and we do implement OSF in our clients’ computer environments.

But, more importantly, we are also doing development contracts related to the framework. In fact, each project we are working on is quite different. Our major projects involve companies that reside in totally different domains, have different needs and need to accommodate different kinds of users. However, most of the projects share the same core needs, and all of them advance the core technology in ways meaningful to our vision. We choose our customers — and , of course, vice versa — based on a true sense of partnership wherein both parties have their objectives furthered.

Let’s see how we use these relationships to drive the development of the Open Semantic Framework.

Open Source Project as a Pool of Resources

In the last three years, Structured Dynamics has attracted multiple companies and organizations that share our vision, and which are willing to invest in the Open Semantic Framework open source project. (See Mike's recent post on business development for a bit more on that aspect of things.) Each of these clients did want to use the OSF framework for their own needs. However, each of them did want to do something special that was not currently implemented in the framework.

What we created in these three years is a pool of resources that we used to develop the framework such that it accommodates the needs of each of our clients. Each of our clients then becomes a participant to the shared pool of innovation. Our clients have been willing to invest in the open source framework because they need their own features and because they know that they will benefit from what other participants of the pool will invest themselves down the road.

In that scenario, we are the managers of a pool of resources. We have the vision of where we want the framework to go, we know the roadmap of the project and we know the needs of each participant (our clients). What we do is to try to optimize the resources we get from each of our clients by developing the framework such that it can accommodate as broad of a spectrum of participants as possible. Then, we seek to find new participants that have some needs that will help us continue to develop the next steps of the roadmap. In this manner, we Jacob's Ladder our existing work to increase the capabilities for later clients, but earlier clients still benefit because they can upgrade to the later improvements. This is a self-sustaining model to continue to move the development of the framework forward.

By finding new clients, what we do is to give a return on investment to the other pool participants. Most of the new features that we develop for these new clients will benefit the other participants to the pool and will create new possibilities for them without any additional investment. All of our first clients have implemented what other participants later invest into the pool, thus crystallizing and augmenting their return on investment by using these new features.

Open Source is Not Just About Software

Open Source is not just about pieces of code, and this is quite important to understand. What we have open sourced with the Open Semantic Framework is much more than a series of code sources. We open sourced the entire framework:

  1. The source codes
  2. The documentation
  3. The processes
  4. The methodologies

We term this comprehensive approach our total open solution.

This distinction with other open source projects is an essential differentiator with our approach. We choose to open source all of the pieces related to the framework. What drove this decision is a simple sentence that shows our philosophy behind it:

"We're Successful When We're Not Needed"

If the APIs, processes and methodologies are not properly documented, it means that we would certainly be needed by our clients, which would mean that we failed to open source our solution. But since we are working to open source our code, our processes and our methodologies, we are on the way to successfully open source the Open Semantic Framework since we won’t be needed by our clients.

This business approach is not as crazy as it sounds. We are free to work on new and important innovations, and are not basing our company culture on dependency and a constant drain by our customers. I know, it does not sound like Larry Ellison, but sounds good to us and our clients. It is certainly not a maximum revenue objective built on the backs of individual clients.

Our life is more fun and our clients trust us with new stuff. Further, each step of the way, we are able to leverage our own framework for unbelievable productivity in what we deliver for the money. But that is a topic for another day.

We think Structured Dynamics' business approach is a contemporary winning strategy. Our customers get good and advanced capabilities at low cost and risk, while we get to work on innovative extensions that are raising the semantic baseline for the marketplace. Who knows if we will always continue this path, but for now it is leading to sustained development and market growth for open semantic frameworks, including our own OSF.

 

 

by Frederick Giasson at October 18, 2011 02:17 AM

October 11, 2011

Frederick Giasson's Weblog

Volkswagen’s RDF Data Management Workflow

TribalDDB UK’s team just published a new case study to the W3C: Case Study: Contextual Search for Volkswagen and the Automotive Industry. They discuss the benefits of some of the semantic web technologies, techniques and concepts that they use to help them managing their data. They describe their approach and outline their design. It covers the technical aspects of their new Semantic Web Platform that I wrote about a few weeks ago.

In this blog post, I want to further explain their data management workflow, and how their data get exposed to different kind of users.

Two Classes of Users

Let’s take a look at their data ingest/management/publishing workflow:

As you can see, all their data get collected, transformed and imported into structWSF. As I explained in my previous blog post, they are using structWSF to manage all their RDF data and access all the functionalities from the different web service endpoints.

However, how the data get exposed to the users is not that clear. In fact, it depends on the classes of users. A user can be multiple different things: it may be a person, it may be a computer software, it may be an organization, etc. However, there are two general classes of users:

  1. Public users, and
  2. Private users

Public users are users that have no direct relation with Volkswagen and that have no access to their internal network. Private users are generally internal departments or some internal software applications that have direct access to the structWSF instance.

Private Users

Private users generally have access to all structWSF web service endpoints. This means that all structWSF functionalities are accessible to them by querying the endpoints.

Two different kind of private users are specified in the use case’s schema:

  1. Volkswagen Site Search
  2. Other / External Applications

The Volkswagen site search is a software application that uses the structWSF Search endpoint to search, filter and expose their data to their users (the people who perform searches on the Volkswagen UK website).

The other/external applications are software applications that have access to the structWSF instance. These are generally internal applications that run in the same network. One of these applications is an internal software that exports all the RDF data from the structWSF SPARQL endpoint, and import it into Kasabi.

These are two examples of software applications that Volkswagen created around the structWSF web services to re-purpose, re-contextualize and re-publish their RDF data.

Public Users

There is currently two kinds of public users of this new Volkswagen Semantic Platform:

  • People, and
  • Software applications

Two interfaces have been made publicly available for each of these kinds of users:

  • A website search engine page for people, and
  • A SPARQL endpoint for software applications

When a person user reaches the website’s search page, the search query get sent to the structWSF Search web service endpoint. The result is then returned to the engine, get templated and displayed to the person user.

A SPARQL endpoint is accessible to the software applications. This endpoint is hosted by the Kasabi information marketplace. Volkswagen chooses to export everything from their structWSF into Kasabi to outsource the maintenance of their public SPARQL endpoint.

Unlock the Power

As we saw in this blog post and in the W3C use case, all Volkswagen UK data is internally managed by structWSF; however they are not locked into that system. They can easily communicate with external services to add new functionalities to their stack or to take business decision such as outsourcing the management of some publicly accessible data access endpoints.

This is an important characteristic of their design:

By choosing semantic web technologies (such as structWSF), techniques and concepts (such as their Vehicles OWL Ontology and RDF), they are not locking themselves into a specific framework. They can easily communicate with external systems and applications. This means that they can quickly adapt their system to their constantly changing needs.

Conclusion

I wrote this blog post to further explain Volkswagen’s data management workflow. I wanted to make sure that people were understanding the role that structWSF has in this use case, and the ecosystem it operates in.

by Frederick Giasson at October 11, 2011 10:53 PM

AI3:::Adaptive Information (Mike Bergman)

The Cobbler’s Shoes

Structured Dynamics The Need to Enforce Periodic Checkups on Web Properties

Face it, we all get busy and begin to overlook our own needs while we work for others on our jobs. The parable of the cobbler’s children going without shoes says it all.  It means that the shoemaker spends so much time looking after his customers’ needs that he neglects the needs of his own children.

We see the same phenomena in relation to our own personal assets, home repairs and cleaning, and a myriad of chores and background requirements. One way we can overcome these neglects is by scheduling annual or periodic checkups or activities. Spring cleaning is one such effort, as is annual asset portfolio re-balancing or doctor’s appointments or 10,000 mile vehicle servicing.

One of the cobbler’s chores for Structured Dynamics is the periodic care and feeding of our various Web sites. This has actually proven to be a non-trivial exercise, as our properties have grown to exceed 1400 static Web pages across some 30 diverse Web addresses and properties. As our client and code base expands, this exercise is increasingly demanding.

Taking advantage of a small break in the action, we have just completed another one of these reviews and revisions. Interestingly, as I was going through the various sites, I saw that date stamps for prior revisions tended to all occur in the September and October time frame. Last September, for example, SD went through a major redesign and new logo. Apparently, without consciously realizing it, we have been doing our own Web attic cleaning in the Fall.

Thus, as a way to formalize this process for us internally, I thought I’d briefly outline the Web site changes that we have cobbled together for this year. I suspect we’ll be doing another spiffing come Fall 2012.

Rationalizing the Properties

It is kind of frightening to realize that we have allowed our Web properties to grow to about 30 individual sites. This accretion happens gradually: a new initiative or capability arises that seems to warrant its own Web site. Yet each site carries with it a need to develop and maintain, as well as to explain its role and use in the Structured Dynamics information space.

Exclusive of internal development sites or ones dedicated to specific customers, here is the roster of existing SD properties that we have needed to rationalize:

Note that all properties with strike outs have now either been retired or consolidated with other properties. We have reduced the property count by 10, or by a third. Additional consolidations will be forthcoming.

Providing a Consistent Entry to the Various Properties

With the growth of our various Web properties and the diversity of the initiatives behind them, Fred and I have grown increasingly frustrated that our site visitors lacked a consistent way to access and understand these projects. Across all properties, Structured Dynamics has about 6,000 daily visitors or RSS tracking feeds.

Providing a consistent context of what these properties mean and their relation to one another is further compounded by the sheer size of our properties. Excluding dynamically generated pages (such as from search, demonstration of our semantic components, or use of the relation browser), we have on the order of 1400 static Web pages across all properties and blogs. Users may enter our information space via any of these entry points.

The answer to how to provide a consistent context on any Web page throughout our properties resides in the nifty JavaScript popup Fred recently described for his own blog. What we realized is that we could adapt this widget to provide a single overview of SD’s resources, and then add that widget to all of our properties such that it appears as a small tab at the bottom (sometimes side) of all property pages.

Then, when the tab SD Resource tab is clicked, the following popup appears:

So, whenever you are on one of our properties, look for the tab (generally) at the lower right corner of every Web page. That will take you to the common entry point across Structured Dynamics’ Web properties.

Updating the Properties

In this process we also went through some of our existing sites and made content, narrative and navigation changes consistent with this rationalization and consistent entry point. These updates were not nearly as extensive as the full re-designs from one year ago.

New Shoe Designs

With a constant stream of new initiatives and new understandings, it will remain a challenge for us to describe our various products and services. An even greater challenge will be to provide coherent descriptions of how all of these initiatives fit together consistent with our overall vision. One attempt at that is our new Overview page. Meanwhile, of course, we will occasionally be offering new Web goodies and sites as developments warrant. These will need to get integrated into this picture as well.

We think we have taken an itty-bitty step to improving this process with the SD Resources tab widget. Nonetheless, I’m sure that we will continue to craft new shoes to try to find ones that are still yet more comfortable and attractive. Thing is, we may have to wait another year before we get around to it again.

by Mike Bergman at October 11, 2011 09:26 AM

October 06, 2011

Frederick Giasson's Weblog

A Men Dedicated To Its Vision (and that Changed the World)

This men literally changed the World we live in. He had a vision, he failed, but he came back to change everybody’s daily habits. He pushed others to the limit and changed entire industries. Even if I don’t always agree with its company’s decisions, I will always respect its vision, its work and its dedication. Rest in peace Mr. Job.

 

Here are my collection of Steve’s best quotes that I aggregated over time… I hope it helps you understanding who the men was.

 

"Your time is limited, so don't waste it living someone else's life. Don't be trapped by dogma – which is living with the results of other people's thinking. Don't let the noise of others' opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary. " - Steve Jobs

"When I was 17, I read a quote that went something like: "If you live each day as if it was your last, someday you'll most certainly be right." It made an impression on me, and since then, for the past 33 years, I have looked in the mirror every morning and asked myself: "If today were the last day of my life, would I want to do what I am about to do today?" And whenever the answer has been "No" for too many days in a row, I know I need to change something." - Steve Jobs

"Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. And, like any great relationship, it just gets better and better as the years roll on. So keep looking until you find it. Don't settle." - Steve Jobs

"Again, you can't connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future. You have to trust in something – your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life."  - Steve Jobs

"To design something really well you have to get it. You have to really grok what it's all about. It takes a passionate commitment to thoroughly understand something – chew it up, not just quickly swallow it. Most people don't take the time to do that. Creativity is just connecting things. When you ask a creative person how they did something, they may feel a little guilty because they didn't really do it, they just saw something. It seemed obvious to them after a while. That's because they were able to connect experiences they've had and synthesize new things. And the reason they were able to do that was that they've had more experiences or have thought more about their experiences than other people have. Unfortunately, that's too rare a commodity. A lot of people in our industry haven't had very diverse experiences. They don't have enough dots to connect, and they en up with very linear solutions, without a broad perspective on the problem. The broader one's understanding of the human experience, the better designs we will have." - Steve Jobs

 

 

by Frederick Giasson at October 06, 2011 12:37 AM

October 05, 2011

Frederick Giasson's Weblog

Unnatural Open Source

I have never been an open source software advocate. In fact, like most people, I always wondered how companies could find a business advantage in developing open source softwares and how they could make money out of it to grow. It is nice to have open source softwares, but it is hard to imagine how you could justify putting thousands of hours in open source software projects if it is not only by passion.

In this post I will explain what I think is the main factor that put people, businesses and organizations on guard when come the time to think about open source softwares. In fact, I think it has much more to do with our nature: how we naturally are as human being, and much less to do with any real business related factors.

In a follow-up blog post, I will explain how Structured Dynamics embraced open source software, how we developed the company around the concept, and how we are managing the development of our project such that it benefits all our clients along with the company. But first, let’s try to figure out why much people are suspicious regarding open source softwares.

The Fear

"I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain."

- Dune, Frank Herbert

Have you ever heard someone telling you:

I found an incredible business idea! I am pretty sure that I am the first one to think about that. I will get some good money down the road!

Then, you naturally asked for more information about this great idea! And then the answer you got was something like:

Hooo! But I can’t tell you, this is really secret right now, at least until everything is ready to go.

Does this sound familiar? I does to me. I hear it often. But, why does people react that way? It is simply by fear: fearing that someone “steal” their ideas, start a company based on them, build projects or services that implemented them, and get rich while you are flipping burgers at McDonald’s.

To me, this is the main reason why people, organizations and businesses are suspicious regarding open source software: because of fear; fear of loosing something they don’t even have.

But the question is: is that rational? From my experience, and my understanding of how things works, I can certainly say that it is not. This way of thinking is not rational because it doesn’t take into account a few things:

  • The ability of others to do something with your ideas
  • The ability of others to have the vision you have for your ideas
  • The willingness of others to spend all their time and energy to make these ideas working
  • People tend to do what they want to do, and not what others wants­

The same behavior seems to happen with open source projects. When I am explaining to people what we are doing, one of the first reaction is: why your work is open and free? Don’t you fear that someone steal your project and ideas? How can you make money if it is free, people will just run with it for themselves no?

The simple answer to all these question is: no. No we don’t feature that anybody steal our projects and ideas just by cloning them from the source control. We don’t because of the four reasons listed above. We don’t because we trust our vision and our abilities to implement it in our various open source projects. And yes we can sustain the company pretty well with these projects and it is what I will cover in my following blog post.

Conclusion

Non-Open Source softwares are just like when someone has a business idea “for the next big thing” and that doesn’t want to share it with anybody else because he think that someone will take that idea and run with it by himself. In fact, it is quite the opposite. I learned with experience that there is only one person (or organization) that can make such a great idea a relative success: the person (or organization) that lives for that idea. An idea is just an idea, and has nothing great in it, until it gets implemented, until the idea lives by itself, propelled by it most dedicated advocate: its creators and their boundless enthusiasm. Any idea would fail without this… and would worth nothing; it would just be an idea.

by Frederick Giasson at October 05, 2011 01:14 PM

October 02, 2011

Frederick Giasson's Weblog

WordPress’s Follow Button for Non-WordPress.com Users

About two weeks ago, the WordPress.com team released a wonderful new tool called the Follow Button to all theirs WordPress.com users. This button is floating in the bottom-right corner of a blogs and let readers subscribing, by email, to the blog’s publications. Each time a new blog post is published, they receive an update in their inbox.

The idea is far from new, and may even looks like old-school. However, the implementation they did is simple, really well done and really clever. Also, the wording they used in the tool is perfect (for example, using the word Follow instead of Subscribe).

The only problem is that this wonderful new tool is only available for WordPress.com users! As you may know, this blog is using WordPress, but it is a self-hosted instance. After doing some research, I couldn’t find any plugins or methods to install it on my blog. Also, the email service under this user interface is built into WordPress.com. As last resort, I checked their Jetpack plugin, to see if it got added the Follow Button to it, but apparently they didn’t (it is probably too recent).

So, I was in a dilemma: I wanted this feature for my blog, I didn’t want to migrate everything to WordPress.com, and I didn’t had the time to write a plugin that does exactly this. So what I did is to take a few hours to hack my own Follow Button using what is already existing out there. In fact, I have been quite surprised to see how easy it turned out to be.

It as been as easy as installing the really good Subscribe2 plugin and to create the UI, from the original Follow Button using some HTML, CSS and JQuery code. After some re-wiring, I ended-up with my own self-hosted Follow Button.

This is what I want to share with you here, in this Hors Série blog post. I am pretty sure that many self-hosted WordPress blogger will want it, so I took an additional hour to write and publish this blog post.

I did two additional “improvements” to the concept:

  1. I changed the icon to put some color in there. Not only to make it less dull, but also to bring a little bit mo attention to it.
  2. I also added a link to my RSS feed. To me, “Follow” is not just about emails, but it is also about other syndication mediums too. However, I kept the email as the first option to keep the spirit of the tool.

Finally, I didn’t want to hack any piece of code in WordPress nor in any other WordPress plugin. The only thing that we will modify is the theme, by adding some code to it. The current implementation could be improved by upgrading Subscriber2 for example, but I didn’t want people to have to do this to enable the Follow Button on their blog.

Step #1: Install Subscribe2

First thing first. The first thing you will have to do is to install the WordPress plugin that will enable your users to subscribe, and to manage their subscriptions, to your blog via emails. We are using the really good Subscribe2 WordPress Plugin that gives these features to your WordPress instance.

To install this plugin using WordPress’ automatic plugin installation system, follow these instructions. Read the plugin’s installation instruction if you want to do this the manual way:

  1. Log in to your WordPress blog and visit Plugins->Add New.
  2. Search for Subscribe2, click “Install Now” and then Activate the Plugin
  3. Click the “Settings” admin menu link, and select “Subscribe2″
  4. Configure the options to taste, including the email template and any categories which should be excluded from notification
  5. Click the “Tools” admin menu link, and select “Subscribers”
  6. Manually subscribe people as you see fit
  7. Create a WordPress Page to display the subscription form. When creating the page, you may click the “S2″ button on the QuickBar to automatically insert the subscribe2 token. Or, if you prefer, you may manually insert the subscribe2 shortcode or token: [subscribe2] or the HTML invisible Ensure the token is on a line by itself and that it has a blank line above and below. This token will automatically be replaced by dynamic subscription information and will display all forms and messages as necessary
  8. In the WordPress “Settings” area for Subscribe2 select the page name in the “Appearance” section that of the WordPress page created in step 7

On this blog, I called the page created at step #7: Follow. Once you are done installing the plugin, you can test it by visiting your Follow page and by entering your own email (one that is not attached to any user of your account is preferable) and by checking in your inbox if you receive a subscription notification. If you haven’t, you may want to take a look at this FAQ to debug any possible issue with your outgoing email service.

Step #2: Customize your Follow Page

This next step is optional. Since that the form generated by the Subscribe2 plugin is really minimalist, you may want to customize it a little bit, to change its design and to add some explanation in the page, to help your readers to understand what is going on. Take a look at my own Follow page to see what I did to customize that page.

Step #3: Add the Follow Button code in you theme

The third step is really what will morph the Subscribe2 plugin into the Follow Button. What we are doing here, is just to add the code, in your theme, to display the Follow Button.

The first thing you have to do, is to locate where the footer of the pages is generated in the theme. Open the theme folder of your blog: /../wordpress/wp-content/themes/mytheme/. Then you will have to open a few files to check where the </body> ending HTML tag is generated. The file where that code is generated really depends on how the theme got designed. You can do a search, within all the PHP files in that folder for the string “</body>“. This should give you the answer right away. Once you located that place, you are good to continue with the following instructions.

Important note: It is possible that your Theme doesn’t use jQuery by default. If it is the case, then you have to edit the header.php (or whatever the name of the file where the header of your blog is generated) of your theme, and add the following line in the <head>...</head> section of the page:

1
<script src=”https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js” type=text/javascript”></script>

If you don’t have jQuery loaded, a JavaScript error will be returned, and the panel will “freeze” in the webpage. Once you make sure that jQuery was loaded, do proceed with this code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
<style type="text/css" media="screen">
  #bit, #bit * {}
  #bit {
      bottom: -300px;
      font: 13px "Helvetica Neue",sans-serif;
      position: fixed;
      right: 10px;
      z-index: 999999;
      width: 230px;
  }
 
  .loggedout-follow-typekit {
      margin-right: 4.5em;
  }
 
  #bit a.bsub {
      background-color: #464646;
      background-image: -moz-linear-gradient(center bottom , #3F3F3F, #464646 5px);
      border: 0 none;
      box-shadow: 0 -1px 5px rgba(0, 0, 0, 0.2);
      color: #CCCCCC;
      display: block;
      float: right;
      font: 13px/28px "Helvetica Neue",sans-serif;
      letter-spacing: normal;
      outline-style: none;
      outline-width: 0;
      overflow: hidden;
      padding: 0 10px 0 8px;
      text-decoration: none !important;
      text-shadow: 0 -1px 0 #444444;
  }
 
  #bit a.bsub {
      border-radius: 2px 2px 0 0;
  }
 
  #bit a.bsub span {
      background-attachment: scroll;
      background-clip: border-box;
      background-color: transparent;
      background-image: url("[[PATH-TO-THE-FAMFAM-ICON]]asterisk_orange.png");
      background-origin: padding-box;
      background-position: 2px 3px;
      background-repeat: no-repeat;
      background-size: 20% auto;
      padding-left: 18px;
  }
 
  #bit a:hover span, #bit a.bsub.open span {
      /*background-position: 0 -117px;*/
      color: #FFFFFF !important;
  }
 
  #bit a.bsub.open {
      background: none repeat scroll 0 0 #333333;
  }
 
  #bitsubscribe {
      background: none repeat scroll 0 0 #464646;
      border-radius: 2px 0 0 0;
      color: #FFFFFF;
      margin-top: 27px;
      padding: 15px;
      width: 200px;
      float: right;
      margin-top: 0;
  }
 
  div#bitsubscribe.open {
      box-shadow: 0 0 8px rgba(0, 0, 0, 0.5);
  }
 
  #bitsubscribe div {
      overflow: hidden;
  }
 
  #bit h3, #bit #bitsubscribe h3 {
      color: #FFFFFF;
      font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
      font-size: 20px;
      font-weight: 300;
      margin: 0 0 0.5em !important;
      text-align: left;
      text-shadow: 0 1px 0 #333333;
  }
 
  #bit #bitsubscribe p {
      color: #FFFFFF;
      font: 300 15px/1.3em "Helvetica Neue",Helvetica,Arial,sans-serif;
      margin: 0 0 1em;
      text-shadow: 0 1px 0 #333333;
  }
 
  #bitsubscribe p a {
      margin: 20px 0 0;
  }
 
  #bit #bitsubscribe p.bit-follow-count {
      font-size: 13px;
  }
 
  #bitsubscribe input[type="submit"] {
      -moz-transition: all 0.25s ease-in-out 0s;
      background: -moz-linear-gradient(center top , #333333 0%, #111111 100%) repeat scroll 0 0 transparent;
      border: 1px solid #282828;
      border-radius: 11px 11px 11px 11px;
      box-shadow: 0 1px 0 #444444 inset;
      color: #CCCCCC;
      padding: 2px 20px;
      text-decoration: none;
      text-shadow: 0 1px 0 #000000;
  }
 
  #bitsubscribe input[type="submit"]:hover {
      background: -moz-linear-gradient(center top , #333333 0%, #222222 100%) repeat scroll 0 0 transparent;
      box-shadow: 0 1px 0 #4F4F4F inset;
      color: #FFFFFF;
      text-decoration: none;
  }
 
  #bitsubscribe input[type="submit"]:active {
      background: -moz-linear-gradient(center top , #111111 0%, #222222 100%) repeat scroll 0 0 transparent;
      box-shadow: 0 -1px 0 #333333 inset;
      color: #AAAAAA;
      text-decoration: none;
  }
 
  #bitsubscribe input[type="text"] {
      border-radius: 3px 3px 3px 3px;
      font: 300 15px "Helvetica Neue",Helvetica,Arial,sans-serif;
  }
 
  #bitsubscribe input[type="text"]:focus {
      border: 1px solid #000000;
  }
 
  #bitsubscribe.open {
      display: block;
  }
 
  #bsub-subscribe-button {
      margin: 0 auto;
      text-align: center;
  }
 
  #bitsubscribe #bsub-credit {
      border-top: 1px solid #3C3C3C;
      font: 11px "Helvetica Neue",sans-serif;
      margin: 0 0 -15px;
      padding: 7px 0;
      text-align: center;
  }
 
  #bitsubscribe #bsub-credit a {
      background: none repeat scroll 0 0 transparent;
      color: #AAAAAA;
      text-decoration: none;
      text-shadow: 0 1px 0 #262626;
  }
 
  #bitsubscribe #bsub-credit a:hover {
      background: none repeat scroll 0 0 transparent;
      color: #FFFFFF;
  }
</style>    

<script type="text/javascript" charset="utf-8">
  jQuery.extend(jQuery.easing, {
      easeOutCubic: function (x, t, b, c, d) {
          return c * ((t = t / d - 1) * t * t + 1) + b;
      }
  });
  jQuery(document).ready(function () {
      var isopen = false,
          bitHeight = jQuery('#bitsubscribe').height();
      setTimeout(function () {
          jQuery('#bit').animate({
              bottom: '-' + bitHeight - 30 + 'px'
          }, 200);
      }, 300);
      jQuery('#bit a.bsub').click(function () {
          if (!isopen) {
              isopen = true;
              jQuery('#bit a.bsub').addClass('open');
              jQuery('#bit #bitsubscribe').addClass('open')
              jQuery('#bit').stop();
              jQuery('#bit').animate({
                  bottom: '0px'
              }, {
                  duration: 400,
                  easing: "easeOutCubic"
              });
          } else {
              isopen = false;
              jQuery('#bit').stop();
              jQuery('#bit').animate({
                  bottom: '-' + bitHeight - 30 + 'px'
              }, 200, function () {
                  jQuery('#bit a.bsub').removeClass('open');
                  jQuery('#bit #bitsubscribe').removeClass('open');
              });
          }
      });
  });
</script>

<div id="bit" class="">
  <a class="bsub" href="javascript:void(0)"><span id='bsub-text'>Follow</span></a>
 
  <div id="bitsubscribe">
    <h3><label for="loggedout-follow-field">Follow this Blog</label></h3>
 
    <form action="[[PATH-TO-YOUR-FOLLOW-WORDPRESS-PAGE]]" method="post" accept-charset="utf-8" id="loggedout-follow">
      <p>Get every new post on this blog delivered to your Inbox.</p>
      <p class="bit-follow-count">Join <?php echo $wpdb->get_var("SELECT COUNT(id) FROM wp_subscribe2 WHERE active='1'"); ?> other followers:</p>
      <p>
        <input type="text" name="email" id="s2email" style="width: 95%; padding: 1px 2px" value="Enter email address" onfocus='this.value=(this.value=="Enter email address") ? "" : this.value;' onblur='this.value=(this.value=="") ? "Enter email address" : this.value;'  id="loggedout-follow-field"/>
      </p>
       
      <input type="hidden" name="ip" value="<?php echo $_SERVER['REMOTE_ADDR']; ?>">
     
      <p id='bsub-subscribe-button'>
        <input type="submit" name="subscribe"  value="Sign me up!" />
      </p>
    </form>
   
    <p style="padding-top: 10px;">Or subscribe to the RSS feed by clicking on the counter:</p>  
   
    <p>
      [[ADD-YOUR-RSS-FEED-LINK-HERE]]
    </p>
  </div>
</div>

The only thing you have to do is to copy/paste that code above the </body> tag. Then, do the following three modifications to properly wire it in your blog:

  • At line #41, replace [[PATH-TO-THE-FAMFAM-ICON]]with the path of the asterisk_orange.png icon, on your blog
  • At line #211, replace [[PATH-TO-YOUR-FOLLOW-WORDPRESS-PAGE]] by the URL of your Follow page (the one you created when you installed Subscribe2)
  • At line #228, replace [[ADD-YOUR-RSS-FEED-LINK-HERE]] by the link to your RSS feed

You can get the free asterisk_orange.png icon image from the FamFamFam website. The only thing you have to do, is to download that image, and to put it in the folder you defined for [[PATH-TO-THE-FAMFAM-ICON]]. However, you can use whatever image that you prefer, that may better fit the design of your blog.

Step #4: Disable it For Mobile Devices

Some mobile devices may have issues displaying this floating window. Sometimes, the window may be floating in the middle of the device’s screen without folding-back in the bottom of the page. For this reason, you may want to disable (remove) this option if the user is using a mobile device to read your blog. You can easily disable it if the web server detects that a mobile devise is requesting the webpage by adding these two blocks of code.

First, copy and paste this first block of code above the code of the Follow button (before line #1):

1
2
3
4
5
<?php
 $useragent = $_SERVER['HTTP_USER_AGENT'];
 if(!preg_match('/android.+mobile|avantgo|bada\/|blackberry|blazer|compal|elaine|fennec|hiptop|iemobile|ip(hone|od)|iris|kindle|lge |maemo|midp|mmp|opera m(ob|in)i|palm( os)?|phone|p(ixi|re)\/|plucker|pocket|psp|symbian|treo|up\.(browser|link)|vodafone|wap|windows (ce|phone)|xda|xiino/i',$useragent)||preg_match('/1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r |s )|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1 u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp( i|ip)|hs\-c|ht(c(\-| |_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac( |\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt( |\/)|klon|kpt |kwc\-|kyo(c|k)|le(no|xi)|lg( g|\/(k|l|u)|50|54|e\-|e\/|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(di|rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-| |o|v)|zz)|mt(50|p1|v )|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|wg|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v )|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-| )|webc|whit|wi(g |nc|nw)|wmlb|wonu|x700|xda(\-|2|g)|yas\-|your|zeto|zte\-/i',substr($useragent,0,4)))
 {
?>

Then copy and paste this second block of code below the code of the follow button (after line #231):

1
2
3
<?php
  }
?>

This code come from the Detect Mobile Browser project and is the best mobile device detection code I saw so far. What this code does, is not to include the Follow Button if the device that is requesting the webpage is a mobile device. Otherwise, the Follow Button is added to the HTML page.

Step #5: Test it!

If you are reading this step #5, it means that you finished to create your own, self-hosted, Follow Button!

Congratulation!

But the last thing that remains to be done, is to test it. Once you saved your file with the code above, just refresh any page of your blog. You should see appearing the Follow button on the bottom-right corner of your blog. If you click on it, you should see the form that let your readers subscribing to the system. If you add one of your emails, and click the subscribe button, you should get redirected to the Follow page. Finally you should receive a confirmation email that ask you to confirm your subscription by clicking on a link.

If all these steps properly works, it means that you are done and ready to provide that new functionality to the readers of your blog!

Conclusion

Even if this blog post is few pages long, I hope you found it easy to install and setup. If you have any question regarding this hack, don’t hesitate to ask them down there, in the comments section of this post. I will be happy to answer all of them.

Happy Hacking!

 

Translations

This blog post as been translated in Federico Bozo in Spanish. Other translations will be added to this section.

by Frederick Giasson at October 02, 2011 10:19 PM

September 30, 2011

HyperDanja (Danny Ayers)

links for 2011-09-30

by danja at September 30, 2011 11:15 PM

September 27, 2011

Frederick Giasson's Weblog

One of Semantic Web’s Core Added Value

If I ask the question: "What added value(s) does the Semantic Web brings on the table?". So, what are the benefits that companies and organizations would get from using the Semantic Web? I am pretty sure that after asking this question, I would get answers such as:
  • You will instantly be able to traverse graphs of relationships
  • You will be able to infer facts (so create/persist new knowledge) from other existing facts
  • You will be able to check to make sure that your knowledge base is consistent and satisfiable
  • You will be able to modify your ontologies/vocabularies/schemas without impacting the description of your instance records or the usability of any software that use it (unlike relation databases)
  • And so on…

All these answers would be accurate. However, what if these answers would only be a part of the real added value that the Semantic Web brings on the table?

Note: when I refer to the “Semantic Web” on this blog post (and across all my writings), I refer to a set of technologies, techniques and concepts referred as the Semantic Web. So it is not a single thing, but a complete set of things that creates new ways of working with, and manipulating, information.

Strong of about 7 years of research and development of Semantic Web technologies that includes about 3 years of developing the Open Semantic Framework, that the biggest added value that I found from utilizing Semantic Web technologies is only partially related to these answers. In fact the biggest added value for me, as a developer can be resumed in one word:

PRODUCTIVITY

As simple as this. The biggest added value I gained from using and applying Semantic Web related technologies, techniques and concepts is an important increase in development, and data integration productivity.

Such productivity gain as to do with one of Semantic Web’s core attribute:

FLEXIBILITY

This is what I was suggesting in my latest blog post about Volkswagen’s use of the Open Semantic Framework: how Volkswagen uses the Open Semantic Framework to get flexibility that will lead to a gain in productivity to integrate, publish, and re-contextualize their data assets. The few gains that I listed above are part of the reason why the Semantic Web gives you flexibility that leads to an increase in productivity.

This same point as been re-affirmed today by Lee Feigenbaum in its latest blog post Saving Months, Not Milliseconds: Do More Faster with the Semantic Web:

Why is this? Ultimately, it’s because of the inherent flexibility of the Semantic Web data model (RDF). This flexibility has been described in many different ways. RDF relies on an adaptive, resilient schema (from Mike Bergman); it enables cooperation without coordination (from David Wood via Kendall Clark); it can be incrementally evolved; changes to one part of a system don’t require re-designs to the rest of the system. These are all dimensions of the same core flexibility of Semantic Web technologies, and it is this flexibility that lets you do things fast with the Semantic Web.

Warning: Productivity is not synonymous with simplicity

However, I would warn people that think that productivity gains are possible because semantic web technologies are simpler to use, manage and implement than other existing technologies.

It is certainly not the case, and I don't think it will ever be. Semantic Web technologies, techniques and concepts are not easy to understand, and have a big learning curve. This is partly true because these techniques, technologies and concepts are relatively new in the field of the computer sciences, and because they are not fully understood, defined, implemented and used.

by Frederick Giasson at September 27, 2011 05:34 PM

September 26, 2011

AI3:::Adaptive Information (Mike Bergman)

Thirty OWL API Tools

OWL - Web Ontology LanguageDocumenting the Emerging Ecosystem Around OWL 2

We have been touting the importance of OWL 2 as the language of choice for federating and reasoning over RDF and ontologies. An absolutely essential enabler of the OWL 2 language is version 3 of the OWL API (actually, version 3.2.4 at the time of this writing), a Java-based framework for accessing and managing the language. Protégé 4, the most popular open source ontology editor and integrated development environment (IDE), for example, is built around the OWL API.

As we laid out a bit more than a year ago, now codified on our TechWiki as the Normative Landscape of Ontology Tools (especially the second figure), we see the OWL API as the essential pivot point for all forms of ontology tools moving forward.

We have attempted to assemble a definitive and comprehensive list of all known tools presently based around version 3 of the OWL API. (We have surely missed some and welcome comments to this post that identify missing ones; we promise to add them and keep tracking them.) Herein is a listing of the 30 or so known OWL API-based tools:

  • Protégé 4 is a free, open source ontology editor and knowledge-base framework based on OWL 2 and centered on the OWL API
  • CEL, FaCT++, HermiT, Pellet, and Racer Pro reasoners provide OWL API wrappers and are also available as reasoner plugins to Protégé 4
  • There is also a FaCT++ port to Java that is also implementing the OWLReasoner and is available as a plugin for Protégé 4.1; it is at version 0.9 with user feedback welcomed
  • structOntology is an open source ontology editor and manager supporting Structured DynamicsconStruct implementation of the Open Semantic Framework (OSF) in Drupal; more information is provided here
  • TrOWL is a Tractable reasoning infrastructure for OWL 2. TrOWL supports both standard TBox and ABox reasoning, as well as conjunctive query answering
  • SKOSEd is a SKOS editor for Protege; just recently made compatible with Protégé 4.1
Please let us know of any missing OWL API tools that should be added to this list by submitting a comment to this post. We will keep this listing current.
  • Populus is a semantic spreadsheet framework using RightField and OPPL for creating OWL ontologies
  • Bubastis is a tool for detecting asserted logical differences between two ontologies, such as between versions. A stand alone version of the tool is also available for download from the EFO tools page. Bubastis is powered by the OWL API
  • Tab2OWL and its download is a Java tool for importing classes into an already existing OWL file. The script uses the OWL API to read in a tab delimited file of class details and create OWL classes from these rows, adding them to an existing ontology
  • S-Match is a semantic matching framework, which provides several semantic matching algorithms and facilities for developing new ones. Currently S-Match contains implementations of the original S-Match semantic matching algorithm, as well as minimal semantic matching algorithm and structure preserving semantic matching algorithm
  • The Alignment API is an API and implementation for expressing and sharing ontology alignments. It uses an RDF format for expressing alignments in a uniform way. Its four main interfaces (Alignment, Cell, Relation and Evaluator) provides these services: storing, finding, and sharing alignments; piping alignment algorithms (improving an existing alignment); manipulating (thresholding and hardening); generating processing output; and comparing alignments
  • The OWLlink API is a Java interface and implementation of the OWLlink protocol on top of the Java-based OWL API. The OWLlink API enables OWL API-based applications to access remote reasoners (so-called OWLlink servers), and it turns any OWL API aware reasoner into an OWLlink server
  • OPPL2 (ontology pre-processing language) is an abstract formalism that allows for manipulating ontologies written in OWL. It is 100% based on the Manchester OWL Syntax; a query language based on OWL (logical) axioms and variables; a scripting language that allows the addition/removal of OWL (logical) axioms. It is available as an Protégé 4.1 plug-in
  • OPPL Patterns It is available as an Protégé 4.1 plug-in
  • Posh (Prolog OWL Shell) is a command line utility that wraps the Thea OWL library to allow for advanced querying and processing of ontologies, combining the power of Prolog and OWL reasoning
  • POPL (Prolog Ontology Processing Language) allows you to write expressive ontology rewrite rules in a high-level declarative fashion using a syntax similar to Manchester syntax
  • OWLTools (aka OWL2LS – OWL2 Life Sciences) is a convenience Java API on top of the OWL API. Code is available here
  • LexOWL is a plug-in for Protégé 4. In order to add more powerful functionality (e.g., inferencing, editing) to the existing infrastructure and align LexGrid more closely with various Semantic Web technologies, the LexOWL plugin for Protégé 4 provides a way for representing the ontologies modeled within the LexGrid environment in OWL. A source for downloading this tool has not been found
  • Apero, a Protégé plug-in that is an ontology debugging tool based on the use of anti-patterns; see http://www.emcl-study.eu/fileadmin/master_theses/thesis_tahwil.pdf
  • DReW is a prototype DL reasoner over LDL+ ontologies and a prototype reasoner for dl-programs over LDL+ ontologies under well-founded semantics. It is not well developed or documented; it can be downloaded here
  • The LingInfo, LexOnto, LexInfo and LMF ontologies are available from the project website, as well as a corresponding Java API with an implementation for the commonly used OWL API
  • Thea2 is a Prolog library that provides complete support for querying and processing OWL 2 ontologies directly from within Prolog programs. Thea2 also offers additional capabilities including a bridge to the Java OWL API and translation of ontologies to Description Logic programs
  • GLOW is a visualization for OWL ontologies, based on Hierarchical Edge Bundles. Hierarchical Edge Bundles is a new visually attractive technique for displaying adjacency relations in hierarchical data, such as concept structures formed by `subclass-of’ and `type-of’ relations. The displayed adjacency relations can be selected from an ontology using a set of common configurations, allowing for intuitive discovery of information. It is a visualization library based on OWL API, as well as a plug-in for Protégé
  • ROWLKit is a simple GUI to reason and query over ontologies written in the OWL 2 QL profile of OWL
  • OBDA Plugin (Ontology-based data access) is an add-on for the Protégé ontology editor aimed at transforming Protégé into a fully fledged OBDA model editor. It provides data source and mapping editors, as well as querying facilities that, in conjunction with an OBDA-enabled reasoner, allows you to design and test every aspect of an OBDA system
  • OntoCAT provides high level abstraction for interacting with ontology resources including local ontology files in standard OWL and OBO formats (via OWL API)
  • SemaRule Navigator is an Eclipse-based toolkit of multiple semWeb tools, built around the OWL API, organized into a pipeline-like system (appears quite complicated)
  • OWLDB (alias Mnemosyne) is a storage system based on object-relational mappings utilising the OWL-API for the W3C Web Ontology Language OWL
  • Finally, for a periodically updated list of “official” extensions, see https://owlapi.svn.sourceforge.net/svnroot/owlapi/v3/branches/owlextensions/.

Addendum

Ignazio Palmisano also graciously suggested these additional sources:

which also further leads to this additional listing:

It is not clear if all of these offer OWL 2 support, let along work with the current OWL API.

by Mike Bergman at September 26, 2011 08:52 AM

September 21, 2011

Frederick Giasson's Weblog

Volkswagen’s Use of structWSF in their Semantic Web Platform

TribalDDB London, Volkswagen UK‘s partner, mentioned earlier this week that Volkswagen are using some parts of the Open Semantic Framework to develop the next generation of their online platform.

This story has been published by Jennifer Zaino’s in her article: Volkswagen: Das Auto Company is Das Semantic Web Company!

I can now talk about this project that uses some pieces of the framework that we have been developing for more than 3 years now.

The Objective

Volkswagen’s main objective behind the development of the next version of their Web platform started by improving their online search engine, but as William Greenly mentioned, it quickly became a strategic decision:

"So the objectives were about site search and improving it, but in the long-run it was always the idea to contextualize content, to facet content, to promote it in different contexts."

The objective is to create a platform that gives them the flexibility to leverage all the data assets they own. This flexibility will help them to leverage the data assests they have to improve not only their search engine, but also to contextualize it in different parts of their websites, partner’s websites or to promote, and publish that same information on different communication channels or devices.

The Flexibility


What is a flexible platform in that context? A flexible platform is one that can integrate any kind of information sources. Such information sources in the context of Volkswagen can be a series of relational dataset schemas spread around the World, Excel spreadsheets, CSV files, old plain text technical documents about past model of cars, semi-structured documents such as webpages, etc.

A flexible platform is also one that minimally impact (if at all) the data consumers if the data structure changes in the system. This is really important since the World we live in constantly changes. This means that things constantly change and we have to reflect these changes in the data we own and maintain. This is why this point is so important, because we want to minimize the impact of the data structure changes that will happen all the time.

Having the flexibility to constantly adapt your data, while minimally impacting the data consumers of the system, enables you to make quick decision to adapt your strategy in a highly competitive World. This flexibility gives you a clear business advantage.

A flexible platform is also one that let you publish your data the way you want, in the format that is needed. Such a flexible platform has to give you access to an interface that give you access to all the functionalities of the platform without having to care about what happens under the hood.

A flexible system is one that can communicate your information on any kind of communication channels, and to any devices that have access to the Web.

Under the Hood

That next generation platform that Volkswagen is currently developing is partly based on a few of the main pieces of the Open Semantic Framework. These pieces help them to reach their goal by helping them giving the flexibility their platform needs.

The first step they gone thru was to create their Volkswagen Vehicles Ontology that is used to describe all the entities they want to index into their platform. The Web Ontology Language (OWL), along with the Resource Description Framework (RDF) is what gives them the complete flexibility on how they can integrate all the pieces of information they want, in a canonical format.

Then they choose to use structWSF (the structured data web services framework). This piece gives them the flexibility to get a series of web interfaces (web service endpoints) to create, update, manage and query their data. This web service layer enables them to do anything they want with their data, from anywhere on the Web. This is possible because all the functionalities of the framework are exposed as web service endpoints. StructWSF also gives them the possibility to communicate their data in multiple different formats. This makes it the perfect flexible system to feed their information in different contexts, in different communication channels or on different devices.

At Volkswagen, structWSF is used to populate, and keep in sync, their Solr and Triple Store instances. It gives them the time to care about the more important aspects of their platform, and to care about how the data should be synced between the various specialized data management systems.

By using structWSF to manage their data, they are able to reach some objectives to make their platform as flexible as possible:

  • To be able to minimize the impact of data changes to the data consumers
    • Because structWSF uses OWL & RDF to describe all the data it index
  • To be able to manipulate their data from anywhere
    • Because all the functionalities of structWSF are exposed as web service endpoints
  • To be able to communicate the information in different contexts, communication channels and devices
    • Because structWSF has, in its core, is designed to transform all the data it indexes in any other kind of format

The Next Step

One of their longer term goal and objective is to analyze their unstructured and semi-structured textual documents to extract some structure out of them, and to index them into their semantic platform. To do this, they are looking at using Scones, which is the structWSF semantic tagger web service endpoint. Scones will use some subject reference structures such as UMBEL to semantically tag the textual document. Once the document as been processed by Scones, and indexed in structWSF, it can now be re-published in different contexts based on the reference concepts that have been tagged to it. This gives them the flexibility to leverage non-structured sources of data and to re-purpose it in different ways by publishing it in different context and in different systems.

This second system will enable them to leverage the investment they made in the past, by writing all these textual documents, and to re-purpose, and re-contextualizing, them in all kind of different contexts.

Conclusion

I think that TribalDDB and Volkswagen make the good decision for their future. Taking the business decision to develop and maintain a completely new kind of information system is not an easy decision to take. I am not saying that they made the good choice to use our pieces of the stack. The decision goes far beyond this. Such a Semantic Platform challenges everything in an organization: the people that takes the decisions, the people that create and manage the data, the people that develop the system, the people that maintain that system, the consumers of the system, the customers, the partners, etc. This is a big decision; whatever the technology stack you plan to use. I congratulate them for the decision they took.

I strongly believe that this was the right decision to take considering the future opportunities they are creating to themselves.

 

 

by Frederick Giasson at September 21, 2011 08:59 PM

September 19, 2011

Frederick Giasson's Weblog

Benchmark of PHP’s main String Search Functions

I am currently upgrading the structWSF ontologies related web service endpoints along with the structOntology conStruct module to make them more performing so that we can load ontologies that have thousands of classes and properties (at least up to 30 000 of them).

While testing these new upgrades with them UMBEL ontology, I noticed that much of the time was spent by a few number of stripos() calls located in the loadXML() function of the ProcessorXML.php internal structXML parser. They were used to extract the prefixes in the header of the structXML files, and then to resolve them into the XML file. I was using stripos() instead of strpos() to make the parsing of these structXML files case-insensitive even if XML is case-sensitive itself. However, due to their processing cost, I did change this behaviors by using the strpos() function instead. Here are the main reasons to this change:

  • XML is itself case-sensitive, so don’t try to be too clever
  • These structXML files that are exchanged are mostly internal to structXML
  • Their parsing performances is critical

The Tests

This is a non-scientific post about some experimentation I made related to the various PHP 5.3 string search functions. These tests have been performed on a small Amazon EC2 instance using DBG and PHPeD.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
 
$text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce malesuada aliquet pharetra. Nunc tincidunt tempus eleifend. Cras aliquet risus eget tortor elementum at molestie erat auctor. Sed sapien nulla, auctor a aliquam in, ornare eget enim. Ut ac luctus nunc. Etiam et tortor felis, sed fringilla orci. Fusce laoreet ligula turpis, quis sodales enim. Pellentesque at sapien ut dolor malesuada placerat eu ac quam. Pellentesque purus elit, sodales in fringilla eu, egestas vitae ipsum. Nam condimentum, nisi ac tincidunt luctus, odio erat porta turpis, eget varius felis leo sit amet lorem. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas quis pulvinar dui. Integer quis eros nibh. Donec in lectus vitae ligula euismod vulputate ut euismod enim. Ut vehicula, sapien at faucibus ornare, nulla lorem luctus purus, sed imperdiet augue purus quis enim.";
$explodedText = explode(" ", $text);
 
for($i = 0; $i < 10000; $i++)
{
  $word = $word = array_rand($explodedText );
   
  strpos($text, $word);
  stripos($text, $word);
  strstr($text, $word);
  stristr($text, $word);
}

?>

The first test uses a text of 138 words. That text get exploded into an array where each value is a word of that text. Then, before each iteration, we randomly select a word that we will search, within the text, using each of the 4 search functions.

Note that in the result images below, each of the line in the left-most column are the ones of the PHP code above.

That first test starts with 10 000 iterations. Here are the results of the first run:


The second test uses the same 138 words, but the test is performed 100 000 times:

As we can see, strpos() and strstr() are clearly faster than their case-insensitive counterparts.

Now, let’s see what is the impact of the size of the text to search. We will now perform the two tests with 10 000 and 100 000 iterations but with a text that has 497 words.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
 
$longText = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce malesuada aliquet pharetra. Nunc tincidunt tempus eleifend. Cras aliquet risus eget tortor elementum at molestie erat auctor. Sed sapien nulla, auctor a aliquam in, ornare eget enim. Ut ac luctus nunc. Etiam et tortor felis, sed fringilla orci. Fusce laoreet ligula turpis, quis sodales enim. Pellentesque at sapien ut dolor malesuada placerat eu ac quam. Pellentesque purus elit, sodales in fringilla eu, egestas vitae ipsum. Nam condimentum, nisi ac tincidunt luctus, odio erat porta turpis, eget varius felis leo sit amet lorem. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas quis pulvinar dui. Integer quis eros nibh. Donec in lectus vitae ligula euismod vulputate ut euismod enim. Ut vehicula, sapien at faucibus ornare, nulla lorem luctus purus, sed imperdiet augue purus quis enim. Nunc eu consectetur quam. Duis nulla sem, tincidunt vel placerat at, ultricies eu est. Vestibulum sed nulla nunc, et tristique orci. Aliquam nulla sapien, lobortis in sagittis vitae, tincidunt ut felis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut condimentum, orci venenatis mollis faucibus, purus enim euismod massa, a imperdiet sapien arcu in sapien. Nulla convallis sodales pretium. Nulla facilisi. Maecenas molestie est tortor. Fusce congue, leo eu tristique sodales, odio leo facilisis lectus, in euismod odio tellus ut sapien. Fusce odio orci, facilisis eu convallis et, consectetur nec mauris. Nullam nulla lacus, volutpat sit amet pulvinar quis, pulvinar eget dolor. Curabitur sit amet odio sem, at dapibus tellus. Donec nec dictum eros. Morbi convallis libero ultrices magna varius suscipit. Duis bibendum volutpat felis non fermentum. Phasellus nunc mi, ornare et vulputate sed, pellentesque sed enim. Mauris suscipit, nisl quis tempor mollis, tortor nunc varius odio, eu dictum odio mi quis sapien. Morbi placerat, erat quis mattis iaculis, urna nisi faucibus nisi, eu mattis elit mauris eu quam. Mauris euismod tincidunt ante quis interdum. Phasellus elementum libero in arcu tempus tincidunt. Praesent in nunc eget nibh porta imperdiet eget eget mauris. Morbi pellentesque dapibus lacus, rutrum sollicitudin nisi fermentum vel. Cras tempor mattis urna, sit amet semper eros varius ut. Fusce erat elit, tempus non commodo et, egestas sit amet odio. Suspendisse libero neque, porttitor vel volutpat eget, placerat in mi. Proin pharetra leo in ligula porttitor vestibulum. Curabitur vel mauris nec lorem sollicitudin porttitor. Sed suscipit, mauris ac sollicitudin tempus, orci velit aliquet leo, vitae ornare mi nulla a tellus. Morbi turpis justo, vestibulum ac auctor sed, vulputate nec nisl. Quisque ut ultricies orci. Sed vel dolor at felis egestas venenatis in ut elit. Nam quis neque sem. Morbi turpis magna, porttitor vulputate dignissim commodo, auctor eu nibh. Ut at nisl tortor. Quisque cursus interdum mi ut molestie. Vivamus nec ipsum ipsum. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed quis ipsum erat, quis dignissim nunc. Sed eu diam dapibus tortor fermentum dignissim. Phasellus ac turpis nisl, dictum consequat elit. Suspendisse at turpis quis eros pharetra imperdiet. Mauris ut nisl augue. ";
$explodedLongText = explode(" ", $longText);
 
for($i = 0; $i < 500000; $i++)
{
  $word = array_rand($explodedLongText);
   
  strpos($text, $word);
  stripos($text, $word);
  strstr($text, $word);
  stristr($text, $word);
}

?>

That third test starts with 10 000 iterations. Here are the results of the third run:

The fourth test uses the same 497 words, but the test is performed 100 000 times:

As we can see, even if we add more words, the same kind of performances are experienced.

Conclusion

After many runs (I only demonstrated a few here). I think I can affirm that strpos() and strstr() are way faster than their case-insensitive counterparts. However, strpos() seems a little bit faster than strstr(), but it seems to depends of the context, and which random words are being searched for. In any cases, according to PHP’s documentation, we should always use strpos() instead of strstr() because it supposedly use less memory.

There may also be some unknown memory considerations that may affect the code I used to test these functions. In any case, I can affirm that in a real context, where queries are sent to the Ontology: Read web service endpoint that hosts the UMBEL ontology, that strpos() is a way faster than stripos().

by Frederick Giasson at September 19, 2011 05:57 PM

September 18, 2011

Frederick Giasson's Weblog

What is an Ontology?

An ontology is the definition of a vocabulary, and the rules for combining its terms, used to describe things that needs to be communicated.

This is yet another tentative definition of what is an ontology applied for the semantic web. Before explaining that definition, I would like to continue by stating what I think is the main purpose of an ontology:

An ontology as for main purpose to communicate coherent and consistent information.

Different Kinds of Ontologies

Over the years, I tended to use the word “vocabulary,” along with the word “ontology,” in different blog posts and technical documents. However, the usage of each word may not always have been clear. Is an vocabulary an ontology? Is an ontology a vocabulary? Are these concepts synonymous? There is an important distinction to make: an ontology can be a vocabulary, but an ontology is much more than a simple vocabulary.

Ontologies can describe all kind of well-known knowledge representation structures, some simple, and others much more complex. Here is a small list of some of them:

  • lexicons
  • taxonomies, or
  • higher order knowledge description frameworks

In its most basic usage, an ontology will define a vocabulary. It will simply define the terms (words) that belongs to that vocabulary without saying anything regarding the usage of these words.

Then, an ontology could evolve into a taxonomy by defined hierarchical relationships between the terms that compose the vocabulary.

Finally, it can evolve further to become a higher order knowledge description framework that defines more complex usage rules such as: usage restrictions, all kind of relationships between described entities, etc. New knowledge could also be inferred. It is why I say that an ontology is not strictly a simple vocabulary, but that it powerful knowledge description framework.

Knowledge Base

As we saw above, the main purpose of an ontology is to be able to create a coherent and consistent knowledge base of information that can get communicated. So an ontology is a kind of language that let you create knowledge bases that are consistent, coherent and where new knowledge can be inferred. That is done by following the usage rules defined in the ontology.

However, there is another important aspect to take into account: an ontology will describe knowledge that is coherent and consistent, but according to the own World view of that ontology. This means that two ontologies, describing the same domain of knowledge, could consistently and coherently describe information according to their view of the World.

Let's take an example. Let's say that two book stores developed their own ontologies to describe the books they sell. Both companies sell books. There are good chances that they will use the same vocabulary to describe their books. However, the usage rules between these terms may differ between the two book stores. One of the book stores could say that a proceeding is a specialized kind of book. But the other book store could say that no, a proceeding is not a specialized kind of book, but that it is a document just like a book. So, both would describe a proceeding as a document, but one would have different interpretation rules about what a book really is. As you see, both book stores use the same vocabulary to define their library of books, but they interpret their meaning differently. If the two stores would have to exchange information about books in the future, they won't have many difficulties because they are probably sharing the same vocabulary, but the interpretation of that information may differ. The result of these potential differences in their interpretations may be where a book will be classified into the store; or how their customers could search for a specific book, using different filtering criterias; etc.

This is not different than what happens in our daily lives: is there a day in your life when you don't hear people arguing about different point of views? It is exactly the same thing that happens here. We potentially all live and see and the exact same events, images, sound, etc.; but we may all have a different interpretation of these things.

Ontologies in the Open Semantic Framework?

Ontologies are so flexible that we choose to make ontologies the "brain" of the Open Semantic Framework.

We wanted to use the most flexible knowledge description framework that would enable us to integrate any possible information sources that have been describe using any existing kind of simple, or really complex, knowledge representation structures such as simple: lexicons, taxonomies, relational schemas, etc. By using ontologies as its central piece, OSF is a flexibly data integration framework that can consolidate information from various, heterogeneous, sources of information.

If we remember the definition we started with, ontologies are not just about describing terms and their relationships in a coherent and consistent way. The ultimate purpose is to communicate that information. It is what the structWSF part of the Open Semantic Framework does: it let any kind of system that have access to the Internet to send, receive and manipulate information in multiple formats from a series of web service endpoints.

More Reading

Finally, I would suggest you to read Mike's Intrepid Guide to Ontologies to have a better understanding of where ontologies come from, how they works, what other formats exists, what are the different approaches to ontologies and what tools currently exists to work with ontologies.

by Frederick Giasson at September 18, 2011 03:55 PM

September 12, 2011

AI3:::Adaptive Information (Mike Bergman)

Making the Argument for Semantic Technologies

Judgment for Semantic TechnologiesFive Unique Advantages for the Enterprise

There have been some notable attempts of late to make elevator pitches [1] for semantic technologies, as well as Lee Feigenbaum’s recent series on Are We Asking the Wrong Question? about semantic technologies [2]. Some have attempted to downplay semantic Web connotations entirely and to replace the pitch with Linked Data (capitalized). These are part of a history of various ways to try to make a business case around semantic approaches [3].

What all of these attempts have in common is a view — an angst, if you will — that somehow semantic approaches have not fulfilled their promise. Marketing has failed semantic approaches. Killer apps have not appeared. The public has not embraced the semantic Web consonant with its destiny. Academics and researchers can not make the semantic argument like entrepreneurs can.

Such hand wringing, I believe, is misplaced on two grounds. First, if one looks to end user apps that solely distinguish themselves by the sizzle they offer, semantic technologies are clearly not essential. There are very effective mash-up and data-intensive sites such as many of the investment sites (Fidelity, TDAmeritrade, Morningstar, among many), real estate sites (Trulia, Zillow, among many), community data sites (American FactFinder, CensusScope, City-Data.com, among many), shopping sites (Amazon, Kayak, among many), data visualization sites (Tableau, Factual, among many), etc. , etc., that work well, are intuitive and integrate much disparate information. For the most part, these sites rely on conventional relational database backends and have little semantic grounding. Effective data-intensive sites do not require semantics per se [4].

Second, despite common perceptions, semantics are in fact becoming pervasive components of many common and conventional Web sites. We see natural language processing (NLP) and extraction technologies becoming common for most search services. Google and Bing sprinkle semantic results and characterizations across their standard search results. Recommendation engines and targeted ad technologies now routinely use semantic approaches. Ontologies are creeping into the commercial spaces once occupied by taxonomies and controlled vocabularies. Semantics-based suggestion systems are now the common technology used. A surprising number of smartphone apps have semantics at their core.

So, I agree with Lee Feigenbaum that we are asking the wrong question. But I would also add that we are not even looking in the right places when we try to understand the role and place of semantic technologies.

The unwise attempt to supplant the idea of semantic technologies with linked data is only furthering this confusion. Linked data is merely a means for publishing and exposing structured data. While linked data can lead to easier automatic consumption of data, it is not necessary to effective semantic approaches and is actually a burden on data publishers [5]. While that burden may be willingly taken by publishers because of its consumption advantages, linked data is by no means an essential precursor to semantic approaches. None of the unique advantages for semantic technologies noted below rely on or need to be preceded by linked data. In semantic speak, linked data is not the same as semantic technologies.

The essential thing to know about semantic technologies is that they are a conceptual and logical foundation to how information is modeled and interrelated. In these senses, semantic technologies are infrastructural and groundings, not applications per se. There is a mindset and worldview associated with the use of semantic technologies that is far more essential to understand than linked data techniques and is certainly more fundamental than elevator pitches or “killer apps.”

Five Unique Advantages

Thus, the argument for semantic technologies needs to be grounded in their foundations. It is within the five unique advantages of semantic technologies described below that the benefits to enterprises ultimately reside.

#1: Modern, Back-end Data Federation

The RDF data model — and its ability to represent the simplest of data up through complicated domain schema and vocabularies via the OWL ontology language — means that any existing schema or structure can be represented. Because of this expressiveness and flexibility, any extant data source or schema can be represented via RDF and its extensions. This breadth means that a common representation for any existing schema may be expressed. That expressiveness, in turn, means that any and all data representations can be described in a canonical way.

A shared, canonical representation of all existing schema and data types means that all of that information can now be federated and interrelated. The canonical means of federating information via the RDF data model is the foundational benefit of semantic technologies. Further, the practice of giving URIs as unique identifiers to all of the constituent items in this approach makes it perfectly suitable to today’s reality of distributed data accessible via the Web [6].

#2: Universal Solvent for Structure

I have stated many times that I have not met a form of structured data I did not like [7]. Any extant data structure or format can be represented as RDF. RDF can readily express information contained within structured (conventional databases), semi-structured (Web page or XML data streams), or unstructured (documents and images) information sources. Indeed, the use of ontologies and entity instance records in RDF is a powerful basis for driving the extraction systems now common for tagging unstructured sources.

(One of the disservices perpetuated by an insistence on linked data is to undercut this representational flexibility of RDF. Since most linked data is merely communicating value-attribute pairs for instance data, virtually any common data format can be used as the transmittal form.)

The ease of representing any existing data format or structure and the ability to extract meaningful structure from unstructured sources makes RDF a “universal solvent” for any and all information. Thus, with only minor conversion or extraction penalties, all information in its extant form can be staged and related together via RDF.

#3: Adaptive, Resilient Schema

A singular difference between semantic technologies (as we practice them) and conventional relational data systems is the use of an open world approach [8]. The relational model is a paradigm where the information must be complete and it must be described by a schema defined in advance. The relational model assumes that the only objects and relationships that exist in the domain are those that are explicitly represented in the database. This makes the closed world of relational systems a very poor choice when attempting to combine information from multiple sources, to deal with uncertainty or incompleteness in the world, or to try to integrate internal, proprietary information with external data.

Semantic technologies, on the other hand, allow domains to be captured and modeled in an incremental manner. As new knowledge is gained or new integrations occur, the underlying schema can be added to and modified without affecting the information that already exists in the system. This adaptability is generally the biggest source of economic benefits to the enterprise from semantic technologies. It is also a benefit that enables experimentation and lowers risk.

#4: Unmatched Productivity

Having all information in a canonical form means that generic tools and applications can be designed to work against that form. That, in turn, leads to user productivity and developer productivity. New datasets, structure and relationships can be added at any time to the system, but how the tools that manipulate that information behave remains unchanged.

User productivity arises from only needing to learn and master a limited number of toolsets. The relationships in the constituent datasets are modeled at the schema (that is, ontology) level. Since manipulation of the information at the user interface level consists of generic paradigms regarding the selection, view or modification of the simple constructs of datasets, types and instances, adding or changing out new data does not change the interface behavior whatsoever. The same bases for manipulating information can be applied no matter the datasets, the types of things within them, or the relationships between things. The behavior of semantic technology applications is very much akin to having generic mashups.

Developer productivity results from leveraging generic interfaces and APIs and not bespoke ones that change every time a new dataset is added to the system. In this regard, ontology-driven applications [9] arising from a properly designed semantic technology framework also work on the simple constructs of datasets, types and instances. The resulting generalization enables the developer to focus on creating logical “packages” of functionality (mapping, viewing, editing, filtering, etc.) designed to operate at the construct level, and not the level of the atomic data.

#5: Natural, Connected Knowledge Systems

All of these factors combine to enable more and disparate information to be assembled and related to one another. That, in turn, supports the idea of capturing entire knowledge domains, which can then be expanded and shifted in direction and emphasis at will. These combinations begin to finally achieve knowledge capture and representation in its desired form.

Any kind of information, any relationship between information, and any perspective on that information can be captured and modeled. When done, the information remains amenable to inspection and manipulation through a set of generic tools. Rather simple and direct converters can move that canonical information to other external forms for use by existing external tools. Similarly, external information in its various forms can be readily converted to the internal canonical form.

These capabilities are the direct opposite to today’s information silos. From its very foundations, semantic technologies are perfectly suited to capture the natural connections and nature of relevant knowledge systems.

A Summary of Advantages Greater than the Parts

There are no other IT approaches available to the enterprise that can come close to matching these unique advantages. The ideal of total information integration, both public and private, with the potential for incremental changes to how that information is captured, manipulated and combined, is exciting. And, it is achievable today.

With semantic technologies, more can be done with less and done faster. It can be done with less risk. And, it can be implemented on a pay-as-you-benefit basis [10] responsive to the current economic climate.

But awareness of this reality is not yet widespread. This lack of awareness is the result of a couple of factors. One factor is that semantic technologies are relatively new and embody a different mindset. Enterprises are only beginning to get acquainted with these potentials. Semantic technologies require both new concepts to be learned, and old prejudices and practices to be questioned.

A second factor is the semantic community itself. The early idea of autonomic agents and the heavy AI emphasis of the initial semantic Web advocacy now feels dated and premature at best. Then, the community hardly improved matters with its shift in emphasis to linked data, which is merely a technique and which completely overlooks the advantages noted above.

However, none of this likely matters. The five unique advantages for enterprises from semantic technologies are real and demonstrable today. While my crystal ball is cloudy as to how fast these realities will become understood and widely embraced, I have no question they will be. The foundational benefits of semantic technologies are compelling.

I think I’ll take this to the bank while others ride the elevator.


[1] This series was called for by Eric Franzon of SemanticWeb.com. Contributions to date have been provided by Sandro Hawke, David Wood, and Mark Montgomery.
[2] See Lee Feigenbaum, 2011. “Why Semantic Web Technologies: Are We Asking the Wrong Question?,” TechnicaLee Speaking blog, August 22, 2011; see http://www.thefigtrees.net/lee/blog/2011/08/why_semantic_web_technologies.html, and its follow up on “The Magic Crank,” August 29, 2011; see http://www.thefigtrees.net/lee/blog/2011/08/the_magic_crank.html. For a further perspective on this issue from Lee’s firm, Cambridge Semantics, see Sean Martin, 2010. “Taking the Tech Out of SemTech,” presentation at the 2010 Semantic Technology Conference, June 23, 2010. See http://www.slideshare.net/LeeFeigenbaum/taking-the-tech-out-of-semtech.
[3] See, for example, Jeff Pollock, 2008. “A Semantic Web Business Case,” Oracle Corporation; see http://www.w3.org/2001/sw/sweo/public/BusinessCase/BusinessCase.pdf.
[4] Indeed, many semantics-based sites are disappointingly ugly with data and triples and URIs shoved in the user’s face rather than sizzle.
[5] Linked data and its linking predicates are also all too often misused or misapplied, leading to poor quality of integrations. See, for example, M.K. Bergman and F. Giasson, 2009. “When Linked Data Rules Fail,” AI3:::Adaptive Innovation blog, November 16, 2009. See http://www.mkbergman.com/846/when-linked-data-rules-fail/.
[6] Greater elaboration on all of these advantages is provided in M. K. Bergman, 2009. “Advantages and Myths of RDF,” AI3:::Adaptive Innovation blog, April 8, 2009. See http://www.mkbergman.com/483/advantages-and-myths-of-rdf/.
[7] See M.K. Bergman, 2009. “‘Structs’: Naïve Data Formats and the ABox,” AI3:::Adaptive Innovation blog, January 22, 2009. See http://www.mkbergman.com/471/structs-naive-data-formats-and-the-abox/.
[8] A considerable expansion on this theme is provided in M.K. Bergman, 2009. “‘The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Innovation blog, December 21, 2009. See http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/.
[9] For a full expansion on this topic, see M.K. Bergman, 2011. “Ontology-driven Apps Using Generic Applications,” AI3:::Adaptive Innovation blog, March 7, 2011. See http://www.mkbergman.com/948/ontology-driven-apps-using-generic-applications/.
[10] See M.K. Bergman, 2010. “‘Pay as You Benefit’: A New Enterprise IT Strategy,” AI3:::Adaptive Innovation blog, July 12, 2010. See http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/.

by Mike Bergman at September 12, 2011 09:11 AM

September 11, 2011

DBpedia Blog

DBpedia 3.7 released, including 15 localized Editions

Hi all,

we are happy to announce the release of DBpedia 3.7. The new release is based on Wikipedia dumps dating from late July 2011.

The new DBpedia data set describes more than 3.64 million things, of which 1.83 million are classified in a consistent ontology, including 416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 169,000 organizations, 183,000 species and 5,400 diseases.

The DBpedia data set features labels and abstracts for 3.64 million things in up to 97 different languages; 2,724,000 links to images and 6,300,000 links to external web pages; 6,200,000 external links into other RDF datasets, and 740,000 Wikipedia categories. The dataset consists of 1 billion pieces of information (RDF triples) out of which 385 million were extracted from the English edition of Wikipedia and roughly 665 million were extracted from other language editions and links to external datasets.

Localized Editions

Up till now, we extracted data from non-English Wikipedia pages only if there exists an equivalent English page, as we wanted to have a single URI to identify a resource across all 97 languages. However, since there are many pages in the non-English Wikipedia editions that do not have an equivalent English page (especially small towns in different countries, e.g. the Austrian village Endach, or legal and administrative terms that are just relevant for a single country) relying on English URIs only had the negative effect that DBpedia did not contain data for these entities and many DBpedia users have complained about this shortcoming.

As part of the DBpedia 3.7 release, we now provide 15 localized DBpedia editions for download that contain data from all Wikipedia pages in a specific language. These localized editions cover the following languages: ca, de, el, es, fr, ga, hr, hu, it, nl, pl, pt, ru, sl, tr. The URIs identifying entities in these i18n data sets are constructed directly from the non-English title and a language-specific URI namespaces (e.g. http://ru.dbpedia.org/resource/Berlin), so there are now 16 different URIs in DBpedia that refer to Berlin. We also extract the inter-language links from the different Wikipedia editions. Thus, whenever a inter-language links between a non-English Wikipedia page and its English equivalent exists, the resulting owl:sameAs link can be used to relate the localized DBpedia URI to the equivalent in the main (English) DBpedia edition. The localized DBpedia editions are provided for download on the DBpedia download page (http://wiki.dbpedia.org/Downloads37). Note that we have not provide public SPARQL endpoints for the localized editions, nor do the localized URIs dereference. This might change in the future, as more local DBpedia chapters are set up in different countries as part of the DBpedia internationalization effort (http://dbpedia.org/Internationalization).

Other Changes

Beside the new localized editions, the DBpedia 3.7 release provides the following improvements and changes compared to the last release:

1. Framework

  • Redirects are resolved in a post-processing step for increased inter-connectivity of 13% (applied for English data sets)
  • Extractor configuration using the dependency injection principle
  • Simple threaded loading of mappings in server
  • Improved international language parsing support thanks to the members of the Internationalization Committee: http://dbpedia.org/Internationalization

2. Bugfixes

  • Encode homepage URLs to conform with N-Triples spec
  • Correct reference parsing
  • Recognize MediaWiki parser functions
  • Raw infobox extraction produces more object properties again
  • skos:related for category links starting with “:” and having and anchor text
  • Restrict objects to Main namespace in MappingExtractor
  • Double rounding (e.g. a person’s height should not be 1800.00000001 cm)
  • Start position in abstract extractor
  • Server can handle template names containing a slash
  • Encoding issues in YAGO dumps

3. Ontology

  • 320 ontology classes
  • 750 object properties
  • 893 datatype properties
  • owl:equivalentClass and owl:equivalentProperty mappings to http://schema.org

Note that the ontology now is a directed-acyclic graph. Classes can have multiple superclasses, which was important for the mappings to schema.org. A taxonomy can still be constructed by ignoring all superclass but the one that is specified first in the list and is considered the most important.

4. Mappings

  • Dynamic statistics for infobox mappings showing the overall and individual coverage of the mappings in each language: http://mappings.dbpedia.org/index.php/Mapping_Statistics
  • Improved DBpedia Ontology as well as improved Infobox mappings using http://mappings.dbpedia.org/. These improvements are largely due to collective work by the community before and during the DBpedia Mapping Creation Sprint. For English, there are 17.5 million RDF statements based on mappings (13.8 million in version 3.6) (see also http://dbpedia.org/Downloads37#ontologyinfoboxproperties).
  • ConstantProperty mappings to capture information from the template title (e.g. Infobox_Australian_Road {{TemplateMapping | mapToClass = Road | mappings = {{ConstantMapping | ontologyProperty = country | value = Australia }}}})
  • Language specification for string properties in PropertyMappings (e.g. Infobox_japan_station: {{PropertyMapping | templateProperty = name | ontologyProperty = foaf:name | language = ja}} )
  • Multiplication factor in PropertyMappings (e.g. Infobox_GB_station: {{PropertyMapping | templateProperty = usage0910 | ontologyProperty = passengersPerYear | factor = 1000000}}, because it’s always specified in millions)

5. RDF Links to External Data Sources

  • New RDF links pointing at resources in the following Linked Data sources: Umbel, EUnis, LinkedMDB, Geospecis
  • Updated RDF links pointing at resources in the following Linked Data sources: Freebase, WordNet, Opencyc, New York Times, Drugbank, Diseasome, Flickrwrapper, Sider, Factbook, DBLP, Eurostat, Dailymed, Revyu

Accessing the new DBpedia Release

You can download the new DBpedia dataset from http://dbpedia.org/Downloads37.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql).

Credits

Lots of thanks to

  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia extraction framework and for extracting the new datasets.
  • Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece) for providing language generalizations to the extraction framework.
  • Paul Kreis (Freie Universität Berlin, Germany) for administering the ontology and for delivering the mapping statistics and schema.org mappings.
  • Uli Zellbeck (Freie Universität Berlin, Germany) for providing the links to external datasets using the Silk framework.
  • The whole Internationalization Committee for expanding some DBpedia extractors to a number of languages:
    http://dbpedia.org/Internationalization.
  • Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the dataset into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.

The work on the new release was financially supported by:

  • The European Commission through the project LOD2 - Creating Knowledge out of Linked Data (http://lod2.eu/, improvements to the extraction framework).
  • The European Commission through the project LATC - LOD Around the Clock (http://latc-project.eu/, creation of external RDF links).
  • Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/).

More information about DBpedia is found at http://dbpedia.org/About

Have fun with the new data set!

Cheers,

Chris Bizer

by ChrisBizer at September 11, 2011 09:14 AM

August 30, 2011

HyperDanja (Danny Ayers)

August 19, 2011

DBTune Blog

4Store stuff

Update: The repository below is not maintained anymore, as official packages have been pushed into Debian. They are not yet available for Ubuntu 11.04 though. In order install 4store on Natty you'd have to install the following packages from the Oneiric repository, in order:

  • libyajl1
  • libgmp10
  • libraptor2
  • librasqal3
  • lib4store0
  • 4store

And you should have a running 4store (1.1.3).

Old post, for reference: I've been playing a lot with Garlik's 4store recently, and I have been building a few things around it. I just finished building packages for Ubuntu Jaunty, which you can get by adding the following lines in your /etc/apt/sources.list:

deb http://moustaki.org/apt jaunty main
deb-src http://moustaki.org/apt jaunty main

And then, an apt-get update && apt-get install 4store should do the trick. The packages are available for i386 and amd64. It is also one of my first packages, so feedback is welcomed (I may have gotten it completely wrong). After being installed, you can create a database and start a SPARQL server.

I've also been writing two client libraries for 4store, all available on Github:

  • 4store-php, a PHP library to interact with 4store over HTTP (so not exactly similar to Alexandre's PHP library, which interacts with 4store through the command-line tools);
  • 4store-ruby, a Ruby library to interact with 4store over HTTP or HTTPS.

by Yves at August 19, 2011 09:29 AM

August 15, 2011

Displacement Activities (Tom Heath)

Back Online after the Spam-fest

Just a quick post now this blog is back online after being badly compromised by spammers. I took everything down and let the links 404 for a while in the hope that it would encourage search engines to clear out their indexes, and the search engine referrals seems to be getting cleaner now, which is a relief. May this be the last of it.

by Tom Heath at August 15, 2011 10:01 AM

AI3:::Adaptive Information (Mike Bergman)

Of Flagpoles and Fishes

World's Tallest Flagpole; see ref [9]The New Paradigm of ‘Substantive Marketing’ for Innovative IT

This decade has clearly marked a sea change in the move of enterprise software from proprietary to open source, as I have recently discussed [1]. It is instructive that only a mere six years ago I was in heated fights with my then Board about open source; today, that seems so quaint and dated.

Also during this period many have noted how open source has changed the capital required to begin a new software startup [2]. Open source both provides the tooling and the components for cobbling together specialty apps and extensions. Six and seven and even eight figure startup costs common just a decade ago have now dropped to four or five figures. When we see the explosion of hundreds of thousands of smartphone apps we are seeing the glowing residue of these additional sea changes. Dropping startup costs by one to three orders of magnitude is truly democratizing innovation.

But something else has been going on that is changing the face of enterprise software (besides consolidation, another factor I also recently commented on). And that factor is “marketing”. Much less commentary is made about this change, but it, too, is greatly lowering costs and fundamentally changing market penetration strategies. That topic — and my personal experience with it — is the focus of this article.

The Obsolete Recent Past

Besides the few remaining big providers of enterprise software — like IBM, Oracle, HP, SAP — most vendors have totally remade their sales practices of just a few years ago. Large sales forces with big commissions and a year to two year sales cycles can no longer be justified when software license fees and the percentage maintenance annuities that flow from them are dropping rapidly. Today’s mantras are doing more with less and doing it faster, hardly consistent with the traditional enterprise software model. Sure, big enterprises, especially big government and big business, have large sunk costs in legacy systems that will continue to be milked by existing vendors. But the flow is constricting with longer-term trends clear to see. The old enterprise software model is obsolete.

Even if it were not dying, it is hard to square huge investments in sales and marketing when product development has become inexpensive and agile. The proliferation of three-letter marketing acronyms for branding “new” product areas and standard formulas for product hype of just a few years ago also feels old and dated. Cozy relationships with conventional trade press pundits and market analysts seem to be diminishing in importance, possibly because the authoritativeness of their influence is also diminishing. It is harder to justify market firm subscription costs when priority budget items are being cut and new information outlets have emerged.

In response to this, many developers have forsaken the enterprise market for the consumer one. Indeed enterprises themselves are looking more and more to the consumer sector and commodity apps for innovation and answers. But, still, problems unique to enterprises remain and how to effectively reach them in this brave new world is today’s marketing problem for enterprise software vendors.

Most entities today, when opining about these challenges, tend to emphasize the need for “laser focus” and “rifle-shot” targeting of prospects. The advice takes the form of: 1) emphasize well-defined verticals; 2) know your market well; and 3) target and go after your likely prospects. Prospect data mining and targeted ad analysis are the proferred elixirs.

But, there is little evidence such refined methods for prospect identification and targeting are really working. Like politicians doing focus groups and opinion polling to capture the desired “message” of their potential electorates, these are all still “push” models of marketing. Yet we are swamped with pushed messages and marketing everywhere we turn. The model is failing.

Besides message overload, there are two issues with laser targeting. First, despite all that we try to know about ready buyers (for enterprise software), we really don’t know if any particular individual is truly needful, in a position to buy, has the authority to buy, or is the right advocate to make the internal sell. Second, though the idea of “laser” carries with it the image of focus and not flailing, it is in fact expensive to identify the targets and send a focused message their way. Because of these issues, decay rates for laser prospects throughout conventional sales pipelines continue to rise.

A New Marketing ParadigmNew Paradigm Roadsign

There has always been the phenomenon of the “fish jumping into the boat“; that is, the unanticipated inbound inquiry from a previously unknown prospect leading to a surprisingly swift sale. But we have seen this phenomenon increase markedly in recent years. Structured Dynamics‘ current customer base — including recurring customers — comes almost exclusively from this source. As we have noted this trend in comparison with more targeted outreach, we have spent much time trying to understand why it is occurring and how we can leverage what Peter Drucker called the “unexpected success” [3].

What we are seeing, I believe, is a shift from sales to marketing, and within marketing from direct or outbound marketing to a new paradigm of marketing. Others have likened this to inbound marketing [4] or content marketing [5] or permission marketing [6]. What we are seeing at Structured Dynamics bears many resemblances to parts of what is claimed for these other approaches, but not all. And, it is also true that what we are seeing may pertain mostly to innovative IT for emerging enterprise markets, and not a generalized paradigm suitable to other products or markets.

For lack of a better term, what we are seeing we can term “substantive marketing”. By this we mean offering valuable content and solutions-oriented systems for free and without restriction. This shares aspects with content marketing. Then, in keeping with the trend for buyers doing their own research and analysis to fulfill their own needs, similar to the premises of inbound or permission marketing, potential consumers can make their own judgments as to relevance and value of our offerings.

Sometimes, of course, some prospects find our approaches and solutions lacking. Sometimes, they may grab what we have offered for free and use them on their own without compensation to us. But where the match is right — and we need to be honest with both ourselves and the customer when it is not — we can better spend the customer’s limited time and resources to tailor our generic solutions to their specific needs. In doing so, we offer higher value (tailored services) while learning better about another spectrum of consumer need that can virtuously enhance our substantive offerings for the next prospect.

So, let’s decompose these components further to see what they can tell us about this new practice of substantive marketing and how to use it as an engine for moving forward.

Substantive Marketing

The Virtuous Cycle Begins with Substantive Solutions

The premise of substantive marketing is to offer square-deal value to the marketplace in the form of solutions-based content. Like content marketing that offers “the creation or sharing of content for the purpose of engaging current and potential consumer bases” [5], substantive marketing goes even further. The whole basis and premise of the approach is to provide substantive content, in one of more of these areas, preferably all:

  • Knowledge — this substantive area includes papers, commentary, survey results or listings of tools and references useful to the target market
  • Analysis — this content area includes unique analysis of market trends, data, technologies or reviews that pertain to the target market
  • Code — this area relates to the provision of open source code and tools, preferably under licenses that allow users to use the software without restriction (two examples are the Apache 2 license and the MIT license)
  • Documentation — a critical substantive area is the documentation in how to install, use, modify or customize these tools, including a prejudice to APIs and tutorial information
  • Methodologies, workflows and best practices — it is important to also discuss how to properly operate and utilize these tools and information. Taking care to document lessons learned and best practices also helps the user community avoid common mistakes and to speed adoption and utility, and
  • Demos — this area involves setting up (and sharing code and procedures for same) demos that show how the code and its methods actually work. Demos also become first use cases to aid the new user in learning and setting up the code bases.

Further, this substantive content is offered without strings, restrictions or customer fill-in forms. The content is not a come on or a teaser. We are not trying to gather leads or prospect names, because we have no intent to dun them with emails or follow-ups.

This substantive content is as complete as can be to enable new users to adopt the information and tools in their current state without further assistance. (In some cases, the information also educates the marketplace in order to prepare future customers for adoption.) Most importantly, this substantive content is offered for free, either open source (for code) or creative commons for documentation and other content. In return, it is fair to request — and we do — attribution when this material is used.

We have previously termed this complete panoply of substantive content a total open solution [7]. Some might find the provision of such robust information crazy: How can we give away the store of our proprietary knowledge and systems? But we find this kind of thinking old school. In an open source world where so much information is now available online, with a bit of effort customers can find this information anyway. Rather, our mindset is that customers do not want to pay again for what has already been done, but are willing to pay for what can be done with that knowledge for their own specific problems. Offering the complete storehouse of our knowledge in fact signals our interest in only charging the customer for new answers, new value or new formulations. The customers we like to work with feel they are getting an honest, square deal.

Flagpole Venues Help Increase Awareness

Consider your substantive content to be your flag, a unique banner for conveying and packaging your specific brand. It is thus important to find appropriate flagpoles — in the virtual territories that your customers visit — for raising this content high for them to see. Since the role of these flagpoles is to create awareness in potential prospects — who you do not likely know individually or even by group in advance — it makes sense to raise your offerings up on many flagpoles and on the highest flagpoles. Visibility is the object of the approach.

This approach is distinctly not leafletting or cramming links or emails into as many spaces as possible. The idea of substantive marketing is to fly valuable content high enough that desirous potential customers can discover and then inspect the information on their own, and only if they so choose. In this regard, substantive marketing resembles permission marketing [6].

Being visible helps ensure that the needful, questing prospect that you would never have been able to target on your own is able to see and be aware of your offerings. And, since they are seeking information and answers, your collateral needs to be of a similar nature. Solutions and substance are what they are seeking; what you have run up the flagpole should respond to that.

The mindset here is to respect your prospective customers and to allow them to chose to receive and inspect your offerings, but only if they so choose. If flown in the right venues with the right visibility, customers will see your flags and inspect them if they meet their requirements.

Some of the venues at which you can raise your flags include:

  • Blogs — this venue is especially helpful, since you have complete control over content, message, voice and packaging
  • Social networks — the value of social networks is now accepted, and should be a core component of any visibility strategy. However, it is also important to make sure that your contributions are driven by substance and value and do not become part of the cacophonous background noise
  • Vertical media — there are always existing outlets well-read and -respected by your customer propects. Establishing relationships and value with these third-party outlets can extend your reach
  • Web sites — this venue includes your standard Web sites, of course. But, you should also consider setting up specific project-related sites or sites dedicated to documentation (c.f., our TechWiki site of 300+ technical articles) or to methodologies (the excellent MIKE2.0 site is one great example) or to other ways by which particular content (such as tools with the Sweet Tools site) can raise another flag
  • User forums — user discussion groups and forums also become their own attractants for like-interested prospects, and
  • Conferences and tradeshows — while potentially valuable, presence at conferences and tradeshows must be carefully evaluated. Since participation and opportunity costs are high, the venues should be clearly relevant to your market space with likely decision makers in attendance.

The observant reader will have already concluded that each of these venues develops slowly, and therefore raising visibility is generally a slow-and-steady game that requires patience. Start-up vendors backed by venture firms or those looking for quick visibility and cashout will not find this approach suitable. On the other hand, customer prospects looking for answers and self-sustaining solutions are not much interested in flash in the pan vendors, either.

A Model Responsive to the Changing Nature of Customer Prospects

The real drivers for this changing paradigm come from customer prospects. Sophisticated buyers of enterprise IT and instrumental change agents within organizations share most if not all of these characteristics:

  • They are inundated with marketing messages and jaded about hype and “pushed” messages
  • They are generally knowledgeable about their needs and problem spaces and about approximate technologies. They are eager and desirous of learning independently and know that their recommendations affect their personal reputations and standing within their enterprises
  • With the many volatile external and internal changes, including staff reductions and fluid assignments, leadership for new technology adoption can come from many different and unknown corners of the organization; it is extremely difficult to identify and target prospects
  • The economic and competitive environment places a premium on affordability and low-risk evaluations of new technologies
  • Lock-ins of any kind — be it to specific vendors or technologies — are understood as inherently risky. This understanding is raising the importance of open and standards-based approaches
  • Being the subject of a pushy sales effort is distasteful and a negative to an eventual sale. Education and learning, however, is respected
  • Because of all that is at stake, honesty with no bullshit is highly appreciated. If you as a vendor do not offer an appropriate solution or have fulfillment weaknesses, tell the prospect so. Further, tell them who can supply the solution. One never knows when and where the next problem may arise, and providing trustworthy advice can lead to later engagements.

More often than not we find our customers to have already installed and used our existing substantive materials for some time before they approach us about further work. They appreciate the tutorial information and have taught themselves much in advance. By the time we engage, both parties are able to cost-effectively focus on what is truly missing and needed and to deliver those answers in a quick way. Re-engagements tend to occur when a next set of gaps or challenges arise.

Though it may sound trite or even unbelievable to those who have not yet experienced such a relationship, the square deal value offered by substantive marketing can really lead to true partnerships and trust between vendor and customer. We experience it daily with our customers, and vice versa. We also think this is the adaptive approach that our new environment demands.

The Free Path to Open Source and Solutions

Once prospects learn of our substantive offerings, many may decide independently that what we have is not suitable. Others may simply download and use the information on their own, for which we often never know let alone receive revenue. We are completely fine with this, as shown for three different cases.

First, some of these prospects need no more than what we already have. This increases our user base, increases our visibility and often results in contributions to our forums and documentation.

Then, some of these prospects come to learn they need or want more than what our current offerings provide, leading to two possible forks. In one fork, the second case, they may have sufficient skills internally or with other suppliers to extend the system on their own. Some of this flows back to an improved code base or improved installation or documentation bases.

In the other fork, the third case, they may decide to engage us in tailoring a solution for them. That case is the only one of the three that leads to a direct revenue path.

In all three cases we win, and the customer wins. Maybe enterprise software vendors of decades past rue this reality of lower margins and shared benefits; we agree that the absolute profit potential of substantive marketing is much less. But we gladly accept the more enjoyable work and steady revenue relationships resulting from these changes. We are not engaged in some pollyann-ish altruism here, but in a steely-eyed honest brokering that best serves our own self-interest (and fairly that of the customer, as well).

A Square Deal Baseline for Tailored Services

Great IT product does not come from idle musings or dreamed up functionality. It comes solely and directly from solving customer problems. Only via customers can software be refined and made more broadly usable.

A slipstream of those who have previously become aware and tested our offerings will choose to engage our services. This generally takes the form of an inbound call, where the prospect not only qualifies itself, but also establishes the terms and conditions for the sale. They have chosen to select us; they are fish that have jumped into the boat.

To again quote Peter Drucker, “. . . the aim of marketing is to make selling superfluous. The aim of marketing is to know and understand the customer so well that the product or service fits him and sells itself. Ideally, marketing should result in a customer who is ready to buy. All that should be needed then is to make the product or service available . . .” [8]. This is precisely what I meant earlier about the shift in emphasis from sales to marketing.

Even at this point there may be mismatches in needs and our skills and availabilities. If such is the case, we do not hesitate to say so, and attempt to point the prospect in another direction (from which we also gain invaluable market knowledge). If there is indeed a match, we then proceed to try to find common ground on schedule and budget.

Paradoxically, this square deal and honesty about the readiness and weaknesses of our offerings often leads to forgiveness from our customers. For example, for some time we have lacked automated installation scripts that would make it easier for prospects to install our open semantic framework. But, because of compensating value in other areas, such gaps can be overlooked and tackled later on (indeed, as a current customer is now funding). By not pretending to be everything to everyone, we can offer what we do have without embarrassment and get on with the job of solving problems.

For larger potential engagements, we typically suggest a fixed price initial effort to develop an implementation plan. The interviews and research to support this typical 4- to 6-weeks effort (generally in the $5 K to $10 K range, depending) then result in a detailed fulfillment proposal, with firm tasks, budget and schedule, specific to that customer’s requirements. Just as we respect our prospects’ time and budget, we expect the same and do not conduct these detailed plans without compensation. With respect to fulfillment contracts, we cap contract amount and limit milestone payments to pre-set percentages or time expended, whichever is lower.

This approach ensures we understand the customer’s needs and have budgeted and tasked accordingly. Capped contracts also put the onus on us the contractor to understand our own effort and tasking structures and realities, which leads to better future estimating. For the customer, this approach caps risk and potential exposure, and ensures milestones are being met no matter the time expenditures by us, the contractor. This approach extends our square-deal basis to also embrace risks and payments.

New (and Open Source) Developments Fuel the Substance Pipeline

Thus, when customers engage us, they spend almost solely on new functionality specifically tailored to their needs. In doing so, we suggest they agree to release the new developments they fund as open source. We argue — and customers predominantly agree — that they are already benefitting from lower overall costs because other customers have funded sharable, open source before them. We point out that the new customers that follow them will also be independently creating new functionality, to which they will also later benefit.

(This argument does not apply to specific customer data or ontologies, which are naturally proprietary to the customer. Also, if the customer wants to retain intellectual ownership of extensions, we charge higher development fees.)

Once these new developments are completed, they are fed back into a new baseline of valuable content and code. From this new baseline the cycle of substantive marketing can be augmented anew and perpetuated.

Three Guidelines to Leverage Substantive Marketing

All of these points can really be boiled down to three guidelines for how to make substantive marketing effective:

  • First, whatever your domain or market, provide useful and substantive content. The content you offer is indeed your marketing collateral. Prospective customers can gauge from it directly whether it meets their needs, appears sound and workable, and has value. If you have little of substance to offer, this paradigm is not for you
  • Second, plant many flagpoles and raise your flags high in territories your market prospects are likely to visit. This is a process that requires thoughtfulness and patience. Thoughtfulness, because that is how you determine where to plant your flags. If you yourself are a consumer of what you offer, it is easier to find those venues. And patience, because it takes time to stack valuable content upon valuable content in order to raise visibility
  • And, third, be honest and respectful. Help your prospect work within available budget to achieve the most possible at lowest risk. And help them find others, if need be, who might be better able than you to truly solve their problems.

What we are finding — as we continue to refine our understanding of this new paradigm — is that through substantive marketing the fish are finding us and they sometimes jump into the boat. We like our enterprise customers to pre-qualify themselves and already be “sold” once they knock on the door. One never knows when that phone might ring or the email might come in. But when it does, it often results in a collaborative customer as a partner who is a joy to work with to solve exciting new problems.


[1] M.K. Bergman, 2011. “Declining IT Innovation in the Enterprise,” in AI3:::Adaptive Innovation blog, January 17, 2011. See http://www.mkbergman.com/940/declining-it-innovation-in-the-enterprise/.
[2] Paul Graham has been the most prominent observer of this scene; see P. Graham, 2008. “Why There Aren’t Any More Googles,” April 2008 (see http://www.paulgraham.com/googles.html) and subsequent articles.
[3] See esp. Peter F. Drucker, 1985. Innovation and Entrepreneurialship: Practice and Principals, Harper & Row, New York, NY, 277 pp.
[4] Inbound marketing is a marketing strategy that focuses on getting found by customers. According to David Meerman Scott, inbound marketers “earn their way in” (via publishing helpful information on a blog etc.) in contrast to outbound marketing where they used to have to “buy, beg, or bug their way in” (via paid advertisements, issuing press releases in the hope they get picked up by the trade press, or paying commissioned sales people, respectively). Brian Halligan, cofounder and CEO of HubSpot, claims he first coined the term of inbound marketing.
[5] Content marketing is an umbrella term encompassing all marketing formats that involve the creation or sharing of content for the purpose of engaging current and potential consumer bases. In contrast to traditional marketing methods that aim to increase sales or awareness through interruption techniques, content marketing subscribes to the notion that delivering high-quality, relevant and valuable information to prospects and customers drives profitable consumer action. See also Holger Shulze, 2011. B2B Content Marketing Trends slideshow, see http://www.slideshare.net/hschulze/b2b-content-marketing-report.
[6] Seth Godin coined the term permission marketing wherein marketers obtain permission before advancing to the next step in the purchasing process. It is mostly used by online marketers, notably email marketers and search marketers, as well as certain direct marketers who send a catalog in response to a request. Godin contrasts this approach to traditional “interruption marketing” where messages are sent without prior permission.
[7] See the three-part series, M.K. Bergman, 2010. “Listening to the Enterprise: Total Open Solutions,” “Part 1,” “Part 2” and “Part 3,” AI3:::Adaptive Information blog, May 12 – 31, 2010.
[8] Peter F. Drucker, 1974. Management: Tasks, Responsibilities, Practices. New York, NY: Harper & Row. pp. 864. ISBN 0-06-011092-9.
[9] The intro photo is of the world’s tallest flagpole (at 165 m), in Dushanbe, Tajikistan. The photo is courtesy of CentralAsiaOnline.com.

by Mike Bergman at August 15, 2011 08:25 AM

August 08, 2011

AI3:::Adaptive Information (Mike Bergman)

A New Best Friend: Gephi for Large-scale Networks

Geshi NetworkVisualization + Analysis Pushes Aside Cytoscape

Though I never intended it, some posts of mine from a few years back dealing with 26 tools for large-scale graph visualization have been some of the most popular on this site. Indeed, my recommendation for Cytoscape for viewing large-scale graphs ranks within the top 5 posts all time on this site.

When that analysis was done in January 2008 my company was in the midst of needing to process the large UMBEL vocabulary, which now consists of 28,000 concepts. Like anything else, need drives research and demand, and after reviewing many graphing programs, we chose Cytoscape, then provided some ongoing guidelines in its use for semantic Web purposes. We have continued to use it productively in the intervening years.

Like for any tool, one reviews and picks the best at the time of need. Most recently, however, with growing customer usage of large ontologies and the development of our own structOntology editing and managing framework, we have begun to butt up against the limitations of large-scale graph and network analysis. With this post, we announce our new favorite tool for semantic Web network and graph analysis — Gephi — and explain its use and showcase a current example.

The Cytoscape Baseline and Limitations

Three and one-half years ago when I first wrote about Cytoscape, it was at version 2.5. Today, it is at version 2.8, and many aspects have seen improvement (including its Web site). However, in other respects, development has slowed. For example, version 3.x was first discussed more than three years ago; it is still not available today.

Though the system is open source, Cytoscape has also largely been developed with external grant funds. Like other similarly funded projects, once and when grant funds slow, development slows as well. While there has clearly been an active community behind Cytoscape, it is beginning to feel tired and a bit long in the tooth. From a semantic Web standpoint, some of the limitations of the current Cytoscape include:

  • Difficult conversion of existing ontologies — Cytoscape requires creating a CSV input; there was an earlier RDFscape plug-in that held great promise to bridge the software into the RDF and semantic Web sphere, but it has not remained active
  • Network analysis — one of the early and valuable generalized network analysis plug-ins was NetworkAnalyzer; however, that component has not seen active devel