Discussions on Organism Classification
Start a New Discussion
-
-
hey kirilly, good day- few things:
Crime base - is leaning awkwardly on your Legal case type, but probably this is unfair. I know little about law, but it probably makes sense to make different types for criminal and civil trials, and so is yours the civil law type then? maybe we can do this. either way, lets link both of em to Lawyer. and maybe even both to Criminal Conviction?
id like to fill in fort types on your Fort can you recip it?
if the structure2 type goes through, and danm reciprocates Death-causing event on deceased person, disaster can shed all the specific people properties we did, and just be 'statistical' or whatever. agree? danm hasn't responded yet.
-
hey, ive stumbled on a host of topics that i think are wrongly typed, are you the go-to person for such thing?
if so, its dinosaurs. i'm trying to do a list of all dinosaurs but for example all the Ornithopods are already typed as 'organism classification', instead of organism, like Laosaurus. its a real mess, its the whole dinosaurus kingdom i think too.
-
Cross referencing my response to you about organism vs. organism classification here.
-
oop, im wrong. sorry~
-
-
-
I'm not sure that having "organism classification" as an included type for Dog Breed makes sense -- I don't think that dog breeds are considered to be taxonimically distinct from one another.
-
Depends on your taxonomy. I don't personally have a problem with it because IMO "Organism classification" should be a generic type that is co-typed with a particular taxonomy type. However, Freebase currently seems to only give credence to ITIS. According to the ITIS documentation:
"Ranks in the animal kingdom below subspecies will not be included in ITIS regardless of their occasional inclusion in datasets, as these ranks are not allowed under the zoological code. The botanical code allows the ranks variety, subvariety, forma, and subforma."
Also worth noting on that page is that:
"ITIS does not intend to serve as a forum for cutting-edge taxonomic classifications. Rather, ITIS is meant to serve as a standard to enable the comparison of biodiversity datasets, and therefore aims to incorporate classifications that have gained broad acceptance in the taxonomic literature and by professionals who work with the taxa concerned."
Therefore, it is useful for taxonomic classifications to be crosswalked to ITIS but there is also reason to maintain other taxonomies. Whether or not this includes dog breeds should be up to the user. Perhaps "Organism classification" should be renamed "ITIS organism classification"?
-
I don't know that Freebase only credits ITIS -- we also have NCBI keys, which collect some organisms not in ITIS, but I can't really speak to that. But if there is a taxonomy out there that classifies breeds, then you're right that it would make sense as a co-type. I just didn't think there were any, but it's not really my area of expertise.
-
-
-
I'd like to add a "conservation status" property to the schema, so that I can start adding endangered/extinct species data.
I'm assuming that "organism classification" would be the right topic...
-
I'd suggest modeling an "organism conservation status" type or something more inclusive sounding and then add it on a case by case basis. For example, I have imported Partners in Flight conservation status codes that are only applicable to birds in the Western Hemisphere. These should probably be added as part of CVTs with date and location properties. You could also add US conservation categories of threatened and endangered or subset it for state listings.
I would create a CVT with status code (that links to the status classification system), date, and location at a minimum. For example, Bald Eagles are going through a US federal delisting period but may still be listed as endangered at the state level in some states and their status today in the US is different than it was a year ago. So, those three properties (code, date, location) should be sufficient but are also essential.
-
I agree with Ed's suggestion about the model, but I would add a date for delisting as well.
What I'd recommend is creating an "organism classification" type (or something similarly-named) in your private domain with the "conservation status" property, and try out some models there. Post back here (or on the data-modelers' list) when/if you want feedback. If it seems to work well, it could eventually get moved to the commons Biology domain.
-
I happened to be looking at my state's endangered & threatened wildlife list and they had both state and federal status.
So that might be something to consider. Even if you only want to do federal level, it should be clear that it is.
And something else I just thought of, is you'd probably need a country field, or source/location, so people knew where it was endangered/threatened.
-
IUCN Red List http://www.iucnredlist.org/search/search-basic
This site has a lots of useful info.
-
-
-
Hi John. I manually marking the imported Wikipedia articles as Organism Classification, I've seen than some articles are combined. For example, topic Ganges and Indus River Dolphin combines the subspecies Platanista gangetica gangetica and Platanista gangetica minor. For lack of a better option, I just marked the topic as the species Platanista gangetica.
But what does your automated import do with this case? Does it mark one topic as being for multiple subspecies, or does it create separate empty topics to zero in on the subspecies? Or maybe just mark the topic as a "possible match" and have the users help disambiguate? (In the latter case is there already an existing type to mark a topic as "possible match" during a data load?)-
Hey Jeff,
The import currently extracts a taxobox and then determines the lowest specific taxon and tries to look it up in ITIS. The idea is to reconcile as much as possible against ITIS so that we could import that wholesale later. This specific entry parsed as:
Platanista gangetica|Platanista|Platanistidae|||Odontoceti|Cetacea|Mammalia|Chordata 267307 Ganges and Indus River Dolphin b:Platanista gangetica, s:P. gangetica, g:Platanista, f:Platanistidae, o:Cetacea, c:Mammalia, p:Chordata
So it would reconcile with taxon # 180413 in ITIS.
180413||Platanista||gangetica||||||valid||TWG standards met||||1996-06-13 14:51:08|180412|70958||5|220|12/02/2004|No|
If we then overlay the rest of ITIS, we should automatically get these children:
Subspecies Platanista gangetica gangetica (Roxburgh, 1801) -- Ganges River dolphin
Subspecies Platanista gangetica minor Owen, 1853 -- Indus River dolphin
and this parent:
Genus Platanista Wagler, 1830 -- Ganges dolphins, Indus dolphins, susu
The wholesale import would not overwrite any data that was already entered by hand that made the same assertions.
Does this make sense? -
To answer your specific question, the ITIS import would create new topics for the missing WP articles and only reconcile the Platanista gangetica topic itself.
-
Thanks, John. That makes sense. See my post below comparing the detail of the different data sources.
-
I'm not sure whether Organism Classification should have a property for "ITIS number" or "ITIS page". As an experiment, I put in both and added to the Ganges and Indus River Dolphin page. How were you planning to handle link-outs to ITIS, including on a topic like Homininae which has no ITIS number?
-
I think ITIS TSN, where available should be captured. Id like to see link-out's be done in a functional way, using data like keys or rawstrings.
We could have /biology/ITIS/180593 as a keyspace for the same price as a rawstring property... -
Not sure how functional keys work in Freebase. Is this explained somewhere that I can read? Is it a way to automatically convert an integer into a link?
-
Namespaces are described here. Rather than loading ITIS ids as rawstrings, I could make them keys. The issue is where to create the namespace. For example /biology/itis/180547 could get you to http://www.freebase.com/view/%239202a8c04000641f80000000002b6696
-
I see. I thought you were talking about a Freebase mechanism to automatically convert the ID into a link so that you could click on /biology/itis/180547 and it would open the page on ITIS. Is there a mechanism like that for the UI?
-
we are prototyping a mechanism like that based on http://www.ietf.org/internet-drafts/draft-gregorio-uritemplate-01.txt
-
Neato! I hadn't heard about this proposal.
-
This thread seems to have died off, but I wanted to correct JG's link about namespaces for any new users going through and reading up on what's happened in the past. The URL has changed slightly, you can now find the info here.
-
The updated link is dead too.
-
That was an internal link. Try http://www.freebase.com/view/guid/9202a8c04000641f800000000544e143#namespaces instead.
-
-
-
-
I'm having trouble filling in the ITIS Taxon S/N (an Enumeration) manually ... is there any particular procedure I should follow to do this?
-
What kind of trouble, Luke? An Enumeration is just a special kind of machine-readable string; you should just be able to enter the value as for any other property. Or are you not sure what value to enter?
-
I recently created the Pentalagus classification (for a genus which just contains one species, so didn't have a Wikipedia stub).
If I click on the "Edit" button next to "ITIS Taxon S/N", the usual dialog box opens, requesting me to enter an enumeration. I can enter 625308 (the serial number according to www.itis.gov); however, the "save" button stays grey. If I click on it anyway, the circles flash indicating that data is being saved; however, when this completes, the value that I entered is not there.
I'm using Firefox v2.0.0.11 if that helps.
-
Thanks, Luke; I’ve reproduced it and filed a bug report on this issue.
-
Thanks, much appreciated.
-
Seems to be working now - thanks!
-
-
-
I have reconciled some data from ITIS which now annotates almost 39k organism classifications. The Mass Data Operation is described here . While the import was very conservative, im sure their may be errors. Please let me know of any that you find and Ill address them for the next import.
-
-
-
For anyone watching this discussion, you may want to look at the thread on the Data-modeling email list called "Should some of the Person attributes be moved to Organism?"
(Followups should go on the email list.)
-
-
-
I made Breed an Organism Classification Rank so that, e.g., Buddy can be marked as a Labrador Retriever and not just a Dog. I only hesitate because a breed usually doesn't have a scientific name or an ITIS number, etc. But it seems right that the Higher Classification of Labrador Retriever is Dog. Any thoughts?
-
I would disagree, Buddy's a dog. Now according to the American Kennel Club (AKC) he's also matches their Labrador Retriever Breed Classification. So for Domestic Cats, Dogs (and I guess all other domesticated animals that have breed classifications (usually for some commercial or show purposes) can be typed both for their species taxonomy and for the specific breed classification. I think several people have private types for dog breeds (I even had one a few months back, long buried by the many revisions of Freebase and time)
-
Good points. Also, the Wikipedia infobox for a breed like Labrador Retriever seems to have lots of properties that argue for separate types for Dog Breed, Horse Breed, etc.
The question is: How do I indicate that Buddy is a Labrador Retriever? If the topic for Labrador Retriever is a Dog Breed, not an Organism, then the Buddy topic needs a different type like "Purebred Dog" with a property for the Breed (such as Labrador Retriever). Likewise for "Purebred Horse", "Purebred Cat", etc. In this case, Buddy still needs to be type Organism to indicate he is a Dog so that a search for Dogs will find him. -
Sorry. Here is the correct link to the Dog Breed example.
-
A new sub-domain for Organisms for Breed(ing)?
Next level would be Dog Breed - Cat Breed - Horse Breed - etc. ( a similar on e for flowers? Roses in particular)
For Dog Breed there would be:
Breed - Country of Origin - Club Classification
(Club Classification would be a CVT that has three parts: Club Name (eg. AKC) - Classification (by that club, and from this would be the topic page with standards that comprise that particular AKC Labrador Dog Breed)
Then on the Breed = Labrador Dog would have many properties for itself. -
I made Buddy a Dog again (not a Labrador Retriever), and removed Breed as an Organism Classification Rank.
As an experiment in my own domain, I made a type Dog Breed to use, e.g., on Labrador Retriever, and a type Purebred Dog to use on Buddy.
For the club standard, I made a compound value type specific to the club, such as AKC standard, for a couple of reasons. This is similar to how Organism Classification does it with a specific "ITIS Taxon S/N", etc. Also, the standard has a Group which I wanted to be an enumerated list dropdown, and different clubs may have different enumerated lists of recognized groups, so each needs its own type. -
Moof!
A bonny start...I'll think more about this after I finish digesting pizza from the User Meeting we had last night at Metaweb. -
Don't you mean "woof", Gordon?
Jeff: this is indeed a bonny start. I would recommend going with a generic CVT for the club standards, however, for two reasons. First is that, either way, the "dog breed group" enum list will be the same for all clubs (the list is identical in your two sample CVTs because the properties in each point to the same type). In order to have different enum lists for each club, we'd actually need separate types for each club's dog breed groups, which I think would be unnecessarily complex. The other reason is just that I think it's less complicated to have a single CVT that has properties for club, group, and website. -
OK, I changed it to a single CVT. See Labrador Retriever. But if as you say, "the dog breed group enum list will be the same for all clubs" then this argues that it should not be in the breed standard CVT for each club, but should be outside, such as the "country of origin" property. Or is there a case where two clubs specify different dog breed groups?
-
I am a firm believer in the DogCow! Long may Clarus live!
I am unaware of any major differences in group specifications...I guess none of the three of us are that well-versed in dog breeding groupings and classifications for breeds (I've watched some of the dog shows on TV because of the wife's interest in big-head dogs, and Best of Show movie, of course). I concur with the notion of the simplest CVT possible for now.
My wife I will ask tonight for more insight into this arcane world of doggery. She's expressed an interest in the past in being able to research for specific traits of a breed (temperment, activity level, size, weight, etc. Here is a bit simplistic site for what would be good data to have for each breed: Canine General Characteristics )
Ex. Akita
Intelligence
Stubborn
Strong Prey-Drive
Protective
Rarely Barks
Weight-Range
Heavy Fur Shedding -
Re dog breed groups, if different clubs have different groups with the same name, we should have different topics for them (like different kinds of fictional humans).
-
Both these points beg this question: The American Kennel Club defines specific properties of a Labrador Retriever such as "The height at the withers for a dog is 22½ to 24½ inches". If these are important enough to link to, shouldn't this information be pulled into Freebase as its own topic such as "AKC Standard: Labrador Retriever"? Then the link to the AKC web site would just be a normal "Web LInk" link on the topic as an external reference. Conversely, if the details of the different definitions of the American Kennel Club, the Canadian Kennel Club, etc. don't matter enough to pull into Freebase, then why complicate the Dog Breed type with properties to link out to these?
-
... in other words, this is really an issue of citation. For example, Thomas Jefferson has religion Deism. But says who? One biographer may say this, but another will claim he was Christian. This can only be resolved by settling on a best guess and citing sources. Likewise, the American Kennel Club says a Labrador Retriever has "height at the withers 22½ to 24½ inches", but the New Zealand Kennel Club says otherwise. The only point I can see of citing 7 different kennel clubs is to burden the user with opening all those pages and researching the disagreements over the definition of a Labrador Retriever.
-
I suspect that Dog breed is a very Dog specfic thing, and that it should be modeled on a Dog type for now.
-
By that, do you mean that the currently-named Purebred Dog type should just be "Dog" and we should add any other dog properties besides Breed such as, maybe, Owner? Or are you talking about changing the Dog Breed type used on topics like Labrador Retriever?
-
For the breed standards, we've been looking at Labrator Retriever which seems to be a relatively simple case. For a more complicated case, see the Wikipedia infobox for Poodle. It's not obvious that all of the breed groups are the same, such as "Group 9 Section 2 #172" and "Group 7 (Non-Sporting)". Also, some kennel clubs have multiple links to standards, such as ANKC. Again, it seems to me that the question is whether Freebase really needs to capture all of this in a compound value type, or whether a simple link to each of the kennel clubs is enough. What do you think?
-
I support doing simple modeling for now, having a link to the standards for each organization.
Horse breeding is as complicated as dogs, if not more so. Cats as well. I think there a few more critters with breeding standards similar to the dog breed type. Camels? ;)
-
-
-
In response to Proposed Taxo Import, if richness is a criterion, consider the most detailed taxonomy of human:
Homo sapiens sapiens, Homo sapiens, Homo, Homininae, Hominidae, Hominoidea, Catarrhini, Simiiformes, Haplorrhini, Primates, Euarchontoglires, Eutheria, Theria, Mammalia, Tetrapoda, Vertebrata, Chordata, Deuterostomia, Eumetazoa, Animalia, Eukaryota.
ITIS is missing many sub- and super-ranks: Homininae, Hominoidea, Catarrhini, Simiiformes, Haplorrhini, Euarchontoglires, Tetrapoda, euterostomia and Eumetazoa.
Species 2000 is missing all the sub- and super-ranks as ITIS, in addition to: Eutheria, Theria and Vertebrata, so it's even less detailed than ITIS.
Discover Life seems confused. For example, its page on Vertebrata has itself as a higher taxon. Also, the tree for the superclass Tetrapoda shows the class Sarcopterygii as a higher rank. Anyway, Discover Life has most of the sub ranks of the human tree except Euarchontoglires. But its confused heirarchies make me not trust it.
NCBI has the most detail. The lineage makes sense, but many of the entries like Vertebrata have rank "no rank".
Encyclopedia of Life doesn't seem live yet. Is that right?
International Plant Names Index: I'll compare plants later.
Wikispecies is missing Euarchontoglires and Amniota but is clearly organized.
Wikipedia-en taxoboxes: At least for this example, the taxoboxes have all the entries, but only if you piece them together. For example, the taxobox for Homo shows the subfamily, etc., immediately above it, but not the details higher up like suborder.
In summary, starting from the Wikipedia taxobox makes sense, since Wikipedia appears to list most of the levels (at least for human). I would not trust a single taxobox to show all the higher levels, as I mentioned, but it is good for getting the immediate higher level. I think the trade offs are between matching with ITIS vs. NCBI:
ITIS pros: unencumbered license, clear identification of rank, promise of future completeness. ITIS cons: missing many intermediate ranks as shown above.
NCBI pros: the most detailed intermediate ranks. NCBI cons: many ranks identified as "no rank". Not sure if license is unencumbered.
If ITIS had the missing intermediate ranks that NCBI has, it would be the clear winner.-
Another thought in favor of ITIS: While there are millions of species, there are only a few hundred classifications at the higher ranks, so if ITIS doesn't have all of these, they can be filled in by hand. It's more important to believe that you are hooking in to a database that will fill out with all species.
-
The Homininae example is a good reason to reify taxobox first and then layer in some external DBs. Another practical reason to do so is we have images for many taxoboxes from wikicommons.
The DiscoverLife folks believe that whats most important is coverage in families. I agree that DL is not nearly of the apparent quality of ITIS /Species 2000, but they are the only other 1MM+ species list right now.
I believe that NCBI is public domain. When asked, they didnt want to export their "link-out" data, and there is the text "Disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification" on their data. ITIS is willing to cite "accepted names".
Clearly NCBI keys seem very important for linking into practical infomatics.
So perhaps a strategy is reify taxobox, layer in ITIS, and then layer in NCBI? -
> So perhaps a strategy is reify taxobox, layer in ITIS, and then layer in NCBI?
Sounds like the right approach given that ITIS intends to be more structured. I'm sure you'll run a routine periodically that will check if ITIS added the entry that only NCBI initially had.
I added an NCBI Taxon ID to Organism Classification. See Ganges and Indus River Dolphin
Do you need me to try modelling something else before you do the data load? -
I think we need an additional text field for Synonym. Synonym is an 'alternative' scientific name, e.g. Mustela lutris rather than Enhydra lutris. We could put multiple scientific names but then we have to distinguish which are 'accepted'.
Common names should collide with the Topic name, Im hoping. -
I added Synonym Scientific Name and made Scientific Name a unique value. See Sea Otter. (I though just 'Synonym' would be confused with 'Also known as'.)
BTW, I also set 'Higher Classification' to unique value. Is there a good argument for allowing multiple values? -
Looks good. I asked Jamie to promote your type to /biology.
-
On the 'Higher Classification': I noticed that the higher classification of the Ganges and Indus River Dolphin was River dolphin, i.e. going from a Species to a Superfamily. It seems that this is the only member of its Genus and Family - would we want to create an entry for each regardless?
Might there be cases where the Infraclass is not commonly used and some authorities would jump straight to Class? (Not that this is relevant, but I'd never heard of Infraclass before finding it on Freebase ... :) -
You are right that the higher classification jumps to the Superfamily. This is merely because there were no imported articles from Wikipedia for the intermediate ranks. In some cases, I did create stub topics to fill in the rank. For example, Homo Sapiens. I'm curious to see how John's data load algorithm will fill in intermediate topics from ITIS, etc. I'd say that if one of these authorities defines an intermediate rank such as Infraclass that we should use it.
-