コード例 #1
0
ファイル: ubio_findit.php プロジェクト: rdmpage/bioguid
            				$text = preg_replace($pattern, "$1[[namebankID:$namebankID|$nameString]]", $text);
            			}
            			else
            			{
            				$text = preg_replace($pattern, "$1[[$nameString]]", $text);
            			}
            		*/
        }
    }
    return $text;
}
function tag_all_names($names, $text)
{
    $text = '   ' . $text;
    // do binonials first
    $text = tag_names($names, $text, true);
    // do uninomials
    $text = tag_names($names, $text, false);
    $text = trim($text);
    return $text;
}
if (0) {
    // test
    $text = 'Philorhizus marggii n. sp. is described from Greece (southern Peloponnese). Type locality: Taygetos Massif, Profitis Illias, N 36°58’/E 022°21’, 2000-2400 m asl. Members of this micropterous species are distinguished from the other Philorhizus species occurring on the Balkans by habitus, the special colouration pattern of the elytra and the special construction of the internal sac of the median lobe. Illustrations of the habitus, the median lobe and its internal sac and a description of the habitat of the new species are presented. A key to all Philorhizus species known from Greece is given. Biogeographic notes on the distribution of micropterous Philorhizus species in the western Palaearctic realm are given. Philorhizus paulo Wrase, 1995 is recorded from France for the first time (East Pyrenees)';
    $text = 'The first comprehensive combined molecular and morphological phylogenetic analysis of the major groups of termites is presented. This was based on the analysis of three genes (cytochrome oxidase II, 12S and 28S) and worker characters for approximately 250 species of termites. Parsimony analysis of the aligned dataset showed that the monophyly of Hodotermitidae, Kalotermitidae and Termitidae were well supported, while Termopsidae and Rhinotermitidae were both paraphyletic on the estimated cladogram. Within Termitidae, the most diverse and ecologically most important family, the monophyly of Macrotermitinae, Foraminitermitinae, Apicotermitinae, Syntermitinae and Nasutitermitinae were all broadly supported, but Termitinae was paraphyletic. The pantropical genera Termes, Amitermes and Nasutitermes were all paraphyletic on the estimated cladogram, with at least 17 genera nested within Nasutitermes, given the presently accepted generic limits. Key biological features were mapped onto the cladogram. It was not possible to reconstruct the evolution of true workers unambiguously, as it was as parsimonious to assume a basal evolution of true workers and subsequent evolution of pseudergates, as to assume a basal condition of pseudergates and subsequent evolution of true workers. However, true workers were only found in species with either separate- or intermediate-type nests, so that the mapping of nest habit and worker type onto the cladogram were perfectly correlated. Feeding group evolution, however, showed a much more complex pattern, particularly within the Termitidae, where it proved impossible to estimate unambiguously the ancestral state within the family (which is associated with the loss of worker gut flagellates). However, one biologically plausible optimization implies an initial evolution from wood-feeding to fungus-growing, proposed as the ancestral condition within the Termitidae, followed by the very early evolution of soil-feeding and subsequent re-evolution of wood-feeding in numerous lineages.';
    $text = 'The family Kalotermitidae is redescribed. The subfamily names \'Electrotermitinae\' and \'Kalotermitinae\' are placed in synonymy. The fossil genus Eotermes is removed from the family Kalotermitidae and placed in the family Hodotermitidae. 2. Three hundred and fifty-three species, fossil and living, are classified into 24 genera. Of these 24 genera, the following eight are new: Postelectrotermes, Ceratokalotermes, Comatermes, Incisitermes, Marginitermes, Tauritermes, Bifiditermes, and Bicornitermes. The genera Pterotermes, Proneotermes, Allotermes, and Epicalotermes are resurrected. The genus name \'Proglyptotermes\' is relegated to synonymy. All the genera are described, and the generitype species are illustrated. 3. The generic classification is based on a constellation of conservative, adaptive, and regressed characters of both the imago and the soldier castes. 4. The phylogeny of the genera is discussed. The imago-nymph mandible indicates two main evolutionary lines. The first line is represented by the Proelectrotermes-Calcaritermes complex, and the second line by the Incisitermes-Cryptotermes complex. 5. Several cases of convergence are illustrated. In both the main lines of the family Kalotermitidae, the phragmotic head, the enlarged third antennal segment, and the slightly sclerotized median vein have all evolved independently many times. Also, the arolium has been convergently lost in many genera. 6. A discussion on conservative and regressed characters is included. Characters that show phylogenetic advancement or regression are also listed. 7. It is evident from the data on the hosts and Protozoa that the evolution of the genera of the Protozoa did not occur in conjunction with the evolution of the host genera and that the differentiation of the Protozoa genera took place before the differentiation of the host genera.';
    $text = 'Etheostoma erythrozonum, a new species of darter (Teleostei: Percidae) from the Meramec River drainage, Missouri';
    $names = ubio_findit($text);
    print_r($names);
}
//echo tag_all_names($names, $text);
コード例 #2
0
ファイル: zootaxa.php プロジェクト: rdmpage/bioguid
    function Harvest()
    {
        global $debug;
        //echo "|" . $this->url . "|";
        $html = get($this->url);
        $html = utf8_encode($html);
        //		echo $html;
        $html = str_replace("\n", "", $html);
        $html = str_replace("\r", "", $html);
        $html = str_replace("<p align=\"left\">", "\n<p align=\"left\">", $html);
        if (preg_match_all('/
		<p\\s+align="left">(.*)<\\/p>
		/x', $html, $matches, PREG_PATTERN_ORDER)) {
            if ($debug) {
                print_r($matches);
            }
            foreach ($matches[1] as $paragraph) {
                $m = array();
                $item = new stdclass();
                $item->authors = array();
                $item->title = 'Zootaxa';
                $item->issn = '1175-5326';
                // <b>2095</b>: 37-46 (<i>
                if (preg_match('/<b>(?<volume>[0-9]+)<\\/b>:\\s*(?<spage>[0-9]+)\\-(?<epage>[0-9]+)/', $paragraph, $m)) {
                    //print_r($m);
                    $item->volume = $m['volume'];
                    $item->spage = $m['spage'];
                    $item->epage = $m['epage'];
                }
                // authors
                if (preg_match('/<br>\\s*(?<authors>[A-Z]+(.*))<\\/font><br>/', $paragraph, $m)) {
                    //print_r($m);
                    $item->authorString = $m['authors'];
                    // clean
                    $a = trim($item->authorString);
                    // remove countries
                    $a = preg_replace('/\\([A-Za-z \\.]+\\)/', '', $a);
                    $a = preg_replace('/\\(Nouvelle\\-Caledonie\\)/', '', $a);
                    // protect suffix
                    $a = preg_replace('/, J[R|r]/', ' Jr', $a);
                    // remove punctuation
                    $a = str_replace(",", "|", $a);
                    $a = str_replace("&amp;", "|", $a);
                    //echo "a=$a\n";
                    $authors = explode("|", $a);
                    //print_r($authors);
                    foreach ($authors as $value) {
                        //array_push($item->authors, trim($auth));
                        $value = trim($value);
                        // Make nice
                        $value = mb_convert_case($value, MB_CASE_TITLE, mb_detect_encoding($value));
                        // Get parts of name
                        $parts = parse_name($value);
                        $author = new stdClass();
                        if (isset($parts['last'])) {
                            $author->lastname = $parts['last'];
                        }
                        if (isset($parts['suffix'])) {
                            $author->suffix = $parts['suffix'];
                        }
                        if (isset($parts['first'])) {
                            $author->forename = $parts['first'];
                            if (array_key_exists('middle', $parts)) {
                                $author->forename .= ' ' . $parts['middle'];
                            }
                        }
                        array_push($item->authors, $author);
                    }
                }
                // abstract
                if (preg_match('/<a href="(?<url>(.*))">Abstract/', $paragraph, $m)) {
                    //print_r($m);
                    $item->url = 'http://www.mapress.com/zootaxa/' . $m['url'];
                }
                // pdf
                if (preg_match('/<\\/font><a href="(?<url>(.*))">Full/', $paragraph, $m)) {
                    //print_r($m);
                    $item->pdf = 'http://www.mapress.com/zootaxa/' . $m['url'];
                }
                // access
                if (preg_match('/subscription\\s+required/', $paragraph, $m)) {
                    //print_r($m);
                } else {
                    $item->availability = 'open access';
                }
                // date
                if (preg_match('/<i>(?<date>[0-9]+\\s+[A-Z][a-z]+(\\.)?\\s+[0-9]{4})<\\/i>/', $paragraph, $m)) {
                    //print_r($m);
                    $item->date = date("Y-m-d", strtotime($m['date']));
                    $item->year = date("Y", strtotime($m['date']));
                }
                // (11 <i>May 2009</i>)
                if (preg_match('/(?<date>[0-9]+\\s+<i>[A-Z][a-z]+(\\.)?\\s+[0-9]{4})<\\/i>/', $paragraph, $m)) {
                    $date = strip_tags($m['date']);
                    $item->date = date("Y-m-d", strtotime($date));
                    $item->year = date("Y", strtotime($date));
                }
                // title
                if (preg_match('/<font FACE="Times New Roman">(?<title>.*)<\\/b><br>/', $paragraph, $m)) {
                    //print_r($m);
                    $atitle = $m['title'];
                    // Some Zootaxa HTML replies on implict space between >< for spacing,
                    // which results in word being run together when tags are stripped.
                    $atitle = str_replace('><', '> <', $atitle);
                    $atitle = strip_tags($atitle);
                    $atitle = preg_replace('/\\s\\s*/', ' ', $atitle);
                    $item->atitle = $atitle;
                }
                //print_r($item);
                // Store
                if (isset($item->atitle)) {
                    // ubio tags to extract taxonomic names and LSIDs
                    $names = ubio_findit($item->atitle);
                    $item->tags = array();
                    $item->tagids = array();
                    foreach ($names as $n) {
                        foreach ($n as $k => $v) {
                            switch ($k) {
                                case 'canonical':
                                    array_push($item->tags, $v);
                                    break;
                                case 'namebankID':
                                    array_push($item->tagids, 'urn:lsid:ubio.org:namebank:' . $v);
                                    break;
                                default:
                                    break;
                            }
                        }
                    }
                    // Store feed item
                    $feed_item = new stdclass();
                    $feed_item->title = $item->atitle;
                    $feed_item->link = $item->url;
                    $description = '';
                    $count = 0;
                    $num_authors = count($item->authors);
                    if ($num_authors > 0) {
                        foreach ($item->authors as $author) {
                            $description .= $author->forename . ' ' . $author->lastname;
                            if (isset($author->suffix)) {
                                $description .= ' ' . $author->suffix;
                            }
                            $count++;
                            if ($count < $num_authors - 1) {
                                $description .= ', ';
                            } else {
                                if ($count < $num_authors) {
                                    $description .= ' and ';
                                }
                            }
                        }
                    }
                    $description .= '<br/>';
                    $description .= '<i>Zootaxa</i>' . ' <b>' . $item->volume . '</b> ' . $item->spage . '-' . $item->epage . ' [' . $item->date . ']' . '<br/>';
                    // tags
                    foreach ($item->tags as $tag) {
                        $description .= '<b>' . $tag . '</b><br/>';
                    }
                    $feed_item->description = $description;
                    $feed_item->id = $item->url;
                    $feed_item->created = $item->date;
                    $feed_item->payload = $item;
                    $this->StoreFeedItem($feed_item);
                }
                // to RDF 1.
            }
        }
    }
コード例 #3
0
ファイル: ubio_findit.php プロジェクト: rdmpage/bioguid
            				$text = preg_replace($pattern, "$1[[$nameString]]", $text);
            			}
            		*/
        }
    }
    return $text;
}
function tag_all_names($names, $text)
{
    $text = '   ' . $text;
    // do binonials first
    $text = tag_names($names, $text, true);
    // do uninomials
    $text = tag_names($names, $text, false);
    $text = trim($text);
    return $text;
}
if (0) {
    // test
    $text = 'Philorhizus marggii n. sp. is described from Greece (southern Peloponnese). Type locality: Taygetos Massif, Profitis Illias, N 36°58’/E 022°21’, 2000-2400 m asl. Members of this micropterous species are distinguished from the other Philorhizus species occurring on the Balkans by habitus, the special colouration pattern of the elytra and the special construction of the internal sac of the median lobe. Illustrations of the habitus, the median lobe and its internal sac and a description of the habitat of the new species are presented. A key to all Philorhizus species known from Greece is given. Biogeographic notes on the distribution of micropterous Philorhizus species in the western Palaearctic realm are given. Philorhizus paulo Wrase, 1995 is recorded from France for the first time (East Pyrenees)';
    $text = 'The first comprehensive combined molecular and morphological phylogenetic analysis of the major groups of termites is presented. This was based on the analysis of three genes (cytochrome oxidase II, 12S and 28S) and worker characters for approximately 250 species of termites. Parsimony analysis of the aligned dataset showed that the monophyly of Hodotermitidae, Kalotermitidae and Termitidae were well supported, while Termopsidae and Rhinotermitidae were both paraphyletic on the estimated cladogram. Within Termitidae, the most diverse and ecologically most important family, the monophyly of Macrotermitinae, Foraminitermitinae, Apicotermitinae, Syntermitinae and Nasutitermitinae were all broadly supported, but Termitinae was paraphyletic. The pantropical genera Termes, Amitermes and Nasutitermes were all paraphyletic on the estimated cladogram, with at least 17 genera nested within Nasutitermes, given the presently accepted generic limits. Key biological features were mapped onto the cladogram. It was not possible to reconstruct the evolution of true workers unambiguously, as it was as parsimonious to assume a basal evolution of true workers and subsequent evolution of pseudergates, as to assume a basal condition of pseudergates and subsequent evolution of true workers. However, true workers were only found in species with either separate- or intermediate-type nests, so that the mapping of nest habit and worker type onto the cladogram were perfectly correlated. Feeding group evolution, however, showed a much more complex pattern, particularly within the Termitidae, where it proved impossible to estimate unambiguously the ancestral state within the family (which is associated with the loss of worker gut flagellates). However, one biologically plausible optimization implies an initial evolution from wood-feeding to fungus-growing, proposed as the ancestral condition within the Termitidae, followed by the very early evolution of soil-feeding and subsequent re-evolution of wood-feeding in numerous lineages.';
    $text = 'The family Kalotermitidae is redescribed. The subfamily names \'Electrotermitinae\' and \'Kalotermitinae\' are placed in synonymy. The fossil genus Eotermes is removed from the family Kalotermitidae and placed in the family Hodotermitidae. 2. Three hundred and fifty-three species, fossil and living, are classified into 24 genera. Of these 24 genera, the following eight are new: Postelectrotermes, Ceratokalotermes, Comatermes, Incisitermes, Marginitermes, Tauritermes, Bifiditermes, and Bicornitermes. The genera Pterotermes, Proneotermes, Allotermes, and Epicalotermes are resurrected. The genus name \'Proglyptotermes\' is relegated to synonymy. All the genera are described, and the generitype species are illustrated. 3. The generic classification is based on a constellation of conservative, adaptive, and regressed characters of both the imago and the soldier castes. 4. The phylogeny of the genera is discussed. The imago-nymph mandible indicates two main evolutionary lines. The first line is represented by the Proelectrotermes-Calcaritermes complex, and the second line by the Incisitermes-Cryptotermes complex. 5. Several cases of convergence are illustrated. In both the main lines of the family Kalotermitidae, the phragmotic head, the enlarged third antennal segment, and the slightly sclerotized median vein have all evolved independently many times. Also, the arolium has been convergently lost in many genera. 6. A discussion on conservative and regressed characters is included. Characters that show phylogenetic advancement or regression are also listed. 7. It is evident from the data on the hosts and Protozoa that the evolution of the genera of the Protozoa did not occur in conjunction with the evolution of the host genera and that the differentiation of the Protozoa genera took place before the differentiation of the host genera.';
    $text = 'Etheostoma erythrozonum, a new species of darter (Teleostei: Percidae) from the Meramec River drainage, Missouri';
    $names = ubio_findit($text);
    print_r($names);
}
//echo tag_all_names($names, $text);
if (0) {
    $names = ubio_findit('Eleutherodactylus altamazonicus');
    print_r($names);
}