Commit 86bbf12a authored by Mathieu Rodic's avatar Mathieu Rodic

[FEATURE] Dates in parsing metadata - All the dates are being formatted in FileParser

parent 5036bc48
{ {
"metadata": { "metadata": {
"name": "", "name": "",
"signature": "sha256:fdea95172a1e0072cc1f2a8f601b8abdd8aed5fbec5b600f2b29e57009dc8ef6" "signature": "sha256:01cc276bb358d5a00a128d39a90a02b0b45e8e9a43ce5670d06fd8b0657bdab5"
}, },
"nbformat": 3, "nbformat": 3,
"nbformat_minor": 0, "nbformat_minor": 0,
...@@ -9,36 +9,23 @@ ...@@ -9,36 +9,23 @@
{ {
"cells": [ "cells": [
{ {
"cell_type": "code", "cell_type": "heading",
"collapsed": false, "level": 1,
"input": [
"from parsing.FileParsers import *"
],
"language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "source": [
"prompt_number": 1 "Testing if the ISI file parser works"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"collapsed": false, "collapsed": false,
"input": [ "input": [
"print(\"RE abcdefgh\\n\"[3:-1])\n", "from parsing.FileParsers import *"
"print(b\"english\".decode())"
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [],
{ "prompt_number": 1
"output_type": "stream",
"stream": "stdout",
"text": [
"abcdefgh\n",
"english\n"
]
}
],
"prompt_number": 2
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -61,26 +48,46 @@ ...@@ -61,26 +48,46 @@
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
"ename": "NameError", "output_type": "stream",
"evalue": "name 'value' is not defined", "stream": "stdout",
"output_type": "pyerr", "text": [
"traceback": [ "{'publication_month': '10', 'authors': 'Rust, J, Singh, H, Rana, RS, McCann, T, Singh, L, Anderson, K, Sarkar, N, Nascimbene, PC, Stebner, F, Thomas, JC, Kraemer, MS, Williams, CJ, Engel, MS, Sahni, A, Grimaldi, D', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Biogeographic and evolutionary implications of a diverse paleobiota in amber from the early Eocene of India', 'abstract': 'For nearly 100 million years, the India subcontinent drifted from Gondwana until its collision with Asia some 50 Ma, during which time the landmass presumably evolved a highly endemic biota. Recent excavations of rich outcrops of 50-52-million-year-old amber with diverse inclusions from the Cambay Shale of Gujarat, western India address this issue. Cambay amber occurs in lignitic and muddy sediments concentrated by near-shore chenier systems; its chemistry and the anatomy of associated fossil wood indicates a definitive source of Dipterocarpaceae. The amber is very partially polymerized and readily dissolves in organic solvents, thus allowing extraction of whole insects whose cuticle retains microscopic fidelity. Fourteen orders and more than 55 families and 100 species of arthropod inclusions have been discovered thus far, which have affinities to taxa from the Eocene of northern Europe, to the Recent of Australasia, and the Miocene to Recent of tropical America. Thus, India just prior to or immediately following contact shows little biological insularity. A significant diversity of eusocial insects are fossilized, including corbiculate bees, rhinotermitid termites, and modern subfamilies of ants (Formicidae), groups that apparently radiated during the contemporaneous Early Eocene Climatic Optimum or just prior to it during the Paleocene-Eocene Thermal Maximum. Cambay amber preserves a uniquely diverse and early biota of a modern-type of broad-leaf tropical forest, revealing 50 Ma of stasis and change in biological communities of the dipterocarp primary forests that dominate southeastern Asia today.', 'doi': '10.1073/pnas.1007407107', 'publication_day': '26', 'publication_date': '2010-10-26 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", "\n",
"\u001b[1;32m<ipython-input-4-785d3def061e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mparse\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "{'publication_month': '10', 'authors': 'Cornman, SR, Schatz, MC, Johnston, SJ, Chen, YP, Pettis, J, Hunt, G, Bourgeois, L, Elsik, C, Anderson, D, Grozinger, CM, Evans, JD', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Genomic survey of the ectoparasitic mite Varroa destructor, a major pest of the honey bee Apis mellifera', 'abstract': 'Background: The ectoparasitic mite Varroa destructor has emerged as the primary pest of domestic honey bees (Apis mellifera). Here we present an initial survey of the V. destructor genome carried out to advance our understanding of Varroa biology and to identify new avenues for mite control. This sequence survey provides immediate resources for molecular and population-genetic analyses of Varroa-Apis interactions and defines the challenges ahead for a comprehensive Varroa genome project. Results: The genome size was estimated by flow cytometry to be 565 Mbp, larger than most sequenced insects but modest relative to some other Acari. Genomic DNA pooled from similar to 1,000 mites was sequenced to 4.3x coverage with 454 pyrosequencing. The 2.4 Gbp of sequencing reads were assembled into 184,094 contigs with an N50 of 2,262 bp, totaling 294 Mbp of sequence after filtering. Genic sequences with homology to other eukaryotic genomes were identified on 13,031 of these contigs, totaling 31.3 Mbp. Alignment of protein sequence blocks conserved among V. destructor and four other arthropod genomes indicated a higher level of sequence divergence within this mite lineage relative to the tick Ixodes scapularis. A number of microbes potentially associated with V. destructor were identified in the sequence survey, including similar to 300 Kbp of sequence deriving from one or more bacterial species of the Actinomycetales. The presence of this bacterium was confirmed in individual mites by PCR assay, but varied significantly by age and sex of mites. Fragments of a novel virus related to the Baculoviridae were also identified in the survey. The rate of single nucleotide polymorphisms (SNPs) in the pooled mites was estimated to be 6.2 x 10(-5)per bp, a low rate consistent with the historical demography and life history of the species. Conclusions: This survey has provided general tools for the research community and novel directions for investigating the biology and control of Varroa mites. Ongoing development of Varroa genomic resources will be a boon for comparative genomics of under-represented arthropods, and will further enhance the honey bee and its associated pathogens as a model system for studying host-pathogen interactions.', 'doi': '10.1186/1471-2164-11-602', 'publication_day': '25', 'publication_date': '2010-10-25 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\u001b[1;32m/home/mat/projects/gargantext/gargantext/parsing/FileParsers/RisFileParser.py\u001b[0m in \u001b[0;36mparse\u001b[1;34m(self, parentNode, tag)\u001b[0m\n\u001b[0;32m 32\u001b[0m \u001b[1;31m# guid = metadata[\"guid\"]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[1;31m# )\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 34\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat_metadata_dates\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmetadata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 35\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 36\u001b[0m \u001b[0mmetadata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\n",
"\u001b[1;32m/home/mat/projects/gargantext/gargantext/parsing/FileParsers/FileParser.py\u001b[0m in \u001b[0;36mformat_metadata_dates\u001b[1;34m(self, metadata)\u001b[0m\n\u001b[0;32m 155\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mprefix\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;34m\"_month\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 156\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mmetadata\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 157\u001b[1;33m \u001b[0mdate_string\u001b[0m \u001b[1;33m+=\u001b[0m \u001b[1;34m\" \"\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mvalue\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 158\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mprefix\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;34m\"_day\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 159\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mmetadata\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "{'publication_month': '10', 'authors': 'Gadagkar, R', 'publication_day': '25', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Sociobiology in turmoil again', 'publication_date': '2010-10-25 00:00:00', 'abstract': \"Altruism is defined as any behaviour that lowers the Darwinian fitness of the actor while increasing that of the recipient. Such altruism (especially in the form of lifetime sterility exhibited by sterile workers in eusocial insects such as ants, bees, wasps and termites) has long been considered a major difficulty for the theory of natural selection. In the 1960s W. D. Hamilton potentially solved this problem by defining a new measure of fitness that he called inclusive fitness, which also included the effect of an individual's action on the fitness of genetic relatives. This has come to be known as inclusive fitness theory, Hamilton's rule or kin selection. E. O. Wilson almost single-handedly popularized this new approach in the 1970s and thus helped create a large body of new empirical research and a large community of behavioural ecologists and kin selectionists. Adding thrill and drama to our otherwise sombre lives, Wilson is now leading a frontal attack on Hamilton's approach, claiming that the inclusive fitness theory is not as mathematically general as the standard natural selection theory, has led to no additional biological insights and should therefore be abandoned. The world cannot but sit up and take notice.\", 'publication_hour': '00', 'publication_second': '00'}\n",
"\u001b[1;31mNameError\u001b[0m: name 'value' is not defined" "\n",
"{'publication_month': '10', 'authors': 'Nemesio, A', 'publication_day': '25', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'The orchid-bee fauna (Hymenoptera: Apidae) of a forest remnant in northeastern Brazil, with new geographic records and an identification key to the known species of the Atlantic Forest of northeastern Brazil', 'publication_date': '2010-10-25 00:00:00', 'abstract': 'The orchid bee fauna of Estacao Ecologica de Murici (ESEC Murici), in the state of Alagoas, one of the largest remnants of the Atlantic Rain Forest in northeastern Brazil, was surveyed for the first time. Seven hundred and twenty-one orchid-bee males belonging to 17 species were collected from the 3(rd) to the 10(th) of September, 2009. Besides the recently described Eulaema (Apeulaema) felipei Nemesio, 2010, three other species recorded at ESEC Murici deserve further attention: Euglossa amazonica Dressler, 1982b, recorded for the first time outside the Amazon Basin; Euglossa milenae Bembe, 2007 and Euglossa analis Westwood, 1840, both recorded for the first time in the Atlantic Forest of northeastern Brazil north to Sao Francisco river. These results together with previous samplings in the state of Alagoas reveal that at least 22 orchid-bee species are now known to occur there. Three other species not recorded for Alagoas yet are known from the neighbor states of Sergipe, Pernambuco, and Paraiba. An identification key to all 25 species of Euglossina known to occur in the states of Alagoas, Sergipe, Pernambuco, Paraiba, and Rio Grande do Norte is provided.', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n",
"{'publication_month': '10', 'authors': 'Rozen, JG', 'publication_day': '22', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Immatures of the Old World Oil-Collecting Bee Ctenoplectra cornuta (Apoidea: Apidae: Apinae: Ctenoplectrini)', 'publication_date': '2010-10-22 00:00:00', 'abstract': 'The mature oocyte, all five larval instars, and the pupa of Ctenoplectra cornuta Gribodo are described based upon specimens from Taiwan. Its mature larva though larger is compared with, and found similar to, that of the African Ctenoplectra armata Magretti, the only other larval ctenoplectrine studied to date. The egg index was similar to that of the African C. albolimbata Magretti. Although Ctenoplectra shares certain larval and pupal similarities with Tetrapedia (Tetrapediini), a broader study including representatives of all apine tribes needs to be considered for evaluating tribal relationships.', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n",
"{'publication_month': '10', 'authors': 'Maisonnasse, A, Lenoir, JC, Beslay, D, Crauser, D, Le Conte, Y', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'E-beta-Ocimene, a Volatile Brood Pheromone Involved in Social Regulation in the Honey Bee Colony (Apis mellifera)', 'abstract': 'Background: In honey bee colony, the brood is able to manipulate and chemically control the workers in order to sustain their own development. A brood ester pheromone produced primarily by old larvae (4 and 5 days old larvae) was first identified as acting as a contact pheromone with specific effects on nurses in the colony. More recently a new volatile brood pheromone has been identified: E-beta-ocimene, which partially inhibits ovary development in workers. Methodology and Principal Finding: Our analysis of E-beta-ocimene production revealed that young brood (newly hatched to 3 days old) produce the highest quantity of E-beta-ocimene relative to their body weight. By testing the potential action of this molecule as a non-specific larval signal, due to its high volatility in the colony, we demonstrated that in the presence of E-beta-ocimene nest workers start to forage earlier in life, as seen in the presence of real brood. Conclusions/Significance: In this way, young larvae are able to assign precedence to the task of foraging by workers in order to increase food stores for their own development. Thus, in the complexity of honey bee chemical communication, E-beta-ocimene, a pheromone of young larvae, provides the brood with the means to express their nutritional needs to the workers.', 'doi': '10.1371/journal.pone.0013531', 'publication_day': '21', 'publication_date': '2010-10-21 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n",
"{'publication_month': '10', 'authors': 'Li, JK, Wu, J, Rundassa, DB, Song, FF, Zheng, AJ, Fang, Y', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Differential Protein Expression in Honeybee (Apis mellifera L.) Larvae: Underlying Caste Differentiation', 'abstract': 'Honeybee (Apis mellifera) exhibits divisions in both morphology and reproduction. The queen is larger in size and fully developed sexually, while the worker bees are smaller in size and nearly infertile. To better understand the specific time and underlying molecular mechanisms of caste differentiation, the proteomic profiles of larvae intended to grow into queen and worker castes were compared at 72 and 120 hours using two dimensional electrophoresis (2-DE), network, enrichment and quantitative PCR analysis. There were significant differences in protein expression between the two larvae castes at 72 and 120 hours, suggesting the queen and the worker larvae have already decided their fate before 72 hours. Specifically, at 72 hours, queen intended larvae over-expressed transketolase, aldehyde reductase, and enolase proteins which are involved in carbohydrate metabolism and energy production, imaginal disc growth factor 4 which is a developmental related protein, long-chain-fatty-acid CoA ligase and proteasome subunit alpha type 5 which metabolize fatty and amino acids, while worker intended larvae over-expressed ATP synthase beta subunit, aldehyde dehydrogenase, thioredoxin peroxidase 1 and peroxiredoxin 2540, lethal (2) 37 and 14-3-3 protein epsilon, fatty acid binding protein, and translational controlled tumor protein. This differential protein expression between the two caste intended larvae was more pronounced at 120 hours, with particular significant differences in proteins associated with carbohydrate metabolism and energy production. Functional enrichment analysis suggests that carbohydrate metabolism and energy production and anti-oxidation proteins play major roles in the formation of caste divergence. The constructed network and validated gene expression identified target proteins for further functional study. This new finding is in contrast to the existing notion that 72 hour old larvae has bipotential and can develop into either queen or worker based on epigenetics and can help us to gain new insight into the time of departure as well as caste trajectory influencing elements at the molecular level.', 'doi': '10.1371/journal.pone.0013455', 'publication_day': '20', 'publication_date': '2010-10-20 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n",
"{'publication_month': '10', 'authors': 'Ramirez, GP, Martinez, AS, Fernandez, VM, Bielsa, GC, Farina, WM', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'The Influence of Gustatory and Olfactory Experiences on Responsiveness to Reward in the Honeybee', 'abstract': 'Background: Honeybees (Apis mellifera) exhibit an extraordinarily tuned division of labor that depends on age polyethism. This adjustment is generally associated with the fact that individuals of different ages display different response thresholds to given stimuli, which determine specific behaviors. For instance, the sucrose-response threshold (SRT) which largely depends on genetic factors may also be affected by the nectar sugar content. However, it remains unknown whether SRTs in workers of different ages and tasks can differ depending on gustatory and olfactory experiences. Methodology: Groups of worker bees reared either in an artificial environment or else in a queen-right colony, were exposed to different reward conditions at different adult ages. Gustatory response scores (GRSs) and odor-memory retrieval were measured in bees that were previously exposed to changes in food characteristics. Principal Findings: Results show that the gustatory responses of pre-foraging-aged bees are affected by changes in sucrose solution concentration and also to the presence of an odor provided it is presented as scented sucrose solution. In contrast no differences in worker responses were observed when presented with odor only in the rearing environment. Fast modulation of GRSs was observed in older bees (12-16 days of age) which are commonly involved in food processing tasks within the hive, while slower modulation times were observed in younger bees (commonly nurse bees, 6-9 days of age). This suggests that older food-processing bees have a higher plasticity when responding to fluctuations in resource information than younger hive bees. Adjustments in the number of trophallaxis events were also found when scented food circulated inside the nest, and this was positively correlated with the differences in timing observed in gustatory responsiveness and memory retention for hive bees of different age classes. Conclusions: This work demonstrates the accessibility of chemosensory information in the honeybee colonies with respect to incoming nectar. The modulation of the sensory-response systems within the hive can have important effects on the dynamics of food transfer and information propagation.', 'doi': '10.1371/journal.pone.0013498', 'publication_day': '20', 'publication_date': '2010-10-20 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n",
"{'publication_month': '10', 'authors': 'Munch, D, Baker, N, Kreibich, CD, Braten, AT, Amdam, GV', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'In the Laboratory and during Free-Flight: Old Honey Bees Reveal Learning and Extinction Deficits that Mirror Mammalian Functional Decline', 'abstract': 'Loss of brain function is one of the most negative and feared aspects of aging. Studies of invertebrates have taught us much about the physiology of aging and how this progression may be slowed. Yet, how aging affects complex brain functions, e.g., the ability to acquire new memory when previous experience is no longer valid, is an almost exclusive question of studies in humans and mammalian models. In these systems, age related cognitive disorders are assessed through composite paradigms that test different performance tasks in the same individual. Such studies could demonstrate that afflicted individuals show the loss of several and often-diverse memory faculties, and that performance usually varies more between aged individuals, as compared to conspecifics from younger groups. No comparable composite surveying approaches are established yet for invertebrate models in aging research. Here we test whether an insect can share patterns of decline similar to those that are commonly observed during mammalian brain aging. Using honey bees, we combine restrained learning with free-flight assays. We demonstrate that reduced olfactory learning performance correlates with a reduced ability to extinguish the spatial memory of an abandoned nest location ( spatial memory extinction). Adding to this, we show that learning performance is more variable in old honey bees. Taken together, our findings point to generic features of brain aging and provide the prerequisites to model individual aspects of learning dysfunction with insect models.', 'doi': '10.1371/journal.pone.0013504', 'publication_day': '19', 'publication_date': '2010-10-19 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n",
"\n"
] ]
} }
], ],
"prompt_number": 4 "prompt_number": 4
}, },
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Basic tests"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"collapsed": false, "collapsed": false,
"input": [ "input": [
"print(\"publication_date\"[-5:])\n", "print(\"RE abcdefgh\\n\"[3:-1])\n",
"print(\"publication_date\"[:-5])" "print(b\"english\".decode())"
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
...@@ -89,12 +96,23 @@ ...@@ -89,12 +96,23 @@
"output_type": "stream", "output_type": "stream",
"stream": "stdout", "stream": "stdout",
"text": [ "text": [
"_date\n", "abcdefgh\n",
"publication\n" "english\n"
] ]
} }
], ],
"prompt_number": 10 "prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(\"publication_date\"[-5:])\n",
"print(\"publication_date\"[:-5])"
],
"language": "python",
"metadata": {},
"outputs": []
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -104,8 +122,7 @@ ...@@ -104,8 +122,7 @@
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": []
"prompt_number": 11
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -115,8 +132,7 @@ ...@@ -115,8 +132,7 @@
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": []
"prompt_number": 25
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -126,17 +142,7 @@ ...@@ -126,17 +142,7 @@
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": []
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
"'2014-10-11 01:02:03'"
]
}
],
"prompt_number": 33
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -146,8 +152,7 @@ ...@@ -146,8 +152,7 @@
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": []
"prompt_number": 34
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -157,17 +162,7 @@ ...@@ -157,17 +162,7 @@
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": []
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"'01'"
]
}
],
"prompt_number": 36
} }
], ],
"metadata": {} "metadata": {}
......
{
"metadata": {
"name": "",
"signature": "sha256:e0c3b2efe7c205a29dc4e028b10ffb7b9d0569f35c4b426febdf523069abffdb"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pprint import pprint\n",
"from node.models import Node, NodeType, Language, Ngram\n",
"from django.contrib.auth.models import User\n",
"import parsing\n",
"from parsing.FileParsers import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Define user\n",
"try:\n",
" user = User.objects.get(username='Mat')\n",
"except:\n",
" user = User(username='Mat', password='0123', email='mathieu@rodic.fr')\n",
" user.save()\n",
"\n",
"# Define document types\n",
"nodetypes = {}\n",
"for name in ['Corpus', 'Document']:\n",
" try:\n",
" nodetypes[name] = NodeType.objects.get(name=name)\n",
" except:\n",
" nodetypes[name] = NodeType(name=name)\n",
" nodetypes[name].save()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Node.objects.all().delete()\n",
"corpus = Node(name='PubMed corpus', user=user, type=nodetypes['Corpus'])\n",
"corpus.save()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fileparser = PubmedFileParser.PubmedFileParser(file='/home/mat/projects/gargantext/data_samples/pubmed.zip')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fileparser.parse(corpus)\n",
"print('Ok!')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Warning: parsing empty text\n",
"Warning: parsing empty text\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text\n",
"Warning: parsing empty text"
]
}
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for node_ngram in corpus.children.first().node_ngram_set.all():\n",
" print(node_ngram.ngram.terms)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
\ No newline at end of file
{ {
"metadata": { "metadata": {
"name": "", "name": "",
"signature": "sha256:1c591bade5ee27302bd6e59b899b047a82733a9529e793ccc0f33e2ea7d73c8a" "signature": "sha256:eac7c9b22e240bb0ef6d0aeec21261194d84a3f0ba53cd02af69f80d30ec5a17"
}, },
"nbformat": 3, "nbformat": 3,
"nbformat_minor": 0, "nbformat_minor": 0,
...@@ -9,36 +9,23 @@ ...@@ -9,36 +9,23 @@
{ {
"cells": [ "cells": [
{ {
"cell_type": "code", "cell_type": "heading",
"collapsed": false, "level": 1,
"input": [
"from parsing.FileParsers import *"
],
"language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "source": [
"prompt_number": 1 "Testing if the ISI file parser works"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"collapsed": false, "collapsed": false,
"input": [ "input": [
"print(\"RE abcdefgh\\n\"[3:-1])\n", "from parsing.FileParsers import *"
"print(b\"english\".decode())"
], ],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [],
{ "prompt_number": 1
"output_type": "stream",
"stream": "stdout",
"text": [
"abcdefgh\n",
"english\n"
]
}
],
"prompt_number": 2
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -49,7 +36,7 @@ ...@@ -49,7 +36,7 @@
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"prompt_number": 3 "prompt_number": 2
}, },
{ {
"cell_type": "code", "cell_type": "code",
...@@ -64,28 +51,57 @@ ...@@ -64,28 +51,57 @@
"output_type": "stream", "output_type": "stream",
"stream": "stdout", "stream": "stdout",
"text": [ "text": [
"{'publication_month': '10', 'authors': 'Rust, J, Singh, H, Rana, RS, McCann, T, Singh, L, Anderson, K, Sarkar, N, Nascimbene, PC, Stebner, F, Thomas, JC, Kraemer, MS, Williams, CJ, Engel, MS, Sahni, A, Grimaldi, D', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Biogeographic and evolutionary implications of a diverse paleobiota in amber from the early Eocene of India', 'abstract': 'For nearly 100 million years, the India subcontinent drifted from Gondwana until its collision with Asia some 50 Ma, during which time the landmass presumably evolved a highly endemic biota. Recent excavations of rich outcrops of 50-52-million-year-old amber with diverse inclusions from the Cambay Shale of Gujarat, western India address this issue. Cambay amber occurs in lignitic and muddy sediments concentrated by near-shore chenier systems; its chemistry and the anatomy of associated fossil wood indicates a definitive source of Dipterocarpaceae. The amber is very partially polymerized and readily dissolves in organic solvents, thus allowing extraction of whole insects whose cuticle retains microscopic fidelity. Fourteen orders and more than 55 families and 100 species of arthropod inclusions have been discovered thus far, which have affinities to taxa from the Eocene of northern Europe, to the Recent of Australasia, and the Miocene to Recent of tropical America. Thus, India just prior to or immediately following contact shows little biological insularity. A significant diversity of eusocial insects are fossilized, including corbiculate bees, rhinotermitid termites, and modern subfamilies of ants (Formicidae), groups that apparently radiated during the contemporaneous Early Eocene Climatic Optimum or just prior to it during the Paleocene-Eocene Thermal Maximum. Cambay amber preserves a uniquely diverse and early biota of a modern-type of broad-leaf tropical forest, revealing 50 Ma of stasis and change in biological communities of the dipterocarp primary forests that dominate southeastern Asia today.', 'doi': '10.1073/pnas.1007407107', 'publication_day': '26', 'publication_date': '2010-10-26 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'Biogeographic and evolutionary implications of a diverse paleobiota in amber from the early Eocene of India', 'authors': 'Rust, J, Singh, H, Rana, RS, McCann, T, Singh, L, Anderson, K, Sarkar, N, Nascimbene, PC, Stebner, F, Thomas, JC, Kraemer, MS, Williams, CJ, Engel, MS, Sahni, A, Grimaldi, D', 'publication_year': '2010', 'publication_day': '26', 'doi': '10.1073/pnas.1007407107', 'abstract': 'For nearly 100 million years, the India subcontinent drifted from Gondwana until its collision with Asia some 50 Ma, during which time the landmass presumably evolved a highly endemic biota. Recent excavations of rich outcrops of 50-52-million-year-old amber with diverse inclusions from the Cambay Shale of Gujarat, western India address this issue. Cambay amber occurs in lignitic and muddy sediments concentrated by near-shore chenier systems; its chemistry and the anatomy of associated fossil wood indicates a definitive source of Dipterocarpaceae. The amber is very partially polymerized and readily dissolves in organic solvents, thus allowing extraction of whole insects whose cuticle retains microscopic fidelity. Fourteen orders and more than 55 families and 100 species of arthropod inclusions have been discovered thus far, which have affinities to taxa from the Eocene of northern Europe, to the Recent of Australasia, and the Miocene to Recent of tropical America. Thus, India just prior to or immediately following contact shows little biological insularity. A significant diversity of eusocial insects are fossilized, including corbiculate bees, rhinotermitid termites, and modern subfamilies of ants (Formicidae), groups that apparently radiated during the contemporaneous Early Eocene Climatic Optimum or just prior to it during the Paleocene-Eocene Thermal Maximum. Cambay amber preserves a uniquely diverse and early biota of a modern-type of broad-leaf tropical forest, revealing 50 Ma of stasis and change in biological communities of the dipterocarp primary forests that dominate southeastern Asia today.', 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-26 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Cornman, SR, Schatz, MC, Johnston, SJ, Chen, YP, Pettis, J, Hunt, G, Bourgeois, L, Elsik, C, Anderson, D, Grozinger, CM, Evans, JD', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Genomic survey of the ectoparasitic mite Varroa destructor, a major pest of the honey bee Apis mellifera', 'abstract': 'Background: The ectoparasitic mite Varroa destructor has emerged as the primary pest of domestic honey bees (Apis mellifera). Here we present an initial survey of the V. destructor genome carried out to advance our understanding of Varroa biology and to identify new avenues for mite control. This sequence survey provides immediate resources for molecular and population-genetic analyses of Varroa-Apis interactions and defines the challenges ahead for a comprehensive Varroa genome project. Results: The genome size was estimated by flow cytometry to be 565 Mbp, larger than most sequenced insects but modest relative to some other Acari. Genomic DNA pooled from similar to 1,000 mites was sequenced to 4.3x coverage with 454 pyrosequencing. The 2.4 Gbp of sequencing reads were assembled into 184,094 contigs with an N50 of 2,262 bp, totaling 294 Mbp of sequence after filtering. Genic sequences with homology to other eukaryotic genomes were identified on 13,031 of these contigs, totaling 31.3 Mbp. Alignment of protein sequence blocks conserved among V. destructor and four other arthropod genomes indicated a higher level of sequence divergence within this mite lineage relative to the tick Ixodes scapularis. A number of microbes potentially associated with V. destructor were identified in the sequence survey, including similar to 300 Kbp of sequence deriving from one or more bacterial species of the Actinomycetales. The presence of this bacterium was confirmed in individual mites by PCR assay, but varied significantly by age and sex of mites. Fragments of a novel virus related to the Baculoviridae were also identified in the survey. The rate of single nucleotide polymorphisms (SNPs) in the pooled mites was estimated to be 6.2 x 10(-5)per bp, a low rate consistent with the historical demography and life history of the species. Conclusions: This survey has provided general tools for the research community and novel directions for investigating the biology and control of Varroa mites. Ongoing development of Varroa genomic resources will be a boon for comparative genomics of under-represented arthropods, and will further enhance the honey bee and its associated pathogens as a model system for studying host-pathogen interactions.', 'doi': '10.1186/1471-2164-11-602', 'publication_day': '25', 'publication_date': '2010-10-25 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'Genomic survey of the ectoparasitic mite Varroa destructor, a major pest of the honey bee Apis mellifera', 'authors': 'Cornman, SR, Schatz, MC, Johnston, SJ, Chen, YP, Pettis, J, Hunt, G, Bourgeois, L, Elsik, C, Anderson, D, Grozinger, CM, Evans, JD', 'publication_year': '2010', 'publication_day': '25', 'doi': '10.1186/1471-2164-11-602', 'abstract': 'Background: The ectoparasitic mite Varroa destructor has emerged as the primary pest of domestic honey bees (Apis mellifera). Here we present an initial survey of the V. destructor genome carried out to advance our understanding of Varroa biology and to identify new avenues for mite control. This sequence survey provides immediate resources for molecular and population-genetic analyses of Varroa-Apis interactions and defines the challenges ahead for a comprehensive Varroa genome project. Results: The genome size was estimated by flow cytometry to be 565 Mbp, larger than most sequenced insects but modest relative to some other Acari. Genomic DNA pooled from similar to 1,000 mites was sequenced to 4.3x coverage with 454 pyrosequencing. The 2.4 Gbp of sequencing reads were assembled into 184,094 contigs with an N50 of 2,262 bp, totaling 294 Mbp of sequence after filtering. Genic sequences with homology to other eukaryotic genomes were identified on 13,031 of these contigs, totaling 31.3 Mbp. Alignment of protein sequence blocks conserved among V. destructor and four other arthropod genomes indicated a higher level of sequence divergence within this mite lineage relative to the tick Ixodes scapularis. A number of microbes potentially associated with V. destructor were identified in the sequence survey, including similar to 300 Kbp of sequence deriving from one or more bacterial species of the Actinomycetales. The presence of this bacterium was confirmed in individual mites by PCR assay, but varied significantly by age and sex of mites. Fragments of a novel virus related to the Baculoviridae were also identified in the survey. The rate of single nucleotide polymorphisms (SNPs) in the pooled mites was estimated to be 6.2 x 10(-5)per bp, a low rate consistent with the historical demography and life history of the species. Conclusions: This survey has provided general tools for the research community and novel directions for investigating the biology and control of Varroa mites. Ongoing development of Varroa genomic resources will be a boon for comparative genomics of under-represented arthropods, and will further enhance the honey bee and its associated pathogens as a model system for studying host-pathogen interactions.', 'fields': 'Biotechnology & Applied Microbiology; Genetics & Heredity', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-25 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Gadagkar, R', 'publication_day': '25', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Sociobiology in turmoil again', 'publication_date': '2010-10-25 00:00:00', 'abstract': \"Altruism is defined as any behaviour that lowers the Darwinian fitness of the actor while increasing that of the recipient. Such altruism (especially in the form of lifetime sterility exhibited by sterile workers in eusocial insects such as ants, bees, wasps and termites) has long been considered a major difficulty for the theory of natural selection. In the 1960s W. D. Hamilton potentially solved this problem by defining a new measure of fitness that he called inclusive fitness, which also included the effect of an individual's action on the fitness of genetic relatives. This has come to be known as inclusive fitness theory, Hamilton's rule or kin selection. E. O. Wilson almost single-handedly popularized this new approach in the 1970s and thus helped create a large body of new empirical research and a large community of behavioural ecologists and kin selectionists. Adding thrill and drama to our otherwise sombre lives, Wilson is now leading a frontal attack on Hamilton's approach, claiming that the inclusive fitness theory is not as mathematically general as the standard natural selection theory, has led to no additional biological insights and should therefore be abandoned. The world cannot but sit up and take notice.\", 'publication_hour': '00', 'publication_second': '00'}\n", "{'authors': 'Gadagkar, R', 'publication_year': '2010', 'publication_day': '25', 'title': 'Sociobiology in turmoil again', 'abstract': \"Altruism is defined as any behaviour that lowers the Darwinian fitness of the actor while increasing that of the recipient. Such altruism (especially in the form of lifetime sterility exhibited by sterile workers in eusocial insects such as ants, bees, wasps and termites) has long been considered a major difficulty for the theory of natural selection. In the 1960s W. D. Hamilton potentially solved this problem by defining a new measure of fitness that he called inclusive fitness, which also included the effect of an individual's action on the fitness of genetic relatives. This has come to be known as inclusive fitness theory, Hamilton's rule or kin selection. E. O. Wilson almost single-handedly popularized this new approach in the 1970s and thus helped create a large body of new empirical research and a large community of behavioural ecologists and kin selectionists. Adding thrill and drama to our otherwise sombre lives, Wilson is now leading a frontal attack on Hamilton's approach, claiming that the inclusive fitness theory is not as mathematically general as the standard natural selection theory, has led to no additional biological insights and should therefore be abandoned. The world cannot but sit up and take notice.\", 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-25 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Nemesio, A', 'publication_day': '25', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'The orchid-bee fauna (Hymenoptera: Apidae) of a forest remnant in northeastern Brazil, with new geographic records and an identification key to the known species of the Atlantic Forest of northeastern Brazil', 'publication_date': '2010-10-25 00:00:00', 'abstract': 'The orchid bee fauna of Estacao Ecologica de Murici (ESEC Murici), in the state of Alagoas, one of the largest remnants of the Atlantic Rain Forest in northeastern Brazil, was surveyed for the first time. Seven hundred and twenty-one orchid-bee males belonging to 17 species were collected from the 3(rd) to the 10(th) of September, 2009. Besides the recently described Eulaema (Apeulaema) felipei Nemesio, 2010, three other species recorded at ESEC Murici deserve further attention: Euglossa amazonica Dressler, 1982b, recorded for the first time outside the Amazon Basin; Euglossa milenae Bembe, 2007 and Euglossa analis Westwood, 1840, both recorded for the first time in the Atlantic Forest of northeastern Brazil north to Sao Francisco river. These results together with previous samplings in the state of Alagoas reveal that at least 22 orchid-bee species are now known to occur there. Three other species not recorded for Alagoas yet are known from the neighbor states of Sergipe, Pernambuco, and Paraiba. An identification key to all 25 species of Euglossina known to occur in the states of Alagoas, Sergipe, Pernambuco, Paraiba, and Rio Grande do Norte is provided.', 'publication_hour': '00', 'publication_second': '00'}\n", "{'authors': 'Nemesio, A', 'publication_year': '2010', 'publication_day': '25', 'title': 'The orchid-bee fauna (Hymenoptera: Apidae) of a forest remnant in northeastern Brazil, with new geographic records and an identification key to the known species of the Atlantic Forest of northeastern Brazil', 'abstract': 'The orchid bee fauna of Estacao Ecologica de Murici (ESEC Murici), in the state of Alagoas, one of the largest remnants of the Atlantic Rain Forest in northeastern Brazil, was surveyed for the first time. Seven hundred and twenty-one orchid-bee males belonging to 17 species were collected from the 3(rd) to the 10(th) of September, 2009. Besides the recently described Eulaema (Apeulaema) felipei Nemesio, 2010, three other species recorded at ESEC Murici deserve further attention: Euglossa amazonica Dressler, 1982b, recorded for the first time outside the Amazon Basin; Euglossa milenae Bembe, 2007 and Euglossa analis Westwood, 1840, both recorded for the first time in the Atlantic Forest of northeastern Brazil north to Sao Francisco river. These results together with previous samplings in the state of Alagoas reveal that at least 22 orchid-bee species are now known to occur there. Three other species not recorded for Alagoas yet are known from the neighbor states of Sergipe, Pernambuco, and Paraiba. An identification key to all 25 species of Euglossina known to occur in the states of Alagoas, Sergipe, Pernambuco, Paraiba, and Rio Grande do Norte is provided.', 'fields': 'Zoology', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-25 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Rozen, JG', 'publication_day': '22', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Immatures of the Old World Oil-Collecting Bee Ctenoplectra cornuta (Apoidea: Apidae: Apinae: Ctenoplectrini)', 'publication_date': '2010-10-22 00:00:00', 'abstract': 'The mature oocyte, all five larval instars, and the pupa of Ctenoplectra cornuta Gribodo are described based upon specimens from Taiwan. Its mature larva though larger is compared with, and found similar to, that of the African Ctenoplectra armata Magretti, the only other larval ctenoplectrine studied to date. The egg index was similar to that of the African C. albolimbata Magretti. Although Ctenoplectra shares certain larval and pupal similarities with Tetrapedia (Tetrapediini), a broader study including representatives of all apine tribes needs to be considered for evaluating tribal relationships.', 'publication_hour': '00', 'publication_second': '00'}\n", "{'authors': 'Rozen, JG', 'publication_year': '2010', 'publication_day': '22', 'title': 'Immatures of the Old World Oil-Collecting Bee Ctenoplectra cornuta (Apoidea: Apidae: Apinae: Ctenoplectrini)', 'abstract': 'The mature oocyte, all five larval instars, and the pupa of Ctenoplectra cornuta Gribodo are described based upon specimens from Taiwan. Its mature larva though larger is compared with, and found similar to, that of the African Ctenoplectra armata Magretti, the only other larval ctenoplectrine studied to date. The egg index was similar to that of the African C. albolimbata Magretti. Although Ctenoplectra shares certain larval and pupal similarities with Tetrapedia (Tetrapediini), a broader study including representatives of all apine tribes needs to be considered for evaluating tribal relationships.', 'fields': 'Biodiversity Conservation; Zoology', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-22 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Maisonnasse, A, Lenoir, JC, Beslay, D, Crauser, D, Le Conte, Y', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'E-beta-Ocimene, a Volatile Brood Pheromone Involved in Social Regulation in the Honey Bee Colony (Apis mellifera)', 'abstract': 'Background: In honey bee colony, the brood is able to manipulate and chemically control the workers in order to sustain their own development. A brood ester pheromone produced primarily by old larvae (4 and 5 days old larvae) was first identified as acting as a contact pheromone with specific effects on nurses in the colony. More recently a new volatile brood pheromone has been identified: E-beta-ocimene, which partially inhibits ovary development in workers. Methodology and Principal Finding: Our analysis of E-beta-ocimene production revealed that young brood (newly hatched to 3 days old) produce the highest quantity of E-beta-ocimene relative to their body weight. By testing the potential action of this molecule as a non-specific larval signal, due to its high volatility in the colony, we demonstrated that in the presence of E-beta-ocimene nest workers start to forage earlier in life, as seen in the presence of real brood. Conclusions/Significance: In this way, young larvae are able to assign precedence to the task of foraging by workers in order to increase food stores for their own development. Thus, in the complexity of honey bee chemical communication, E-beta-ocimene, a pheromone of young larvae, provides the brood with the means to express their nutritional needs to the workers.', 'doi': '10.1371/journal.pone.0013531', 'publication_day': '21', 'publication_date': '2010-10-21 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'E-beta-Ocimene, a Volatile Brood Pheromone Involved in Social Regulation in the Honey Bee Colony (Apis mellifera)', 'authors': 'Maisonnasse, A, Lenoir, JC, Beslay, D, Crauser, D, Le Conte, Y', 'publication_year': '2010', 'publication_day': '21', 'doi': '10.1371/journal.pone.0013531', 'abstract': 'Background: In honey bee colony, the brood is able to manipulate and chemically control the workers in order to sustain their own development. A brood ester pheromone produced primarily by old larvae (4 and 5 days old larvae) was first identified as acting as a contact pheromone with specific effects on nurses in the colony. More recently a new volatile brood pheromone has been identified: E-beta-ocimene, which partially inhibits ovary development in workers. Methodology and Principal Finding: Our analysis of E-beta-ocimene production revealed that young brood (newly hatched to 3 days old) produce the highest quantity of E-beta-ocimene relative to their body weight. By testing the potential action of this molecule as a non-specific larval signal, due to its high volatility in the colony, we demonstrated that in the presence of E-beta-ocimene nest workers start to forage earlier in life, as seen in the presence of real brood. Conclusions/Significance: In this way, young larvae are able to assign precedence to the task of foraging by workers in order to increase food stores for their own development. Thus, in the complexity of honey bee chemical communication, E-beta-ocimene, a pheromone of young larvae, provides the brood with the means to express their nutritional needs to the workers.', 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-21 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Li, JK, Wu, J, Rundassa, DB, Song, FF, Zheng, AJ, Fang, Y', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'Differential Protein Expression in Honeybee (Apis mellifera L.) Larvae: Underlying Caste Differentiation', 'abstract': 'Honeybee (Apis mellifera) exhibits divisions in both morphology and reproduction. The queen is larger in size and fully developed sexually, while the worker bees are smaller in size and nearly infertile. To better understand the specific time and underlying molecular mechanisms of caste differentiation, the proteomic profiles of larvae intended to grow into queen and worker castes were compared at 72 and 120 hours using two dimensional electrophoresis (2-DE), network, enrichment and quantitative PCR analysis. There were significant differences in protein expression between the two larvae castes at 72 and 120 hours, suggesting the queen and the worker larvae have already decided their fate before 72 hours. Specifically, at 72 hours, queen intended larvae over-expressed transketolase, aldehyde reductase, and enolase proteins which are involved in carbohydrate metabolism and energy production, imaginal disc growth factor 4 which is a developmental related protein, long-chain-fatty-acid CoA ligase and proteasome subunit alpha type 5 which metabolize fatty and amino acids, while worker intended larvae over-expressed ATP synthase beta subunit, aldehyde dehydrogenase, thioredoxin peroxidase 1 and peroxiredoxin 2540, lethal (2) 37 and 14-3-3 protein epsilon, fatty acid binding protein, and translational controlled tumor protein. This differential protein expression between the two caste intended larvae was more pronounced at 120 hours, with particular significant differences in proteins associated with carbohydrate metabolism and energy production. Functional enrichment analysis suggests that carbohydrate metabolism and energy production and anti-oxidation proteins play major roles in the formation of caste divergence. The constructed network and validated gene expression identified target proteins for further functional study. This new finding is in contrast to the existing notion that 72 hour old larvae has bipotential and can develop into either queen or worker based on epigenetics and can help us to gain new insight into the time of departure as well as caste trajectory influencing elements at the molecular level.', 'doi': '10.1371/journal.pone.0013455', 'publication_day': '20', 'publication_date': '2010-10-20 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'Differential Protein Expression in Honeybee (Apis mellifera L.) Larvae: Underlying Caste Differentiation', 'authors': 'Li, JK, Wu, J, Rundassa, DB, Song, FF, Zheng, AJ, Fang, Y', 'publication_year': '2010', 'publication_day': '20', 'doi': '10.1371/journal.pone.0013455', 'abstract': 'Honeybee (Apis mellifera) exhibits divisions in both morphology and reproduction. The queen is larger in size and fully developed sexually, while the worker bees are smaller in size and nearly infertile. To better understand the specific time and underlying molecular mechanisms of caste differentiation, the proteomic profiles of larvae intended to grow into queen and worker castes were compared at 72 and 120 hours using two dimensional electrophoresis (2-DE), network, enrichment and quantitative PCR analysis. There were significant differences in protein expression between the two larvae castes at 72 and 120 hours, suggesting the queen and the worker larvae have already decided their fate before 72 hours. Specifically, at 72 hours, queen intended larvae over-expressed transketolase, aldehyde reductase, and enolase proteins which are involved in carbohydrate metabolism and energy production, imaginal disc growth factor 4 which is a developmental related protein, long-chain-fatty-acid CoA ligase and proteasome subunit alpha type 5 which metabolize fatty and amino acids, while worker intended larvae over-expressed ATP synthase beta subunit, aldehyde dehydrogenase, thioredoxin peroxidase 1 and peroxiredoxin 2540, lethal (2) 37 and 14-3-3 protein epsilon, fatty acid binding protein, and translational controlled tumor protein. This differential protein expression between the two caste intended larvae was more pronounced at 120 hours, with particular significant differences in proteins associated with carbohydrate metabolism and energy production. Functional enrichment analysis suggests that carbohydrate metabolism and energy production and anti-oxidation proteins play major roles in the formation of caste divergence. The constructed network and validated gene expression identified target proteins for further functional study. This new finding is in contrast to the existing notion that 72 hour old larvae has bipotential and can develop into either queen or worker based on epigenetics and can help us to gain new insight into the time of departure as well as caste trajectory influencing elements at the molecular level.', 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-20 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Ramirez, GP, Martinez, AS, Fernandez, VM, Bielsa, GC, Farina, WM', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'The Influence of Gustatory and Olfactory Experiences on Responsiveness to Reward in the Honeybee', 'abstract': 'Background: Honeybees (Apis mellifera) exhibit an extraordinarily tuned division of labor that depends on age polyethism. This adjustment is generally associated with the fact that individuals of different ages display different response thresholds to given stimuli, which determine specific behaviors. For instance, the sucrose-response threshold (SRT) which largely depends on genetic factors may also be affected by the nectar sugar content. However, it remains unknown whether SRTs in workers of different ages and tasks can differ depending on gustatory and olfactory experiences. Methodology: Groups of worker bees reared either in an artificial environment or else in a queen-right colony, were exposed to different reward conditions at different adult ages. Gustatory response scores (GRSs) and odor-memory retrieval were measured in bees that were previously exposed to changes in food characteristics. Principal Findings: Results show that the gustatory responses of pre-foraging-aged bees are affected by changes in sucrose solution concentration and also to the presence of an odor provided it is presented as scented sucrose solution. In contrast no differences in worker responses were observed when presented with odor only in the rearing environment. Fast modulation of GRSs was observed in older bees (12-16 days of age) which are commonly involved in food processing tasks within the hive, while slower modulation times were observed in younger bees (commonly nurse bees, 6-9 days of age). This suggests that older food-processing bees have a higher plasticity when responding to fluctuations in resource information than younger hive bees. Adjustments in the number of trophallaxis events were also found when scented food circulated inside the nest, and this was positively correlated with the differences in timing observed in gustatory responsiveness and memory retention for hive bees of different age classes. Conclusions: This work demonstrates the accessibility of chemosensory information in the honeybee colonies with respect to incoming nectar. The modulation of the sensory-response systems within the hive can have important effects on the dynamics of food transfer and information propagation.', 'doi': '10.1371/journal.pone.0013498', 'publication_day': '20', 'publication_date': '2010-10-20 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'The Influence of Gustatory and Olfactory Experiences on Responsiveness to Reward in the Honeybee', 'authors': 'Ramirez, GP, Martinez, AS, Fernandez, VM, Bielsa, GC, Farina, WM', 'publication_year': '2010', 'publication_day': '20', 'doi': '10.1371/journal.pone.0013498', 'abstract': 'Background: Honeybees (Apis mellifera) exhibit an extraordinarily tuned division of labor that depends on age polyethism. This adjustment is generally associated with the fact that individuals of different ages display different response thresholds to given stimuli, which determine specific behaviors. For instance, the sucrose-response threshold (SRT) which largely depends on genetic factors may also be affected by the nectar sugar content. However, it remains unknown whether SRTs in workers of different ages and tasks can differ depending on gustatory and olfactory experiences. Methodology: Groups of worker bees reared either in an artificial environment or else in a queen-right colony, were exposed to different reward conditions at different adult ages. Gustatory response scores (GRSs) and odor-memory retrieval were measured in bees that were previously exposed to changes in food characteristics. Principal Findings: Results show that the gustatory responses of pre-foraging-aged bees are affected by changes in sucrose solution concentration and also to the presence of an odor provided it is presented as scented sucrose solution. In contrast no differences in worker responses were observed when presented with odor only in the rearing environment. Fast modulation of GRSs was observed in older bees (12-16 days of age) which are commonly involved in food processing tasks within the hive, while slower modulation times were observed in younger bees (commonly nurse bees, 6-9 days of age). This suggests that older food-processing bees have a higher plasticity when responding to fluctuations in resource information than younger hive bees. Adjustments in the number of trophallaxis events were also found when scented food circulated inside the nest, and this was positively correlated with the differences in timing observed in gustatory responsiveness and memory retention for hive bees of different age classes. Conclusions: This work demonstrates the accessibility of chemosensory information in the honeybee colonies with respect to incoming nectar. The modulation of the sensory-response systems within the hive can have important effects on the dynamics of food transfer and information propagation.', 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-20 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n", "\n",
"{'publication_month': '10', 'authors': 'Munch, D, Baker, N, Kreibich, CD, Braten, AT, Amdam, GV', 'publication_year': '2010', 'publication_minute': '00', 'language': 'English', 'title': 'In the Laboratory and during Free-Flight: Old Honey Bees Reveal Learning and Extinction Deficits that Mirror Mammalian Functional Decline', 'abstract': 'Loss of brain function is one of the most negative and feared aspects of aging. Studies of invertebrates have taught us much about the physiology of aging and how this progression may be slowed. Yet, how aging affects complex brain functions, e.g., the ability to acquire new memory when previous experience is no longer valid, is an almost exclusive question of studies in humans and mammalian models. In these systems, age related cognitive disorders are assessed through composite paradigms that test different performance tasks in the same individual. Such studies could demonstrate that afflicted individuals show the loss of several and often-diverse memory faculties, and that performance usually varies more between aged individuals, as compared to conspecifics from younger groups. No comparable composite surveying approaches are established yet for invertebrate models in aging research. Here we test whether an insect can share patterns of decline similar to those that are commonly observed during mammalian brain aging. Using honey bees, we combine restrained learning with free-flight assays. We demonstrate that reduced olfactory learning performance correlates with a reduced ability to extinguish the spatial memory of an abandoned nest location ( spatial memory extinction). Adding to this, we show that learning performance is more variable in old honey bees. Taken together, our findings point to generic features of brain aging and provide the prerequisites to model individual aspects of learning dysfunction with insect models.', 'doi': '10.1371/journal.pone.0013504', 'publication_day': '19', 'publication_date': '2010-10-19 00:00:00', 'publication_hour': '00', 'publication_second': '00'}\n", "{'title': 'In the Laboratory and during Free-Flight: Old Honey Bees Reveal Learning and Extinction Deficits that Mirror Mammalian Functional Decline', 'authors': 'Munch, D, Baker, N, Kreibich, CD, Braten, AT, Amdam, GV', 'publication_year': '2010', 'publication_day': '19', 'doi': '10.1371/journal.pone.0013504', 'abstract': 'Loss of brain function is one of the most negative and feared aspects of aging. Studies of invertebrates have taught us much about the physiology of aging and how this progression may be slowed. Yet, how aging affects complex brain functions, e.g., the ability to acquire new memory when previous experience is no longer valid, is an almost exclusive question of studies in humans and mammalian models. In these systems, age related cognitive disorders are assessed through composite paradigms that test different performance tasks in the same individual. Such studies could demonstrate that afflicted individuals show the loss of several and often-diverse memory faculties, and that performance usually varies more between aged individuals, as compared to conspecifics from younger groups. No comparable composite surveying approaches are established yet for invertebrate models in aging research. Here we test whether an insect can share patterns of decline similar to those that are commonly observed during mammalian brain aging. Using honey bees, we combine restrained learning with free-flight assays. We demonstrate that reduced olfactory learning performance correlates with a reduced ability to extinguish the spatial memory of an abandoned nest location ( spatial memory extinction). Adding to this, we show that learning performance is more variable in old honey bees. Taken together, our findings point to generic features of brain aging and provide the prerequisites to model individual aspects of learning dysfunction with insect models.', 'fields': 'Multidisciplinary Sciences', 'publication_minute': '00', 'publication_month': '10', 'publication_hour': '00', 'publication_date': '2010-10-19 00:00:00', 'publication_second': '00', 'language': 'English'}\n",
"\n" "\n"
] ]
} }
], ],
"prompt_number": 4 "prompt_number": 3
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Basic tests"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(\"RE abcdefgh\\n\"[3:-1])\n",
"print(b\"english\".decode())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"abcdefgh\n",
"english\n"
]
}
],
"prompt_number": 2
}, },
{ {
"cell_type": "code", "cell_type": "code",
......
...@@ -89,6 +89,7 @@ class FileParser: ...@@ -89,6 +89,7 @@ class FileParser:
"""Add a document to the database. """Add a document to the database.
""" """
def create_document(self, parentNode, title, contents, language, metadata, guid=None): def create_document(self, parentNode, title, contents, language, metadata, guid=None):
metadata = self.format_metadata(metadata)
# create or retrieve a resource for that document, based on its user id # create or retrieve a resource for that document, based on its user id
# if guid is None: # if guid is None:
# resource = Resource(guid=guid) # resource = Resource(guid=guid)
......
...@@ -16,7 +16,6 @@ class PubmedFileParser(FileParser): ...@@ -16,7 +16,6 @@ class PubmedFileParser(FileParser):
with zipfile.ZipFile(self._file) as zipFile: with zipfile.ZipFile(self._file) as zipFile:
for filename in zipFile.namelist(): for filename in zipFile.namelist():
file = zipFile.open(filename, "r") file = zipFile.open(filename, "r")
# print(file.read())
xml = etree.parse(file, parser=xml_parser) xml = etree.parse(file, parser=xml_parser)
# parse all the articles, one by one # parse all the articles, one by one
...@@ -24,19 +23,17 @@ class PubmedFileParser(FileParser): ...@@ -24,19 +23,17 @@ class PubmedFileParser(FileParser):
xml_articles = xml.findall('PubmedArticle') xml_articles = xml.findall('PubmedArticle')
for xml_article in xml_articles: for xml_article in xml_articles:
# extract data from the document # extract data from the document
date_year = int(xml_article.find('MedlineCitation/DateCreated/Year').text) metadata = {}
date_month = int(xml_article.find('MedlineCitation/DateCreated/Month').text)
date_day = int(xml_article.find('MedlineCitation/DateCreated/Day').text)
metadata = {
"date_pub": '%s-%s-%s' % (date_year, date_month, date_day),
}
metadata_path = { metadata_path = {
"journal" : 'MedlineCitation/Article/Journal/Title', "journal" : 'MedlineCitation/Article/Journal/Title',
"title" : 'MedlineCitation/Article/ArticleTitle', "title" : 'MedlineCitation/Article/ArticleTitle',
"language_iso3" : 'MedlineCitation/Article/Language', "language_iso3" : 'MedlineCitation/Article/Language',
"doi" : 'PubmedData/ArticleIdList/ArticleId[type=doi]', "doi" : 'PubmedData/ArticleIdList/ArticleId[type=doi]',
"abstract" : 'MedlineCitation/Article/Abstract/AbstractText' "abstract" : 'MedlineCitation/Article/Abstract/AbstractText',
} "publication_year" : 'MedlineCitation/DateCreated/Year',
"publication_month" : 'MedlineCitation/DateCreated/Month',
"publication_day" : 'MedlineCitation/DateCreated/Day',
}
for key, path in metadata_path.items(): for key, path in metadata_path.items():
try: try:
node = xml_article.find(path) node = xml_article.find(path)
......
{
"metadata": {
"name": "",
"signature": "sha256:71dcc854ee670084dd2d3795a96e0faa7d3feb1f1958d41b08c32fe1a0d70be9"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pprint import pprint\n",
"from node.models import Node, NodeType, Language, Ngram\n",
"from django.contrib.auth.models import User\n",
"import parsing\n",
"from parsing.FileParsers import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Define user\n",
"try:\n",
" user = User.objects.get(username='Mat')\n",
"except:\n",
" user = User(username='Mat', password='0123', email='mathieu@rodic.fr')\n",
" user.save()\n",
"\n",
"# Define document types\n",
"nodetypes = {}\n",
"for name in ['Corpus', 'Document']:\n",
" try:\n",
" nodetypes[name] = NodeType.objects.get(name=name)\n",
" except:\n",
" nodetypes[name] = NodeType(name=name)\n",
" nodetypes[name].save()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Node.objects.all().delete()\n",
"corpus = Node(name='PubMed corpus', user=user, type=nodetypes['Corpus'])\n",
"corpus.save()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fileparser = PubmedFileParser.PubmedFileParser(file='/home/mat/projects/gargantext/data_samples/pubmed.zip')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fileparser.parse(corpus)\n",
"print('Ok!')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Warning: parsing empty text\n",
"Warning: parsing empty text\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Warning: parsing empty text"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Ok!"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for node_ngram in corpus.children.first().node_ngram_set.all():\n",
" print(node_ngram.ngram.terms)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"plant-pathogenic rna virus\n",
"significant source\n",
"result\n",
"host populations\n",
"in\n",
"arthropod hosts\n",
"unique example\n",
"spread\n",
"tobacco ringspot\n",
"colony survival\n",
"apis mellifera\n",
"other bee viruses"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"negative impact\n",
"threat\n",
"honeybees\n",
"varroa mites\n",
"intracellular life cycle\n",
"virus\n",
"conjunction\n",
"honeybee hosts\n",
"bee hemolymph\n",
"distinct lineage\n",
"transkingdom host alteration"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"monophyletic clade\n",
"prevalence\n",
"winter\n",
"pathogen host shifts\n",
"furthermore\n",
"species-level genetic variation\n",
"trsvs\n",
"diseases\n",
"gradual decline\n",
"domesticates\n",
"systemic invasion"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"high mutation rates\n",
"pathogenesis\n",
"entire body\n",
"humans\n",
"plant hosts\n",
"infections\n",
"virions\n",
"plant\n",
"varroa"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"infectious diseases\n",
"winter colony collapse\n",
"infected colonies\n",
"rna viruses\n",
"gastric cecum"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"trsv-infected individuals\n",
"instances\n",
"host ranges\n",
"health\n",
"viruses\n",
"study\n",
"bees\n",
"ectoparasitic varroa\n",
"present study\n",
"tree topology"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"animal kingdoms\n",
"phylogenetic analysis\n",
"colonies\n",
"feed\n",
"common ancestor\n",
"trsv\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
}
],
"metadata": {}
}
]
}
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment