Commit aac329cd authored by delanoe's avatar delanoe

[DOCS/SPECS] Architecture for TF ICF.

parent bdf80283
...@@ -110,7 +110,9 @@ Let be a set of ngrams where NodeNgram != 0 then ...@@ -110,7 +110,9 @@ Let be a set of ngrams where NodeNgram != 0 then
A classifier (Support Machine Vector) is used on the following scaled-measures A classifier (Support Machine Vector) is used on the following scaled-measures
for each step: for each step:
- Occurrences : Zip Law - n (of the "n" gram)
- Occurrences : Zip Law (in fact already used in TFICF, this
features are correletad, put here for pedagogical purpose)
- TFICF-CORPUS-SOURCETYPE - TFICF-CORPUS-SOURCETYPE
- TFICF-SOURCETYPE-ALL - TFICF-SOURCETYPE-ALL
- Genericity score - Genericity score
...@@ -144,18 +146,72 @@ If the context is a document in a set of documents (corpus), then it is a TFIDF ...@@ -144,18 +146,72 @@ If the context is a document in a set of documents (corpus), then it is a TFIDF
Then TFICF-DOCUMENT-CORPUS == TFICF(ngram,DOCUMENT,CORPUS) = TFIDF. Then TFICF-DOCUMENT-CORPUS == TFICF(ngram,DOCUMENT,CORPUS) = TFIDF.
TFICF is the generalization of [TFIDF, Term Frequency - Inverse Document Frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf). TFICF is the generalization of [TFIDF, Term Frequency - Inverse Document Frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).
#### Implementation
TFICF = TF * log (ICF)
To prepare the groups, we need to store TF and ICF seperately (in
NodesNogram via 2 nodes).
Let be TF and ICF typename of Nodes.
Node[USER](gargantua)
├── Node[OCCURRENCES](source)
├── Node[TF](all sourcetype)
├── Node[ICF](all sourcetype)
├── Node[SOURCETYPE](Pubmed)
│   ├── Node[OCCURRENCES](all corpora)
│   ├── Node[TF](all corpora)
│   └── Node[ICF](all corpora)
├── Node[SOURCETYPE](WOS)
## others ngrams lists ## others ngrams lists
### Group List ### Group List
#### Definition
Group list gives a quantifiable link between two ngrams.
#### Definition
#### Policy to build group lists #### Policy to build group lists
To group the ngrams:
- stemming or lemming
- c-value
- clustering (see graphs)
- manually by the user (supervised learning)
The scale is the character.
#### Storage #### Storage
In the table NodeNgramNgram where Node has type name Group for ngram1
and ngram2.
### Favorite List
#### Definition
Fovorite Nodes
The scale is the node.
#### Building policy
- manually by the user (supervised learning)
#### Storage
NodeNode relation where first Node has type Favorite.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment