Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
humanities
gargantext
Commits
aac329cd
Commit
aac329cd
authored
Jun 28, 2017
by
delanoe
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[DOCS/SPECS] Architecture for TF ICF.
parent
bdf80283
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
58 additions
and
2 deletions
+58
-2
ngram_lists.md
docs/ngram_lists.md
+58
-2
No files found.
docs/ngram_lists.md
View file @
aac329cd
...
@@ -110,7 +110,9 @@ Let be a set of ngrams where NodeNgram != 0 then
...
@@ -110,7 +110,9 @@ Let be a set of ngrams where NodeNgram != 0 then
A classifier (Support Machine Vector) is used on the following scaled-measures
A classifier (Support Machine Vector) is used on the following scaled-measures
for each step:
for each step:
-
Occurrences : Zip Law
-
n (of the "n" gram)
-
Occurrences : Zip Law (in fact already used in TFICF, this
features are correletad, put here for pedagogical purpose)
-
TFICF-CORPUS-SOURCETYPE
-
TFICF-CORPUS-SOURCETYPE
-
TFICF-SOURCETYPE-ALL
-
TFICF-SOURCETYPE-ALL
-
Genericity score
-
Genericity score
...
@@ -144,18 +146,72 @@ If the context is a document in a set of documents (corpus), then it is a TFIDF
...
@@ -144,18 +146,72 @@ If the context is a document in a set of documents (corpus), then it is a TFIDF
Then TFICF-DOCUMENT-CORPUS == TFICF(ngram,DOCUMENT,CORPUS) = TFIDF.
Then TFICF-DOCUMENT-CORPUS == TFICF(ngram,DOCUMENT,CORPUS) = TFIDF.
TFICF is the generalization of
[
TFIDF, Term Frequency - Inverse Document Frequency
](
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
)
.
TFICF is the generalization of
[
TFIDF, Term Frequency - Inverse Document Frequency
](
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
)
.
#### Implementation
TFICF = TF
*
log (ICF)
To prepare the groups, we need to store TF and ICF seperately (in
NodesNogram via 2 nodes).
Let be TF and ICF typename of Nodes.
Node[USER](gargantua)
├── Node[OCCURRENCES](source)
├── Node[TF](all sourcetype)
├── Node[ICF](all sourcetype)
├── Node[SOURCETYPE](Pubmed)
│ ├── Node[OCCURRENCES](all corpora)
│ ├── Node[TF](all corpora)
│ └── Node[ICF](all corpora)
├── Node[SOURCETYPE](WOS)
## others ngrams lists
## others ngrams lists
### Group List
### Group List
#### Definition
Group list gives a quantifiable link between two ngrams.
#### Definition
#### Policy to build group lists
#### Policy to build group lists
To group the ngrams:
-
stemming or lemming
-
c-value
-
clustering (see graphs)
-
manually by the user (supervised learning)
The scale is the character.
#### Storage
#### Storage
In the table NodeNgramNgram where Node has type name Group for ngram1
and ngram2.
### Favorite List
#### Definition
Fovorite Nodes
The scale is the node.
#### Building policy
-
manually by the user (supervised learning)
#### Storage
NodeNode relation where first Node has type Favorite.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment