Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 159
    • Issues 159
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 7
    • Merge Requests 7
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Issues
  • #50

Closed
Open
Opened Oct 20, 2020 by david Chavalarias@davidchavalarias
  • Report abuse
  • New issue
Report abuse New issue

Similarité distributionnelle

Ditrance ditributionelle

tags: gargantext

Objective : We want to compute with matrices processing the similarity between term $i$ and term $j$ : prox(i,j)=\frac{\Sigma_{k \neq i,j} min(\frac{n_{ik}^2}{n_{ii}n_{kk}},\frac{n_{jk}^2}{n_{jj}n_{kk}})}{\Sigma_{k \neq i,j}\frac{n_{ik}^2}{ n_{ii}n_{kk}}}

where n_{ij} is the number of cooccurrences between term i and term j

Definitions and Operators for multi-dimensional arrays

Definitions

  • operator : .* and ./ cell by cell multiplication and division of the matrix
  • operator * is the matrix multiplication
  • operator : Diag(M)=[n_{ii}]_i (vecteur)
  • Matrice M=[n_{ij}]_{i,j}
  • Id = identity Matrix (ones on the diagonal, zeros everywhere else)
  • Matrice ones : O=[1]_{i,j} (one everywhere)
  • D(M)=Id .* M (the square matrix that has zeros everywhere except on its diagonal where it is equal to M)

Operators

repmat :: Matrix -> Int -> Int -> Int -> 3 dimensional array repmat M $n_1$ $n_2$ $n_3$ is the 3 dimensional array that replicates M along dimensions 1 2 and 3 respectively n_1, n_2 and n_3 times.

Example : if M=[m_{i,j}]_{i,j \leq n} is a n x n Matrix, M'=repmat M 1 1 n = [m'_{i,j,k}]_{i,j,k \leq n} is a 3 dimentional cube shaped array such that \forall i,j,k, m'_{i,j,k}=m_{i,j}

Type definition :: Vect123 [i j k] is of type Vect123 iff :

  • i,j,k\in\{1,~2,~3\}
  • sort asc {i,j,k}={1, 2, 3}

proj :: Int \in {1 2 3} -> Vect123 -> Int

Noted proj n [i j k] it returns the pth element of [i j k] Operator that permute the numbers 1 2 3. Example :

  • proj 1 [3 1 2] = 3 ;
  • proj 2 [3 1 2] = 1

permute :: 3 dimentional array -> Vect123 -> 3 dimensional array permute M' [x y z] = M" where

  • M'=[m'_{i,j,k}]_{i,j,k}

  • M"=[m'_{proj~x~[i~j~k],proj~y~[i~j~k],proj~z~[i~j~k]}]_{i,j,k}

sum3 :: 3 dimentional array -> Matrix -- projection of a 3 dimentional array onto a matrix by summation over one of the dimensions sum3 M'=M where
M'=[m'_{i,j,k}]_{i,j,k} M=[m_{i,j}]_{i,j} where m_{i,j}=\Sigma_k m'_{i,j,k}

Main operations

We first build the matrix that will guaranty the condition k\neq i,j

  • I=repmat (O-Id) 1 1 N
  • I_1=permute~I~[1~3~2] has 0 where i=k and ones elsewhere
  • I_2=permute~I~[3~2~1] has 0 where j=k and ones elsewhere
  • II=I_1.*I_2=[r_{i,j,k}] is a matrix such that k= i,j \implies r_{i,j,k}=0 ; r_{i,j,k}=1 otherwise.

Then Let

  • D_1=O * D(M) =[n_{jj}]_{i,j}
  • D_2=D(M) * O =[n_{ii}]_{i,j}
  • MI=(M ./ D_1) .* (M ./ D_2 )=[\frac{n_{ij}^2}{n_{ii}n_{jj}}]_{ij}
  • W=repmat MI 1 1 N=[\frac{n_{ij}^2}{n_{ii}n_{jj}}]_{ijk}
  • W_1=permute~W~[1~3~2]=[\frac{n_{ik}^2}{n_{ii}n_{kk}}]_{ijk}
  • W_2=permute~W~[3~2~1]=[\frac{n_{jk}^2}{n_{jj}n_{kk}}]_{ijk}
  • W'=min(W_1,W_2)=[min(\frac{n_{ik}^2}{n_{ii}n_{kk}},\frac{n_{jk}^2}{n_{jj}n_{kk}})]_{ijk}
  • Z_1=sum3~(W'.*II)=[\Sigma_{k \neq i,j} min(\frac{n_{ik}^2}{n_{ii}n_{kk}},\frac{n_{jk}^2}{n_{jj}n_{kk}})]_{i,j}
  • Z_2=sum3~(W_1.*II)=[\Sigma_{k \neq i,j} \frac{n_{ik}^2}{n_{ii}n_{kk}}]_{i,j}

Then prox(M)=[prox(i,j)]_{i,j}=Z_1./Z_2

Log Distributionnal Similarity

In order to be aligned with the V3 and also to take into account terms Zipf law distribution,the final implementation of the distributional similarity should be a variant of the above.

We will consider

To=\Sigma_{i,j}n_{ij}

s_i=\Sigma_{i\neq k}n_{ik}

For i\neq j, m_{i,j}=log(To*\frac{n_{ij}}{s_i*s_j})

sumMin_{ij}=\Sigma_{k \neq i,j} min(m_{ik},m_{jk})

SumM_i=\Sigma_{k\neq i,j} m_{ik}

And the similarity LogSim will be defined by:

LogSim(i,j)=\frac{sumMin_{ij}}{SumM_i}

Edited Feb 25, 2021 by david Chavalarias
Assignee
Assign to
V4 Release
Milestone
V4 Release
Assign milestone
Time tracking
None
Due date
None
1
Labels
map/Graph
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext#50