Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
H
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Przemyslaw Kaminski
haskell-gargantext
Commits
0f0feaac
Commit
0f0feaac
authored
Nov 22, 2017
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FEAT] Ngrams metrics, thanks to text-metrics to begin with.
parent
e6a1adfe
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
223 additions
and
10 deletions
+223
-10
gargantext.cabal
gargantext.cabal
+8
-0
Ngrams.hs
src/Data/Gargantext/Ngrams.hs
+10
-8
Metrics.hs
src/Data/Gargantext/Ngrams/Metrics.hs
+75
-0
stack.yaml
stack.yaml
+3
-1
Main.hs
test/Main.hs
+2
-1
Metrics.hs
test/Ngrams/Metrics.hs
+125
-0
No files found.
gargantext.cabal
View file @
0f0feaac
...
@@ -27,10 +27,12 @@ library
...
@@ -27,10 +27,12 @@ library
, conduit
, conduit
, conduit-extra
, conduit-extra
, directory
, directory
, duckling
, extra
, extra
, filepath
, filepath
, http-conduit
, http-conduit
, lens
, lens
, logging-effect
, opaleye
, opaleye
, path
, path
, parsec
, parsec
...
@@ -49,9 +51,12 @@ library
...
@@ -49,9 +51,12 @@ library
, split
, split
, tagsoup
, tagsoup
, text
, text
, text-metrics
, time
, time
, timezone-series
, time-locale-compat
, time-locale-compat
, transformers
, transformers
, unordered-containers
, uuid
, uuid
, vector
, vector
, wai
, wai
...
@@ -82,10 +87,12 @@ library
...
@@ -82,10 +87,12 @@ library
, Data.Gargantext.Ngrams.Parser
, Data.Gargantext.Ngrams.Parser
, Data.Gargantext.Ngrams.Lang.En
, Data.Gargantext.Ngrams.Lang.En
, Data.Gargantext.Ngrams.Lang.Fr
, Data.Gargantext.Ngrams.Lang.Fr
, Data.Gargantext.Ngrams.Metrics
, Data.Gargantext.Ngrams.TextMining
, Data.Gargantext.Ngrams.TextMining
, Data.Gargantext.Ngrams.Occurrences
, Data.Gargantext.Ngrams.Occurrences
, Data.Gargantext.Parsers
, Data.Gargantext.Parsers
, Data.Gargantext.Parsers.WOS
, Data.Gargantext.Parsers.WOS
, Data.Gargantext.Parsers.Date
, Data.Gargantext.Prelude
, Data.Gargantext.Prelude
, Data.Gargantext.Server
, Data.Gargantext.Server
, Data.Gargantext.Types
, Data.Gargantext.Types
...
@@ -117,6 +124,7 @@ test-suite garg-test-ngrams
...
@@ -117,6 +124,7 @@ test-suite garg-test-ngrams
Ngrams.Lang.Fr
Ngrams.Lang.Fr
Ngrams.Lang.En
Ngrams.Lang.En
Ngrams.Lang.Occurrences
Ngrams.Lang.Occurrences
Ngrams.Metrics
build-depends: base
build-depends: base
, extra
, extra
, text
, text
...
...
src/Data/Gargantext/Ngrams.hs
View file @
0f0feaac
module
Data.Gargantext.Ngrams
(
module
Data.Gargantext.Ngrams
(
module
Data
.
Gargantext
.
Ngrams
.
Count
module
Data
.
Gargantext
.
Ngrams
.
Count
,
--, module Data.Gargantext.Ngrams.Hetero
--module Data.Gargantext.Ngrams.Hetero,
,
module
Data
.
Gargantext
.
Ngrams
.
CoreNLP
module
Data
.
Gargantext
.
Ngrams
.
CoreNLP
,
,
module
Data
.
Gargantext
.
Ngrams
.
Parser
module
Data
.
Gargantext
.
Ngrams
.
Parser
,
,
module
Data
.
Gargantext
.
Ngrams
.
Occurrences
module
Data
.
Gargantext
.
Ngrams
.
Occurrences
,
,
module
Data
.
Gargantext
.
Ngrams
.
TextMining
module
Data
.
Gargantext
.
Ngrams
.
TextMining
,
module
Data
.
Gargantext
.
Ngrams
.
Metrics
--
module Data.Gargantext.Ngrams.Words
--,
module Data.Gargantext.Ngrams.Words
)
where
)
where
import
Data.Gargantext.Ngrams.Count
import
Data.Gargantext.Ngrams.Count
...
@@ -17,3 +17,5 @@ import Data.Gargantext.Ngrams.Parser
...
@@ -17,3 +17,5 @@ import Data.Gargantext.Ngrams.Parser
import
Data.Gargantext.Ngrams.Occurrences
import
Data.Gargantext.Ngrams.Occurrences
import
Data.Gargantext.Ngrams.TextMining
import
Data.Gargantext.Ngrams.TextMining
--import Data.Gargantext.Ngrams.Words
--import Data.Gargantext.Ngrams.Words
import
Data.Gargantext.Ngrams.Metrics
src/Data/Gargantext/Ngrams/Metrics.hs
0 → 100644
View file @
0f0feaac
module
Data.Gargantext.Ngrams.Metrics
(
levenshtein
,
levenshteinNorm
,
damerauLevenshtein
,
damerauLevenshteinNorm
,
overlap
,
jaccard
,
hamming
)
where
import
Data.Text
(
Text
)
import
GHC.Real
(
Ratio
)
import
qualified
Data.Text.Metrics
as
DTM
-- | This module provide metrics to compare Text
-- starting as an API rexporting main functions of the great lib
-- text-metrics of Mark Karpov
-- | Levenshtein Distance
-- In information theory, Linguistics and computer science,
-- the Levenshtein distance is a string metric for measuring
-- the difference between two sequences.
-- See: https://en.wikipedia.org/wiki/Levenshtein_distance
levenshtein
::
Text
->
Text
->
Int
levenshtein
=
DTM
.
levenshtein
-- | Return normalized Levenshtein distance between two 'Text' values.
-- Result is a non-negative rational number (represented as @'Ratio'
-- 'Data.Numeric.Natural'@), where 0 signifies no similarity between the
-- strings, while 1 means exact match.
levenshteinNorm
::
Text
->
Text
->
Ratio
Int
levenshteinNorm
=
DTM
.
levenshteinNorm
-- | Return Damerau-Levenshtein distance between two 'Text' values. The
-- function works like 'levenshtein', but the collection of allowed
-- operations also includes transposition of two /adjacent/ characters.
-- See also:
-- <https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance>
damerauLevenshtein
::
Text
->
Text
->
Int
damerauLevenshtein
=
DTM
.
damerauLevenshtein
-- damerau-Levenshtein distance normalized
damerauLevenshteinNorm
::
Text
->
Text
->
Ratio
Int
damerauLevenshteinNorm
=
DTM
.
damerauLevenshteinNorm
-- Treating inputs like sets
-- | Return overlap coefficient for two 'Text' values. Returned value
-- is in the range from 0 (no similarity) to 1 (exact match). Return 1
-- if both 'Text' values are empty.
--
-- See also: <https://en.wikipedia.org/wiki/Overlap_coefficient>.
overlap
::
Text
->
Text
->
Ratio
Int
overlap
=
DTM
.
overlap
-- | Jaccard distance
-- measures dissimilarity between sample sets
jaccard
::
Text
->
Text
->
Ratio
Int
jaccard
=
DTM
.
jaccard
-- | Hamming Distance
-- In information theory, the Hamming distance between two strings of
-- equal length is the number of positions at which the corresponding
-- symbols are different. In other words, it measures the minimum number of
-- substitutions required to change one string into the other
-- See: https://en.wikipedia.org/wiki/Hamming_distance
hamming
::
Text
->
Text
->
Maybe
Int
hamming
=
DTM
.
hamming
stack.yaml
View file @
0f0feaac
...
@@ -4,5 +4,7 @@ packages:
...
@@ -4,5 +4,7 @@ packages:
-
.
-
.
-
/home/alexandre/local/logiciels/haskell/servant/servant-multipart
-
/home/alexandre/local/logiciels/haskell/servant/servant-multipart
extra-deps
:
extra-deps
:
-
utc-0.2.0.1
-
aeson-1.0.2.1
-
duckling-0.1.3.0
-
protolude-0.2
resolver
:
lts-9.2
resolver
:
lts-9.2
test/Main.hs
View file @
0f0feaac
...
@@ -2,10 +2,11 @@ import Data.Gargantext.Types.Main (Language(..))
...
@@ -2,10 +2,11 @@ import Data.Gargantext.Types.Main (Language(..))
--import qualified Ngrams.Lang.Fr as Fr
--import qualified Ngrams.Lang.Fr as Fr
import
qualified
Ngrams.Lang
as
Lang
import
qualified
Ngrams.Lang
as
Lang
import
qualified
Ngrams.Lang.Occurrences
as
Occ
import
qualified
Ngrams.Lang.Occurrences
as
Occ
import
qualified
Ngrams.Metrics
as
Metrics
main
::
IO
()
main
::
IO
()
main
=
do
main
=
do
Occ
.
parsersTest
Occ
.
parsersTest
Lang
.
ngramsExtractionTest
EN
Lang
.
ngramsExtractionTest
EN
Metrics
.
main
--Lang.ngramsExtractionTest FR
--Lang.ngramsExtractionTest FR
test/Ngrams/Metrics.hs
0 → 100644
View file @
0f0feaac
{-# LANGUAGE CPP #-}
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module
Ngrams.Metrics
(
main
)
where
import
Data.Ratio
import
Data.Text
(
Text
)
import
Data.Gargantext.Ngrams.Metrics
import
Test.Hspec
import
Test.QuickCheck
import
qualified
Data.Text
as
T
#
if
!
MIN_VERSION_base
(
4
,
8
,
0
)
import
Control.Applicative
#
endif
instance
Arbitrary
Text
where
arbitrary
=
T
.
pack
<$>
arbitrary
main
::
IO
()
main
=
hspec
spec
spec
::
Spec
spec
=
do
describe
"levenshtein"
$
do
testSwap
levenshtein
context
"with concrete examples"
$
do
testPair
levenshtein
"kitten"
"sitting"
3
testPair
levenshtein
"cake"
"drake"
2
testPair
levenshtein
"saturday"
"sunday"
3
testPair
levenshtein
"red"
"wax"
3
#
if
__GLASGOW_HASKELL__
>=
710
testPair
levenshtein
"a😀c"
"abc"
1
#
endif
testPair
levenshtein
"lucky"
"lucky"
0
testPair
levenshtein
""
""
0
describe
"levenshteinNorm"
$
do
testSwap
levenshteinNorm
testPair
levenshteinNorm
"kitten"
"sitting"
(
4
%
7
)
testPair
levenshteinNorm
"cake"
"drake"
(
3
%
5
)
testPair
levenshteinNorm
"saturday"
"sunday"
(
5
%
8
)
testPair
levenshteinNorm
"red"
"wax"
(
0
%
1
)
#
if
__GLASGOW_HASKELL__
>=
710
testPair
levenshteinNorm
"a😀c"
"abc"
(
2
%
3
)
#
endif
testPair
levenshteinNorm
"lucky"
"lucky"
(
1
%
1
)
testPair
levenshteinNorm
""
""
(
1
%
1
)
describe
"damerauLevenshtein"
$
do
testSwap
damerauLevenshtein
testPair
damerauLevenshtein
"veryvery long"
"very long"
4
testPair
damerauLevenshtein
"thing"
"think"
1
testPair
damerauLevenshtein
"nose"
"ones"
2
testPair
damerauLevenshtein
"thing"
"sign"
3
testPair
damerauLevenshtein
"red"
"wax"
3
#
if
__GLASGOW_HASKELL__
>=
710
testPair
damerauLevenshtein
"a😀c"
"abc"
1
#
endif
testPair
damerauLevenshtein
"lucky"
"lucky"
0
testPair
damerauLevenshtein
""
""
0
describe
"damerauLevenshteinNorm"
$
do
testSwap
damerauLevenshteinNorm
testPair
damerauLevenshteinNorm
"veryvery long"
"very long"
(
9
%
13
)
testPair
damerauLevenshteinNorm
"thing"
"think"
(
4
%
5
)
testPair
damerauLevenshteinNorm
"nose"
"ones"
(
1
%
2
)
testPair
damerauLevenshteinNorm
"thing"
"sign"
(
2
%
5
)
testPair
damerauLevenshteinNorm
"red"
"wax"
(
0
%
1
)
#
if
__GLASGOW_HASKELL__
>=
710
testPair
damerauLevenshteinNorm
"a😀c"
"abc"
(
2
%
3
)
#
endif
testPair
damerauLevenshteinNorm
"lucky"
"lucky"
(
1
%
1
)
testPair
damerauLevenshteinNorm
""
""
(
1
%
1
)
describe
"hamming"
$
do
testSwap
hamming
testPair
hamming
"karolin"
"kathrin"
(
Just
3
)
testPair
hamming
"karolin"
"kerstin"
(
Just
3
)
testPair
hamming
"1011101"
"1001001"
(
Just
2
)
testPair
hamming
"2173896"
"2233796"
(
Just
3
)
testPair
hamming
"toned"
"roses"
(
Just
3
)
testPair
hamming
"red"
"wax"
(
Just
3
)
#
if
__GLASGOW_HASKELL__
>=
710
testPair
hamming
"a😀c"
"abc"
(
Just
1
)
#
endif
testPair
hamming
"lucky"
"lucky"
(
Just
0
)
testPair
hamming
""
""
(
Just
0
)
testPair
hamming
"small"
"big"
Nothing
describe
"overlap"
$
do
testSwap
overlap
testPair
overlap
"fly"
"butterfly"
(
1
%
1
)
testPair
overlap
"night"
"nacht"
(
3
%
5
)
testPair
overlap
"context"
"contact"
(
5
%
7
)
testPair
overlap
"red"
"wax"
(
0
%
1
)
#
if
__GLASGOW_HASKELL__
>=
710
testPair
overlap
"a😀c"
"abc"
(
2
%
3
)
#
endif
testPair
overlap
"lucky"
"lucky"
(
1
%
1
)
describe
"jaccard"
$
do
testSwap
jaccard
testPair
jaccard
"xxx"
"xyx"
(
1
%
2
)
testPair
jaccard
"night"
"nacht"
(
3
%
7
)
testPair
jaccard
"context"
"contact"
(
5
%
9
)
#
if
__GLASGOW_HASKELL__
>=
710
testPair
overlap
"a😀c"
"abc"
(
2
%
3
)
#
endif
testPair
jaccard
"lucky"
"lucky"
(
1
%
1
)
-- | Test that given function returns the same results when order of
-- arguments is swapped.
testSwap
::
(
Eq
a
,
Show
a
)
=>
(
Text
->
Text
->
a
)
->
SpecWith
()
testSwap
f
=
context
"if we swap the arguments"
$
it
"produces the same result"
$
property
$
\
a
b
->
f
a
b
===
f
b
a
-- | Create spec for given metric function applying it to two 'Text' values
-- and comparing the result with expected one.
testPair
::
(
Eq
a
,
Show
a
)
=>
(
Text
->
Text
->
a
)
-- ^ Function to test
->
Text
-- ^ First input
->
Text
-- ^ Second input
->
a
-- ^ Expected result
->
SpecWith
()
testPair
f
a
b
r
=
it
(
"‘"
++
T
.
unpack
a
++
"’ and ‘"
++
T
.
unpack
b
++
"’"
)
$
f
a
b
`
shouldBe
`
r
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment