Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
160
Issues
160
List
Board
Labels
Milestones
Merge Requests
14
Merge Requests
14
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
fcb48b8f
Verified
Commit
fcb48b8f
authored
Dec 30, 2024
by
Przemyslaw Kaminski
1
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[ngrams] some more simplification of ngramsByDoc'
parent
ab7c1766
Pipeline
#7173
failed with stages
in 14 minutes and 21 seconds
Changes
2
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
28 additions
and
12 deletions
+28
-12
Utils.hs
src/Gargantext/Database/Action/Flow/Utils.hs
+26
-10
Count.hs
test/Test/Ngrams/Count.hs
+2
-2
No files found.
src/Gargantext/Database/Action/Flow/Utils.hs
View file @
fcb48b8f
...
...
@@ -197,22 +197,38 @@ ngramsByDoc l nt ts docs =
ngramsByDoc'
l
nt
ts
<$>
docs
-- | Given list of terms and a document, produce a map for this doc's
-- terms count and weights
-- terms count and weights. Notice that the weight is always 1 here.
-- ngramsByDoc' :: Lang
-- -> NgramsType
-- -> [NT.NgramsTerm]
-- -> ContextOnlyId HyperdataDocument
-- -> HashMap.HashMap ExtractedNgrams (DM.Map NgramsType (Map NodeId (Int, TermsCount)))
-- ngramsByDoc' l nt ts doc =
-- HashMap.fromListWith (DM.unionWith (DM.unionWith (\(_a,b) (_a',b') -> (1,b+b')))) withExtractedNgrams
-- where
-- _docNgrams' :: ([(MatchedText, TermsCount)], NodeId)
-- _docNgrams'@(matched, nId) = (docNgrams l ts doc, doc ^. context_oid_id)
-- withExtractedNgrams :: [(ExtractedNgrams, Map NgramsType (Map NodeId (Int, TermsCount)))]
-- withExtractedNgrams =
-- map (\(matchedText, cnt) ->
-- ( SimpleNgrams (text2ngrams matchedText)
-- , DM.singleton nt $ DM.singleton nId (1, cnt) ) ) matched
ngramsByDoc'
::
Lang
->
NgramsType
->
[
NT
.
NgramsTerm
]
->
ContextOnlyId
HyperdataDocument
->
HashMap
.
HashMap
ExtractedNgrams
(
DM
.
Map
NgramsType
(
Map
NodeId
(
Int
,
TermsCount
)))
ngramsByDoc'
l
nt
ts
doc
=
HashMap
.
fromListWith
(
DM
.
unionWith
(
DM
.
unionWith
(
\
(
_a
,
b
)
(
_a'
,
b'
)
->
(
1
,
b
+
b'
))))
withExtractedNgrams
HashMap
.
map
(
\
cnt
->
DM
.
singleton
nt
$
DM
.
singleton
nId
(
1
,
cnt
))
extractedMap
where
docNgrams'
::
([(
MatchedText
,
TermsCount
)],
NodeId
)
docNgrams'
=
(
docNgrams
l
ts
doc
,
doc
^.
context_oid_id
)
_
docNgrams'
::
([(
MatchedText
,
TermsCount
)],
NodeId
)
_docNgrams'
@
(
matched
,
nId
)
=
(
docNgrams
l
ts
doc
,
doc
^.
context_oid_id
)
(
matched
,
nId
)
=
docNgrams'
withExtractedNgrams
::
[(
ExtractedNgrams
,
TermsCount
)]
withExtractedNgrams
=
first
(
SimpleNgrams
.
text2ngrams
)
<$>
matched
withExtractedNgrams
::
[(
ExtractedNgrams
,
Map
NgramsType
(
Map
NodeId
(
Int
,
TermsCount
)))]
withExtractedNgrams
=
map
(
\
(
matchedText
,
cnt
)
->
(
SimpleNgrams
(
text2ngrams
matchedText
)
,
DM
.
singleton
nt
$
DM
.
singleton
nId
(
1
,
cnt
)
)
)
matched
extractedMap
::
HashMap
.
HashMap
ExtractedNgrams
TermsCount
extractedMap
=
HashMap
.
fromListWith
(
+
)
withExtractedNgrams
test/Test/Ngrams/Count.hs
View file @
fcb48b8f
...
...
@@ -138,7 +138,7 @@ testNgramsByDoc01 = do
let
hd1
=
emptyHyperdataDocument
{
_hd_title
=
Just
"hello world, kaboom"
,
_hd_abstract
=
Nothing
}
let
ctx1
=
ContextOnlyId
1
hd1
let
hd2
=
emptyHyperdataDocument
{
_hd_title
=
Just
"world, boom"
let
hd2
=
emptyHyperdataDocument
{
_hd_title
=
Just
"world, boom
world
"
,
_hd_abstract
=
Nothing
}
let
ctx2
=
ContextOnlyId
2
hd2
...
...
@@ -151,7 +151,7 @@ testNgramsByDoc01 = do
]
,
HashMap
.
fromList
[
(
SimpleNgrams
$
UnsafeNgrams
{
_ngramsTerms
=
"world"
,
_ngramsSize
=
1
}
,
Map
.
singleton
NgramsTerms
$
Map
.
singleton
(
UnsafeMkNodeId
2
)
(
1
,
1
)
)
,
Map
.
singleton
NgramsTerms
$
Map
.
singleton
(
UnsafeMkNodeId
2
)
(
1
,
2
)
)
]
]
...
...
Przemyslaw Kaminski
@cgenie
mentioned in commit
03b33383
·
Jan 30, 2025
mentioned in commit
03b33383
mentioned in commit 03b33383dd67c1821a4edb4628923cf7bd039d90
Toggle commit list
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment