Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
198
Issues
198
List
Board
Labels
Milestones
Merge Requests
12
Merge Requests
12
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
93e711b1
Commit
93e711b1
authored
Nov 25, 2022
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FIX][DOC] Indexing issue
parent
439d3771
Pipeline
#3420
passed with stage
in 92 minutes and 38 seconds
Changes
5
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
31 additions
and
23 deletions
+31
-23
CHANGELOG.md
CHANGELOG.md
+6
-0
gargantext.cabal
gargantext.cabal
+2
-2
package.yaml
package.yaml
+1
-1
List.hs
src/Gargantext/API/Ngrams/List.hs
+7
-10
WithList.hs
src/Gargantext/Core/Text/Terms/WithList.hs
+15
-10
No files found.
CHANGELOG.md
View file @
93e711b1
## Version 0.0.6.8.5.1
*
[
BACK
][
FIX
]
Indexing issue: taking all terms instead of longest of terms in case of ngrams included in others
*
[
FRONT
][
FIX
][
Disconnection of instance causes a blank page (#464)
](
https://gitlab.iscpif.fr/gargantext/purescript-gargantext/issues/464
)
*
[
BACK
][
FIX
]
ArXiv search in Abstracts by default
## Version 0.0.6.8.5
*
[
BACK
][
FIX
][
Ngrams Table, page sort / limit (#149)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/149
)
*
[
FRONT
][
FIX
][
Security Issue with Teams (#452)
](
https://gitlab.iscpif.fr/gargantext/purescript-gargantext/issues/452
)
...
...
gargantext.cabal
View file @
93e711b1
cabal-version: 1.12
-- This file has been generated from package.yaml by hpack version 0.3
4.7
.
-- This file has been generated from package.yaml by hpack version 0.3
5.0
.
--
-- see: https://github.com/sol/hpack
name: gargantext
version: 0.0.6.8.5
version: 0.0.6.8.5
.1
synopsis: Search, map, share
description: Please see README.md
category: Data
...
...
package.yaml
View file @
93e711b1
...
...
@@ -6,7 +6,7 @@ name: gargantext
# | | | +----- Layers * : New versions with API additions
# | | | | +--- Layers * : New versions without API breaking changes
# | | | | |
version
:
'
0.0.6.8.5'
version
:
'
0.0.6.8.5
.1
'
synopsis
:
Search, map, share
description
:
Please see README.md
category
:
Data
...
...
src/Gargantext/API/Ngrams/List.hs
View file @
93e711b1
...
...
@@ -130,26 +130,23 @@ reIndexWith :: ( HasNodeStory env err m
->
Set
ListType
->
m
()
reIndexWith
cId
lId
nt
lts
=
do
printDebug
"(cId,lId,nt,lts)"
(
cId
,
lId
,
nt
,
lts
)
-- Getting [NgramsTerm]
ts
<-
List
.
concat
<$>
map
(
\
(
k
,
vs
)
->
k
:
vs
)
<$>
HashMap
.
toList
<$>
getTermsWith
identity
[
lId
]
nt
lts
-- printDebug "ts" ts
-- Taking the ngrams with 0 occurrences only (orphans)
-- occs <- getOccByNgramsOnlyFast' cId lId nt ts
-- printDebug "occs" occs
let
orphans
=
ts
{- List.concat
$ map (\t -> case HashMap.lookup t occs of
Nothing -> [t]
Just n -> if n <= 1 then [t] else [ ]
) ts
-}
-- printDebug "orphans" orphans
-}
printDebug
"orphans"
orphans
-- Get all documents of the corpus
docs
<-
selectDocNodes
cId
...
...
@@ -171,12 +168,12 @@ reIndexWith cId lId nt lts = do
(
List
.
cycle
[
Map
.
fromList
$
[(
nt
,
Map
.
singleton
(
doc
^.
context_id
)
1
)]])
)
docs
-- printDebug "ngramsByDoc
" ngramsByDoc
printDebug
"ngramsByDoc:
"
ngramsByDoc
-- Saving the indexation in database
_
<-
mapM
(
saveDocNgramsWith
lId
)
ngramsByDoc
pure
()
-- ngramsByDoc
pure
()
toIndexedNgrams
::
HashMap
Text
NgramsId
->
Text
->
Maybe
(
Indexed
Int
Ngrams
)
toIndexedNgrams
m
t
=
Indexed
<$>
i
<*>
n
...
...
src/Gargantext/Core/Text/Terms/WithList.hs
View file @
93e711b1
...
...
@@ -37,8 +37,11 @@ data Pattern = Pattern
type
Patterns
=
[
Pattern
]
------------------------------------------------------------------------
replaceTerms
::
Patterns
->
[
Text
]
->
[[
Text
]]
replaceTerms
pats
terms
=
go
0
data
ReplaceTerms
=
KeepAll
|
LongestOnly
replaceTerms
::
ReplaceTerms
->
Patterns
->
[
Text
]
->
[[
Text
]]
replaceTerms
rplaceTerms
pats
terms
=
go
0
where
terms_len
=
length
terms
...
...
@@ -49,15 +52,17 @@ replaceTerms pats terms = go 0
Just
(
len
,
term
)
->
term
:
go
(
ix
+
len
)
merge
(
len1
,
lab1
)
(
len2
,
lab2
)
=
if
len2
<
len1
then
(
len1
,
lab1
)
else
(
len2
,
lab2
)
m
=
IntMap
.
fromListWith
merge
m
=
toMap
[
(
ix
,
(
len
,
term
))
|
Pattern
pat
len
term
<-
pats
,
ix
<-
KMP
.
match
pat
terms
]
toMap
=
case
rplaceTerms
of
KeepAll
->
IntMap
.
fromList
LongestOnly
->
IntMap
.
fromListWith
merge
where
merge
(
len1
,
lab1
)
(
len2
,
lab2
)
=
if
len2
<
len1
then
(
len1
,
lab1
)
else
(
len2
,
lab2
)
buildPatterns
::
TermList
->
Patterns
buildPatterns
=
sortWith
(
Down
.
_pat_length
)
.
concatMap
buildPattern
where
...
...
@@ -82,14 +87,14 @@ termsInText pats txt = groupWithCounts
--------------------------------------------------------------------------
extractTermsWithList
::
Patterns
->
Text
->
Corpus
[
Text
]
extractTermsWithList
pats
=
map
(
replaceTerms
pats
)
.
monoTextsBySentence
extractTermsWithList
pats
=
map
(
replaceTerms
KeepAll
pats
)
.
monoTextsBySentence
-- | Extract terms
-- >>> let termList = [(["chat blanc"], [["chat","blanc"]])] :: TermList
-- extractTermsWithList' (buildPatterns termList) "Le chat blanc"["chat blanc"]
-- ["chat blanc"]
extractTermsWithList'
::
Patterns
->
Text
->
[
Text
]
extractTermsWithList'
pats
=
map
(
concat
.
map
concat
.
replaceTerms
pats
)
extractTermsWithList'
pats
=
map
(
concat
.
map
concat
.
replaceTerms
KeepAll
pats
)
.
monoTextsBySentence
--------------------------------------------------------------------------
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment