Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
C
clinicaltrials
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
david Chavalarias
clinicaltrials
Commits
e74b8d26
Commit
e74b8d26
authored
Sep 17, 2017
by
Romain Loth
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
phpAPI: (csv indexing) better tokenization + data exemples
parent
e95b3657
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
64 additions
and
3 deletions
+64
-3
csv_indexation.php
twbackends/phpAPI/csv_indexation.php
+63
-2
info_div.php
twbackends/phpAPI/info_div.php
+1
-1
No files found.
twbackends/phpAPI/csv_indexation.php
View file @
e74b8d26
...
@@ -4,6 +4,65 @@
...
@@ -4,6 +4,65 @@
// --------------------------------------------------------------
// --------------------------------------------------------------
// returns the full csv array (the documents base)
// returns the full csv array (the documents base)
// AND a list of postings (the search index)
// AND a list of postings (the search index)
//
// The documents-base gets a [{1 obj per row: 1 property per column}] structure
//
// exemple of the documents base structure:
// -------
// {
// "title": "A three-dimensional photoelastic method for analysis of differential-contraction stresses",
// "source": "Experimental Mechanics",
// "publication_year": "1963",
// "publication_month": "01",
// "publication_day": "01",
// "abstract": "Abstract: The property of homogeneous and isotropic
// contraction accompanying the slow polymerization
// of a photoelastic epoxy resin is utilized to produce
// a photoelastic model of the same size and shape,
// at the elevated cure temperature, as the container
// in which it was cast. (...).",
// "authors": "Robert C. Sampson"
// },
// {
// "title": "Use of subjective information in estimation of aquifer parameters",
// "source": "Water Resources Research",
// "publication_year": "1972",
// "publication_month": "01",
// "publication_day": "01",
// "abstract": "In the calibration of aquifer models, the desire for
// an automated adjustment process is sometimes
// in conflict with the need for subjective intervention
// during the calibration process. (...)",
// "authors": "R. E. Lovell, L. Duckstein, C. C. Kisiel"
// },
// {
// "title": "Man-machine interactive transit system planning",
// "source": "Socio-Economic Planning Sciences",
// "publication_year": "1972",
// "publication_month": "01",
// "publication_day": "01",
// "abstract": "The problem of finding the best fixed routes for node
// oriented transit systems is used for an initial
// implementation and evaluation of a man-machine
// interactive problem solving system. (...)",
// "authors": "Matthias H. Rapp"
// },
//
//
// The postings have the form: {
// col_i => {
// "tokenA" => {
// docid0: occs i.A.0,
// docid1: occs i.A.1,
// ...
// },
// ...
// },
// ...
// }
//
//
//
function
parse_and_index_csv
(
$filename
,
$typed_cols_to_index
,
$separator
,
$quotechar
)
{
function
parse_and_index_csv
(
$filename
,
$typed_cols_to_index
,
$separator
,
$quotechar
)
{
// list of csv rows
// list of csv rows
...
@@ -58,8 +117,10 @@ function parse_and_index_csv($filename, $typed_cols_to_index, $separator, $quote
...
@@ -58,8 +117,10 @@ function parse_and_index_csv($filename, $typed_cols_to_index, $separator, $quote
for
(
$ndtypeid
=
0
;
$ndtypeid
<
$GLOBALS
[
"ntypes"
]
;
$ndtypeid
++
)
{
for
(
$ndtypeid
=
0
;
$ndtypeid
<
$GLOBALS
[
"ntypes"
]
;
$ndtypeid
++
)
{
if
(
array_key_exists
(
$ndtypeid
,
$postings
))
{
if
(
array_key_exists
(
$ndtypeid
,
$postings
))
{
if
(
array_key_exists
(
$colname
,
$postings
[
$ndtypeid
]))
{
if
(
array_key_exists
(
$colname
,
$postings
[
$ndtypeid
]))
{
// basic tokenisation (TODO specify tokenisation delimiters etc.)
$tokens
=
preg_split
(
"/\W/"
,
$line_fields
[
$c
]);
// basic tokenisation on unicode punctuation and separators
// cf http://unicode.org/reports/tr18/#General_Category_Property
$tokens
=
preg_split
(
"/[\p
{
Z}\p{P}\p{C
}
]+/u"
,
$line_fields
[
$c
]);
// for debug
// for debug
// echo("indexing column:".$colname." under type:".$ndtypeid.'<br>');
// echo("indexing column:".$colname." under type:".$ndtypeid.'<br>');
...
...
twbackends/phpAPI/info_div.php
View file @
e74b8d26
...
@@ -99,7 +99,7 @@ else {
...
@@ -99,7 +99,7 @@ else {
$searchcols
=
$my_conf
[
"node"
.
$ntid
][
$dbtype
][
'qcols'
];
$searchcols
=
$my_conf
[
"node"
.
$ntid
][
$dbtype
][
'qcols'
];
// a - split the query
// a - split the query
$qtokens
=
preg_split
(
'/
\W/
'
,
$_GET
[
"query"
]);
$qtokens
=
preg_split
(
'/
[\p{Z}\p{P}\p{C}]+/u
'
,
$_GET
[
"query"
]);
// b - compute freq similarity per doc
// b - compute freq similarity per doc
$sims
=
array
();
$sims
=
array
();
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment