Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
searx-engine
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
1
Merge Requests
1
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
searx-engine
Commits
6ca16223
Commit
6ca16223
authored
Oct 16, 2019
by
Adam Tauber
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[fix] update 1x engine
parent
c98a2df3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
29 deletions
+6
-29
www1x.py
searx/engines/www1x.py
+6
-29
No files found.
searx/engines/www1x.py
View file @
6ca16223
...
...
@@ -11,8 +11,8 @@
"""
from
lxml
import
html
import
re
from
searx.url_utils
import
urlencode
,
urljoin
from
searx.engines.xpath
import
extract_text
# engine dependent config
categories
=
[
'images'
]
...
...
@@ -34,41 +34,18 @@ def request(query, params):
def
response
(
resp
):
results
=
[]
# get links from result-text
regex
=
re
.
compile
(
'(</a>|<a)'
)
results_parts
=
re
.
split
(
regex
,
resp
.
text
)
cur_element
=
''
# iterate over link parts
for
result_part
in
results_parts
:
dom
=
html
.
fromstring
(
resp
.
text
)
for
res
in
dom
.
xpath
(
'//div[@class="List-item MainListing"]'
):
# processed start and end of link
if
result_part
==
'<a'
:
cur_element
=
result_part
continue
elif
result_part
!=
'</a>'
:
cur_element
+=
result_part
continue
cur_element
+=
result_part
# fix xml-error
cur_element
=
cur_element
.
replace
(
'"></a>'
,
'"/></a>'
)
dom
=
html
.
fromstring
(
cur_element
)
link
=
dom
.
xpath
(
'//a'
)[
0
]
link
=
res
.
xpath
(
'//a'
)[
0
]
url
=
urljoin
(
base_url
,
link
.
attrib
.
get
(
'href'
))
title
=
link
.
attrib
.
get
(
'title'
,
''
)
title
=
extract_text
(
link
)
thumbnail_src
=
urljoin
(
base_url
,
link
.
xpath
(
'.//img'
)[
0
]
.
attrib
[
'src'
])
thumbnail_src
=
urljoin
(
base_url
,
res
.
xpath
(
'.//img'
)[
0
]
.
attrib
[
'src'
])
# TODO: get image with higher resolution
img_src
=
thumbnail_src
# check if url is showing to a photo
if
'/photo/'
not
in
url
:
continue
# append result
results
.
append
({
'url'
:
url
,
'title'
:
title
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment