Commit 292225f7 authored by Atrax Nicolas's avatar Atrax Nicolas

Add TsvTranslator to Streamlit

parent 1d088dd9
......@@ -49,6 +49,10 @@ name = "Clean CSV To TSV"
path = "pages/CSV_Harzing_to_TSV.py"
name = "CSV Harzing To TSV"
[[pages]]
path = "pages/TSV_Translator.py"
name = "TSV Translator"
[[pages]]
name = "PDF Convert"
icon = ":twisted_rightwards_arrows:"
......
locale,key,value
fr,title,"# Traducteur de TSV"
en,title,"# TSV Translator"
fr,text,"Traduit un fichier TSV dans la langue de votre choix"
en,text,"Translate a TSV file in the chosen language"
fr,file,"Choisir un fichier"
en,file,"Choose a file"
fr,new_file,"Télécharge ton fichier TSV traduit:"
en,new_file,"Download your translated TSV file : "
fr,submit," Soumettre "
en,submit,"Submit "
fr,detect," Détecter les langues "
en,detect," Detect languages"
fr,translate1,"Traduire de "
en,translate1,"Translate from "
fr,translate2," Vers "
en,translate2," To "
fr,detected,"Langues détectées : "
en,detected,"Detected languages : "
This diff is collapsed.
Googletrans
===========
|GitHub license| |travis status| |Documentation Status| |PyPI version|
|Coverage Status| |Code Climate|
Googletrans is a **free** and **unlimited** python library that
implemented Google Translate API. This uses the `Google Translate Ajax
API <https://translate.google.com>`__ to make calls to such methods as
detect and translate.
Compatible with Python 3.6+.
For details refer to the `API
Documentation <https://py-googletrans.readthedocs.io/en/latest>`__.
Features
--------
- Fast and reliable - it uses the same servers that
translate.google.com uses
- Auto language detection
- Bulk translations
- Customizable service URL
- HTTP/2 support
TODO
~~~~
more features are coming soon.
- Proxy support
- Internal session management (for better bulk translations)
HTTP/2 support
~~~~~~~~~~~~~~
This library uses httpx for HTTP requests so HTTP/2 is supported by default.
You can check if http2 is enabled and working by the `._response.http_version` of `Translated` or `Detected` object:
.. code:: python
>>> translator.translate('테스트')._response.http_version
# 'HTTP/2'
How does this library work
~~~~~~~~~~~~~~~~~~~~~~~~~~
You may wonder why this library works properly, whereas other
approaches such like goslate won't work since Google has updated its
translation service recently with a ticket mechanism to prevent a lot of
crawler programs.
I eventually figure out a way to generate a ticket by reverse
engineering on the `obfuscated and minified code used by Google to
generate such
token <https://translate.google.com/translate/releases/twsfe_w_20170306_RC00/r/js/desktop_module_main.js>`__,
and implemented on the top of Python. However, this could be blocked at
any time.
--------------
Installation
------------
To install, either use things like pip with the package "googletrans"
or download the package and put the "googletrans" directory into your
python path.
.. code:: bash
$ pip install googletrans
Basic Usage
-----------
If source language is not given, google translate attempts to detect the
source language.
.. code:: python
>>> from googletrans import Translator
>>> translator = Translator()
>>> translator.translate('안녕하세요.')
# <Translated src=ko dest=en text=Good evening. pronunciation=Good evening.>
>>> translator.translate('안녕하세요.', dest='ja')
# <Translated src=ko dest=ja text=こんにちは。 pronunciation=Kon'nichiwa.>
>>> translator.translate('veritas lux mea', src='la')
# <Translated src=la dest=en text=The truth is my light pronunciation=The truth is my light>
Customize service URL
~~~~~~~~~~~~~~~~~~~~~
You can use another google translate domain for translation. If multiple
URLs are provided, it then randomly chooses a domain.
.. code:: python
>>> from googletrans import Translator
>>> translator = Translator(service_urls=[
'translate.google.com',
'translate.google.co.kr',
])
Customize service URL to point to standard api
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Considering translate.google.<domain> url services use the webapp requiring a token,
you can prefer to use the direct api than does not need any token to process.
It can solve your problems of unstable token providing processes (refer to issue #234)
.. code:: python
>>> from googletrans import Translator
>>> translator = Translator(service_urls=[
'translate.googleapis.com'
])
Advanced Usage (Bulk)
~~~~~~~~~~~~~~~~~~~~~
Array can be used to translate a batch of strings in a single method
call and a single HTTP session. The exact same method shown above works
for arrays as well.
.. code:: python
>>> translations = translator.translate(['The quick brown fox', 'jumps over', 'the lazy dog'], dest='ko')
>>> for translation in translations:
... print(translation.origin, ' -> ', translation.text)
# The quick brown fox -> 빠른 갈색 여우
# jumps over -> 이상 점프
# the lazy dog -> 게으른 개
Language detection
~~~~~~~~~~~~~~~~~~
The detect method, as its name implies, identifies the language used in
a given sentence.
.. code:: python
>>> from googletrans import Translator
>>> translator = Translator()
>>> translator.detect('이 문장은 한글로 쓰여졌습니다.')
# <Detected lang=ko confidence=0.27041003>
>>> translator.detect('この文章は日本語で書かれました。')
# <Detected lang=ja confidence=0.64889508>
>>> translator.detect('This sentence is written in English.')
# <Detected lang=en confidence=0.22348526>
>>> translator.detect('Tiu frazo estas skribita en Esperanto.')
# <Detected lang=eo confidence=0.10538048>
GoogleTrans as a command line application
-----------------------------------------
.. code:: bash
$ translate -h
usage: translate [-h] [-d DEST] [-s SRC] [-c] text
Python Google Translator as a command-line tool
positional arguments:
text The text you want to translate.
optional arguments:
-h, --help show this help message and exit
-d DEST, --dest DEST The destination language you want to translate.
(Default: en)
-s SRC, --src SRC The source language you want to translate. (Default:
auto)
-c, --detect
$ translate "veritas lux mea" -s la -d en
[veritas] veritas lux mea
->
[en] The truth is my light
[pron.] The truth is my light
$ translate -c "안녕하세요."
[ko, 1] 안녕하세요.
--------------
Note on library usage
---------------------
DISCLAIMER: this is an unofficial library using the web API of translate.google.com
and also is not associated with Google.
- **The maximum character limit on a single text is 15k.**
- Due to limitations of the web version of google translate, this API
does not guarantee that the library would work properly at all times
(so please use this library if you don't care about stability).
- **Important:** If you want to use a stable API, I highly recommend you to use
`Google's official translate
API <https://cloud.google.com/translate/docs>`__.
- If you get HTTP 5xx error or errors like #6, it's probably because
Google has banned your client IP address.
--------------
Versioning
----------
This library follows `Semantic Versioning <http://semver.org/>`__ from
v2.0.0. Any release versioned 0.x.y is subject to backwards incompatible
changes at any time.
Contributing
-------------------------
Contributions are more than welcomed. See
`CONTRIBUTING.md <CONTRIBUTING.md>`__
-----------------------------------------
License
-------
Googletrans is licensed under the MIT License. The terms are as
follows:
::
The MIT License (MIT)
Copyright (c) 2015 SuHun Han
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
.. |GitHub license| image:: https://img.shields.io/github/license/mashape/apistatus.svg
:target: http://opensource.org/licenses/MIT
.. |travis status| image:: https://travis-ci.org/ssut/py-googletrans.svg?branch=master
:target: https://travis-ci.org/ssut/py-googletrans
.. |Documentation Status| image:: https://readthedocs.org/projects/py-googletrans/badge/?version=latest
:target: https://readthedocs.org/projects/py-googletrans/?badge=latest
.. |PyPI version| image:: https://badge.fury.io/py/googletrans.svg
:target: http://badge.fury.io/py/googletrans
.. |Coverage Status| image:: https://coveralls.io/repos/github/ssut/py-googletrans/badge.svg
:target: https://coveralls.io/github/ssut/py-googletrans
.. |Code Climate| image:: https://codeclimate.com/github/ssut/py-googletrans/badges/gpa.svg
:target: https://codeclimate.com/github/ssut/py-googletrans
This diff is collapsed.
MANIFEST.in
README.rst
setup.cfg
setup.py
translate
googletrans/__init__.py
googletrans/client.py
googletrans/constants.py
googletrans/gtoken.py
googletrans/models.py
googletrans/urls.py
googletrans/utils.py
googletrans.egg-info/PKG-INFO
googletrans.egg-info/SOURCES.txt
googletrans.egg-info/dependency_links.txt
googletrans.egg-info/requires.txt
googletrans.egg-info/top_level.txt
\ No newline at end of file
"""Free Google Translate API for Python. Translates totally free of charge."""
__all__ = 'Translator',
__version__ = '4.0.0-rc.1'
from .client import Translator
from .constants import LANGCODES, LANGUAGES # noqa
This diff is collapsed.
DEFAULT_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
DEFAULT_CLIENT_SERVICE_URLS = (
'translate.google.com',
)
DEFAULT_FALLBACK_SERVICE_URLS = (
'translate.googleapis.com',
)
DEFAULT_SERVICE_URLS = ('translate.google.ac', 'translate.google.ad', 'translate.google.ae',
'translate.google.al', 'translate.google.am', 'translate.google.as',
'translate.google.at', 'translate.google.az', 'translate.google.ba',
'translate.google.be', 'translate.google.bf', 'translate.google.bg',
'translate.google.bi', 'translate.google.bj', 'translate.google.bs',
'translate.google.bt', 'translate.google.by', 'translate.google.ca',
'translate.google.cat', 'translate.google.cc', 'translate.google.cd',
'translate.google.cf', 'translate.google.cg', 'translate.google.ch',
'translate.google.ci', 'translate.google.cl', 'translate.google.cm',
'translate.google.cn', 'translate.google.co.ao', 'translate.google.co.bw',
'translate.google.co.ck', 'translate.google.co.cr', 'translate.google.co.id',
'translate.google.co.il', 'translate.google.co.in', 'translate.google.co.jp',
'translate.google.co.ke', 'translate.google.co.kr', 'translate.google.co.ls',
'translate.google.co.ma', 'translate.google.co.mz', 'translate.google.co.nz',
'translate.google.co.th', 'translate.google.co.tz', 'translate.google.co.ug',
'translate.google.co.uk', 'translate.google.co.uz', 'translate.google.co.ve',
'translate.google.co.vi', 'translate.google.co.za', 'translate.google.co.zm',
'translate.google.co.zw', 'translate.google.com.af', 'translate.google.com.ag',
'translate.google.com.ai', 'translate.google.com.ar', 'translate.google.com.au',
'translate.google.com.bd', 'translate.google.com.bh', 'translate.google.com.bn',
'translate.google.com.bo', 'translate.google.com.br', 'translate.google.com.bz',
'translate.google.com.co', 'translate.google.com.cu', 'translate.google.com.cy',
'translate.google.com.do', 'translate.google.com.ec', 'translate.google.com.eg',
'translate.google.com.et', 'translate.google.com.fj', 'translate.google.com.gh',
'translate.google.com.gi', 'translate.google.com.gt', 'translate.google.com.hk',
'translate.google.com.jm', 'translate.google.com.kh', 'translate.google.com.kw',
'translate.google.com.lb', 'translate.google.com.ly', 'translate.google.com.mm',
'translate.google.com.mt', 'translate.google.com.mx', 'translate.google.com.my',
'translate.google.com.na', 'translate.google.com.ng', 'translate.google.com.ni',
'translate.google.com.np', 'translate.google.com.om', 'translate.google.com.pa',
'translate.google.com.pe', 'translate.google.com.pg', 'translate.google.com.ph',
'translate.google.com.pk', 'translate.google.com.pr', 'translate.google.com.py',
'translate.google.com.qa', 'translate.google.com.sa', 'translate.google.com.sb',
'translate.google.com.sg', 'translate.google.com.sl', 'translate.google.com.sv',
'translate.google.com.tj', 'translate.google.com.tr', 'translate.google.com.tw',
'translate.google.com.ua', 'translate.google.com.uy', 'translate.google.com.vc',
'translate.google.com.vn', 'translate.google.com', 'translate.google.cv',
'translate.google.cz', 'translate.google.de', 'translate.google.dj',
'translate.google.dk', 'translate.google.dm', 'translate.google.dz',
'translate.google.ee', 'translate.google.es', 'translate.google.eu',
'translate.google.fi', 'translate.google.fm', 'translate.google.fr',
'translate.google.ga', 'translate.google.ge', 'translate.google.gf',
'translate.google.gg', 'translate.google.gl', 'translate.google.gm',
'translate.google.gp', 'translate.google.gr', 'translate.google.gy',
'translate.google.hn', 'translate.google.hr', 'translate.google.ht',
'translate.google.hu', 'translate.google.ie', 'translate.google.im',
'translate.google.io', 'translate.google.iq', 'translate.google.is',
'translate.google.it', 'translate.google.je', 'translate.google.jo',
'translate.google.kg', 'translate.google.ki', 'translate.google.kz',
'translate.google.la', 'translate.google.li', 'translate.google.lk',
'translate.google.lt', 'translate.google.lu', 'translate.google.lv',
'translate.google.md', 'translate.google.me', 'translate.google.mg',
'translate.google.mk', 'translate.google.ml', 'translate.google.mn',
'translate.google.ms', 'translate.google.mu', 'translate.google.mv',
'translate.google.mw', 'translate.google.ne', 'translate.google.nf',
'translate.google.nl', 'translate.google.no', 'translate.google.nr',
'translate.google.nu', 'translate.google.pl', 'translate.google.pn',
'translate.google.ps', 'translate.google.pt', 'translate.google.ro',
'translate.google.rs', 'translate.google.ru', 'translate.google.rw',
'translate.google.sc', 'translate.google.se', 'translate.google.sh',
'translate.google.si', 'translate.google.sk', 'translate.google.sm',
'translate.google.sn', 'translate.google.so', 'translate.google.sr',
'translate.google.st', 'translate.google.td', 'translate.google.tg',
'translate.google.tk', 'translate.google.tl', 'translate.google.tm',
'translate.google.tn', 'translate.google.to', 'translate.google.tt',
'translate.google.us', 'translate.google.vg', 'translate.google.vu',
'translate.google.ws')
SPECIAL_CASES = {
'ee': 'et',
}
LANGUAGES = {
'af': 'afrikaans',
'sq': 'albanian',
'am': 'amharic',
'ar': 'arabic',
'hy': 'armenian',
'az': 'azerbaijani',
'eu': 'basque',
'be': 'belarusian',
'bn': 'bengali',
'bs': 'bosnian',
'bg': 'bulgarian',
'ca': 'catalan',
'ceb': 'cebuano',
'ny': 'chichewa',
'zh-cn': 'chinese (simplified)',
'zh-tw': 'chinese (traditional)',
'co': 'corsican',
'hr': 'croatian',
'cs': 'czech',
'da': 'danish',
'nl': 'dutch',
'en': 'english',
'eo': 'esperanto',
'et': 'estonian',
'tl': 'filipino',
'fi': 'finnish',
'fr': 'french',
'fy': 'frisian',
'gl': 'galician',
'ka': 'georgian',
'de': 'german',
'el': 'greek',
'gu': 'gujarati',
'ht': 'haitian creole',
'ha': 'hausa',
'haw': 'hawaiian',
'iw': 'hebrew',
'he': 'hebrew',
'hi': 'hindi',
'hmn': 'hmong',
'hu': 'hungarian',
'is': 'icelandic',
'ig': 'igbo',
'id': 'indonesian',
'ga': 'irish',
'it': 'italian',
'ja': 'japanese',
'jw': 'javanese',
'kn': 'kannada',
'kk': 'kazakh',
'km': 'khmer',
'ko': 'korean',
'ku': 'kurdish (kurmanji)',
'ky': 'kyrgyz',
'lo': 'lao',
'la': 'latin',
'lv': 'latvian',
'lt': 'lithuanian',
'lb': 'luxembourgish',
'mk': 'macedonian',
'mg': 'malagasy',
'ms': 'malay',
'ml': 'malayalam',
'mt': 'maltese',
'mi': 'maori',
'mr': 'marathi',
'mn': 'mongolian',
'my': 'myanmar (burmese)',
'ne': 'nepali',
'no': 'norwegian',
'or': 'odia',
'ps': 'pashto',
'fa': 'persian',
'pl': 'polish',
'pt': 'portuguese',
'pa': 'punjabi',
'ro': 'romanian',
'ru': 'russian',
'sm': 'samoan',
'gd': 'scots gaelic',
'sr': 'serbian',
'st': 'sesotho',
'sn': 'shona',
'sd': 'sindhi',
'si': 'sinhala',
'sk': 'slovak',
'sl': 'slovenian',
'so': 'somali',
'es': 'spanish',
'su': 'sundanese',
'sw': 'swahili',
'sv': 'swedish',
'tg': 'tajik',
'ta': 'tamil',
'te': 'telugu',
'th': 'thai',
'tr': 'turkish',
'uk': 'ukrainian',
'ur': 'urdu',
'ug': 'uyghur',
'uz': 'uzbek',
'vi': 'vietnamese',
'cy': 'welsh',
'xh': 'xhosa',
'yi': 'yiddish',
'yo': 'yoruba',
'zu': 'zulu',
}
LANGCODES = dict(map(reversed, LANGUAGES.items()))
DEFAULT_RAISE_EXCEPTION = False
DUMMY_DATA = [[["", None, None, 0]], None, "en", None,
None, None, 1, None, [["en"], None, [1], ["en"]]]
# -*- coding: utf-8 -*-
import ast
import math
import re
import time
import httpx
from .utils import rshift
class TokenAcquirer:
"""Google Translate API token generator
translate.google.com uses a token to authorize the requests. If you are
not Google, you do have this token and will have to pay for use.
This class is the result of reverse engineering on the obfuscated and
minified code used by Google to generate such token.
The token is based on a seed which is updated once per hour and on the
text that will be translated.
Both are combined - by some strange math - in order to generate a final
token (e.g. 744915.856682) which is used by the API to validate the
request.
This operation will cause an additional request to get an initial
token from translate.google.com.
Example usage:
>>> from googletrans.gtoken import TokenAcquirer
>>> acquirer = TokenAcquirer()
>>> text = 'test'
>>> tk = acquirer.do(text)
>>> tk
950629.577246
"""
RE_TKK = re.compile(r'tkk:\'(.+?)\'', re.DOTALL)
RE_RAWTKK = re.compile(r'tkk:\'(.+?)\'', re.DOTALL)
def __init__(self, client: httpx.Client, tkk='0', host='translate.google.com'):
self.client = client
self.tkk = tkk
self.host = host if 'http' in host else 'https://' + host
def _update(self):
"""update tkk
"""
# we don't need to update the base TKK value when it is still valid
now = math.floor(int(time.time() * 1000) / 3600000.0)
if self.tkk and int(self.tkk.split('.')[0]) == now:
return
r = self.client.get(self.host)
raw_tkk = self.RE_TKK.search(r.text)
if raw_tkk:
self.tkk = raw_tkk.group(1)
return
try:
# this will be the same as python code after stripping out a reserved word 'var'
code = self.RE_TKK.search(r.text).group(1).replace('var ', '')
# unescape special ascii characters such like a \x3d(=)
code = code.encode().decode('unicode-escape')
except AttributeError:
raise Exception('Could not find TKK token for this request.\nSee https://github.com/ssut/py-googletrans/issues/234 for more details.')
except:
raise
if code:
tree = ast.parse(code)
visit_return = False
operator = '+'
n, keys = 0, dict(a=0, b=0)
for node in ast.walk(tree):
if isinstance(node, ast.Assign):
name = node.targets[0].id
if name in keys:
if isinstance(node.value, ast.Num):
keys[name] = node.value.n
# the value can sometimes be negative
elif isinstance(node.value, ast.UnaryOp) and \
isinstance(node.value.op, ast.USub): # pragma: nocover
keys[name] = -node.value.operand.n
elif isinstance(node, ast.Return):
# parameters should be set after this point
visit_return = True
elif visit_return and isinstance(node, ast.Num):
n = node.n
elif visit_return and n > 0:
# the default operator is '+' but implement some more for
# all possible scenarios
if isinstance(node, ast.Add): # pragma: nocover
pass
elif isinstance(node, ast.Sub): # pragma: nocover
operator = '-'
elif isinstance(node, ast.Mult): # pragma: nocover
operator = '*'
elif isinstance(node, ast.Pow): # pragma: nocover
operator = '**'
elif isinstance(node, ast.BitXor): # pragma: nocover
operator = '^'
# a safety way to avoid Exceptions
clause = compile('{1}{0}{2}'.format(
operator, keys['a'], keys['b']), '', 'eval')
value = eval(clause, dict(__builtin__={}))
result = '{}.{}'.format(n, value)
self.tkk = result
def _lazy(self, value):
"""like lazy evaluation, this method returns a lambda function that
returns value given.
We won't be needing this because this seems to have been built for
code obfuscation.
the original code of this method is as follows:
... code-block: javascript
var ek = function(a) {
return function() {
return a;
};
}
"""
return lambda: value
def _xr(self, a, b):
size_b = len(b)
c = 0
while c < size_b - 2:
d = b[c + 2]
d = ord(d[0]) - 87 if 'a' <= d else int(d)
d = rshift(a, d) if '+' == b[c + 1] else a << d
a = a + d & 4294967295 if '+' == b[c] else a ^ d
c += 3
return a
def acquire(self, text):
a = []
# Convert text to ints
for i in text:
val = ord(i)
if val < 0x10000:
a += [val]
else:
# Python doesn't natively use Unicode surrogates, so account for those
a += [
math.floor((val - 0x10000) / 0x400 + 0xD800),
math.floor((val - 0x10000) % 0x400 + 0xDC00)
]
b = self.tkk if self.tkk != '0' else ''
d = b.split('.')
b = int(d[0]) if len(d) > 1 else 0
# assume e means char code array
e = []
g = 0
size = len(a)
while g < size:
l = a[g]
# just append if l is less than 128(ascii: DEL)
if l < 128:
e.append(l)
# append calculated value if l is less than 2048
else:
if l < 2048:
e.append(l >> 6 | 192)
else:
# append calculated value if l matches special condition
if (l & 64512) == 55296 and g + 1 < size and \
a[g + 1] & 64512 == 56320:
g += 1
l = 65536 + ((l & 1023) << 10) + (a[g] & 1023) # This bracket is important
e.append(l >> 18 | 240)
e.append(l >> 12 & 63 | 128)
else:
e.append(l >> 12 | 224)
e.append(l >> 6 & 63 | 128)
e.append(l & 63 | 128)
g += 1
a = b
for i, value in enumerate(e):
a += value
a = self._xr(a, '+-a^+6')
a = self._xr(a, '+-3^+b+-f')
a ^= int(d[1]) if len(d) > 1 else 0
if a < 0: # pragma: nocover
a = (a & 2147483647) + 2147483648
a %= 1000000 # int(1E6)
return '{}.{}'.format(a, a ^ b)
def do(self, text):
self._update()
tk = self.acquire(text)
return tk
from httpx import Response
from typing import List
class Base:
def __init__(self, response: Response = None):
self._response = response
class TranslatedPart:
def __init__(self, text: str, candidates: List[str]):
self.text = text
self.candidates = candidates
def __str__(self):
return self.text
def __dict__(self):
return {
'text': self.text,
'candidates': self.candidates,
}
class Translated(Base):
"""Translate result object
:param src: source language (default: auto)
:param dest: destination language (default: en)
:param origin: original text
:param text: translated text
:param pronunciation: pronunciation
"""
def __init__(self, src, dest, origin, text, pronunciation, parts: List[TranslatedPart],
extra_data=None, **kwargs):
super().__init__(**kwargs)
self.src = src
self.dest = dest
self.origin = origin
self.text = text
self.pronunciation = pronunciation
self.parts = parts
self.extra_data = extra_data
def __str__(self): # pragma: nocover
return self.__unicode__()
def __unicode__(self): # pragma: nocover
return (
u'Translated(src={src}, dest={dest}, text={text}, pronunciation={pronunciation}, '
u'extra_data={extra_data})'.format(
src=self.src, dest=self.dest, text=self.text,
pronunciation=self.pronunciation,
extra_data='"' + repr(self.extra_data)[:10] + '..."'
)
)
def __dict__(self):
return {
'src': self.src,
'dest': self.dest,
'origin': self.origin,
'text': self.text,
'pronunciation': self.pronunciation,
'extra_data': self.extra_data,
'parts': list(map(lambda part: part.__dict__(), self.parts)),
}
class Detected(Base):
"""Language detection result object
:param lang: detected language
:param confidence: the confidence of detection result (0.00 to 1.00)
"""
def __init__(self, lang, confidence, **kwargs):
super().__init__(**kwargs)
self.lang = lang
self.confidence = confidence
def __str__(self): # pragma: nocover
return self.__unicode__()
def __unicode__(self): # pragma: nocover
return u'Detected(lang={lang}, confidence={confidence})'.format(
lang=self.lang, confidence=self.confidence)
# -*- coding: utf-8 -*-
"""
Predefined URLs used to make google translate requests.
"""
BASE = 'https://translate.google.com'
TRANSLATE = 'https://{host}/translate_a/single'
TRANSLATE_RPC = 'https://{host}/_/TranslateWebserverUi/data/batchexecute'
"""A conversion module for googletrans"""
import json
import re
def build_params(client, query, src, dest, token, override):
params = {
'client': client,
'sl': src,
'tl': dest,
'hl': dest,
'dt': ['at', 'bd', 'ex', 'ld', 'md', 'qca', 'rw', 'rm', 'ss', 't'],
'ie': 'UTF-8',
'oe': 'UTF-8',
'otf': 1,
'ssel': 0,
'tsel': 0,
'q': query,
}
if token != '':
params['tk'] = token
if override is not None:
for key, value in get_items(override):
params[key] = value
return params
def legacy_format_json(original):
# save state
states = []
text = original
# save position for double-quoted texts
for i, pos in enumerate(re.finditer('"', text)):
# pos.start() is a double-quote
p = pos.start() + 1
if i % 2 == 0:
nxt = text.find('"', p)
states.append((p, text[p:nxt]))
# replace all wiered characters in text
while text.find(',,') > -1:
text = text.replace(',,', ',null,')
while text.find('[,') > -1:
text = text.replace('[,', '[null,')
# recover state
for i, pos in enumerate(re.finditer('"', text)):
p = pos.start() + 1
if i % 2 == 0:
j = int(i / 2)
nxt = text.find('"', p)
# replacing a portion of a string
# use slicing to extract those parts of the original string to be kept
text = text[:p] + states[j][1] + text[nxt:]
converted = json.loads(text)
return converted
def get_items(dict_object):
for key in dict_object:
yield key, dict_object[key]
def format_json(original):
try:
converted = json.loads(original)
except ValueError:
converted = legacy_format_json(original)
return converted
def rshift(val, n):
"""python port for '>>>'(right shift with padding)
"""
return (val % 0x100000000) >> n
[metadata]
description-file = README.md
[egg_info]
tag_build =
tag_date = 0
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os.path
import re
from setuptools import setup, find_packages
def get_file(*paths):
path = os.path.join(*paths)
try:
with open(path, 'rb') as f:
return f.read().decode('utf8')
except IOError:
pass
def get_version():
init_py = get_file(os.path.dirname(__file__), 'googletrans', '__init__.py')
pattern = r"{0}\W*=\W*'([^']+)'".format('__version__')
version, = re.findall(pattern, init_py)
return version
def get_description():
init_py = get_file(os.path.dirname(__file__), 'googletrans', '__init__.py')
pattern = r'"""(.*?)"""'
description, = re.findall(pattern, init_py, re.DOTALL)
return description
def get_readme():
return get_file(os.path.dirname(__file__), 'README.rst')
def install():
setup(
name='googletrans',
version=get_version(),
description=get_description(),
long_description=get_readme(),
license='MIT',
author='SuHun Han',
author_email='ssut' '@' 'ssut.me',
url='https://github.com/ssut/py-googletrans',
classifiers=['Development Status :: 5 - Production/Stable',
'Intended Audience :: Education',
'Intended Audience :: End Users/Desktop',
'License :: Freeware',
'Operating System :: POSIX',
'Operating System :: Microsoft :: Windows',
'Operating System :: MacOS :: MacOS X',
'Topic :: Education',
'Programming Language :: Python',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8'],
packages=find_packages(exclude=['docs', 'tests']),
keywords='google translate translator',
install_requires=[
'httpx==0.13.3',
],
python_requires= '>=3.6',
tests_require=[
'pytest',
'coveralls',
],
scripts=['translate']
)
if __name__ == "__main__":
install()
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse
import sys
from googletrans import Translator
def main():
parser = argparse.ArgumentParser(
description='Python Google Translator as a command-line tool')
parser.add_argument('text', help='The text you want to translate.')
parser.add_argument('-d', '--dest', default='en',
help='The destination language you want to translate. (Default: en)')
parser.add_argument('-s', '--src', default='auto',
help='The source language you want to translate. (Default: auto)')
parser.add_argument('-c', '--detect', action='store_true', default=False,
help='')
args = parser.parse_args()
translator = Translator()
if args.detect:
result = translator.detect(args.text)
result = """
[{lang}, {confidence}] {text}
""".strip().format(text=args.text,
lang=result.lang, confidence=result.confidence)
print(result)
return
result = translator.translate(args.text, dest=args.dest, src=args.src)
result = u"""
[{src}] {original}
->
[{dest}] {text}
[pron.] {pronunciation}
""".strip().format(src=result.src, dest=result.dest, original=result.origin,
text=result.text, pronunciation=result.pronunciation)
print(result)
if __name__ == '__main__':
main()
"""
Streamlit Application
Nicolas Atrax
"""
import streamlit as st
import pandas as pd
import chardet
import re
import codecs
import os
import tempfile
import csv
from lib.langdetect.langdetect import detect
from lib.langdetect.langdetect.lang_detect_exception import LangDetectException
from lib.googletrans.googletrans import Translator as GoogleTranslator
from lib.googletrans.googletrans import LANGUAGES
import src.basic as tmp
tmp.base("TsvTranslator")
# Tool Code Start
def detectLanguages(abstract, languages):
if re.search('[a-zA-Z]', abstract) == None:
return languages
try:
tmp = detect(abstract)
if tmp in languages:
languages[tmp] += 1
else:
languages[tmp] = 1
except LangDetectException:
pass
return languages
def estimateLanguagesPercentage(languages):
st.session_state.detected = ""
total = 0
res = {}
for l in languages:
total += languages[l]
for l in languages:
tmp = (languages[l] / total) * 100
if tmp >= 15:
res[l] = tmp
if st.session_state.detected != "":
st.session_state.detected += "| "
st.session_state.detected += l + " : " + str(tmp) + "%"
return res
def inspectLanguages(file):
languages = {}
reader = csv.DictReader(codecs.iterdecode(
file, 'utf-8'), delimiter=st.session_state.separator)
for row in reader:
for name, value in row.items():
if name.lower() == 'abstract':
languages = detectLanguages(value, languages)
file.seek(0)
return estimateLanguagesPercentage(languages)
def translate(text, srcLang, destLang):
translator = GoogleTranslator()
return translator.translate(text, src=srcLang, dest=destLang).text
def correctedSequence(text, separator):
tmp = text.replace("\"", "\"\"")
find = separator in text or "\"" in text or "\n" in text
if find:
if text[len(text) - 1] == "\n":
tmp = tmp[:-1]
tmp = "\"" + tmp + "\""
return tmp
def getSeparator(file):
line = file.readline().decode('utf-8')
file.seek(0)
if ',' in line:
return ','
if ';' in line:
return ';'
return '\t'
def getColumnsNames(file):
data = ""
total = 1
reader = csv.DictReader(codecs.iterdecode(
file, 'utf-8'), delimiter=st.session_state.separator)
columnsNames = []
for row in reader:
for name, value in row.items():
if data != "":
data += "\t"
data += name.replace("\ufeff", "")
break
data += "\n"
total += sum(1 for row in reader)
file.seek(0)
return data, total
def getContent(file, data, total, separator):
percent = -1
reader = csv.DictReader(codecs.iterdecode(
file, 'utf-8'), delimiter=separator)
count = 1
bar = st.progress(0, "Translation progress : 0%")
for row in reader:
tmp = ""
first = True
loading = int(count / total * 100)
bar.progress(loading, "Translation progress : " + str(loading) + "%")
for name, value in row.items():
if not first:
tmp += "\t"
else:
first = False
if name.lower() == "abstract" and re.search('[a-zA-Z]', value) != None:
tmp += correctedSequence(translate(value,
st.session_state.srcLang, st.session_state.destLang), separator)
else:
tmp += correctedSequence(value, separator)
count += 1
data += tmp + "\n"
return data
def translateTSV(file):
if st.session_state.srcLang == st.session_state.destLang:
return None
data, total = getColumnsNames(file)
data = getContent(file, data, total, st.session_state.separator)
return data
# Tool Code End
form = st.form('api')
# Page Code Start
if 'page' not in st.session_state:
st.session_state.page = 0
if 'submit' not in st.session_state:
st.session_state.submit = False
if 'detect' not in st.session_state:
st.session_state.detect = False
if 'uploadedTSV' not in st.session_state:
st.session_state.uploadedTSV = False
if 'languages' not in st.session_state:
st.session_state.languages = []
def setDetect():
st.session_state.detect = True
def setSubmit():
st.session_state.submit = True
def setTSV():
st.session_state.uploadedTSV = True
def resetPage():
st.session_state.page = 0
def uploadTSV():
with st.form("Detect"):
st.write(st.session_state.general_text_dict['text'])
st.file_uploader(
st.session_state.general_text_dict['file'], type=["tsv", "csv"], key='file')
st.form_submit_button(
st.session_state.general_text_dict['detect'], on_click=setDetect())
def askTranslateLanguages(file):
with st.form("Submit"):
name = file.name
st.write(name)
st.write(
st.session_state.general_text_dict['detected'] + st.session_state.detected)
col1, col2 = st.columns(2)
with col1:
st.selectbox(st.session_state.general_text_dict['translate1'], st.session_state.languages,
key='srcLang')
with col2:
st.selectbox(st.session_state.general_text_dict['translate2'], st.session_state.languages,
key='destLang')
st.form_submit_button(
st.session_state.general_text_dict['submit'], on_click=setSubmit())
# Page Code End
if st.session_state.page == 0:
if st.session_state.detect:
if st.session_state.file != None:
st.session_state.separator = getSeparator(st.session_state.file)
st.session_state.languages = inspectLanguages(
st.session_state.file)
st.session_state.page = 1
st.session_state.detect = False
st.session_state.tmpFile = st.session_state.file
else:
uploadTSV()
else:
uploadTSV()
if st.session_state.page == 1:
if st.session_state.submit:
st.session_state.page = 2
st.session_state.submit = False
else:
askTranslateLanguages(st.session_state.tmpFile)
if st.session_state.page == 2:
tsv = translateTSV(st.session_state.tmpFile)
st.write(st.session_state.general_text_dict['new_file'])
name = st.session_state.tmpFile.name
st.download_button(name,
tsv, name, on_click=resetPage())
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment