Gitmask Anonymous PR #40

gitmask-anonymous · 2019-03-04T13:15:20Z

This is an anonymous PR submitted via Gitmask - https://www.gitmask.com

rysson

Po włączeniu PR #45, zmiana w normalize nie będzie konieczna.

rysson · 2019-03-05T18:31:15Z

plugin.video.fanfilm/resources/lib/sources/pl/filiser.py

@@ -30,7 +30,7 @@ def __init__(self):
        self.domains = ['fili.cc']

        self.base_link = 'https://fili.cc'
-        self.url_transl = 'embed?type=%s&code=%s&code2=%s&salt=%s'
+        self.url_transl = 'embed?type=%s&code=%s&code2=%s&salt=%s&title=title&title2=title2'


Ta strona potrzebuje stałego napisu w postaci title=title&title2=title2?

rysson · 2019-03-05T19:01:18Z

script.module.ptw/lib/ptw/libraries/cleantitle.py

@@ -77,7 +77,7 @@ def normalize(title):
    try:
        try: return title.decode('ascii').encode("utf-8")
        except: pass
-        return str(''.join(c for c in unicodedata.normalize('NFKD', unicode(title.decode('utf-8'))) if unicodedata.category(c) != 'Mn')).replace('ł','l')
+        return str(''.join(c for c in unicodedata.normalize('NFKD', unicode(title.decode('utf-8'))) if unicodedata.category(c) != 'Mn').replace(u"\u0142",'l'))


A to ciekawe jest. Wiem, że już było i poprawka jest drobna, ale przy okazji:

Czy ta funkcja służy do wycinania ogonków i innych narodowych śmieci?

Czy mamy pewność, że tylko małe litery będą? Małe Ł jest obsłużone (nie dekomponuje się, to nie akcent), ale wielkie już nie. To samo z ręcznym przejściem na 'ąć..' na 'ac'..

Po co jest unicode(title.decode('utf-8')) skoro samo title.decode('utf-8') zwraca unicode?

Może zamieniać resztę dziwnych znaków na ?, np.

maketrans = lambda s1, s2: dict(zip(map(ord, s1), map(ord, s2))) unicode_translate_table = maketrans(u'łŁ–—\u2044•„”«»', u'lL--/.""<>') def normalize(title): """Convert UTF-8 title to ASCII as well as we can.""" if not isinstance(title, type(u'')): title = title.decode('utf-8') title = u''.join(c for c in unicodedata.normalize('NFKD', title) if unicodedata.category(c) != 'Mn') return title.translate(unicode_translate_table).encode('ascii', 'replace')

Dzięki czemu mamy konwersję (zawsze można rozszerzyć unicode_translate_table jakby co):

UTF-8 ASCII

Zażółć AŁĆ Zazolc ALC

• Hit – „Česki Film «Arabela ½»” . Hit - "Ceski Film <Arabela 1/2>"

µ-film ?-film

Działa w Python2 i Python3.

EDIT: Dodałem wykrywanie czy to już nie jest unicode, nie trzeba będzie dawać encode przed wywołaniem, jak jest np. w #41.

anonymous commit

fe8190d

rysson mentioned this pull request Mar 5, 2019

refactor cleantitle.normalize in PTW #45

Merged

rysson reviewed Mar 5, 2019

View reviewed changes

rysson self-requested a review March 5, 2019 20:09

rysson mentioned this pull request Mar 5, 2019

Gitmask Anonymous PR #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gitmask Anonymous PR #40

Gitmask Anonymous PR #40

Uh oh!

gitmask-anonymous commented Mar 4, 2019

Uh oh!

rysson left a comment

Uh oh!

rysson Mar 5, 2019

Uh oh!

rysson Mar 5, 2019

Uh oh!

Uh oh!

UTF-8	ASCII
Zażółć AŁĆ	Zazolc ALC
• Hit – „Česki Film «Arabela ½»”	. Hit - "Ceski Film <Arabela 1/2>"
µ-film	?-film

Gitmask Anonymous PR #40

Are you sure you want to change the base?

Gitmask Anonymous PR #40

Uh oh!

Conversation

gitmask-anonymous commented Mar 4, 2019

Uh oh!

rysson left a comment

Choose a reason for hiding this comment

Uh oh!

rysson Mar 5, 2019

Choose a reason for hiding this comment

Uh oh!

rysson Mar 5, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!