Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Porkepix
wantzel
Commits
ca112f58
Commit
ca112f58
authored
May 22, 2015
by
Mindiell
Browse files
Modifying the unescape part of webpage's title
parent
8b89bec5
Changes
1
Hide whitespace changes
Inline
Side-by-side
wantzel.py
View file @
ca112f58
...
...
@@ -14,6 +14,7 @@ TODO:
"""
import
feedparser
import
HTMLParser
import
importlib
from
irc
import
IrcClientFactory
import
MySQLdb
...
...
@@ -76,16 +77,8 @@ def get_title(message):
title
=
re
.
search
(
"<title>([^<]+)</title>"
,
content
).
group
(
1
)
except
:
pass
# Unescaping HTML entities
if
title
:
title
=
re
.
sub
(
">|>"
,
">"
,
title
)
title
=
re
.
sub
(
"<|<"
,
"<"
,
title
)
title
=
re
.
sub
(
""|""
,
'"'
,
title
)
title
=
re
.
sub
(
"'|'"
,
"'"
,
title
)
title
=
re
.
sub
(
"&|&"
,
"&"
,
title
)
title
=
re
.
sub
(
"–|–"
,
"–"
,
title
)
# Multiple lines titles are compressed
title
=
re
.
sub
(
"
\n
|
\r
"
,
""
,
title
)
# Unescaping HTML entities and removing multiple lines
title
=
HTMLParser
.
HTMLParser
().
unescape
(
re
.
sub
(
"
\n
|
\r
"
,
""
,
title
))
return
(
title
,
website
)
def
is_moderator
(
name
):
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment