91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 37.75MB
    文件類型: .zip
    金幣: 2
    下載: 2 次
    發(fā)布日期: 2023-07-01
  • 語(yǔ)言: 其他
  • 標(biāo)簽: 學(xué)堂在線??

資源簡(jiǎn)介

目標(biāo):爬取學(xué)堂在線合作院校頁(yè)面內(nèi)容 網(wǎng)址:https://v1-www.xuetangx.com/partners 要求:爬取到合作院校的名稱及該所院校在學(xué)堂在線開課的數(shù)量,將爬取到的數(shù)據(jù)保存到一個(gè)json文件中!例如:“清華大學(xué),308” 交付內(nèi)容:整個(gè)項(xiàng)目(rar或zip格式)!壓縮包名要求為 "ID-作業(yè)序號(hào)"!

資源截圖

代碼片段和文件信息

import?logging
import?re
from?collections?import?namedtuple
from?datetime?import?time

import?six
from?six.moves.urllib.parse?import?(ParseResult?quote?urlparse
????????????????????????????????????urlunparse)

logger?=?logging.getLogger(__name__)

_Rule?=?namedtuple(‘Rule‘?[‘field‘?‘value‘])
RequestRate?=?namedtuple(
????‘RequestRate‘?[‘requests‘?‘seconds‘?‘start_time‘?‘end_time‘])

_DISALLOW_DIRECTIVE?=?{‘disallow‘?‘dissallow‘?‘dissalow‘?‘disalow‘?‘diasllow‘?‘disallaw‘}
_ALLOW_DIRECTIVE?=?{‘a(chǎn)llow‘}
_USER_AGENT_DIRECTIVE?=?{‘user-agent‘?‘useragent‘?‘user?agent‘}
_SITEMAP_DIRECTIVE?=?{‘sitemap‘?‘sitemaps‘?‘site-map‘}
_CRAWL_DELAY_DIRECTIVE?=?{‘crawl-delay‘?‘crawl?delay‘}
_REQUEST_RATE_DIRECTIVE?=?{‘request-rate‘?‘request?rate‘}
_HOST_DIRECTIVE?=?{‘host‘}

_WILDCARDS?=?{‘*‘?‘$‘}

_HEX_DIGITS?=?set(‘0123456789ABCDEFabcdef‘)

__all__?=?[‘RequestRate‘?‘Protego‘]


def?_is_valid_directive_field(field):
????return?any([field?in?_DISALLOW_DIRECTIVE
????????????????field?in?_ALLOW_DIRECTIVE
????????????????field?in?_USER_AGENT_DIRECTIVE
????????????????field?in?_SITEMAP_DIRECTIVE
????????????????field?in?_CRAWL_DELAY_DIRECTIVE
????????????????field?in?_REQUEST_RATE_DIRECTIVE
????????????????field?in?_HOST_DIRECTIVE])


def?_enforce_path(pattern):
????if?pattern.startswith(‘/‘):
????????return?pattern

????return?‘/‘?+?pattern


class?_URLPattern(object):
????“““Internal?class?which?represents?a?URL?pattern.“““

????def?__init__(self?pattern):
????????self._pattern?=?pattern
????????self.priority?=?len(pattern)
????????self._contains_asterisk?=?‘*‘?in?self._pattern
????????self._contains_dollar?=?self._pattern.endswith(‘$‘)

????????if?self._contains_asterisk:
????????????self._pattern_before_asterisk?=?self._pattern[:self._pattern.find(‘*‘)]
????????elif?self._contains_dollar:
????????????self._pattern_before_dollar?=?self._pattern[:-1]

????????self._pattern_compiled?=?False

????def?match(self?url):
????????“““Retun?True?if?pattern?matches?the?given?URL?otherwise?return?False.“““
????????#?check?if?pattern?is?already?compiled
????????if?self._pattern_compiled:
????????????return?self._pattern.match(url)

????????if?not?self._contains_asterisk:
????????????if?not?self._contains_dollar:
????????????????#?answer?directly?for?patterns?without?wildcards
????????????????return?url.startswith(self._pattern)

????????????#?pattern?only?contains?$?wildcard.
????????????return?url?==?self._pattern_before_dollar

????????if?not?url.startswith(self._pattern_before_asterisk):
????????????return?False

????????self._pattern?=?self._prepare_pattern_for_regex(self._pattern)
????????self._pattern?=?re.compile(self._pattern)
????????self._pattern_compiled?=?True
????????return?self._pattern.match(url)

????def?_prepare_pattern_for_regex(self?pattern):
????????“““Return?equivalent?regex?pattern?for?the?given?URL?pattern.“““
????????pattern?=?re.sub(r‘\*+‘?‘*‘?pattern)
????????s?=?re.split(r‘(\*|\$$)‘?pattern)
????????for?index?substr?in

?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2020-05-19?10:51??學(xué)堂在線\
?????目錄???????????0??2020-05-19?10:49??學(xué)堂在線\.idea\
?????目錄???????????0??2020-05-19?10:49??學(xué)堂在線\.idea\inspectionProfiles\
?????文件?????????174??2020-05-19?10:49??學(xué)堂在線\.idea\inspectionProfiles\profiles_settings.xml
?????文件?????????200??2020-05-19?10:49??學(xué)堂在線\.idea\misc.xml
?????文件?????????283??2020-05-19?10:49??學(xué)堂在線\.idea\modules.xml
?????文件????????4558??2020-05-19?10:49??學(xué)堂在線\.idea\workspace.xml
?????文件?????????570??2020-05-19?10:49??學(xué)堂在線\.idea\學(xué)堂在線.iml
?????目錄???????????0??2020-05-19?10:46??學(xué)堂在線\xtzx\
?????文件????19740888??2020-05-19?10:51??學(xué)堂在線\xtzx.zip
?????目錄???????????0??2020-05-19?10:54??學(xué)堂在線\xtzx\.idea\
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\.idea\inspectionProfiles\
?????文件?????????174??2020-05-09?16:25??學(xué)堂在線\xtzx\.idea\inspectionProfiles\profiles_settings.xml
?????文件?????????195??2020-05-09?16:25??學(xué)堂在線\xtzx\.idea\misc.xml
?????文件?????????267??2020-05-09?16:25??學(xué)堂在線\xtzx\.idea\modules.xml
?????文件????????6315??2020-05-19?10:54??學(xué)堂在線\xtzx\.idea\workspace.xml
?????文件?????????361??2020-05-09?16:25??學(xué)堂在線\xtzx\.idea\xtzx.iml
?????文件???????11010??2020-05-19?10:39??學(xué)堂在線\xtzx\MyData.json
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\venv\
?????目錄???????????0??2020-05-09?16:25??學(xué)堂在線\xtzx\venv\Include\
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\venv\Lib\
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\venv\Lib\site-packages\
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\venv\Lib\site-packages\attr\
?????目錄???????????0??2020-05-09?16:50??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\
?????文件???????????4??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\INSTALLER
?????文件????????1082??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\LICENSE
?????文件????????9022??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\metaDATA
?????文件????????2184??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\RECORD
?????文件???????????5??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\top_level.txt
?????文件?????????110??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attrs-19.3.0.dist-info\WHEEL
?????文件????????2141??2020-05-09?16:29??學(xué)堂在線\xtzx\venv\Lib\site-packages\attr\converters.py
............此處省略4037個(gè)文件信息

評(píng)論

共有 條評(píng)論