Ausgabe der neuen DB Einträge
This commit is contained in:
parent
bad48e1627
commit
cfbbb9ee3d
2399 changed files with 843193 additions and 43 deletions
|
|
@ -0,0 +1,113 @@
|
|||
Metadata-Version: 2.1
|
||||
Name: Protego
|
||||
Version: 0.1.16
|
||||
Summary: Pure-Python robots.txt parser with support for modern conventions
|
||||
Home-page: UNKNOWN
|
||||
Author: Anubhav Patel
|
||||
Author-email: anubhavp28@gmail.com
|
||||
License: BSD
|
||||
Description: # Protego
|
||||
|
||||

|
||||
[](https://www.python.org/)
|
||||
## Overview
|
||||
Protego is a pure-Python `robots.txt` parser with support for modern conventions.
|
||||
|
||||
## Requirements
|
||||
* Python 2.7 or Python 3.5+
|
||||
* Works on Linux, Windows, Mac OSX, BSD
|
||||
|
||||
## Install
|
||||
|
||||
To install Protego, simply use pip:
|
||||
|
||||
```
|
||||
pip install protego
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
>>> from protego import Protego
|
||||
>>> robotstxt = """
|
||||
... User-agent: *
|
||||
... Disallow: /
|
||||
... Allow: /about
|
||||
... Allow: /account
|
||||
... Disallow: /account/contact$
|
||||
... Disallow: /account/*/profile
|
||||
... Crawl-delay: 4
|
||||
... Request-rate: 10/1m # 10 requests every 1 minute
|
||||
...
|
||||
... Sitemap: http://example.com/sitemap-index.xml
|
||||
... Host: http://example.co.in
|
||||
... """
|
||||
>>> rp = Protego.parse(robotstxt)
|
||||
>>> rp.can_fetch("http://example.com/profiles", "mybot")
|
||||
False
|
||||
>>> rp.can_fetch("http://example.com/about", "mybot")
|
||||
True
|
||||
>>> rp.can_fetch("http://example.com/account", "mybot")
|
||||
True
|
||||
>>> rp.can_fetch("http://example.com/account/myuser/profile", "mybot")
|
||||
False
|
||||
>>> rp.can_fetch("http://example.com/account/contact", "mybot")
|
||||
False
|
||||
>>> rp.crawl_delay("mybot")
|
||||
4.0
|
||||
>>> rp.request_rate("mybot")
|
||||
RequestRate(requests=10, seconds=60, start_time=None, end_time=None)
|
||||
>>> list(rp.sitemaps)
|
||||
['http://example.com/sitemap-index.xml']
|
||||
>>> rp.preferred_host
|
||||
'http://example.co.in'
|
||||
```
|
||||
|
||||
Using Protego with [Requests](https://3.python-requests.org/)
|
||||
|
||||
```python
|
||||
>>> from protego import Protego
|
||||
>>> import requests
|
||||
>>> r = requests.get("https://google.com/robots.txt")
|
||||
>>> rp = Protego.parse(r.text)
|
||||
>>> rp.can_fetch("https://google.com/search", "mybot")
|
||||
False
|
||||
>>> rp.can_fetch("https://google.com/search/about", "mybot")
|
||||
True
|
||||
>>> list(rp.sitemaps)
|
||||
['https://www.google.com/sitemap.xml']
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
Class `protego.Protego`:
|
||||
|
||||
### Properties
|
||||
|
||||
* `sitemaps` {`list_iterator`} A list of sitemaps specified in `robots.txt`.
|
||||
* `preferred_host` {string} Preferred host specified in `robots.txt`.
|
||||
|
||||
### Methods
|
||||
|
||||
* `parse(robotstxt_body)` Parse `robots.txt` and return a new instance of `protego.Protego`.
|
||||
* `can_fetch(url, user_agent)` Return True if the user agent can fetch the URL, otherwise return False.
|
||||
* `crawl_delay(user_agent)` Return the crawl delay specified for the user agent as a float. If nothing is specified, return None.
|
||||
* `request_rate(user_agent)` Return the request rate specified for the user agent as a named tuple `RequestRate(requests, seconds, start_time, end_time)`. If nothing is specified, return None.
|
||||
|
||||
Keywords: robots.txt,parser,robots,rep
|
||||
Platform: UNKNOWN
|
||||
Classifier: Development Status :: 4 - Beta
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: License :: OSI Approved :: BSD License
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 2
|
||||
Classifier: Programming Language :: Python :: 2.7
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Programming Language :: Python :: 3.5
|
||||
Classifier: Programming Language :: Python :: 3.6
|
||||
Classifier: Programming Language :: Python :: 3.7
|
||||
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
|
||||
Description-Content-Type: text/markdown
|
||||
Loading…
Add table
Add a link
Reference in a new issue