Megu Package

megu.cli

style

Contains some style definitions for the CLI interface.

class megu.cli.style.Colors[source]

Object containing standard colors for the CLI interface.

class megu.cli.style.Symbols[source]

Object containing standard symbols for the CLI interface.

ui

utils


megu.constants

Contains project-wide constants.

megu.constants.APP_NAME

The name of the application. Should always be megu.

Type:

str

megu.constants.APP_VERSION

The current version of the megu application.

Type:

str

megu.constants.CONFIG_DIR

The directory path where the application configuration lives.

Type:

Path

megu.constants.PLUGIN_DIR

The directory path where plugins are installed to.

Type:

Path

megu.constants.CACHE_DIR

The directory path where the application cache lives.

Type:

Path

megu.constants.LOG_DIR

The directory path where the application logs live.

Type:

Path

megu.constants.TEMP_DIR

The directory path where the application temporary files live.

Type:

Path

megu.constants.STAGING_DIR

The directory path where the application downloads content fragments to.

Type:

Path

megu.constants.DOWNLOAD_DIR

The directory path where downloads are stored to by default.

Type:

Path


megu.env

Contains available environment configs and defaults.

class megu.env.MeguEnv(cache_dir=PosixPath('/home/docs/.cache/megu'), log_dir=PosixPath('/home/docs/.cache/megu/log'), plugin_dir=PosixPath('/home/docs/.config/megu/plugins'), download_dir=PosixPath('/home/docs/Downloads'))[source]

Defines available environment configuration values.

cache_dir

The directory where persistent caches should be stored. Read from MEGU_CACHE_DIR.

Type:

Path

log_dir

The directory where logs should be stored. Read from MEGU_LOG_DIR.

Type:

Path

plugin_dir

The directory where plugins will be read from. Read from MEGU_PLUGIN_DIR.

Type:

Path

download_dir

The directory where downloads are stored to by default. Read from MEGU_DOWNLOAD_DIR.

Type:

Path


megu.config

Contains project wide configuration values.

class megu.config.MeguConfig(cache_dir=PosixPath('/home/docs/.cache/megu'), log_dir=PosixPath('/home/docs/.cache/megu/log'), plugin_dir=PosixPath('/home/docs/.config/megu/plugins'), download_dir=PosixPath('/home/docs/Downloads'))[source]

Project wide configuration values.

app_name

The name of the CLI app.

Type:

str

app_version

The semver version of the CLI app.

Type:

str

temp_dir

The directory where temporary files should be stored.

Type:

Path

staging_dir

The directory where files can be persisted for staging downloads.

Type:

Path

cache_dir

The directory where persistent caches should be stored.

Type:

Path

log_dir

The directory where logs should be stored.

Type:

Path

plugin_dir

The directory where plugins should be read from.

Type:

Path

Parameters:
  • cache_dir (Path) – The directory where persistent caches should be stored.

  • log_dir (Path) – The directory where logs should be stored.

  • plugin_dir (Path) – The directory where plugins should be read from.

  • download_dir (Path) – The directory where downloads are stored to by default.


megu.download

Contains the namespace for content downloaders.

base

Contains the abstractions necessary to build content downloaders.

megu.download.base.DEFAULT_MAX_CONNECTIONS

The maximum number of connections permittable for a standard download.s

Type:

int

class megu.download.base.BaseDownloader[source]

The base downloader that all content downloaders should inherit from.

abstract classmethod can_handle(content)[source]

Check if some given content can be handled by the downloader.

Parameters:

content (Content) – The content to check against the current content.

Returns:

True if the downloader can handle downloading the content, otherwise False

Return type:

bool

abstract download_content(content, max_connections=8, update_hook=None)[source]

Download the resources of some content to temporary storage.

Parameters:
  • content (Content) – The content to download.

  • max_connections (int, optional) – The limit of connections to make to handle downloading the content. Defaults to DEFAULT_MAX_CONNECTIONS.

  • update_hook (Optional[Callable[[int], Any]], optional) – Callable for reporting downloaded chunk sizes. Defaults to None.

Returns:

The manifest of downloaded content and local file artifacts.

Return type:

Manifest

abstract property name: str

Human readable name for the plugin.

Return type:

str

discover

Contains the functionality to discover the currently available downloaders.

megu.download.discover.discover_downloaders()[source]

Discover the available downloaders in the project.

Yields:

Type[BaseDownloader] – The currently available downloaders.

Return type:

Generator[Type[BaseDownloader], None, None]

http

Contains logic for handling HTTP downloads.

megu.download.http.DEFAULT_CHUNK_SIZE

The default bytesize that the HTTP downloader should use for streaming content.

Type:

int

megu.download.http.DEFAULT_MAX_CONNECTIONS

The default maximum number of HTTP connections the downloader should use.

Type:

int

megu.download.http.CONTENT_RANGE_PATTERN

A compiled regex pattern to help matching content range header values.

Type:

Pattern

class megu.download.http.HttpDownloader[source]

Downloader for traditional HTTP resources.

classmethod can_handle(content)[source]

Check if some given content can be handled by the HTTP downloader.

Parameters:

content (Content) – The content to check against the current content.

Returns:

True if the downloader can handle downloading the content, otherwise False

Return type:

bool

download_content(content, max_connections=8, update_hook=None)[source]

Download the resource of some content to temporary storage.

Parameters:
  • content (Content) – The content to download.

  • max_connections (int, optional) – The limit of connections to make to handle downloading the content. Defaults to DEFAULT_MAX_CONNECTIONS.

  • update_hook (Optional[Callable[[int], Any]], optional) – Callable for reporting downloaded chunk sizes. Defaults to None.

Returns:

The manifest of downloaded content and local file artifacts.

Return type:

Manifest

download_resource(resource, resource_index, to_path, chunk_size=4096, update_hook=None)[source]

Download some resource to a specific filepath.

Parameters:
  • resource (HttpResource) – The resource to download.

  • resource_index (int) – The content’s index of the resource in its list of resources.

  • to_path (Path) – The filepath to download the resource to.

  • chunk_size (int, optional) – The byte size of chunks to stream the resource data in. Defaults to DEFAULT_CHUNK_SIZE.

  • update_hook (Optional[Callable[[int], Any]], optional) – Callable for reporting downloaded chunk sizes. Defaults to None.

Raises:

ValueError – When attempting to download the resource fails for any reason.

Returns:

A tuple containing the index, the resource, and the path the resource was downloaded to.

Return type:

Tuple[int, HttpResource, Path]

session

HTTP session to use for downloading resources.

Returns:

The HTTP session to use for downloading resources.

Return type:

Session


megu.exceptions

Contains definitions for custom project exceptions.

exception megu.exceptions.MeguException(message)[source]

Provides a namespace for the project specific exceptions.

Parameters:

message (str) –

exception megu.exceptions.PluginFailure(message)[source]

Describes when a plugin fails to load for some reason.

Parameters:

message (str) –


megu.filters

Contains some really basic content filters.

megu.filters.best_content(content)[source]

Get the best quality content from the extracted content iterator.

Parameters:

content (Iterable[Content]) – The iterable of content that was extracted

Returns:

The highest quality content

Return type:

Content

megu.filters.specific_content(content, **conditions)[source]

Apply many filters to an iterable of content instances.

With no conditions provided, no content will be filtered out and all content instances will be returned. When conditions are provided, matching filter handlers will be dynamically applied to filter out content instances.

Parameters:
  • content (Iterable[Content]) – An iterable of content to apply many filters to.

  • conditions (Dict[str, Any]) – A dictionary of filters to apply to the given content iterable.

Returns:

An iterator for filtered content.

Return type:

Iterable[Content]

Yields:

Content – Content instances which have passed all defined filters.


megu.hasher

This module provides simple safe hashing functions.

We only support several of the available hashing algorithms from hashlib as they have several that are never really used (such as sha224).

Tip

The provided basic functions allow you to calculate multiple hashes at the same time which means that your bottleneck will be whatever slowest hashing algorithm you request.

>>> from megu.hasher import hash_io, HashType
>>> with open("/home/user/A/PATH/TO/A/FILE", "rb") as file_io:
...     hashes = hash_io(file_io, {HashType.MD5, HashType.SHA256})
{
    <HashType.SHA256: 'sha256'>: 'f0e4c2f76c58916ec258f246851bea091d14d4247a2f...',
    <HashType.MD5: 'md5'>: 'a46062d24103b87560b2dc0887a1d5de'
}
megu.hasher.DEFAULT_CHUNK_SIZE

The default size in bytes to chunk file streams for hashing.

Type:

int

class megu.hasher.HashType(value)[source]

Enumeration of supported hash types.

property hasher: Callable[[bytes | bytearray | memoryview], hashlib._Hash]

Get the hasher callable for the current hash type.

megu.hasher.hash_file(filepath, types, chunk_size=65536)[source]

Calculate the requested hash types for some given file path instance.

Basic usage of this function typically looks like the following:

>>> from pathlib import Path
>>> from megu.hasher import hash_file, HashType
>>> big_file_path = Path("/home/USER/A/PATH/TO/A/BIG/FILE")
>>> hash_file(big_file_path, {HashType("md5"), HashType.SHA256})
{
    <HashType.SHA256: 'sha256'>: 'f0e4c2f76c58916ec258f246851bea091d14d4247a2f...',
    <HashType.MD5: 'md5'>: 'a46062d24103b87560b2dc0887a1d5de'
}
Parameters:
  • filepath (Path) – The filepath to calculate hashes for.

  • types (Set[~HashType]) – The set of names for hash types to calculate.

  • chunk_size (int) – The size of bytes ot have loaded from the file into memory at a time. Defaults to DEFAULT_CHUNK_SIZE.

Raises:
  • FileNotFoundError – If the given filepath does not point to an existing file.

  • ValueError – If one of the given types is not supported.

Returns:

A dictionary of hash type strings and the calculated hexdigest of the hash.

Return type:

Dict[~HashType, str]

megu.hasher.hash_io(io, types, chunk_size=65536)[source]

Calculate the requested hash types for some given binary IO instance.

>>> from io import BytesIO
>>> from megu.hasher import hash_io, HashType
>>> hash_io(BytesIO(b"Hey, I'm a string"), {HashType("sha256"), HashType.MD5})
{
    <HashType.SHA256: 'sha256'>: 'f0e4c2f76c58916ec258f246851bea091d14d4247a2f...',
    <HashType.MD5: 'md5'>: '25cb7b2c4e2064c1deebac4b66195c9c'
}

Of course if you need to instead hash StringIO, it’s up to you to do whatever conversions you need to do to create a BytesIO instance. This typically involves having to read the entire string and encode it.

>>> from io import BytesIO, StringIO
>>> from megu.hasher import hash_io, HashType
>>> string_io = StringIO("Hey, I'm a string")
>>> byte_io = BytesIO(string_io.read().encode("utf-8"))
>>> hash_io(byte_io, {HashType.SHA256, HashType("md5")})
{
    <HashType.SHA256: 'sha256'>: 'f0e4c2f76c58916ec258f246851bea091d14d4247a2f...',
    <HashType.MD5: 'md5'>: '25cb7b2c4e2064c1deebac4b66195c9c'
}
Parameters:
  • io (BinaryIO) – The IO to calculate hashes for.

  • types (Set[~HashType]) – The set of names for hash types to calculate.

  • chunk_size (int) – The size of bytes to have loaded from the buffer into memory at a time. Defaults to DEFAULT_CHUNK_SIZE.

Raises:

ValueError – If one of the given types is not supported.

Returns:

A dictionary of hash type strings and the calculated hexdigest of the hash.

Return type:

Dict[~HashType, str]


megu.helpers

Contains helper methods that plugins can use to simplify usage.

megu.helpers.disk_cache(cache_name)[source]

Context manager for creating or accessing a local disk cache.

We recommend that you avoid using a diskcache if at all possible. The feature to define and use a disk-persisted cache was introduced for the purpose of caching fetched API tokens between runs (such as OAuth Bearer tokens). You should not be caching content, you should be downloading content.

Important

For some relatively naive precautions, we don’t allow for path separators or spaces in the cache name. For this purpose, we are enforcing that the name of the cache must match the following pattern: ^[a-z]+[a-z0-9_-]{3,31}[a-z0-9]$.

For this reason, we recommend that you use your plugin’s package name as the name for your plugin’s disk-persisted cache.

Warning

Please be reasonable about what you are caching. No one wants people taking advantage of their disk-space.

Parameters:

cache_name (str) – The name of the cache to create or access.

Raises:

ValueError – If the given cache_name does not match the approved naming pattern.

Yields:

Cache – The diskcache Cache instance.

Return type:

Generator[Cache, None, None]

megu.helpers.get_soup(markup)[source]

Get a BeautifulSoup instance for some HTML markup.

Parameters:

markup (str) – The HTML markup to use when building a BeautifulSoup instance.

Returns:

The parsed soup for the given HTML markup.

Return type:

BeautifulSoup

megu.helpers.http_session()[source]

Context manager for creating a requests HTTP session to make basic requests.

Yields:

Session – A new clean session that plugins can use for requests.

Return type:

Generator[Session, None, None]

megu.helpers.noop(*args, **kwargs)[source]

Noop function that does absolutely nothing.

Return type:

None

class megu.helpers.noop_class(**kwargs)[source]

Noop class that allows for everything but does nothing.

__call__(*args, **kwargs)[source]

Noop class call (returns itself).

__getattr__(*args, **kwargs)[source]

Noop getter (returns itself).

megu.helpers.python_path(*paths)[source]

Context manager for temporarily added directories to the Python search path.

Parameters:

*paths (Tuple[PathLike]) – The paths of directories that you want to add to the Python path.

Yields:

List[str] – The temporarily mutated sys.path.

Return type:

Generator[List[str], None, None]

megu.helpers.temporary_directory(prefix, dirpath=None)[source]

Context manager for creating a temporary directory at the appropriate location.

Parameters:
  • prefix (str) – The prefix of the temporary directory.

  • dirpath (Path, optional) – The directory path the temporary directory should be created in. Defaults to TEMP_DIR.

Raises:

NotADirectoryError – When the provided dirpath does not exist.

Yields:

Path – The temporary directory’s path.

Return type:

Generator[Path, None, None]

megu.helpers.temporary_file(prefix, mode, dirpath=None)[source]

Context manager for opening a temporary file at the appropriate location.

Parameters:
  • prefix (str) – The prefix of the temporary file.

  • mode (str) – The mode the file should be opened with.

  • dirpath (Path, optional) – The directory path the temporary file should be opened in. Defaults to TEMP_DIR.

Raises:

NotADirectoryError – When the provided dirpath does not exist.s

Yields:

Tuple[Path, IO] – A tuple containing the temporary file’s path and the file handle.

Return type:

Generator[Tuple[Path, IO], None, None]


megu.log

Contains logger configuration and creation.

We use Loguru to handle all the complexities of logging. They work with the concept of a single global logger which is used throughout the entire application. Since this project is just a single tool that doesn’t need to handle too complex threading or distributed processing, this style of a single global logger works fine.

Examples

Most all usage of this logger should look like the following:

from .log import instance as log
log.debug("My logged message here")

If you need to re-configure the logger for debug logging or for other intricate logging handler settings, you should do so through the configure_logger() function:

from .log import configure_logger, instance
configure_logger(instance, debug=True)
megu.log.instance

The configured global logger instance that should likely always be used.

Type:

loguru.Logger

megu.log.configure_logger(logger, level='CRITICAL', debug=False, record=False)[source]

Configure the global logger.

Parameters:
  • logger (loguru.Logger) – The global logger instance to configure.

  • level (str, optional) – The string level to filter logging messages through. Defaults to “CRITICAL”

  • debug (bool, optional) – If True, configures the logger with the debug configuration. Defaults to False.

  • record (bool, optional) – If True, logs will be recorded and written out to the log directory. Defaults to False.

Returns:

The newly configured global logger

Return type:

loguru.Logger

megu.log.get_logger(debug=False)[source]

Get the configured global logger.

Parameters:

debug (bool, optional) – If True, enables debug logging. Defaults to False.

Returns:

The configured global logger

Return type:

loguru.Logger


megu.models

Contains data models to use throughout the project.

content

Contains definitions of content types used throughout the project.

class megu.models.content.Url[source]

A basic wrapper around a furl URL to keep things consistent between plugins and the internals of the package without declaring a direct dependency on a third-party.

class megu.models.content.Checksum(**data)[source]

Describes a checksum that should be used for content validation.

Parameters:
  • type (HashType) – The type of checksum hash is being defined.

  • hash (str) – The value of the checksum hash being defined.

  • data (Dict, optional) – Model parameter dictionary provided by pydantic. You should likely never use this property unless you need a keyword argument for a dictionary payload to construct the model.

class megu.models.content.Content(**data)[source]

Describes some extracted content that can be downloaded.

Parameters:
  • id (str) – The plugin-defined content-unique identifier for the content.

  • name (str) – The human-readable name to describe the content.

  • url (str) – The absolute URL from where the plugin extracted the content. This URL string gets translated into a Url instance.

  • quality (float) – The plugin-defined arbitrary quality of the content.

  • size (int) – The size in bytes the content will take up on the local filesystem.

  • type (str) – The appropriate mimetype of the content.

  • resources (List[Resource]) – The resources required to fetch and download the extracted content.

  • meta (Meta) – The structured metadata of the extracted content.

  • checksums (List[Checksum]) – A list of checksums that can be used to verify the downloaded content.

  • extra (Dict[str, Any) – The unstructured metadata of the extracted content.

  • data (Dict, optional) – Model parameter dictionary provided by pydantic. You should likely never use this property unless you need a keyword argument for a dictionary payload to construct the model.

class Config[source]

Configuration for Content model validation.

property ext: str

File extension for the content.

Returns:

The best suitable file extension for the content. May be a blank string if a extension cannot be determined.

Return type:

str

property filename: str

Filename for the content.

Returns:

The appropriate filename for the content.

Return type:

str

class megu.models.content.Manifest(**data)[source]

Describes the downloaded artifacts ready to be merged.

Parameters:
  • content (Content) – The content instance that was download.

  • artifacts (List[Tuple[Resource, Path]]) – A tuple containing (resource, path) of content resources that were downloaded to the local filesystem.

  • data (Dict, optional) – Model parameter dictionary provided by pydantic. You should likely never use this property unless you need a keyword argument for a dictionary payload to construct the model.

class megu.models.content.Meta(**data)[source]

Describes some additional metadata about the extracted content.

Parameters:
  • id (Optional[str], optional) – The site internal identifier for the extracted content.

  • title (Optional[str], optional) – The site defined title for the extracted content.

  • description (Optional[str], optional) – The site defined description for the extracted content.

  • publisher (Optional[str], optional) – The site defined publisher name for the extracted content.

  • published_at (Optional[datetime], optional) – The site defined datetime timestamp for when the extracted content was published.

  • filename (Optional[str], optional) – The site defined filename for the extracted content.

  • thumbnail (Optional[str], optional) – The URL for the thumbnail of the extracted content.

  • data (Dict, optional) – Model parameter dictionary provided by pydantic. You should likely never use this property unless you need a keyword argument for a dictionary payload to construct the model.

class megu.models.content.Resource(**data)[source]

The base resource class that resource types must inherit from.

Important

This class is abstract and used as an typing interface for the Content model. Concrete implementations of this abstract class such as HttpResource must be provided to content in order for the application to understand how to fetch the content.

Parameters:

data (Dict, optional) – You should never use this parameter. Since this is an abstract class, you should never be instantiating it.

abstract property fingerprint: str

Get the unique identifier of an resource.

Raises:

NotImplementedError – Subclasses must implement this property.

Returns:

A string fingerprint of the resource.

Return type:

str

http

Contains definitions of HTTP resource types used throughout the project.

class megu.models.http.HttpMethod(value)[source]

Enumeration of the available HTTP methods that resources can use.

class megu.models.http.HttpResource(**data)[source]

Describes a downloadable HTTP resource that is part of some local content.

Parameters:
  • method (HttpMethod) – The HTTP method that should be used to fetch this resource.

  • url (str) – The URL that should be used to fetch this resource. This URL string gets translated into a Url instance.

  • headers (dict) – The dictionary of headers to use to fetch this resource (if any).

  • data (Optional[bytes], optional) – The data body to send in the resource request (if any).

  • auth – (Optional[Callable[[~requests.Request], ~requests.Request]], optional): A callable that mutates a request to ensure it is authenticated for fetching the resource.

class Config[source]

Model configuration for the Resource model.

fingerprint

Get a computed unique identifier for the resource.

Returns:

The unique identifier for the resource.

Return type:

str

classmethod from_request(request)[source]

Produce an resource from an existing prepared request.

Parameters:

request (PreparedRequest) – The request to construct an resource from.

Returns:

The newly produced resource.

Return type:

HttpResource

to_request()[source]

Get a matching prepared request for the current resource.

Returns:

The matching prepared request for the current resource.

Return type:

PreparedRequest

types

Contains custom model types to be used in model implementations.

class megu.models.types.Url(url='', args=<object object>, path=<object object>, fragment=<object object>, scheme=<object object>, netloc=<object object>, origin=<object object>, fragment_path=<object object>, fragment_args=<object object>, fragment_separator=<object object>, host=<object object>, port=<object object>, query=<object object>, query_params=<object object>, username=<object object>, password=<object object>, strict=False)[source]

A URL validated by AnyHttpUrl and casted as a furl.

classmethod __get_validators__()[source]

Yield the appropriate validators for the class.

Yields:

Callable[[Any, ModelField, BaseConfig], Any] – A validator callable.

Return type:

Generator[Callable[[Any, ModelField, BaseConfig], Any], None, None]

classmethod __modify_schema__(field_schema)[source]

Modify the field schema entry.

Parameters:

field_schema (Dict[str, Any]) – The current field schema.

Return type:

None

classmethod validate(value, field, config)[source]

Validate and parse the given URL string value.

Parameters:
  • value (Any) – The URL provided by a user.

  • field (ModelField) – The field instance the URL is using.

  • config (BaseConfig) – The config instance the URL is in.

Returns:

The furl instance of the given URL string.

Return type:

furl.furl.furl


megu.plugin

Contains logic for producing and loading plugins for the project.

base

Contains the abstractions necessary for the plugin discovery to work.

class megu.plugin.base.BasePlugin[source]

The base plugin that all plugins should inherit from.

This class should mostly be excluded from testing as it should only ever define an interface and not provide much if any implementation.

__str__()[source]

Build a human-friendly string representation of a plugin.

Returns:

The human-friendly string representation of a plugin.

Return type:

str

abstract can_handle(url)[source]

Check if a given Url can be handled by the plugin.

Parameters:

url (Url) – The URL to check against the current plugin.

Returns:

True if the plugin can handle the given URL, otherwise False

Return type:

bool

abstract property domains: Set[str]

Set of domains that this plugin supports.

Return type:

Set[str]

abstract extract_content(url)[source]

Extract content from the given URL.

Parameters:

url (Url) – The URL to extract content from.

Yields:

Content – The discovered content from the given URL.

Return type:

Generator[Content, None, None]

abstract merge_manifest(manifest, to_path)[source]

Merge downloaded artifacts from a manifest to a singular local filepath.

Parameters:
  • manifest (Manifest) – The manifest containing the content and its downloaded artifacts.

  • to_path (Path) – The path to merge to artifacts to.

Returns:

The path the artifacts have been merged to.

Return type:

Path

abstract property name: str

Human readable name for the plugin.

Return type:

str

discover

Contains logic to discover and load compatible plugins from a directory.

megu.plugin.discover.discover_plugins(package_dirpath, plugin_type=<class 'megu.plugin.base.BasePlugin'>)[source]

Discover and load plugins from a given directory of plugin modules.

Parameters:
  • package_dirpath (Path) – The path of the directory to look for plugins in.

  • plugin_type (Type, optional) – The type of plugin to filter for and attempt to load. Defaults to BasePlugin

Raises:

PluginFailure – When a discovered plugin fails to load

Yields:

Tuple[str, List[BasePlugin]] – A tuple of the plugin name and the instances of exported plugins from that plugin module

Return type:

Generator[Tuple[str, List[BasePlugin]], None, None]

megu.plugin.discover.iter_available_plugins(plugin_dirpath=None, plugin_type=<class 'megu.plugin.base.BasePlugin'>)[source]

Get all available plugins from the given plugin directory.

Parameters:
  • plugin_dirpath (Path, optional) – The path to the directory where plugins are installed. Defaults to PLUGIN_DIR.

  • plugin_type (Type, optional) – The type of plugins to load. Defaults to BasePlugin.

Yields:

Tuple[str, List[BasePlugin]] – A tuple of the plugin name and the instances of exported plugins from available plugin modules.

Return type:

Generator[Tuple[str, List[BasePlugin]], None, None]

megu.plugin.discover.load_plugin(plugin_name, plugin_class)[source]

Load a plugin instance from a given plugin class.

Parameters:
  • plugin_name (str) – The name of the plugin package

  • plugin_class (Type[BasePlugin]) – The plugin class from the plugin package

Raises:

PluginFailure – When the plugin fails to load

Returns:

The loaded plugin instance

Return type:

BasePlugin

megu.plugin.discover.load_plugin_module(module_name)[source]

Load/import a plugin module given the module name.

Parameters:

module_name (str) – The name of the plugin module

Raises:

PluginFailure – When the plugin module fails to import

Returns:

The imported plugin module

Return type:

ModuleType

generic

Contains a very generic fallback plugin.

class megu.plugin.generic.GenericPlugin[source]

A very generic fallback plugin.

This plugin assumes that the given URL can just be downloaded with a single HTTP Get request and that it produces a single artifact that only needs to be renamed.

can_handle(url)[source]

Check if the plugin can handle the given URL.

Parameters:

url (Url) – The URL to check against the generic plugin.

Returns:

This plugin assumes it can handle any URL, therefore it always returns True.

Return type:

Literal[True]

extract_content(url)[source]

Extract the content from the given Url instance.

This extraction makes a single HTTP Head request to fetch Content-Length and Content-Type. Otherwise, it returns a single content instance based on the hash of the given Url.

Parameters:

url (Url) – The URL to extract content from.

Yields:

Generator[Content, None, None] – The extracted content from the given Url instance.

Return type:

Generator[Content, None, None]

merge_manifest(manifest, to_path)[source]

Merge the given manifest artifacts into a single filepath.

Parameters:
  • manifest (Manifest) – The manifest of the downloaded artifacts.

  • to_path (Path) – The path that the artifacts should be merged into.

Raises:

ValueError – When the provided manifest contains more than 1 artifact.

Returns:

The filepath the artifacts were merged into.

Return type:

Path

manage

Contains logic to install plugins into a directory.

megu.plugin.manage.add_plugin(package, plugin_dirpath=None, silence_subprocess=False)[source]

Install a plugin utilizing pip.

Important

If your package is not installable via pip through any of the distribution methods that pip checks (pypi, git, local, etc.), installation of your plugin simply will not work.

Parameters:
  • package (str) – The package identifier that pip should use to discover and install your plugin.

  • plugin_dirpath (Path, optional) – The directory the plugin should be installed to. Defaults to PLUGIN_DIR.

  • silence_subprocess (bool) – If set to True, will redirect output of subprocess calls to /dev/null. Defaults to False.

Returns:

The directory the plugin was installed to.

Return type:

Path

megu.plugin.manage.remove_plugin(package, plugin_dirpath=None)[source]

Remove the given package if it exists in the plugin directory.

Parameters:
  • package (str) – The name of the package to remove.

  • plugin_dirpath (Path, optional) – The plugin directory to remove the package from. Defaults to PLUGIN_DIR.

Raises:

NotADirectoryError – If the given package does not exist as a subdirectory within the given plugin directory.


megu.services

Contains helpful service functions that should really only be used during runtime.

megu.services.get_downloader(content)[source]

Get the best available downloader for the given content.

Parameters:

content (Content) – The content that the downloader should be able to handle.

Returns:

The best available downloader instance for the given content.

Return type:

BaseDownloader

megu.services.get_plugin(url, plugin_dirpath=None)[source]

Get the best available plugin for a given url.

Parameters:
  • url (Union[str, Url]) – The URL string to fetch the appropriate plugin for.

  • plugin_dirpath (Optional[Path]) – The path to the directory of plugins to read through. Defaults to None.

Returns:

The best available plugin that can handle the given url.

Return type:

BasePlugin

megu.services.iter_content(url, plugin)[source]

Shortcut to discover and iterate over content for a given URL.

Parameters:
  • url (Union[str, Url]) – The URL to discover content for.

  • plugin (BasePlugin) – The plugin to use for extracting content.

Yields:

Content – The content extracted for the URL by the most suitable available plugin.

Return type:

Generator[Content, None, None]

megu.services.merge_manifest(plugin, manifest, to_path)[source]

Merge a manifest with the given plugin and finalize content to the given path.

Parameters:
  • plugin (BasePlugin) – The plugin that was used to extract the content of the manifest.

  • manifest (Manifest) – The resulting content and artifact manifest.

  • to_path (Path) – The path the content should be finalized at.

Raises:

FileExistsError – If the given output path already exists.

Returns:

The path the merged content was finalized to.

Return type:

Path

megu.services.normalize_url(url)[source]

Normalize a given URL to a formatted Url instance.

Parameters:

url (Union[str, Url]) – The given url as either a string or a Url instance.

Returns:

The normal Url instance.

Return type:

Url


megu.utils

Contains utilities for the framework to use.

These helper/utility functions should not be exposed to plugins.

megu.utils.allocate_storage(to_path, size)[source]

Allocate a specific number of bytes to a non-existing filepath.

Parameters:
  • to_path (Path) – The filepath to allocate a specific number of bytes to.

  • size (int) – The number of bytes to allocate.

Raises:

FileExistsError – If the given filepath already exists

Returns:

The given filepath

Return type:

Path

megu.utils.compose_functions(*functions)[source]

Compose many similar functions together.

Parameters:

functions (List[Callable[[_T], _T]]) – Many functions to compose together.

Returns:

A new function that applies all provided functions in order.

Return type:

Callable[[_T], _T]

megu.utils.create_required_directories()[source]

Handle setting up the required directories on the local machine.