hermes.commands.harvest.file_exists
Module for the FileExistsHarvestPlugin and it’s associated models and helpers.
Classes
Basic model of a |
|
Basic model of a |
|
Basic model of a |
|
Settings for |
|
Harvest plugin that finds and tags files based on patterns. |
Functions
|
Case-insensitive path matching. |
|
Get a list of all files by recursively searching the |
|
Get a list of all files by calling |
Module Contents
- class hermes.commands.harvest.file_exists.URL
Basic model of a
schema:URL.See also: https://schema.org/URL
- class hermes.commands.harvest.file_exists.MediaObject
Basic model of a
schema:MediaObject.See also: https://schema.org/MediaObject
- url: URL
- class hermes.commands.harvest.file_exists.CreativeWork
Basic model of a
schema:CreativeWork.See also: https://schema.org/CreativeWork
- associated_media: MediaObject
- class hermes.commands.harvest.file_exists.FileExistsHarvestSettings(/, **data: Any)
Bases:
pydantic.BaseModelSettings for
file_existsharvester.
- class hermes.commands.harvest.file_exists.FileExistsHarvestPlugin
Bases:
hermes.commands.harvest.base.HermesHarvestPluginHarvest plugin that finds and tags files based on patterns.
Files are searched using
git ls-filesor a recursive traversal of the working directory. If available,git ls-filesis used. This can be disabled via the options.The found files are then tagged based on patterns such as
readme.mdorlicenses/*.txt. Matching of the file paths is implemented using thematchfunction of Python’sPathobjects. This means, matching is performed from the end of the path. Search patterns are case-insensitive and use/as the path separator.Files are tagged using the name of the file name pattern’s “group” as the keyword. If a file matches multiple patterns, all appropriate keywords are added. Depending on configuration of
keep_untagged_files, files without any tags are then removed from the file list (this is the default).Files that were tagged with
readmeare added to the data model as aschema:URLusing thecodemeta:readmeproperty. Files that were taggedlicenseare added to the data model as aschema:URLusing theschema:licenseproperty. All files are added to the data model as aschema:CreativeWorkusing theschema:hasPartproperty. All file URLs are given using thefile:protocol and the absolute path of the file at the time of harvesting.- settings: FileExistsHarvestSettings
- __call__(command: hermes.commands.harvest.base.HermesHarvestCommand)
Execute the plugin.
- Parameters:
command – The command that triggered this plugin to run.
- _find_files() List[pathlib.Path]
Find files.
If the setting
enable_git_ls_filesisTrue,git ls-filesis used to find matching files. If it is set toFalseor getting the list from git fails, the working directory is searched recursively.
- hermes.commands.harvest.file_exists._path_matches_pattern(path: pathlib.Path, pattern: str) bool
Case-insensitive path matching.
Python 3.12 introduces the
case_sensitivekwarg to thematchfunction. For older Python versions, we have to implement this behaviour ourselves.
- hermes.commands.harvest.file_exists._ls_files(working_directory: pathlib.Path) List[pathlib.Path]
Get a list of all files by recursively searching the
working_directory.Only regular files (i.e. files which are not directories, pipes, etc.) are returned.
- hermes.commands.harvest.file_exists._git_ls_files(working_directory: pathlib.Path) List[pathlib.Path] | None
Get a list of all files by calling
git ls-fileinworking_directory.git ls-fileis called with the--cachedflag which lists all files tracked by git. The returned file paths are converted to a list ofPathobjects. Files that are tracked by git but don’t exist on disk are not returned. If the git command fails or git is not found,Noneis returned.The result of this function is cached. Git is only executed once per given
working_directory.