subaligner.embedder module¶

class subaligner.embedder.FeatureEmbedder(n_mfcc: int = 13, frequency: int = 16000, hop_len: int = 512, step_sample: float = 0.04, len_sample: float = 0.075)[source]¶

Bases: object

Audio and subtitle feature embedding.

duration_to_position(seconds: float) → int[source]¶

Return the cell position from a time in seconds.

Parameters: {float} -- The duration in seconds. (seconds) –
Returns: int – The cell position.

extract_data_and_label_from_audio(audio_file_path: str, subtitle_file_path: Optional[str, None], subtitles: Optional[pysrt.SubRipFile, None] = None, sound_effect_start_marker: Optional[str, None] = None, sound_effect_end_marker: Optional[str, None] = None) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Generate a train dataset from an audio file and its subtitles

Parameters

{string} -- The path to the audio file. (audio_file_path) –
{string} -- The path to the subtitle file. (subtitle_file_path) –

Keyword Arguments

{pysrt.SubRipFile} -- The SubRipFile object (default (subtitles) – {None}).
sound_effect_start_marker – {string} – A string indicating the start of the ignored sound effect (default: {None}).
sound_effect_end_marker – {string} – A string indicating the end of the ignored sound effect (default: {None}).

Returns

tuple – The training data and the training lables.

property frequency¶

Get the sample rate.

Returns: int – The sample rate.

get_len_mfcc() → float[source]¶

Get the number of samples to get LEN_SAMPLE: LEN_SAMPLE/(HOP_LEN/FREQUENCY).

Returns: float – The number of samples.

get_step_mfcc() → float[source]¶

Get the number of samples to get STEP_SAMPLE: STEP_SAMPLE/(HOP_LEN/FREQUENCY).

Returns: float – The number of samples.

property hop_len¶

Get the number of samples per frame.

Returns: int – The number of samples per frame.

property len_sample¶

Get the length in seconds for the input samples.

Returns: float – The length in seconds for the input samples.

property n_mfcc¶

Get the number of MFCC components.

Returns: int – The number of MFCC components.

position_to_duration(position: int) → float[source]¶

Return the time in seconds from a cell position.

Parameters: {int} -- The cell position. (position) –
Returns: float – The number of seconds.

position_to_time_str(position: int) → str[source]¶

Return the time string from a cell position.

Parameters: {int} -- The cell position. (position) –
Returns: 23:20,150).
Return type: string – The time string (e.g., 01

property step_sample¶

The space (in seconds) between the begining of each sample.

Returns: float – The space (in seconds) between the begining of each sample.

time_to_position(pysrt_time: pysrt.SubRipTime) → int[source]¶

Return a cell position from timestamp.

Parameters: {pysrt.SubRipTime} -- SubRipTime or coercible. (pysrt_time) –
Returns: int – The cell position.

classmethod time_to_sec(pysrt_time: pysrt.SubRipTime) → float[source]¶

Convert timestamp to seconds.

Parameters: {pysrt.SubRipTime} -- SubRipTime or coercible. (pysrt_time) –
Returns: float – The number of seconds.