# Topic Modeling for Uncertainty Analysis

The task of classifying uncertainty within the context of topic modeling requires a nuanced and specialized approach, diverging from conventional methods. This methodology is characterized by three distinct modifications, each contributing to the precise identification and analysis of uncertainty-related themes.

## Three Modifications

### 1. Integration of Prior Knowledge

The model is designed to incorporate prior knowledge, which serves as a guiding framework for identifying topics specifically related to uncertainty. The defined priors include:

- **Prior 0**: Targeting terms synonymous with uncertainty, such as "uncertain," "risk," and "uncertainty."
- **Prior 1**: Concentrating on terms that denote enhancement or fortification, such as "improve," "strengthen," "ensure," and "enhance."

These priors strategically channel the model's focus towards themes intrinsically connected to uncertainty, thereby refining the relevance and specificity of the extracted topics.

### 2. Strategic Removal of Stop Words

Beyond the elimination of standard stop words, the model is configured to exclude the top 100 words from each of the 10 topics identified in the previous model, totaling 896 words. This exclusion is carefully executed to preserve uncertainty-related words through manual inspection. By filtering out these common but non-contributory words, the model is enabled to concentrate on terms that are contextually significant and pertinent to the understanding of uncertainty.

### 3. Fine-Tuning of Model Parameters

The model's parameters are meticulously adjusted to resonate with the unique objective of classifying uncertainty. The key modifications include:

- **min_cf**: Set at 500, this parameter stipulates the minimum collection frequency of words, focusing on terms with substantial corpus-wide presence.
- **min_df**: Also set at 500, this parameter governs the minimum document frequency of words, accentuating terms that recur across diverse documents.

These deliberate parameter adjustments, coupled with the strategic integration of prior knowledge and the removal of specific stop words, culminate in a bespoke approach to topic modeling. This approach is adept at capturing and classifying uncertainty-related themes, aligning seamlessly with the overarching research goals.

## Configuration Details

The configuration for this specialized topic modeling with uncertainty analysis is encapsulated in the following command:


In [1]:
!nbcpu +model=nbcpu-topic_uncertainty noop=1

## Command Line Interface for HyFI ##
{'about': {'authors': 'Young Joon Lee <entelecheia@hotmail.com>',
           'description': 'Quantifying Central Bank Policy Uncertainty in a '
                          'Highly Dollarized Economy: A Topic Modeling '
                          'Approach',
           'homepage': 'https://nbcpu.entelecheia.ai',
           'license': 'MIT',
           'name': 'Measuring Central Bank Policy Uncertainty'},
 'debug_mode': False,
 'dryrun': False,
 'hydra_log_dir': '/home/yjlee/.hyfi/logs/hydra',
 'model': {'_config_group_': '/model',
           '_config_name_': 'lda',
           '_target_': 'thematos.models.lda.LdaModel',
           'autosave': True,
           'batch': {'_config_group_': '/batch',
                     '_config_name_': '__init__',
                     'batch_name': 'model',
                     'batch_num': None,
                     'batch_num_auto': False,
                     'batch_root': 'workspace/topic',
                     'confi

## Running the Workflow

The entire workflow can be executed using the following command:


In [3]:
!nbcpu +workflow=nbcpu tasks='[nbcpu-topic_uncertainty]' mode=__info__

[[36m2023-08-15 19:07:43,714[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f5a586f5670>[0m
[[36m2023-08-15 19:07:43,714[0m][[34mhyfi.main.config[0m][[32mINFO[0m] - HyFi project [nbcpu] initialized[0m
[[36m2023-08-15 19:07:43,912[0m][[34mhyfi.main.main[0m][[32mINFO[0m] - The HyFI config is not instantiatable, running HyFI task with the config[0m
[[36m2023-08-15 19:07:44,736[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f5a387bf250>[0m
[[36m2023-08-15 19:07:45,870[0m][[34mhyfi.task.batch[0m][[32mINFO[0m] - Initalized batch: corpus(1) in /home/yjlee/workspace/projects/nbcpu/workspace/topic/corpus[0m
[[36m2023-08-15 19:07:47,283[0m][[34mhyfi.task.batch[0m][[32mINFO[0m] - Initalized batch: corpus(1) in /home/yjlee/workspace/projects/nbcpu/workspace/topic/corpus[0m
[[36m2023-08-15 19:07:47,283[0m][[34mhy

## Model Results

The specialized Latent Dirichlet Allocation (LDA) model, designed to classify uncertainty, was applied to a corpus of 27,594 documents containing 1,916,988 words. Out of 131,732 total vocabs, 1,102 were used in the analysis, with specific parameters set to focus on uncertainty-related topics.

### Key Findings

1. **Topic #0**: This topic prominently features terms directly related to uncertainty, such as "risk," "slow recovery," "recession," "uncertainty," and economic fluctuations like "cut," "hike," "fall," "increase," and "decline." It encapsulates the economic uncertainty and potential risks in recovery and growth.

2. **Topic #1**: This topic aligns with the prior emphasizing improvement and strengthening. Terms like "improve," "strengthen," "ensure," "achieve," and "enhance" reflect efforts to mitigate uncertainty by enhancing frameworks, addressing goals, and implementing reforms.

3. **Topic Coherence Scores**: The coherence scores, including u_mass at -2.0798, c_uci at 0.4653, c_npmi at 0.0657, and c_v at 0.5938, indicate a reasonable level of interpretability and relevance of the topics, although there may be room for further refinement.

### Interpretation

The results demonstrate the model's effectiveness in identifying and classifying uncertainty-related topics. The incorporation of prior knowledge, removal of specific stop words, and fine-tuning of model parameters have led to the extraction of themes that resonate with the concept of uncertainty.

Topic #0 provides a comprehensive view of economic uncertainty, capturing the volatility and risks in the market. Topic #1, on the other hand, offers insights into strategies and efforts to navigate and mitigate uncertainty.

The tailored approach to topic modeling for uncertainty has yielded meaningful insights into the themes of risk, recovery, and strategies to overcome uncertainty. The model's configuration and the resulting topics align well with the research objectives, providing a nuanced understanding of uncertainty within the given context. The coherence scores suggest that the topics are interpretable, although further refinement and exploration may enhance the model's precision and depth of analysis.


{numref}`fig-lda-wordcloud-uncertainty` shows the wordcloud of the top 500 words in each topic from the LDA model with 20 topics and uncertainty prior.

```{figure} ./figs/LDA_model(3)_k(20)_wordcloud_00.png
---
name: fig-lda-wordcloud-uncertainty
---
Wordcloud of the top 500 words in each topic from the LDA model with 20 topics and uncertainty prior.
```


## Refinement of Uncertainty Topic Modeling

The refinement of uncertainty topic modeling is executed through a methodical two-stage process. Initially, documents are filtered according to specific criteria, and subsequently, the topic model is reapplied to this refined dataset. This iterative methodology sharpens the focus on themes pertinent to uncertainty, yielding a more accurate and detailed analysis.

In the first stage, documents are identified and filtered based on their relevance to uncertainty topics, specifically topics 0 and 1. The selection is guided by the combined weight of these topics, which directly correspond to the concept of uncertainty. Following this identification, the selected documents are merged with the original dataset.

The final stage of the process involves a further filtering of the data, adhering to the established selection criteria, and preparing the dataset for another round of topic modeling. This is achieved by applying the query `"topic_relevant > 0.5"`, ensuring that only documents meeting this threshold are retained.

This systematic refinement process epitomizes a targeted approach to uncertainty topic modeling. It emphasizes the selection of documents that resonate with the themes of risk, uncertainty, and strategic mitigation. By employing this iterative and filtering technique, the analysis is honed in on the essential facets of uncertainty, thereby improving both the precision and interpretability of the findings. The configuration of the pipeline fortifies this approach, guaranteeing a methodical and replicable procedure that is in harmony with the overarching research goals and academic standards.


In [4]:
!nbcpu +workflow=nbcpu tasks='[nbcpu-datasets_uncertainty_filter]' mode=__info__

[[36m2023-08-15 19:16:38,472[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f27d820e610>[0m
[[36m2023-08-15 19:16:38,473[0m][[34mhyfi.main.config[0m][[32mINFO[0m] - HyFi project [nbcpu] initialized[0m
[[36m2023-08-15 19:16:38,669[0m][[34mhyfi.main.main[0m][[32mINFO[0m] - The HyFI config is not instantiatable, running HyFI task with the config[0m
[[36m2023-08-15 19:16:39,529[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f27b8350760>[0m
[[36m2023-08-15 19:16:40,726[0m][[34mhyfi.workflow.workflow[0m][[32mINFO[0m] - Running task [nbcpu-datasets_uncertainty_filter] with [run={} verbose=False uses='nbcpu-datasets_uncertainty_filter'][0m
[[36m2023-08-15 19:16:40,750[0m][[34mhyfi.task.task[0m][[32mINFO[0m] - Running 1 pipeline(s)[0m
[[36m2023-08-15 19:16:40,750[0m][[34mhyfi.task.task[0m][[32mINFO[0m] - R