# Topic Modeling with Prior

In the second stage of the analysis, the research employs a topic modeling approach with prior information to refine the topics pertinent to the study of central bank policy uncertainty in Cambodia's highly dollarized economy. The topic modeling with prior represents a sophisticated approach to distilling relevant information from a corpus of text data. By incorporating prior knowledge, the model is tailored to capture the nuances of central bank policy uncertainty in the specific context of Cambodia's economy. This method ensures that the derived topics are aligned with the research objectives, providing a robust foundation for subsequent analysis and interpretation.

## Prior Information

The prior information is set to guide the topic modeling towards specific themes relevant to central bank policy uncertainty. The prior consists of two main groups:

- **Group 0**: Focuses on general economic indicators, including terms like 'price', 'inflation', 'growth', and 'economy'.
- **Group 1**: Concentrates on central banking aspects, with terms such as 'nbc', 'central_bank', 'national_bank', and 'national_bank_cambodia'.

## Configuration

The configuration of the topic modeling with prior is as follows:


In [1]:
!nbcpu +model=nbcpu-topic_prior noop=1

## Command Line Interface for HyFI ##
{'about': {'authors': 'Young Joon Lee <entelecheia@hotmail.com>',
           'description': 'Quantifying Central Bank Policy Uncertainty in a '
                          'Highly Dollarized Economy: A Topic Modeling '
                          'Approach',
           'homepage': 'https://nbcpu.entelecheia.ai',
           'license': 'MIT',
           'name': 'Measuring Central Bank Policy Uncertainty'},
 'debug_mode': False,
 'dryrun': False,
 'hydra_log_dir': '/home/yjlee/.hyfi/logs/hydra',
 'model': {'_config_group_': '/model',
           '_config_name_': 'lda',
           '_target_': 'thematos.models.lda.LdaModel',
           'autosave': True,
           'batch': {'_config_group_': '/batch',
                     '_config_name_': '__init__',
                     'batch_name': 'model',
                     'batch_num': None,
                     'batch_num_auto': False,
                     'batch_root': 'workspace/topic',
                     'confi

## Running the Workflow

The entire workflow can be executed using the following command:


In [2]:
!nbcpu +workflow=nbcpu tasks='[nbcpu-topic_prior]' mode=__info__

[[36m2023-08-15 18:39:58,536[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f15a43f6b80>[0m
[[36m2023-08-15 18:39:58,536[0m][[34mhyfi.main.config[0m][[32mINFO[0m] - HyFi project [nbcpu] initialized[0m
[[36m2023-08-15 18:39:58,736[0m][[34mhyfi.main.main[0m][[32mINFO[0m] - The HyFI config is not instantiatable, running HyFI task with the config[0m
[[36m2023-08-15 18:39:59,574[0m][[34mhyfi.joblib.joblib[0m][[32mINFO[0m] - initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7f15785c1220>[0m
[[36m2023-08-15 18:40:00,715[0m][[34mhyfi.task.batch[0m][[32mINFO[0m] - Initalized batch: corpus(1) in /home/yjlee/workspace/projects/nbcpu/workspace/topic/corpus[0m
[[36m2023-08-15 18:40:02,146[0m][[34mhyfi.task.batch[0m][[32mINFO[0m] - Initalized batch: corpus(1) in /home/yjlee/workspace/projects/nbcpu/workspace/topic/corpus[0m
[[36m2023-08-15 18:40:02,147[0m][[34mhy

## Model Results

The Latent Dirichlet Allocation (LDA) model, applied to a corpus of 27,594 documents encompassing 4,810,963 words, utilized 18,158 out of 126,469 total vocabs. Configured with 10 topics and specific hyperparameters for alpha and eta, the model underwent 100 iterations without burn-in steps, with an optimization interval of 10.

The resultant topics reflect a diverse spectrum of subjects, including economics, finance, politics, technology, and social matters. Through the incorporation of prior information, the model was steered towards themes pertinent to central bank policy and economic indicators. Specifically:

- **Topic #0** aligns with the economic indicators' prior, emphasizing price, market growth, inflation, and global economic aspects.
- **Topic #1** resonates with the banking sector, including central banking, mirroring the prior set for banking and financial services.
- **Topic #7** encapsulates trade, investment, and regional cooperation facets, including ASEAN, indirectly correlating with economic policy and central banking.

These topics collectively illustrate the model's adeptness in leveraging prior information to hone in on areas of interest such as central banking and economic indicators. The coherence of the topics underscores their interpretability and relevance to the overarching research focus.


{numref}`fig-lda-wordcloud-prior` shows the wordcloud of the top 500 words in each topic from the LDA model with 10 topics and prior. The size of the word is proportional to the frequency of the word in the topic.

```{figure} ./figs/LDA_model(0)_k(10)_wordcloud_00.png
---
name: fig-lda-wordcloud-prior
---
Wordcloud of the top 500 words in each topic from the LDA model with 10 topics and prior. The size of the word is proportional to the frequency of the word in the topic.
```
